How could the devs apply machine learning to improving AI?

July 13, 2017

Hi folks,

The age of AI seems to be fast approaching (my current YouTube distraction, lol). I'm curious as to how specifically machine learning could be applied to make AI in IL2 more realistic/natural/"better".

It's a wide topic with I'm sure wide applications within the game and clearly affected by CPU capabilities but just throwing it out there to pull in current thinking. Cheers.

July 13, 2017

I don't know if it would be very good.

I imagine that the AI would only become as smart as your average player on Wings of Liberty. It would quickly learn that the quickest path to victory is grab an Me-109 and fly balls out, rads out towards the nearest enemy airfield and pick off the other player as they are taxiing. Once it discovered that tactic it would be unlikely to deviate from it.

July 13, 2017

That would probably apply to learning in MP, which just goes to show what a pitiful excuse for a simulation of warfare MP is, but a more interesting case would be learning in SP, especially if success was measured as a cumulative score in a DiD context in a career mode or scripted campaign.

Note the recent triumph ( a draw actually IIRC) of a computer in Go, which unlike chess does not have a fairly limited number of paths the PC can test by brute force.

I expect that the AI would learn to maintain altitude, look around all the time, make attacks with speed and altitude and then separate as fast as possible....

July 13, 2017

Hi folks,

The age of AI seems to be fast approaching (my current YouTube distraction, lol). I'm curious as to how specifically machine learning could be applied to make AI in IL2 more realistic/natural/"better".

It's a wide topic with I'm sure wide applications within the game and clearly affected by CPU capabilities but just throwing it out there to pull in current thinking. Cheers.

I am no expert in this field, but I have worked with some of these systems. There are many ways this could work, though most would require decomposing BFM/AFM into sets of little steps/rules/decisions in both the tactical move and control domains. E.g., "given a target plane at such-and-such bearing, aspect, range, attitude, and speed and you are at such-and-such heading, attitude, and speed, you should ... [set of possibilities]" for the former, and say the "move" decided is to adjust heading and attitude and speed in a particular way, then the best way out of many options to achieve this and maintain energy for the latter. Once you have this (massive) collection of decisions, than standard machine learning methods can be used train the AI to optimize the decision tree -- nothing special there. So the real hurdle is the development of the sets of rules that express all the possible moves/responses for a given situation ... and that not only is a HUGE amount of work, it needs to be done by humans.

As I said, though, I am no expert, and the field may be several universes more advanced than what I speculate here, so take this all with a grain of salt.

But, assuming that this gets done, then one interesting side-effect will be the possibilities for a range of AI's each with a distinct and different character. That is, we might imagine individuals with varying degrees of skills and experience that are not easily quantifiable by, e.g. "veteran" or "ace", but rather a nuanced spectrum ranging from "raw" to "ultra-optimized", reflective of the ranges and number of different opponents they have been trained on, human, AI or otherwise. Furthermore, two equally experienced/optimized AI's might have "learned" very different approaches to success, so that they might even be recognizable by their distinct styles. I imagine the AI's will be aircraft type-specific, as the learning is intimately interwoven with aircraft operation, but within a particular type of aircraft, you could certainly clone a successful AI to occupy another individual aircraft of the same type ... but the moment the clone AI takes off and fights its first battle, its learning trajectory will fork from that of its original template. So individuality would not just be an interesting aspect of the AI system, but possibly a fundamental part of it too. This might really make MP a fascinating arena --- as persistent AI's on both sides populate servers, and human players either hope to hope not to meet known/famous/notorious aces or, alternatively, try and grab the better AI wingmen.

July 13, 2017

any such self-optimizing system would almost invariably fall into the great pitfall of AI design, which is, it can easily become "too clever for it's own good"

not that it would become unbeatable (though that is another problematic scenario that cannot be fully ruled out) - but much like online players, it would very rapidly start picking up onto what in game design we call "dominant strategies"

one rather controversial example of this which is very close to home in this community, would be phenomena such as "magic flaps", and "109F spam", to name a few very obvious cases where the option that confers the largest advantage is also one which contradicts the premise (unhistorical) and/or reduces the enjoyment of the game (ppl don't like it when others do it) - however, you'd have to manually deny each such possibility (including unknown ones) from the learning system to prevent it abusing it - then reset it all over again

any system designed to learn by itself would require some form of feedback loop to let it know which actions work and which don't -- by refining it's choices based on this feedback, it would then converge on behavior which provides it the largest returns...

most often, it achieves this by completely unexpected, and even bizarre ways - definitely not good for a historically accurate simulator

a more conventional approach offers better control to designers, and thus allows it to mimic correct historic behavior - such as pilots were trained to have, back in the day - and this can be much more finely tuned to offer a far more enjoyable experience. - with a few cleverly placed variations within the system, this kind of AI will often surpass online human opponents in terms of behavior authenticity, and even the unpredictability of their actions

fact is, such methods for optimizing performance, be it of man or machine, can be (and usually are) rather unnatural... this happens for a great many number of reasons, but generally it tends to result in behavior that is unauthentic, and ultimately, unfun

Edited July 13, 2017 by 19//Moach

July 13, 2017

I work directly in this field and specifically on reinforcement learning for continuous control.

There are different approaches that can be taken to solve this issue, as mentioned in the previous posts. One of the more interesting would be end-to-end training, where you do not need to hard code anything at all. It is possible, and here are a few examples of (what I would call more complex tasks than flying) :

I have been toying with an idea of implementing an agent for il2-style aircraft sims for a while, but the game doesn't lend itself well to the research field, mostly due to a complete lack of api or any control or readout.

I believe the AI field is now powerful enough to produce an end-to-end solution for the task of flying and dogfighting, but cooperation still remains a significant challenge. Just like AlphaGo has perturbed the Go game field, where experts look up to the AI to find new strategies, I would be quite curious to find out what AI could do in dogfighting field.

I am happy to go into more detail if anyone is interested.

@Moach

The problems you describe can be easily avoided by rewarding the agent for the "correct" behaviours. You do not have to let it choose which aircraft to fly, you can "reward" formation flying etc..

There are many many other non-purist approaches to "learning to fly" that still involve direct control (rather than meta-control), such as learning from demonstrations. These work quicker and are more "reliable", but generally lack the magic of "AI coming up with something new".

Edit : I suspect that there are a few quick gains that the AI in this game could grab from using ML without implementing a full-blown ML agent. But I have no knowledge about the current AI implementation to be able to come up with specifics.

Edited July 13, 2017 by JaffaCake

July 13, 2017

I am happy to go into more detail if anyone is interested.

Oh yeah! Whatever you have the patience for.

This was an interesting look at how machine learning took place from no specific training, in of course a relatively simple environment.

July 13, 2017

Oh yeah! Whatever you have the patience for.

This was an interesting look at how machine learning took place from no specific training, in of course a relatively simple environment.

--..--

Keep in mind that this video is a perfect example of over-fitting the network to the task. That is, it does not generalise, but memorise the actions required. The main attractiveness of neural networks is their ability to generalise to the situations that were never encountered during the training phase and there are different strategies used to avoid the over-fitting.

The ML field is huge and quite old. The most recent explosion was mostly caused by the "deep learning", which is basically a fast implementation of old neural networks that enables you to stack the neurons in multiple layers (which was extremely costly and barely tractable in the past).

Neural networks themselves are just function approximators - they do not automagically solve the problem for you - instead they are able to map a highly-dimensional sparse data set onto a smaller set of varaiables. The action of function-fitting produces this "generalisation". But it is up to you to define what data to supply to network and what you expect it to produce, how you wish to train it etc.

A quick example of how an AI would be able to learn to play an Atari game via reinforcement learning (DQN):

We supply the neural network with a full image of the game. (so that is 160x80x3 pixels each with 256 values). Every frame of the game we are in a "state" that is represented by these pixels. Every frame of the game we receive a reward of 0 if there was no change in score, or the delta-score if the score has changed. Every frame the agent has to decide which action to take, for example, in "Breakout" game it can choose to do nothing, move left or move right. So the agent needs to produce a value for each of the possible actions given the current frame (state). We can denote it has Q(s,a) (s- state/frame a - action).

At the beginning the agent knows nothing about the environment, so the values of Q(s,a) are mostly random. Imagine the agent took a certain action that it, at the time, thought was most valuable. Now you have a transition s,a -> s',r Where s' is the new frame and r is the reward that the agent received in the transition.

We can now update our estimates for the action-values for the state-action s,a by doing : Q_new(s,a) = Q_old(s,a) + learning rate * ( reward + discount * max(Q(s',a)) - Q_old(s,a)). Basically we "nudge" the agent in the direction of the correct reward. Of course we also need to consider the "value" of the best action of the next state, because otherwise the Q value would be 0 for all states other than the ones that are right next to the reward. The update above can be performed on tables in-memory, but for games like Atari it would be impossible to allocate enough memory to represent every possible state. The agent would also fail to generalise the problem if it was using tables. So instead of using tabular method we could train a neural network that takes the pixels as input and maps them to 3 values representing the actions given the input state.

The end result :

You can read more on neural networks here :

http://ufldl.stanford.edu/tutorial/

And if you want a selection of deep-learning applications check out this blog :

https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471

Edited July 13, 2017 by JaffaCake

July 13, 2017

I just wanted to add to the above that fully trained neural networks (and resulting agents) are very fast to execute and I wouldn't be surprised if they were faster than standard AI implementations.

In terms of IL2, the performance wouldn't be an issue, mostly just the design implementation and training of the agent. Actually machine learning has been used in racing games for a long time now too... here we are only adding an extra dimension

Edited July 13, 2017 by JaffaCake

July 13, 2017

Very interesting topic. Im no expert but i have been learning about it recently. Thanks JaffaCake for the informative posts.

July 13, 2017

I just wanted to add to the above that fully trained neural networks (and resulting agents) are very fast to execute and I wouldn't be surprised if they were faster than standard AI implementations.

In terms of IL2, the performance wouldn't be an issue, mostly just the design implementation and training of the agent. Actually machine learning has been used in racing games for a long time now too... here we are only adding an extra dimension

Now that is positive news; an important mention. :-)

Very interesting topic. Im no expert but i have been learning about it recently. Thanks JaffaCake for the informative posts.

Yeah, JaffaCake thanks for taking the time to write about this. And the links provide an interesting read. I'll probably do that Lynda short course at the weekend. It could be useful for our little business actually/hopefully.

It would indeed be interesting to see what a self-trained AI would come up with in terms of BFM and indeed larger tactics if trained as a "swarm", but ultimately guided by realistic formations and human-like behavior etc for gameplay.

July 13, 2017

JaffaCake -please forgive a complete layman's questions but would the type of AI you're talking about allow individual types of flying AI? In other words, would one AI fly a Zero differently than an AI flying an F4F so each would use the different strengths of each airplane?

And does this type of AI have to learn in real time or does it learn at computer speeds?

July 13, 2017

We are the engines of our own undoing.

July 13, 2017

We are the engines of our own undoing.

Likely. On the upside though, you and I will likely be long gone before that happens.

July 14, 2017

True.

July 14, 2017

I have this hopeless wish for AI not being able to see thru sky and other obstacles, that they could be sneaked upon to by going low and at their six.

I wish they would not be so obvious in their behaviour. They should react on more things than distance between them and opponents. They should behave like wingmen that care. But this is not wishes that goes only for this simulator, I think a good AI , like all aspects in life cost money. You get what you pay for, and we did not pay much

July 14, 2017

I'm not sure that we should be teaching AI to fly warplanes...

Just a thought.

July 14, 2017

I have this hopeless wish for AI not being able to see thru sky and other obstacles, that they could be sneaked upon to by going low and at their six.

I wish they would not be so obvious in their behaviour. They should react on more things than distance between them and opponents. They should behave like wingmen that care. But this is not wishes that goes only for this simulator, I think a good AI , like all aspects in life cost money. You get what you pay for, and we did not pay much

I agree with all of this. Honestly, if it were possible to add to this sim it would be well worth paying for. Edited July 14, 2017 by Rjel

July 14, 2017

JaffaCake -please forgive a complete layman's questions but would the type of AI you're talking about allow individual types of flying AI? In other words, would one AI fly a Zero differently than an AI flying an F4F so each would use the different strengths of each airplane?

And does this type of AI have to learn in real time or does it learn at computer speeds?

This depends entirely on the implementation of the learning algorithm. You could try teaching an agent to fly any aircraft, and supply it with "specs" of that aircraft prior to take off - but in my mind that would be additional complexity that will increase the required learning time by several orders. On the other hand you could teach individual agents for different aircraft types where each would have to learn to exploit the strengths/weaknesses while learning about the aircraft as well.

In both of the above cases the AI will fly each aircraft differently.

With regard to the training - the agent of course does not care about the real-time speeds. You can train it as fast as you can possibly run the game simulation (or many simulations simultaneously). You could initially use hard-coded AI as opponents, followed by agent vs agent dogfights to perfect the training. This is somewhat similar to what was done with AlphaGo. I'm just going to leave this link for those interested to know the opinion of 9dan (highest professional) Go player about the AlphaGo strategies (They also call it "Master" in the series)

I have this hopeless wish for AI not being able to see thru sky and other obstacles, that they could be sneaked upon to by going low and at their six.

I wish they would not be so obvious in their behaviour. They should react on more things than distance between them and opponents. They should behave like wingmen that care. But this is not wishes that goes only for this simulator, I think a good AI , like all aspects in life cost money. You get what you pay for, and we did not pay much

If you check the above links I posted you would find that the DeepMind's Atari playing AI was playing from pixels. That is, the AI "saw" the game, instead of being told where things are.

Machine learning has several strategies to deal with objects that occasionally disappear from view - LSTMs / Neural computers etc. Its a fair bit less reliable than "direct" learning as it also learns what to "remember" from the past, but it is possible.

From my POV if this game would get a dedicated ML AI guy that knows the engine and knows his field well I'd give these estimates for development times (very very rough and mostly can be ignored) :

- Narrow optimisations, such as AI rudder control, flight stability. Possibly even model optimisation that devs say takes such a long time to hand-tune ~ 1-3 months. (Published works, implemented examples online available).

- Top-level AI decisions, the aircraft is flown by the hard-coded AI, but the choices of what to do is made by ML agent ~ 1-6 months (published works, generic implementations available)

- Reinforcement learning agent able to fly the aircraft to desired locations (no take-off/landing) ~ 2-6 months (generic implementations available)

- Reinforcement learning agent able to dogfight - 6-18 months (generic implementations available, but heavy modifications + research to adapt required)

- RL agent able to cooperate with other agents - unknown - toy examples in toy systems available, but applying it on top of already very complex system is difficult.

- etc.

And ofc. the above is time working on the machine learning part - not the engine modifications to support ML training / data collection / etc

Overall the human-like AI is much closer than one might think. DeepMind is working on creating AI that can play SC2 from pixels, other companies / labs work on and show results in continuous control / cooperative / etc tasks for AI. I'd give it a few more years, but we really are much closer than what many people think. And AlphaGo success is just the first step.

Edit : Also the fact that currently ML developers are in highest demand, and anyone knowing their worth costs a lot more to hire than an equivalent software engineer.

More edit : The only serious hope I see for a flight sim to gain any traction with ML is to release API for the researches in the field to be able to experiment and develop algorithms. For example, GTA V is already used to train agents for self-driving cars : https://github.com/ai-tor/DeepGTAV There are few air sims available that have the FM fidelity of IL2 in the research field though.

Edited July 14, 2017 by JaffaCake

July 14, 2017

Thx Jaffe. This all sounds like it could be an exciting development for simming.

July 14, 2017

Well, one way they can improve the AI is to make each ground attacking plane choose a target that none of it's wingmen is attacking, so they can quickly take out ground units and leave. Right now, the ground attack planes choose a target, they all go for it, it gets killed and they take 5 minutes to complete a circuit and get back in a position to kill the next ground unit. Repeat 5-6 times to actually kill a single artillery battery.

July 14, 2017

Hi folks,

The age of AI seems to be fast approaching (my current YouTube distraction, lol). I'm curious as to how specifically machine learning could be applied to make AI in IL2 more realistic/natural/"better".

It's a wide topic with I'm sure wide applications within the game and clearly affected by CPU capabilities but just throwing it out there to pull in current thinking. Cheers.

You cannot.. not at least as most people think. Machine learning is a rough name for a huge set of different and specific techniques tailored for specific application. What all have in common? Theya re all about aligning a set of vectors in an N-dimentional space...

Now you ask.. how the alignment of a vector in space can be related to piloting? That is the trick question always with amchine learning. 99% of the work is finding a way to map in an algebra way input and outputs of an operation.

Machine learning then will learn how to get that result. NEVER how to get that result making it look like what a real human would have done.

The best you can expect is make the computer start to value less and less maneuvers that he knows fail against YOU and value more maneuvers that work. But creating its own new moves? that is not within the scope of machine learning until someone can model air combat ina simple F(x1,x2,x3...xn)=y model.

If i were to try make computer able to adapt to how pilots fly, I would use Genetic algorithms and not machine learning. TH eproblem is that it needs a gazzilion iterations to start getting somewhere.

Edited July 14, 2017 by VeryOldMan

July 14, 2017

Edit : Also the fact that currently ML developers are in highest demand, and anyone knowing their worth costs a lot more to hire than an equivalent software engineer.

More edit : The only serious hope I see for a flight sim to gain any traction with ML is to release API for the researches in the field to be able to experiment and develop algorithms. For example, GTA V is already used to train agents for self-driving cars : https://github.com/ai-tor/DeepGTAV There are few air sims available that have the FM fidelity of IL2 in the research field though.

Some great info Jaffe, and that last part is a great point- A ML dev with skills to implement an AI for IL-2 would be prohibitively expensive for this sim unfortunately. Jason has already admitted to paying his dev(s) below market rate in another thread (to avoid any misunderstanding that's not to say he's short changing them, shoot I might take a pay cut to work on something I was passionate about like Il-2).

Edited July 14, 2017 by 19//curiousGamblerr

July 14, 2017

It is not simply hiring one being expensive. I am a ML focused developer, working in medical image field.. and under-payed as well But I cannot even fathom how I would even start to develop such a system for a flight sim without a LARGE team of people to feed and train the system. Machine learning as of today is still not easy to use to mimic human behavior since most research is not focused on that.

July 14, 2017

Real AI is extremely CPU intensive. Real AI tend to cheat what is normal btw. What we have here is hybrid between Human/CPU. That allow more AI controlled planes in the air. However hybrid AI tend to do the same behaviour what makes their action foreseeable.

July 14, 2017

My understanding of such cognitive systems is that they take lots of data points and gradually get better over time. I don't think that translates well to flight sim AI.

First, will every installed instance learn independently? Will it be a cloud based mind merge? How long will the AI have to make a decision? What kind of CPU power would be needed?

Think of the AI success in chess. That is a strict 1:1 scenario with only a single AI entity. The situation always starts from a known point and develops from there in discrete steps. At each discrete step the situation is analyzed for possibilities moving forward.

Now consider air combat. The starting position is infinitely variable. The activity is fluid with all actors moving in real time rather than at discrete intervals. This fluidity creates many more scenarios than are possible on a chess board (which is not trivial in itself).

Finally, you want AI to model realistic human behavior, not ideal fighter pilot behavior. Much of air combat is also decided by human emotions - fear being a primary one. To get realistic human behavior you have to model self preservation. That is possible, as expert systems can account for competing imperatives, but it certainly does add another massive layer of complexity.

Never say never, but not soon.

July 15, 2017

If you check the above links I posted you would find that the DeepMind's Atari playing AI was playing from pixels. That is, the AI "saw" the game, instead of being told where things are.

Thanks for the links, personally I think if a learning AI would come it would struck gold. Many people like me prefer fighting against the machine instead of public server. No matter how stupid the AI is, they at least try to do the mission and do not look at stats. I will read up and , again I appreciate you taking the time to post in such a detailed manner

July 15, 2017

How could the devs apply machine learning to improving AI?

should they do it at all?

Selflearning AI is a lot different from humans trained ina Aerial warfare, WWII style.

July 16, 2017

Yep, it would make more sense to make the AI 'learn' from the actions of human 'trainers'.

On the other side I would be already happy if the AI couldn't see through obstactles or could be ordered in a more specific way than 'attack nearby air targets' or 'attack nearby ground targets'.

July 17, 2017

You cannot.. not at least as most people think. Machine learning is a rough name for a huge set of different and specific techniques tailored for specific application. What all have in common? Theya re all about aligning a set of vectors in an N-dimentional space...

Now you ask.. how the alignment of a vector in space can be related to piloting? That is the trick question always with amchine learning. 99% of the work is finding a way to map in an algebra way input and outputs of an operation.

Machine learning then will learn how to get that result. NEVER how to get that result making it look like what a real human would have done.

The best you can expect is make the computer start to value less and less maneuvers that he knows fail against YOU and value more maneuvers that work. But creating its own new moves? that is not within the scope of machine learning until someone can model air combat ina simple F(x1,x2,x3...xn)=y model.

If i were to try make computer able to adapt to how pilots fly, I would use Genetic algorithms and not machine learning. TH eproblem is that it needs a gazzilion iterations to start getting somewhere.

You are describing supervised learning in this case. And you are completely right - it requires huge datasets for it to work.

For AI applications in games supervised learning is used as a method to train function approximator to determine the value function of the agent. Please check out the DQN paper I linked previously.

In short, in reinforcement learning the "huge dataset" problem is solved by running many instances of the environment to gather the experience. The experience is also "replayed" multiple times for the trained network to improve convergence and stability. Large data requirement is certainly the least important problem in the view of IL2 application.

Now you ask.. how the alignment of a vector in space can be related to piloting? That is the trick question always with amchine learning. 99% of the work is finding a way to map in an algebra way input and outputs of an operation.

The ML field that deals exactly with this issue is called "reinforcement learning". If you are interested in this area I highly suggest to check out this book : https://github.com/ShangtongZhang/reinforcement-learning-an-introduction

The best you can expect is make the computer start to value less and less maneuvers that he knows fail against YOU and value more maneuvers that work. But creating its own new moves? that is not within the scope of machine learning until someone can model air combat ina simple F(x1,x2,x3...xn)=y model.

Fortunately this was proven wrong already. AlphaGo has created new set of moves for the Go game that the experts never expected. These moves have been picked up by the community and are now taught on regular basis. While Go game may not seem like anything similar to IL2, the principle is the same - huge complexity of the environment and very few "pathways" to victory.

If i were to try make computer able to adapt to how pilots fly, I would use Genetic algorithms and not machine learning. TH eproblem is that it needs a gazzilion iterations to start getting somewhere.

Genetic algorithm is one of the methods of training a neural network to fit the function to the dataset. Gradient descent is another. Neither suddenly solve the problem of creating the AI for the game.

Imagine you have a set of parameters that determine which action you'd like to take based on the in-game screenshot (agent's state). Imagine this parameter space like a 3D landscape with valleys and hills. The agent's goal is to navigate this parameter space to reach the lowest point.

Genetic algo randomly places multiple agents in this space, determines their altitude and kills half of the population with worse results. It then "spawns" new half around the "successful" population with some perturbation. This is a very old method of optimisation that indeed is very inefficient.

Gradient descent works by only having one agent in this space. Every learning step the algorithm determines in which DIRECTION it needs to change the parameters to go down the valley. It doesn't know how far it needs to go (learning step) or if it is a local valley, and there is a much deeper valley right next to it (exploration). But this method is orders of magnitude faster than Genetic.

My understanding of such cognitive systems is that they take lots of data points and gradually get better over time. I don't think that translates well to flight sim AI.

Nothing stops you from running simulations against hard-coded AI and then versus the already learned agents.

Think of the AI success in chess. That is a strict 1:1 scenario with only a single AI entity. The situation always starts from a known point and develops from there in discrete steps. At each discrete step the situation is analyzed for possibilities moving forward.

There was no ML in chess. It was a variation of minimax algorithm. It basically is an equivalent of hard-coded AI we have right now in the game. : https://en.wikipedia.org/wiki/Minimax

Now consider air combat. The starting position is infinitely variable. The activity is fluid with all actors moving in real time rather than at discrete intervals. This fluidity creates many more scenarios than are possible on a chess board (which is not trivial in itself).

Reinforcement learning and ML in general is made to deal exactly with this issue. Continuous states and continuous control are by now standard problems that have several algorithms capable of dealing with it. Here is an example of one if you would like to read : https://arxiv.org/pdf/1602.01783.pdf

Finally, you want AI to model realistic human behavior, not ideal fighter pilot behavior. Much of air combat is also decided by human emotions - fear being a primary one. To get realistic human behavior you have to model self preservation. That is possible, as expert systems can account for competing imperatives, but it certainly does add another massive layer of complexity.

Self-preservation is kind of obvious imperative for the agent... There is nothing stopping the agent from inventing something new that humans have not thought about... but do realise that these agents will be trained under the same conditions as humans were - self preservation, number of kills, completion of objective etc... It is now fairly standard that agents deal with multiple of such imperatives. For example in the videos I have shown above the walker DIES the moment it falls. But it is rewarded based on the distance travelled - two imperatives that it is able to learn.

Never say never, but not soon.

You have mostly referred to 10-20 year old technology in your post. If it was the most recent tech right now, I'd agree with your statement. Fortunately we are well into solving the problems you described.

should they do it at all? Selflearning AI is a lot different from humans trained ina Aerial warfare, WWII style.

Entirely depends on your approach to training the AI. Replication of realistic human behaviour is known as "imitation learning" and has been used to a great success in robotics. The drawback is that it leaves nothing for the agent to truly discover on its own or come up with "new" solutions.

I hope I have addressed some of the questions raised here... Still happy to discuss in more detail if anyone is interested.

Edited July 17, 2017 by JaffaCake

July 17, 2017

Very interesting JaffaCake, thank you. A couple of random thoughts:

"Genetic algo randomly places multiple agents in this space, determines their altitude and kills half of the population with worse results. It then "spawns" new half around the "successful" population with some perturbation. This is a very old method of optimisation that indeed is very inefficient."

A very old method indeed. Strictly speaking I think it is fair to say that individual agents within a population selected in this manner do not "learn" anything at all. That does not mean that it is not a good way to find patterns of AI behaviour that create an illusion of appearing to be human.

As to the new knowledge issue: I am not sure that we actually want AI agents to come up with some new solution to ACM; again we just want them to look like a range of humans from trainee to ace flying with a certain set of tactics.

At some point we have to have a real human examine the behaviour and determine whether it is what we want, which is probably harder in some respects than for the chess and Go type AIs, where the criterion for success is simply beating a succession of increasingly skillful players. For a CFS that might be a necessary condition, but it is not sufficient. As previous posters have pointed out, if the agent just does this by gaming the flaws of the FM's treatment of flaps (should there actually be any such flaws), this is not modeling the behaviour of a WW2 ace: it is modeling the behaviour of a MP nerd.

July 17, 2017

Quote The best you can expect is make the computer start to value less and less maneuvers that he knows fail against YOU and value more maneuvers that work. But creating its own new moves? that is not within the scope of machine learning until someone can model air combat ina simple F(x1,x2,x3...xn)=y model. Fortunately this was proven wrong already. AlphaGo has created new set of moves for the Go game that the experts never expected. These moves have been picked up by the community and are now taught on regular basis. While Go game may not seem like anything similar to IL2, the principle is the same - huge complexity of the environment and very few "pathways" to victory.

The problem is that it would result in moves that were NOT used in real aviation and with very little connection on how someone could have come up with those. Considering that the type of public we have here shuns is the COLOR of a button int he cockpit is wrong... I bet it would not be well received.

The problem is making the AI work in a way it feels its a human on control. So creating something new.. that feels it could have been created by a human is harder than just creating something new.

The cutting edge academic work might point into a path.. but to apply in industry of small scale of cost techniques that are still growing in academics is expect too much for the time frame of a game life

Where I could see a VERY interesting AI advance would be on a strategic level of both the campaign and AI planing of missions.

July 17, 2017

A very old method indeed. Strictly speaking I think it is fair to say that individual agents within a population selected in this manner do not "learn" anything at all. That does not mean that it is not a good way to find patterns of AI behaviour that create an illusion of appearing to be human.

Its a matter of perception. Both genetic or gradient descend methods optimise the parameter space with respect to the error space - you can even keep randomising the parameters until you reach an acceptable level of performance. Learning in this case is locating the better parameters...

As to the new knowledge issue: I am not sure that we actually want AI agents to come up with some new solution to ACM; again we just want them to look like a range of humans from trainee to ace flying with a certain set of tactics.

At some point we have to have a real human examine the behaviour and determine whether it is what we want, which is probably harder in some respects than for the chess and Go type AIs, where the criterion for success is simply beating a succession of increasingly skillful players. For a CFS that might be a necessary condition, but it is not sufficient. As previous posters have pointed out, if the agent just does this by gaming the flaws of the FM's treatment of flaps (should there actually be any such flaws), this is not modeling the behaviour of a WW2 ace: it is modeling the behaviour of a MP nerd.

You could call me a purist for this - I prefer the AI to learn from scratch within given set of rules. And you are completely right - if you were to just present it with a system that has the easily exploitable behaviour (such as flaps) I do expect the AI to learn it. Which is why when you deploy majority of ML systems you run the sanity checks to make sure that the ML agent learnt to play by the "rules" and you haven't missed out enforcing a certain rule that ML ended up exploiting.

Issues like flaps, control overuse or other "exploits" that people are afraid ML agents will be able to discover are easily fixable via reward shaping. Suddenly the agent has to pay penalty every time it uses the flaps. etc.

The problem is that it would result in moves that were NOT used in real aviation and with very little connection on how someone could have come up with those. Considering that the type of public we have here shuns is the COLOR of a button int he cockpit is wrong... I bet it would not be well received.

if AI actually discovered a set of manoeuvres that are unknown to a human and are more effective than what we currently have you have a bigger problem than rowdy community - you are looking at publishing a paper, getting interest from military and generally attracting a lot of attention from community outside of the game.

Tbh, I doubt everybody wants AI that is able to fly "by the book". It just doesn't make sense to do - we don't do it either in the game.

The problem is making the AI work in a way it feels its a human on control. So creating something new.. that feels it could have been created by a human is harder than just creating something new.

The cutting edge academic work might point into a path.. but to apply in industry of small scale of cost techniques that are still growing in academics is expect too much for the time frame of a game life

Where I could see a VERY interesting AI advance would be on a strategic level of both the campaign and AI planing of missions.

I disagree. Imitation learning is the easiest way of going about teaching the AI how to fly human-like, but you loose the human-like flexibility and adaptability. You keep asking the AI to be able to react and be imaginative, but at the same time you are forcing it to fly by the book... In my mind both are very close to being opposites of each other.

I agree that in the life span of this game such AI would not be implemented, unless they release research API and have several research groups interested in using their sim for the research.

Just to summarise, if you want expected human-like behaviour from the AI, hard-coded protocols with a ruleset for cooperation is the way to go - it is meaningless to waste research time to force an agent to behave in such a way.

If you want AI that is able to pose a challenge to even the most experienced pilots, learns to adapt and "think-on-the-fly", then reinforcement learning is a good starting point.

Edited July 17, 2017 by JaffaCake

July 17, 2017

BTW, did you ever thought about trying to experiment with flight gear? It is open source so might have an API that you could use. Even if its a bad API it is better than nothing.

I work in medical diagnose software and self taught myself into the ML that I needed, the usage is much different although. We do not want an AI learning anything fancy on the fly , we want the AI deciding if the doctor is conforming a pattern, so it is mostly computer vision (at very very speed optimized scenarios) feeding simple pattern recognition. One usage is to know when the radiologist is starting to slip errors ( so they can be forced into a pause) or the data redirected to other radiologist for review if the system doubts what the first one did.

July 17, 2017

Its a matter of perception. Both genetic or gradient descend methods optimise the parameter space with respect to the error space - you can even keep randomising the parameters until you reach an acceptable level of performance. Learning in this case is locating the better parameters...

You could call me a purist for this - I prefer the AI to learn from scratch within given set of rules. And you are completely right - if you were to just present it with a system that has the easily exploitable behaviour (such as flaps) I do expect the AI to learn it. Which is why when you deploy majority of ML systems you run the sanity checks to make sure that the ML agent learnt to play by the "rules" and you haven't missed out enforcing a certain rule that ML ended up exploiting.

Issues like flaps, control overuse or other "exploits" that people are afraid ML agents will be able to discover are easily fixable via reward shaping. Suddenly the agent has to pay penalty every time it uses the flaps. etc.

if AI actually discovered a set of manoeuvres that are unknown to a human and are more effective than what we currently have you have a bigger problem than rowdy community - you are looking at publishing a paper, getting interest from military and generally attracting a lot of attention from community outside of the game.

Tbh, I doubt everybody wants AI that is able to fly "by the book". It just doesn't make sense to do - we don't do it either in the game.

I disagree. Imitation learning is the easiest way of going about teaching the AI how to fly human-like, but you loose the human-like flexibility and adaptability. You keep asking the AI to be able to react and be imaginative, but at the same time you are forcing it to fly by the book... In my mind both are very close to being opposites of each other.

I agree that in the life span of this game such AI would not be implemented, unless they release research API and have several research groups interested in using their sim for the research.

Just to summarise, if you want expected human-like behaviour from the AI, hard-coded protocols with a ruleset for cooperation is the way to go - it is meaningless to waste research time to force an agent to behave in such a way.

If you want AI that is able to pose a challenge to even the most experienced pilots, learns to adapt and "think-on-the-fly", then reinforcement learning is a good starting point.

My view is that the AI should not be what makes the pilot fly by the book. THe AI should add the human bit.. the artificial "stupidity" that we see human pilots do. Panic reactions.. things like that.

The fly by the book, I agree is "easy" to teach. Hard is to know when you make an AI that realistically adds human level of "stupidity" at moments. Make a mistake generating machine be well balanced is not easy. Its hard to define what realistic is in these grounds anyway. At same time making the AI be able to learn and not make the same mistakes.... but sometimes still do it (just as a human do).

At the end mapping the human behavior as " what we expect" might be deeper than the machine learning part. As an artist once said, " yes you can put thousand of monkeys with brushes and paint to create artwork and someday they will.. but takes an artist to know when to make the monkeys STOP"

July 17, 2017

My view is that the AI should not be what makes the pilot fly by the book. THe AI should add the human bit.. the artificial "stupidity" that we see human pilots do. Panic reactions.. things like that.

The fly by the book, I agree is "easy" to teach. Hard is to know when you make an AI that realistically adds human level of "stupidity" at moments. Make a mistake generating machine be well balanced is not easy. Its hard to define what realistic is in these grounds anyway. At same time making the AI be able to learn and not make the same mistakes.... but sometimes still do it (just as a human do).

At the end mapping the human behavior as " what we expect" might be deeper than the machine learning part. As an artist once said, " yes you can put thousand of monkeys with brushes and paint to create artwork and someday they will.. but takes an artist to know when to make the monkeys STOP"

Your comment became so distant from anything actually implementable or interpretable that I do not know what to comment other than to say that the closest thing that we know that could come to human behaviour right now is reinforcement learning - agent that you have no idea what it learned exactly, you cannot predict how it is going to react in every single situation, and agent that you hope has learned a higher-level representation of the problem than just a simple if-else statements hard-coded AI uses.

It feels to me a lot of people want the AI to magically do everything they feel fits their criteria of a perfect but beatable opponent in their scenario. It is best achieved by years of direct AI development, not machine learning.

July 17, 2017

Your comment became so distant from anything actually implementable or interpretable that I do not know what to comment other than to say that the closest thing that we know that could come to human behaviour right now is reinforcement learning - agent that you have no idea what it learned exactly, you cannot predict how it is going to react in every single situation, and agent that you hope has learned a higher-level representation of the problem than just a simple if-else statements hard-coded AI uses.

It feels to me a lot of people want the AI to magically do everything they feel fits their criteria of a perfect but beatable opponent in their scenario. It is best achieved by years of direct AI development, not machine learning.

That is why my first post was "you cannot".. simply because you cannot expect to achieve all that everyone wants just by "strapping" some machine learning. I expect in future we might have interesting usages of machine learning in games, but for some tasks (At least TODAY) it might not be the best way. It is a lot of work to make an AI learn to fly ( if you present the AI with just controls of a physics model of a plane), and you can already do that in a more direct "classical way". ML becomes interesting only if you can use it to present something that you could not have made with "classical game AI".

A simple example is if you go back to the old Il2 game. If you pursued a Russian plane it would always climb in a long climb turn until it reached a height that its performance would be inferior to the pursuer. At that exact moment it would do a split s. Meanwhile, if you ever shot at the plane and you undershot the AI would not react. If you sent any amount of bullet overshooting the AI it would ALWAYS revert the turn and drop a few hundred meters... easily exploitable.

On the opposite scenario, if the AI was pursuing you you knew it would pursue you EXACTLY until its supercharger gear changed... easily exploitable.

ML could be used not to teach the AI how to fly ( well it can, but that is not where I think it would gain us the most) but to make the AI learn that you always do this or that trick and start to change their patterns. The ML although would still need to behave badly sometimes, because we expect humans to make mistakes ( most air kill are due to a mistake of the victim) . Correct me if I am wrong, but I think that is a scenario easier to reach in a foreseeable future than a complete AI learned tactical flying.

Sorry if something is hard to comprehend on my post, but non native language at 5 am is sometimes hard.

July 17, 2017

Sigh... The fact that you can do something in a more classical way does not make it better... ML meets this argument in every single development that has been made. Computer vision specifically - even nowadays there is a hard line of software engineers who are stuck with the old idea of hand-crafted features for object detection, and they swear by it. However deep learning has long surpassed the capabilities of hand-crafted detection algorithms.

My goal of spending time to describe the current state of the art was to show that it would indeed be possible to implement ML agent capable to fly and dogfight. It would be human-like in its ability to possibly invent new tactics and perform flexibly/unpredictably during the dogfight. To me that is achievable and would represent a pinnacle of AI in flight sims - to the point of removing the MP component to the players who seek challenge and the standard AI does not satisfy them.

The mixture of misconceptions or obvious "problems" that were raised in this thread I have addressed to my best ability.

ML could be used not to teach the AI how to fly ( well it can, but that is not where I think it would gain us the most) but to make the AI learn that you always do this or that trick and start to change their patterns. The ML although would still need to behave badly sometimes, because we expect humans to make mistakes ( most air kill are due to a mistake of the victim) . Correct me if I am wrong, but I think that is a scenario easier to reach in a foreseeable future than a complete AI learned tactical flying.

This is like asking a painter to paint with 16 predetermined sprites. It is possible and it is actually in one of the posts I have previously mentioned, but it restricts the agent to the point where it, again, becomes easily predictable and exploitable. "Oh he is going for an immelman".

July 17, 2017

Sigh... The fact that you can do something in a more classical way does not make it better... ML meets this argument in every single development that has been made. Computer vision specifically - even nowadays there is a hard line of software engineers who are stuck with the old idea of hand-crafted features for object detection, and they swear by it. However deep learning has long surpassed the capabilities of hand-crafted detection algorithms.

My goal of spending time to describe the current state of the art was to show that it would indeed be possible to implement ML agent capable to fly and dogfight. It would be human-like in its ability to possibly invent new tactics and perform flexibly/unpredictably during the dogfight. To me that is achievable and would represent a pinnacle of AI in flight sims - to the point of removing the MP component to the players who seek challenge and the standard AI does not satisfy them.

The mixture of misconceptions or obvious "problems" that were raised in this thread I have addressed to my best ability.

This is like asking a painter to paint with 16 predetermined sprites. It is possible and it is actually in one of the posts I have previously mentioned, but it restricts the agent to the point where it, again, becomes easily predictable and exploitable. "Oh he is going for an immelman".

from the way you put your statements it seems you are thinking from a research point of view alone. In real world scenarios, specially in small companies, if you can do it in another way that is cheaper it IS the correct way!

Less effort with good enough result always wins when you are talking about companies with small budgets.

While you seems to know a lot about IA, I would dare to ask, since you toned your answers with a subtle anger towards other people answers,.. how many products have you worked on since their theoretical inception up to market delivery? You sound a lot like academic person. Safety always wins on the small company scenarios. If you have a way that works and is cheap and you can do a small increment design or you can go full in a cutting edge way with revolutionary techniques... you never ever chose the second option (unless your company can afford "throwaway money projects". Another sim already failed hard because it aimed too high in recent past...

July 17, 2017

Just caught up on this thread. Really enjoyed jaffacakes posts, fascinating stuff. I wonder though from a gameplay perspective how difficulty levels might work though.

After the AI has learned how to defeat scharfi what chance do the rest of us mere mortals have

How could the devs apply machine learning to improving AI?

Recommended Posts

FlyingNutcase

Feathered_IV

unreasonable

Bearfoot

19//Moach

JaffaCake

FlyingNutcase

JaffaCake

JaffaCake

Jade_Monkey

FlyingNutcase

Rjel

BlitzPig_EL

Rjel

BlitzPig_EL

Lusekofte

=WH=PangolinWranglin

Rjel

JaffaCake

Rjel

hames123

VeryOldMan

curiousGamblerr

VeryOldMan

Livai

PatrickAWlson

Lusekofte

Yakdriver

sniperton

JaffaCake

unreasonable

VeryOldMan

JaffaCake

VeryOldMan

VeryOldMan

JaffaCake

VeryOldMan

JaffaCake

VeryOldMan

Herne

Please sign in to comment

Browse All

Activity