Airfield attack DM tests.

January 1, 2020

With the aniversary of the Bodenplatte attacks upon us, I wanted to find out which of the BoBP fighters is best for the ability to survive in a ground attack role.

To do this I used my Airfield Defence mission, in which 46 aircraft in a series of waves fly over an airfield defended by eight LAA guns, circling once to make a second pass before flying off. This is
not an exact model of the actual Bodenplatte attacks: just a scenario in which a large number of aircraft attack a defended target at low level. All runs done on 4.002 or later. AFAIK there have been
no relevant DM changes since 4.002 Each plane group was run through the mission thirty times.

For the guns I used the German 20mm Flak38. With a lighter shell the 20mm gun gives more granular results than the heavier guns where one hit tends to equal one kill. LAA AI was "Normal".

For the planes I used the P-51, P-47, Bf 109K, Fw190 A8, Spitfire IX, Tempest V from the standard BoBP set, all with 60% fuel and no external ordnance or mods. The waypoints are almost identical: I
had to increase their size for some runs as not all aircraft could handle the original test mission.

After the guns had finished firing I counted the planes that were still flying in control, and then the number that were:

"Heavily Damaged" - which I took to be any streaming fuel, coolant or smoke significantly heavier than that normally seen from the exhausts
"Lightly Damaged" - those showing superficial holes, or missing parts.

I then categorized the planes as:

"Downed" = not making it in control to the rally
"Lost"= downed and heavily damaged planes. I am assuming that the heavily damaged planes would be unable to RTB.
"Hit" = any aircraft downed or damaged.

While I did not count damage to specific areas, or types of damage in more detail than the categories outlined, I will make a few observations later.

RESULTS

From a pilot's perspective:

If you are RAF, fly a Tempest rather than a Spitfire
If you are USAAF, fly a P-51 rather than a P-47
If you are GAF, fly a Fw190 vs a Bf109

Overall best is the Fw190 A8, worst is a toss up between P-47 and Bf109K

The first point to notice about the results of each run is that they are extremely variable, so to get a good average it was necessary to do a lot of runs. The smaller an effect, the more runs you
need, so the results for Hits are a more reliable average than for Lost, and least reliable for Downed.

The survivability for the pilot is a function of the probability of getting hit and the consequences of getting hit.

So first the probability of getting hit; this is very strongly correlated with the size of the target. I counted aircraft hit, not hits, as often planes are hit more than once, but a rough estimate of the number of hits can be calculated using the Hits/Planes ratio: this shows a closer correlation with size.

The main outlier here was the P-51 which was getting hit a little less than I expected: this could be due some combination of it flying the waypoints a little more accurately than the others, or possibly that a few hits did not register damage decals, or I did not see them: or just natural variability even with a very large sample.

I measured size using the viewer to get a side, plan and tail view, counting pixels in GIMP, then equally weighting them as an index.

I constructed an index for durability based on size and empty weight, without taking into account specific features such as radial/in line engines. Looking at the results of hits we can see much more variation and less of a clear pattern.

The Fw190 A8 is both small and tough, the best combination. The Tempest appears to be significantly tougher than the P-47. The P-51 "Lost" number includes a large proportion of planes losing fuel from a wing tank. A real P-51 would probably be able to isolate the leak and RTB (I think?), so the "Lost" figure might be an overestimate.

A FEW OBSERVATIONS

Gunner AI has had a couple of changes (Update 4.001) which look subtle but have a fairly large impact on LAA firing at low fast movers.

73. Simple ground vehicles have realistic gun aiming speeds;
75. Simple AI tanks and guns have a delay between initial aiming and opening fire and between destroying a target and engaging another;

They now very rarely get the "grouse shoot" quick swivel and short range hit. They also keep firing at a target moving away and are slow to pick up on another target moving towards them. The result in the tests is that apart from the first wave almost all of the hits came from behind the wing line, usually from only slightly below.

PKs or PStuns are rare, and whenever I paused to examine a plane suffering from these, the damage was close to the cockpit. So these tests give no support to the theory that distant HE hits are
causing PKs or PSs

I do not remember a single occasion in which a plane had a leaking radiator, even in planes that had visible damage in the radiator area. Possible reasons:
- The radiator area is quite small, especially from the aspect at which the hits occurred, so they were not hit
- Radiators are tougher in RL than we think
- Oddity in BoX DM treatment of radial vs in-line

P-47s are very vulnerable to hits from behind that cause the "fuel/oil leak" graphic, ie green/yellow stuff emerging from the rear fuselage, leading to engine failure, which I assume is from a
splinter hit to the supercharger/turbo oil system. That is addition to the vulnerability to hits to the engine area, documented previously.

[Edit: I have no idea why text that is correctly formatted in the entry area for a post gets messed up when posted.]

Edited January 1, 2020 by unreasonable

January 2, 2020

Thanks for taking your time, this must have been a lot of work.

Could you include a measure of variability as well in order to have an estimate whether the observed differences are statistically significant? Standard deviation, standard error of the mean, and CV (%) would work. As an alternative, you could plot box plot diagrams (can be done with Excel 2016). These include mean, median, as well as measures of the data distribution / variability.

January 2, 2020

Wow, that's a lot of work.

Personally, what I find most interesting even though it's probably the most obvious thing - size matters. Thanks for providing a empirical result to confirm this.

It also looks to me as if the P-47 really is as screwed as subjective observations posted by a number of community members suggest.

Thank you for doing all this testing and counting and the presentation.

January 2, 2020

@JtD Thanks! It did take a long time but I did it bit by bit and it is oddly relaxing.... having been in the business of trying to make some kind of empirical sense of highly complex systems all my life I find it hard not to do it. But I do not think there is anything more I can do on DMs for the foreseeable future.

The RAF pair were more or less in line with my expectations as were the German: the US pair are the oddities and it is hard to say exactly why. It is very noticeable that the lightly damaged P-51s usually only have a light sprinkling of damage decals, where everyone else gets the big holes, so it really does seem to be structurally very tough. (I have no idea if it should be). I think the P-47 is suffering from having more vulnerable internal parts in the fuselage: if splinter damage is over the top, it is going to suffer those critical hits.

I am a bit puzzled by the lack of obvious radiator hits.

3 hours ago, JG27_PapaFly said:

Thanks for taking your time, this must have been a lot of work.

Could you include a measure of variability as well in order to have an estimate whether the observed differences are statistically significant? Standard deviation, standard error of the mean, and CV (%) would work. As an alternative, you could plot box plot diagrams (can be done with Excel 2016). These include mean, median, as well as measures of the data distribution / variability.

Essentially you are sampling for a binomial with each category. Each aircraft is either hit or not and whether each aircraft is hit is statistically independent (between tests absolutely, in each test almost). So the best measure is the binomial confidence limit: but there are two questions you can ask:

1) If the number of successes in your sample is the true average s/p, what is the probability of a number of successes s in a new sample of size n?

2) If the number of successes in your sample size n is s, between what two numbers are you x% confident the true average s/p lies?

I can do (1) in Open Office (I do not have excel). It looks like this:

The limits for the Hit are fairly narrow: larger effect, and a large sample. So if the test averages were true, you would expect that if you ran another 1380 Spitfires through the system their Hit/Planes ratio would only be higher than the P-47's true average 5% of the time. Or here is the expected distribution of the number of Lost Tempest's vs P-47s in graphical form. When you get to Downed the limits are wider: a much smaller effect, plus some variation in how long early vs late hit planes might take to give up the ghost, since they do two passes, makes this less reliable

The best calculator I have found for (2) limits sample size to 500 so I scaled the whole sample pro rata.

These limits are slightly wider, but a lot of that is due to the sample size being less than half.

Whatever the limits, the observed averages are the best estimate of the true average. To get better estimate than these someone would have to run the test mission more than 30 times per plane, which I do not expect to happen. (Especially if it is me! ) Alternately produce a more controlled test - this one is more of a scenario - to limit the variables. This would still need a large sample.

Edited January 2, 2020 by unreasonable

January 2, 2020

The Mustang has lethal hit zones much more spread out than the P-47 and for this it was not a popular aircraft for ground attack, much in contrast to the P-47. In the P-47 you have about the same lethal hit area as the Tempest, but the engine should be able to take more hits, and for this Tempest pilots were actually looking forward for the Tempest Mk.II, obviously assuming that a British radial featuring similar „taker qualities“ as the R-2800.

I think if you match your losses to the respective non-permissive hit areas like fuel tanks, engine, oil tank, radiator, pilot then I would guess the numbers would be even more off.

January 3, 2020

I think the P-47 is way too easily shot down, while the Mustang soaks up a lot of damage. Thanks for taking thetime testing - it very much mirrors my experience.

Would you mind running a test on the P-38, too? The amount of getting the controls smashed or structural failure seems exaggerated to me.

I have never read by either side that the airplanes were easy to de-control or easy to make fold up.

January 3, 2020

3 hours ago, Bremspropeller said:

I think the P-47 is way too easily shot down, while the Mustang soaks up a lot of damage. Thanks for taking thetime testing - it very much mirrors my experience.

Would you mind running a test on the P-38, too? The amount of getting the controls smashed or structural failure seems exaggerated to me.

I have never read by either side that the airplanes were easy to de-control or easy to make fold up.

I will see if the P-38 can fly the test mission first: if I have to start moving or changing the way points more than a little we are not really comparing like with like.

If it can then I will do a few runs: but not 30!

Edit: It can. It is also getting hit a lot, as you would expect: it is the largest of my zoo, and the tail are is especially huge. Like the P-51, hits making only light damage produce rather limited decals, so I assume this means a fairly high level of structural strength, although I have seen a few missing flaps etc. I will post results from 10 runs in a a day or two.

Edited January 3, 2020 by unreasonable

January 4, 2020

Here are results of the runs with the P38 for comparison.

Comments:

1) Size: as you can see the P-38 is getting hit a lot, although perhaps not as often as you would expect, just like the P-51. I cannot exactly compare using the same analysis as for the others, since I did not save screens shots so I am not sure exactly what foreshortening effect was in the viewer. Taking a single below, rear, side shot and pixel counting, I get a size comparison of about ~ 2 times the Spitfire. From behind it is nearly 2.5.

2) Criteria: same as for the others: on one occasion a plane had one engine out but no leaks so that counted as Light Damage. As in the P-51 there are often fuel leaks from wing cells that could be survivable in RL. Really the only way to determine if planes could RTB in game from that kind of thing is to let them run for 30 minutes ...

3) Losses: downed is the smallest number so the least reliable but it is in the middle of the pack. I did not see any structural collapses: even three 20mm hits well distributed will not do this.

Random thoughts:

I would expect that if you replaced the guns with K-61s (oh 40mm Bofors where art thou?) you would get structural collapses much more often: all I recall in these runs is a Spitfire losing an outer wing section and going OOC and a 109 losing a tail. I expect there is some sort of trade off, where against large shells you are better off being small, and against small shells you are better off being large.

From the POV of a staff officer you would have to factor in losses against the potential ordnance delivered, which I have not done.

January 5, 2020

One last chart to get this on the record: compares above results with the two planes I tested in 3.007 with the 20mm flak before the DM changes to make wings stronger and the changes in the gunner AI.

The proportion of planes hit has fallen sharply, and the consequences of being hit are much less severe. Overall, despite some reservations about particular aircraft (P-47!) and DM effects, I find the new DM/AI mix much more plausible than the old one in terms of it's overall results.

January 5, 2020

I think in this case the best comparison is made between the Bf109K-4 and G-4, with the aircraft being same size and structurally similar. It's indeed a striking difference, already in terms of hit percentage, but mostly in terms of loss percentage. 6% vs. 20% is quite a drastic change.

January 7, 2020

On 1/5/2020 at 6:09 AM, JtD said:

I think in this case the best comparison is made between the Bf109K-4 and G-4, with the aircraft being same size and structurally similar. It's indeed a striking difference, already in terms of hit percentage, but mostly in terms of loss percentage. 6% vs. 20% is quite a drastic change.

It correlates with my observation that some aircrafts, despite being similar in size, are getting hit by ai gunners way more often than others.

All in all it seems to be just another hint that the DM is in need of a serious overhaul.

January 8, 2020

7 hours ago, Operation_Ivy said:

It correlates with my observation that some aircrafts, despite being similar in size, are getting hit by ai gunners way more often than others.

All in all it seems to be just another hint that the DM is in need of a serious overhaul.

No it does not!

It shows the effects of the changes to AI in update 4.001:

73. Simple ground vehicles have realistic gun aiming speeds;
75. Simple AI tanks and guns have a delay between initial aiming and opening fire and between destroying a target and engaging another;

Watching the AI during many runs of the test mission these changes are very clear.

In the tests done with the same DM and AI, the number of planes hit correlates extremely well with the size of the aircraft. I would expect that if you counted the number of hits, the correlation would be even closer.

I changed the size index a little from the one given in the table earlier by using a single image for each plane so as to get the P-38 in on a consistent basis. There is no single definitive measure of relative "size" in this context: it depends on the angle from which shells are coming.

Edited January 8, 2020 by unreasonable

January 8, 2020

I misunderstood, that's what you get for not reading charts carefully. However, is there a reason you used the g4 instead of the k4?

It would be interesting to see if there is a difference, even though I consider it unlikely.

January 9, 2020

6 hours ago, Operation_Ivy said:

I misunderstood, that's what you get for not reading charts carefully. However, is there a reason you used the g4 instead of the k4?

It would be interesting to see if there is a difference, even though I consider it unlikely.

No problem - we have all done it, or not read the thread before replying. The G-4 was used in my tests from about 3.007, so before the K-4 was released, the later tests for this post were specifically a BoBP set comparison.

After posting the comparison, I thought it would be worth checking the G-4 in 4.002 Not enough rounds yet to have reasonably stable averages, but the planes hit percent is very close to that of the K-4 so far. The damage is a little different, with G-4s more less likely to survive a hit with only Light damage, so slightly higher Downed and Lost (due to fuel leaks). This could just be chance: I will post the results when I have completed my sample.

The K-4 was significantly heavier, but I am not an expert on 109 types: if anyone is, is the K-4 known to be structurally stronger, or in particular have better fuel tank protection?

January 9, 2020

No, it wasn't significantly different as far as I know. There are only slight differences, mostly simplifications with the K-4.

January 9, 2020

Having run the mission 20 times to get a reasonable average for the G-4 here is the updated comparison:

The % of planes hit is very close as you would expect. Double the samples and I would expect these to converge a bit more. Doing a G-4 to K-4 pixel comparison the G-4 is about 1.5% smaller on my chosen angle: slightly larger tail and prop vs no tail wheel. Given our margin of error and uncertainty about how prop hits are modeled, the same size.

Hard to say if the difference in the Downed figure is significant: on it's own perhaps not, but I do think the damage differences are real.

Finally, because I know JtD loves graphs here is the revised hit/size graph with a power fit regression and an arbitrary straight line.

January 9, 2020

Yeah, love'em. ?

Size and hit probability not being linear might have something to do with fire distribution. The region around the plane is not saturated with fire evenly.

January 9, 2020

I am sure that the total number of shells that hit would be closer to the straight line: P-38s were taking multiple hits much more often, not only from different bursts but from the same burst.

In my old P-47 tests looking at individual aircraft the hits per burst that hit looked like this:

I am sure that if I could do the same for a range of planes we would see a strong size effect. But my old plane at a time test no longer works with the new nerfed gunner AI, (which is a relief )

Edited January 9, 2020 by unreasonable

February 14, 2020

@unreasonable.

That is a really interesting test you did!

Thanks for your time!

Airfield attack DM tests.

Recommended Posts

unreasonable

JG27*PapaFly

JtD

unreasonable

ZachariasX

Bremspropeller

unreasonable

unreasonable

unreasonable

JtD

Operatsiya_Ivy

unreasonable

Operatsiya_Ivy

unreasonable

JtD

unreasonable

JtD

unreasonable

JG_deserteagle540

Please sign in to comment

Browse All