Jump to content

Pointers in Investigating Random Crashes on a New Build


Recommended Posts

Posted

Oooookey. How does one open windows minidump files? 
 

My new build is crashing randomly in my racing sim. Sometimes I can run two AI races with 24 ai cars back to back, and sometimes it crashes anywhere from 2 laps into it, or while saving a replay after a 14 lap run in offline testing.

 

Not sure if I’m running into a thermal related issue, or a VR related issue, or (the horror) a hardware issue. 

 

GPU-Z log shows GPU temp of 63, with a GPU memory temp of 84 at the time of crash, but I don’t have a log of cpu temperature at the time of crash, but while running my driving sim, it is usually at 53-55 C, 28-33 during idle.

 

I ran the CPU stress test in CPU-Z, and it pegged up to 88 C within a minute of the test’s start and I ran it 5-6 minutes before I stopped. 

 

I tried IL-2 yesterday, and didn’t have an issue, but it might just be coincidental since I’ve ran my driving sim with a 5-to-1 ratio compared to il-2. 
 

I’m hoping it has something to do with VR since the monitor blacks out for a second when I start my G2 (which others reported W11 started doing on their systems very recently)


It’s a brand new install of W11 with:

13900KF with z790 board, no OC, 360mm AIO 

eVGA hybrid 3090,

32GB 3600 D4 RAM running with XMP profile I,

NVMe drives (2)

 

WMR and OpenXR on Reverb G2. iL-2 is also on OpenXR. No SteamVR/WMR for SteamVR installed at this time. W11 is up to date with all updates. 
 

Any pointers in investigating is appreciated 

RAAF492SQNOz_Steve
Posted

I would recommend running a stress test on the RAM for several hours to see if it is stable.

Not all RAM chips actually run at advertised speeds.

I have used memtest86 in the past but many other test routines exist.

  • Upvote 1
Posted

Thanks I’ll try that, but memory sticks have been running in my previous PC for quite some time without any issues.  
 

I’ll check the mini dump files tonight but it looks like it might be the bios update I forgot Precision X1 forced when I first installed it. 

Posted

First thing I would try is removing the Mod to run Open XR and see how it behaves in Steam VR with no mods.

  • Like 1
Posted
Just now, dburne said:

First thing I would try is removing the Mod to run Open XR and see how it behaves in Steam VR with no mods.

iRacing has native OpenXR implementation though.  

Posted (edited)

Do you have the latest ME firmware, MEI drivers and chipset software installed? Device manager showing anything unidentified?

 

I run my p-cores/e cores in adaptive mode, with the load line calibration with the most v-droop, and with c-states on. Also have tvb voltage optimizations on. This lowers the temps quite a bit..

Edited by DBCOOPER011
  • Thanks 1
Posted
1 hour ago, DBCOOPER011 said:

Do you have the latest ME firmware, MEI drivers and chipset software installed? Device manager showing anything unidentified?

 

I run my p-cores/e cores in adaptive mode, with the load line calibration with the most v-droop, and with c-states on. Also have tvb voltage optimizations on. This lowers the temps quite a bit..

I do have the ME firmware, MEI drivers and chipset updates done. 
 

I may have found what the issue might be though. I was checking in the case to see if there was anything obvious I may have missed before and there was. I was thinking maybe a memory module not fully seated, but it’s the case fans. More like the flow direction of the case fans. I have 9 fans in the case, 5 total on the AIOs, 4 more and I guess I installed all of them as exhaust fans ?
 

Just to check, I took the side panel off and ran everything just as normal. Not only did it not crash, but when I finished, GPU, hot spot and GPU memory temperatures were about 9C lower than before. I didn’t have much time today to investigate further, but I’ll continue like this a couple of days and on the weekend swap case fans’ flow direction. 

Posted
On 12/6/2022 at 3:42 AM, kissTheSky said:

I have 9 fans in the case, 5 total on the AIOs, 4 more and I guess I installed all of them as exhaust fans ?.

Good to see you indentified this. In and out air flow has to be balanced. Another option would be to flip the 5 AOIs, so GPU and CPU will take always fresh air, but RAM and Mobo will take warmer air. I really don´t know what is best.

 

I am sharing here my experience with PSUs just in case the issue is not solved:  

 

  • Like 1
Posted (edited)

Hi, there is a school of thought for having more intake fans than exhaust which saturates the case with cool air and forces the hot air out. It may not work so well with Aio cooling but I get good results with my air cooled system.

 

My system is all air cooled 6 case fans plus cpu fan with 4 intakes , 1 in the bottom 3 in the front and two exhaust 1 rear and 1 top. I have a 5800x3d and rtx4080 under load the cpu is around 55 degrees and the gpu is around 45 degrees even with 2850mhz boost.

 

I also use Argus monitor to control fan speed curves  etc https://www.argusmonitor.com/?language=en

Edited by shirazjohn
  • Like 1
Posted

Thanks @chiliwili69, and @shirazjohn
 

The PSU is the same 1000W that I’ve been using before, so I know it’s good. The new things to this build are the case, mobo, cpu, cpu AIO, nVME drives and case fans. PSU, ram, GPU are carryovers. 
 

I’ll continue looking into it. It may not just be the ventilation. I appreciate the pointers. 

Posted (edited)

It might also be worth looking in your Windows event log filtered by WHEA (Windows Hardware Error Architecture):

image.png.279d27b3849e66bdf2a0b6ee44aaa170.png

 

Any errors there point to hardware issues - usually Event ID 1 is RAM-related and 18 is CPU related. You can find details in entries of the latter to point you to which core caused the error.

 

If that ends up being the case, I believe you can use tools like core cycler (which was primarily created as an easy way to tune AMD processors but has nothing AMD-specific to it - it basically just runs Prime95 scenarios on a single core at a time) to stress each core separately as you adjust its frequency and voltage.

 

On a similar note a very good stress test utility for RAM is testmem5, specifically some of its scenarios are quite demanding.

 

Edited by firdimigdi
  • Like 1
Posted

Thanks again guys. 
 

out of frustration, I’ve re-installed all drivers including chipset drivers and this time it threw an error for “missing RAID drivers” on the device manager. A redownload of Intel RSMT driver installed without issues and it may have been what’s causing the crashes. 
 

Weird thing is, it didn’t show up as a “missing driver” before, but when I was first installing it, I stopped bloatware (Intel octane software) installation halfway thru after the driver installation, so it may have somehow corrupted the install. 
 

I’ll run it like this till the weekend, and if everything is still running ok, I’ll swap fan directions. Haven’t decided if I’ll reverse the AIOs or the case fans, but will be reversing some nonetheless. 

  • Like 1
Posted (edited)
On 12/7/2022 at 10:31 AM, shirazjohn said:

there is a school of thought for having more intake fans than exhaust which saturates the case with cool air and forces the hot air out.

 

All this is about heat transfer. All the laws governing that are well known. But there are many factors to take into account (Air coolers vs. AOI coolers), sizes and topology, number of fans and airflows, CPU model and load, GPU model and load, PSU design, RAM heat, Mobo heat, ambient temperature, max temp for CPU, GPU, RAM, max fan noise.... etc

 

So it is difficult to have a general rule for all. Only a Computational Fluid Dynamics (CFD) simulation would detarmine the optimum configuration for every case:

 

 

There are tons of literature over the internet about that, and as everything on internet they could be very true or very wrong.

 

But having all fans pulling or all fans pushing is definetely wrong.

 

As a general rule, I would say that the best configuration is the one which minimize the residence time of the air inside the case. And this is normally achieved with the same total forced airflow in than total forced airflow out. Being "total forced airflow" calculated as number of fans per its airflow capacity.

 

I am really missing a software application which would allow to configure your own designs and to do a true CFD to study the best configuration of cooling. (and  destroy myths and false theories).

Edited by chiliwili69
  • Like 1
Posted
5 minutes ago, chiliwili69 said:

All these is about heat transfer. All the laws governing that are well known. But there are many factors to take into account (Air coolers vs. AOI coolers), sizes and topology, number of fans and airflows, CPU model and load, GPU model and load, PSU design, RAM heat, Mobo heat, ambient temperature, max temp for CPU, GPU, RAM, max fan noise.... etc

I totally agree with you, since i built my system in 2019 i have tried many different configurations for the cooling (the trial and error approach!). My case is only of average specification (corsair carbide series 200R ) and as i don't have any physical drive's therefore decided to remove all the metal work from the inside to allow better airflow and fit a case fan in the 5.25 drive bay.

 

I'm not to worried about fan noise whilst gaming as I'm in a room away from everybody else and always use headphones or the g2 off ear speakers which cancels out the noise so quite happy for the fans to run near 100% but when using the Pc for everyday use a good fan curve profile keeps it quiet.

 

The current setup works quite well for me but as you say most PC's are different so alot of experimenting and testing is required.

Posted
On 12/5/2022 at 10:58 PM, RAAF492SQNOz_Steve said:

Not all RAM chips actually run at advertised speeds.

 

With my DDR5 5600 chips the system crashes if set above 4600 mHz.

 

==========================================================================

 

I have built 2 systems now each with a corsair water cooler. Perfect regarding temperatures.

The latest models need a hardwired Icue controller breakout box, i don't need all those showtime shit lightning.

It's a real downer, as you cannot bypass it and use your motherboard's system.

The older versions were better. Far too many wires make a mess of my PC's internals, looks like a Japanese city now:

 

367319411_--CityPowercables.jpg.8771854cc946fa048652fa674c50abc0.jpg

 

Posted
27 minutes ago, jollyjack said:

 

With my DDR5 5600 chips the system crashes if set above 4600 mHz.

 

==========================================================================

 

I have built 2 systems now each with a corsair water cooler. Perfect regarding temperatures.

The latest models need a hardwired Icue controller breakout box, i don't need all those showtime shit lightning.

It's a real downer, as you cannot bypass it and use your motherboard's system.

The older versions were better. Far too many wires make a mess of my PC's internals, looks like a Japanese city now:

 

367319411_--CityPowercables.jpg.8771854cc946fa048652fa674c50abc0.jpg

 

Heeey!? Where did you get a photo of inside of my case? ?

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...