I’m sure there’s something more important that can be written about, but sometimes it’s nice to just let it out and put rant to paper. Well, I’m not sure “rant” is the right word for this, but it will hopefully provide some insight as to how frustrating benchmarking can be when things don’t go according to plan. Especially when what’s wrong could be one of any number of things.
Earlier this week, I wrote about work that’s being done in the lab to automate our benchmarks suites, which at this point focuses mostly on workstation GPU and CPU. Gaming GPUs will be automated to some degree, but no kind of testing is as time-consuming as it is for WS GPU and CPU. Our WS GPU suite takes about 8 hours to run on an RTX 2080 Ti, which assumes there were no hiccups along the way. Manually, it could effectively take all day. The CPU suite is likely to take even longer when it’s complete, but if it’s automated, I’m not that concerned.
What is concerning, though, is running into issues that are truly baffling.
Our Intel workstation rig, albeit with a different motherboard today
On the Intel workstation rig, sporting the Core i9-7980XE, I’d regularly blue screen when running the Adobe Premiere Pro test. Right off the bat, the issue made no sense to me, since the entire process of installation and every other benchmark completed without issue. But this crash happened on TITAN Xp, Quadro P6000, P5000, P4000 – you get the drill. The only card it didn’t happen on (up to this point – the OS will be reinstalled when moving to AMD) was P2000, which made me think that it had to do only with higher-end GPUs. But… the P4000 isn’t exactly top-end, and it did it, too.
On another PC, the same GPUs didn’t crash Premiere Pro at all – and let me stress, PP was the only test or operation that caused a BSOD, and it happened 75% of the time if it was going to happen at all. I even tested different memory, and I came close to testing a different SSD, even though it made no possible sense that it’d be the problem.
Techgage‘s Tom Roeder suggested I reseat the NVMe SSD, even though that seemed like an unlikely fix. But, I did it, and it made no difference. At this point, I felt like the SSD had gone bad, so I installed a SATA-based SSD instead, and then had an idea. I took the NVMe SSD and put it in a PCIe add-in card, and plugged it in that way. As it turns out, this was the fix. I just didn’t realize a reseat would mean the SSD would be moved entirely.
Our poor Kingston KC1000 NVMe SSD that endured a few dozen BSODs
After I moved the NVMe drive from the internal slot into this add-in card, the BSOD problem disappeared. I went from crashing 75% of the time on most cards to 0% of the time on all cards. Why the issue affected only a single operation on the PC, I’m really not sure, but given I am not dealing with the issue any longer, I can’t help but believe that was the problem.
On the motherboard I’m using, ASUS’ ROG STRIX X299-E GAMING, the M.2 slot at the bottom of the board sits under a large heatsink. Long story short, I am pretty sure now that the SSD was never as secure as it should have been due to the fact that the heatsink’s thermal pad was likely pulling on the drive. It’s a challenging area to install a drive in, but I’m just glad my second option fixed the issue entirely. I was prepared to replace an SSD, but am now happy that I don’t have to. I guess it’s important to never underestimate the power of reseating.
So there you have it, a mini-rant about one of the many mind-numbing complications that can happen in the lab. With so much stuff on tap, including i9-9900K, Radeon Pro WX 8200, GeForce RTX 2070, and others I don’t want to talk about quite yet, I am hoping that these nonsense issues keep away for a little bit.