To deliver a full-featured article for launch, my look at AMD’s Ryzen Threadripper 2990WX and 2950X combined Windows and Linux performance in the same article. As it turns out, that was a mistake, since few people noticed we even had Linux benchmarks, despite there being an obvious demand for them.
Before publication, I debated on whether or not I should break Linux performance into its own article, but in this particular case, I opted for the combo because I felt the bigger picture was needed. That’s because in Windows, performance scaling on such a big CPU is hit-or-miss, whereas the Linux kernel seems to support AMD’s biggest no problem.
I am not going to stand here (or sit) and pretend to understand why the 2990WX doesn’t perform so well in all Windows tests, because getting a clear answer out of anyone is tough. No one wants to pass around the blame, but by all appearances, it looks like a bulk of the problem is Windows. This article exists to not only draw attention to that, but also highlight a bit better what the 2990WX is capable of – if the software in question can take advantage of it.
Instead of copying and pasting previous Linux results into this new article, I retested both the 2990WX and Intel’s Core i9-7980XE with a newer kernel (4.18.5, vs. 4.15.0), and with some extra tests added on. If you want to see more in-depth results involving six other CPUs, I’d recommend checking out the 2990WX launch article.
Almost all of the Linux benchmarking I perform is accomplished through the use of the Phoronix Test Suite, which makes it almost too easy to generate lots of useful results really quickly (well, depending on the hardware, of course). In contrast to the launch article, this one has 13 results total (from 8), representing a few more angles of where a 64-thread CPU can shine if given the appropriate opportunity.
Before testing, I run this command as sudo to enforce the performance power profile:
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
As I mentioned in the 2990WX launch article, running this command on the 2990WX machine kicked back an error saying the file didn’t exist. Interestingly (or is it?), that error no longer exists on the newer kernel, so that’s a plus. At the same time, performance hasn’t changed much between 4.18.5 and 4.15, so the error clearly didn’t mean too much.
I admit that before testing, I expected the 2990WX to clobber the top-end of Intel’s stack, but that doesn’t happen. With the ImageMagick compile, Intel matches itself up almost perfectly with the AMD chip that has almost twice the number of cores. The kernel compile test fares a lot better for the 2990WX, but even still, I feel like the performance could scale even better with the right software.
One of the most interesting uses of high-performance hardware is rendering, and believe me, there’s no shortage of renderers that can take advantage of every single core you can give them (either CPU or GPU). With our Blender test, the Cycles renderer is used on only the CPU, and when that’s the case, AMD’s many cores helps it overtake Intel’s top-end desktop chip – and pretty easily.
Blender is optimized quite well for these many core workloads, and seeing this performance makes me even more excited for version 2.80 of the suite, since that will introduce heterogeneous rendering (CPU+GPU) capabilities. Based on our previous V-Ray testing, that design could result in a dramatic uptick in rendering performance.
Despite the redundancy, I tested using HandBrake both in Linux and Windows for the launch article, because it’s always interesting (to me) to see the differences in performance for the same test across two different OSes. Going that route turned out to be a blessing, because while I used one version in Linux, I used a different, newer one in Windows – and that version had big issues with Zen.
That version is 1.1.1, which remains the current stable version available on HandBrake’s official website. I used 1.1.1 for Windows testing, which resulted in Zen-based chips taking twice as long for each encode, whereas 1.1.0 in Linux had no such issue, and gave us decent scaling. As it turns out, even newer builds of HandBrake in testing deliver even more performance boosts, and not just for Zen.
For x264, performance between 1.1.0 and the nightly is the same for both AMD and Intel. x265 shakes things up quite a bit, though, delivering improvements on both chips, although Intel doesn’t gain nearly as much as AMD here. The difference is simply incredible, so it goes without saying: if you’re a HandBrake user taking advantage of H.265, you’ll want to grab the latest nightly.
Rendering and ray tracing are two peas in a pod, so it’s no surprise to see some great scaling between the two CPUs here in these tests. I should note that none of these ray tracers are deemed “current”, but some have gone on to be reiterated in newer renderers. That doesn’t mean that the performance from these tests are useless, though, because like any good renderer worth its weight in bytes, ray tracers are built to scale, and all three of these tests do that extremely well.
It’s hard to ignore the fact that AMD simply clobbers Intel in the smallpt test, which isn’t unusual ever since Zen dropped last spring. I am not sure how that particular test reflects performance in today’s landscape, but the safer scaling to look at is with Tachyon or C-ray, though as Blender has shown, perfect synthetic scaling doesn’t always lead to perfect real-world scaling. Still, the scaling ability here is clear.
Scaling is the name of the game with scientific tests, so this Rodinia set takes advantage of the 2990WX’s 64 threads no problem. The scaling is better with the LavaMD test than the solver, but a 33% gain in performance at the low-end is hardly anything to balk at.
AMD is great at cryptography, and that’s easily proven here. That said, this is a test Intel typically cuts through like a hot knife through butter. However, the sheer performance on tap offered by AMD pegs it as the definite winner. With a result like this, it’s clear that Intel could compete extremely well if it decided to break through its 18-core barrier on the desktop (we’re still waiting for that 28-core shown-off at Computex).
This set of results might be the most interesting in the article. In the launch review, the 2990WX beat out the Intel chip by 2,000 MIPS, but with the kernel upgrade, the performance on AMD has actually dropped. I had to sanity check this one, and it stuck. For some reason, 4.15 provides better performance for this test. That’s not a great reason to use an older kernel, and it could be that the next kernel will fix it once again. This isn’t the last time I’ll be testing the 2990WX, so I’ll test again once 4.20 (or 5.0, if that’s skipped) drops.
For interest’s sake, the same compression test in Windows hits just 55K MIPS on the 2990WX, so even with the performance drop, it still comes far ahead of that OS’ performance here.
This result requires a bit of explanation, because it’s fair to pass it off as definitive. AMD beats Intel by about 10% here, which is notable, but it’s also because AVX-512 wasn’t introduced. In Windows, SiSoftware’s Sandra will use the most appropriate instruction set, and so on certain CPUs and tests, Intel might come ahead. Only the top-end of Intel’s stack has this, though, so for a fuller look, I’d recommend hitting up the launch article.
These results also don’t highlight an issue with the 2990WX’s memory design to begin with. Overall bandwidth is fine, but latencies are what makes the chip not so ideal for non-intensive purposes – like gaming. Due to time, I have not explored memory on the 2990WX as much as I’d like, and it might be a little bit of time before I can hop on it due to other content needing attention. Ultimately, the 2990WX delivers good bandwidth, but this result paints only part of the picture. That’s why it really pays to know your workload: you shouldn’t jump on a 2990WX over a chip like the 2950X or 7960X/7980XE unless you know your workload.
For this set of results, I decided to opt for relative performance since it was easier to keep them all to the same graph. In the case of Darktable and Hackbench, the results are in seconds, and lower is better. The others have separate values, but are always higher is better.
With Darktable, which is an Adobe Lightroom clone, and IndigoBench, a renderer, the overall gains are minor, but still notable. With Darktable, Intel’s multimedia prowess put it super-close to AMD’s much larger chip. Things change quite dramatically with the other tests, though.
With HPC challenge, the raw compute power of the 2990WX helps it soar to gain 55% over Intel’s chip. With HackBench, a benchmark for system calls, AMD again performs extremely well. Even the 44% gain in the chess engine test is impressive. I can’t really see too many chess fans integrating a 2990WX into their play, but how awesome is it that it would actually be taken advantage of?
As you can probably tell, AMD’s 32-core Ryzen Threadripper 2990WX can kick some serious ass in some tests, and still impress in the others. You might have noticed a lack of single-precision tests here, and that’s because I was focusing on multi-threaded tests to show what happens when the going gets tough. It should be a secret to no one that the single-threaded performance on a 32-core processor is not going to be close to market-leading (single-threaded Windows tests were published in the launch article). That’s not going to impact regular use, but the fact is: you’d be seeing lower performance in scenarios that hinge a lot on single-threaded performance.
This is the perfect example of a product that highlights how important it is to know your workload. There are some applications that just don’t scale that well, even if they appear to. I didn’t have that issue in Linux that I can recall, but did in Windows with a few tests, where even though 100% of the CPU was being used, the gain over half the threads was barely any different.
With HandBrake here in Linux, not all of the threads will be used, so the 64-core chip doesn’t show off its strength as well as it maybe could. To get around that, I could have run the FFMPEG test that generates an FPS result, but I am not quite sure how that benchmark is relevant to the real-world, as I’ve been unable to see the same kind of scaling elsewhere. The same could also be said about Darktable; the benchmark will use the full CPU, but I’ve never managed to get more than half of the 2990WX’s threads used from real-world testing (but that’s not to say it’s not possible; I’d love feedback.)
If this article doesn’t cover the type of performance you were after, please leave a comment and I’ll see if I can tackle it the next time the machine is hooked up (both the 7980XE and 2990WX are in their own dedicated PC, so they are generally handy). You may wish to also check the 2990WX launch article even though it largely focused around Windows, as general performance may still carry over to your Tux solution.
There’s more 2990WX testing to come in time, but other content and launches are going prevent me from jumping on it too much in the near future, but suggestions of relevant tests are appreciated, as is any feedback in general.