Date: December 23, 2020
Author(s): Rob Williams
With Blender 2.91 recently released, as well as a fresh crop of hardware from both AMD and NVIDIA, we’re tackling performance from many different angles here. On tap is rendering with the CPU, GPU, and CPU+GPU, as well as viewport – with wireframe testing making a rare appearance for important reasons. Let’s dig in.
The fourth and final major Blender release of 2020 dropped late last month, and as usual, we couldn’t waste too much time getting around to testing it out. As has become standard fare, unfortunately, we ran into a few issues that complicated our progress, but that has led to there being more stuff to talk about this time around.
As before, this article is going to tackle rendering with GPUs, CPUs, OptiX for NVIDIA GPUs, as well as heterogeneous rendering, combining the forces of CPUs with NVIDIA’s GeForce RTX 3070. On the second page, we’ll be poring over viewport performance, which is a lot more interesting in 2.91 than it has been in previous versions – leading us to even include wireframe performance for the first time.
For a look at all that the Blender 2.91 release brings to the table, we’d suggest heading over to the detailed release page. You can also refer to the even more in-depth release notes. The next time we’ll be revisiting Blender for in-depth testing will be with 2.92, due to release at the end of February.
Here’s a full list of all of the CPUs and GPUs we tested for this article:
|CPUs & GPUs Tested in Blender 2.91|
|AMD Ryzen Threadripper 3990X (64-core; 2.9 GHz; $3,990)
AMD Ryzen Threadripper 3970X (32-core; 3.7 GHz; $1,999)
AMD Ryzen Threadripper 3960X (24-core; 3.8 GHz; $1,399)
AMD Ryzen 9 5950X (16-core; 3.4 GHz; $799)
AMD Ryzen 9 5900X (12-core; 3.7 GHz; $549)
AMD Ryzen 7 5800X (8-core; 3.8 GHz; $449)
AMD Ryzen 5 5600X (6-core; 3.7 GHz; $299)
Intel Core i9-10980XE (18-core, 3.0 GHz; $999)
Intel Core i9-10900K (10-core; 3.7 GHz; $499)
Intel Core i5-10600K (6-core; 3.8 GHz; $269)
|AMD Radeon RX 6900 XT (16GB; $999)
AMD Radeon RX 6800 XT (16GB; $649)
AMD Radeon RX 6800 (16GB; $579)
AMD Radeon RX 5700 XT (8GB; $399)
AMD Radeon RX 5700 (8GB; $349)
AMD Radeon RX 5600 XT (6GB; $279)
AMD Radeon RX 5500 XT (8GB; $199)
NVIDIA RTX 3090 (24GB, $1,499)
NVIDIA RTX 3080 (10GB, $699)
NVIDIA RTX 3070 (8GB, $499)
NVIDIA RTX 3060 Ti (8GB, $399)
NVIDIA TITAN RTX (24GB; $2,499)
NVIDIA GeForce RTX 2080 Ti (11GB; $1,199)
NVIDIA GeForce RTX 2080 SUPER (8GB, $699)
NVIDIA GeForce RTX 2070 SUPER (8GB; $499)
NVIDIA GeForce RTX 2060 SUPER (8GB; $399)
NVIDIA GeForce RTX 2060 (6GB; $349)
NVIDIA GeForce GTX 1660 Ti (6GB; $279)
|Motherboard chipset drivers were updated on each platform before testing.
All GPU testing was performed on our Ryzen Threadripper 3970X workstation.
All GPUs were tested with DDR4-3200 8GBx4 Corsair Vengeance RGB Pro.
All CPUs were tested with DDR4-3200 16GBx4 Corsair Dominator Platinum.
AMD Radeon Driver: Adrenalin 20.11.2
NVIDIA GeForce & TITAN Driver: GeForce 457.09 (457.40 for 3060 Ti)
All product links in this table are affiliated, and support the website.
All of our GPU testing was performed on our AMD Ryzen Threadripper workstation PC, with 32GB of Corsair memory. CPU and CPU+GPU testing was performed on each respective platform, with 64GB of memory, also from Corsair. Each platform is largely kept stock, except for the enabling of the memory profile, and disabling of any automatic overclocking features that breaks the “stock” nature of the processor.
The latest version of Windows, 20H2, was also used, as well as recent GPU drivers. As always, our OS is optimized to reduce bottlenecks as much as possible, and all tests are run at least three times over to ensure accuracy.
The popular BMW Blender project from Mike Pan is one of the lightest we benchmark with across any one of our tested renderers, but it’s become de facto over the years due to the fact that it doesn’t take long to render, and still manages to show great scaling across product stacks – CPU and GPU alike. Nothing changes in that regard with the latest crop of hardware releases.
From the get-go here, we can see that AMD has a difficult fight on its hands, despite making great strides between the RDNA and RDNA2 generations on the ray tracing front. If NVIDIA’s Ampere didn’t exist, it’d be exciting to see a GPU like the Radeon RX 6800 XT catch up to the GeForce RTX 2080 Ti – but then NVIDIA comes along and releases a $399 GeForce RTX 3060 Ti that outperforms its last-gen $2,499 TITAN RTX.
The CUDA vs. OpenCL performance above doesn’t seem too harsh on AMD’s GPUs, but unfortunately for Team Red, NVIDIA has a trick up its sleeve in the form of OptiX accelerated ray tracing which utilizes Turing’s and Ampere’s dedicated RT cores. When the OptiX API is employed instead of CUDA, we see the 34 second render time of the 3060 Ti drop to 20 seconds:
These results must feel like a sting to AMD, because with OpenCL, the RX 6900 XT renders this project in 40 seconds, whereas the last-gen RTX 2060 hits the same exact performance with OptiX enabled. Another set of useful results to compare generational improvements would be between the last-gen $399 RTX 2060 SUPER and the current-gen RTX 3060 Ti – we see a drop from 33 to 20 seconds in that particular OptiX match-up.
Here’s the same take with the more advanced Classroom project:
We’re not entirely sure why, but AMD’s Radeon GPUs have historically been pretty strong in the Classroom project when compared to its scalability in the BMW one. While not listed, this is one project where the older Radeon VII would exhibit a really strong time. Once again, if OptiX didn’t exist, this would look great for Radeon, but OptiX once again works wonders here:
An important thing to note is that OptiX hasn’t yet reached feature parity with the CUDA API, so if you’re converting older projects, you may have to adjust the project to render correctly. There’s a shortlist on Blender’s developer site that shows which features are still missing. At the moment, baking, branched path tracing, CPU + GPU, and bevel support is missing, but the latter two will make it into the 2.92 release. 2.92 will also drop the experimental tag, encouraging more folks to use it.
The Controller project up next is one that doesn’t render with OptiX in Blender 2.91 because of the missing bevel support, but it does render in the current 2.92 alpha. We’ll wait until final release before testing before / after, but for now, we can see the CUDA vs. OpenCL performance in 2.91:
Once again, AMD doesn’t perform too bad against NVIDIA’s last-gen competition, but Ampere is a serious force to be reckoned with. Even without the use of RT cores, NVIDIA’s ray traced rendering performance is really strong.
Fortunately for AMD, not all hope is lost, because if someone dedicated their entire Blender use to the Eevee rendering engine, then the scales change quite a bit. OptiX isn’t supported in Eevee, but NVIDIA still proves to deliver the stronger performance, but not nearly to the same degree of what we saw with Cycles + OptiX:
What’s made clear across all of these results is that you really don’t want to go with a low-end graphics card if you have a lot of work to get rendered. If you run 100 samples per frame, then you could consider the above Eevee performance to represent about 10 seconds of animation.
Another important factor to keep in mind with your GPU is also the frame buffer. 8GB is likely to be suitable for the general audience for the time-being, but it’s hard to predict when it will begin to feel like a real limitation. AMD gives more attractive memory options for the dollar, but that doesn’t matter a great deal when the memory bandwidth is lower, and the cards still underperform vs. the NVIDIA competitors.
An interesting thing about CPU rendering in Blender is that for most of the software’s life, that was the only option. That of course meant that the more powerful the CPU, the faster your render times would be. Fast-forward to today, though, and current GPUs are so fast, that it almost makes the CPU seem irrelevant in some ways.
Take, for example, the fact that it takes AMD’s 64-core Ryzen Threadripper 3990X to hit a 43 second render time with the BMW project, a value roughly matched by the $649 Radeon RX 6800 XT. However, that ignores NVIDIA’s even stronger performance, allowing the $399 GeForce RTX 3060 Ti to hit 34 seconds with CUDA, or 20 seconds with OptiX. You’re reading that right: NVIDIA’s (currently) lowest-end Ampere GeForce renders these projects as fast or faster than AMD’s biggest CPU.
Because GPUs are so darn fast at rendering, the decision of which CPU you need will be dictated by your other workloads. You can still improve rendering time overall with more powerful CPUs, but the reality is, if you want faster rendering, you’d be better off buying a second GPU and enjoy nearly twice the rendering performance, rather than up to a 10% improvement with a faster CPU. However, having a large pool of system memory would still be to your advantage with more complex scenes, where the GPU’s framebuffer might be limited. Not to mention that rendering is just one-part of the process with 3D design, since baking, physics, and compositing are still CPU-bound.
Other things to consider is that faster clock speeds and faster single-threaded performance in general can speed up interactions in an OS and applications, so to us, a great option for a new high-end workstation would be AMD’s $549 Ryzen 9 5900X or Intel’s Core i9-10900K. That assumes you don’t have need for quad-channel memory, though – that feature can only be had on AMD’s Threadripper and Intel’s Core X enthusiast platforms.
Speaking of CPU + GPU, let’s see how scaling is impacted when combining each one of these CPUs with NVIDIA’s $499 GeForce RTX 3070:
If it wasn’t made clear from the previous results that the GPU is much quicker at rendering in Blender than even the biggest CPUs, these three charts should seal the deal. The RTX 3070 that was used for testing in these particular tests isn’t the highest-end card going, but it does highlight the fact that multiple GPUs would deliver far greater speed-ups than adding a big CPU to complement the GPU.
All of that being said, if you don’t have a second GPU, but still want a rendering speed-up, it makes sense to enable CUDA’s heterogeneous rendering to speed things up – unless OptiX can prove even quicker.
On the next page, we’re going to shift topics from rendering to viewport performance.
When we first started testing viewport performance in Blender a few versions ago, we took advantage of the complex Racing Car project that was bundled with the 2.77 release. Back then, that project proved a bit too much for most of our tested GPUs, so we ended up replacing it with the simpler Controller project. Well, as GPUs have become more powerful, it made sense to bring the Racing Car project back, and see how things fare nowadays.
Generally speaking, the Wireframe and Solid viewport modes don’t require strong GPU performance to deliver a good frame rate, but Material Preview and Rendered do. The latter is just as it sounds – a real render inside of the viewport. When in this mode, moving the camera and stopping it will immediately start the render over, allowing you to get decently quick feedback. Material Preview lowers the performance bar a bit by not looking like the final render, but looking close enough to give you the impression you need of how things are coming together.
That all said, we don’t generally test with any viewport mode other than Material Preview, but we have exceptions coming up. First, let’s look at the Controller project at its three resolutions:
It’s important to not take these charts out of context. As we see with gaming, the lower the rendered resolution, the greater the chance the CPU will make top GPUs all perform the same. We get some hint of that here, although there are still some clear differences across the stack. At 4K, the scaling looks the best, as the GPU is put to best use, but at the top-end, one card can flip-flop positions with each other. It’s this odd project scaling that can result in something like a 2070 Super performing better than a 3090. It’s for this reason that we wanted to re-add the Racing Car project to see if it would give us more linear scaling:
Even with this Racing Car project, the lower resolutions have the tendency to shuffle the top-end of the stack, usually only by a frame or two. With this particular project at 4K, however, that graph gives us the scaling from top-to-bottom we’d expect. Even the RTX 3090 manages to show a strong gain over the RTX 3080 in that test – and the RTX 3080 comes far ahead of the RTX 3070.
To be clear about one thing: whereas 60 FPS is pretty much a necessity in gaming for an enjoyable experience, the nature of manipulating a viewport allows for lower frame rates to feel apt. We wouldn’t recommend a GPU that dips below 20 FPS, however, because you don’t want it to feel outright janky – that only leads to frustration. But if you want a faster viewport, you can just disable objects you’re currently not working on to help improve things.
The most glaring thing about the above performance is the fact that AMD GPUs suffer hard compared to the NVIDIA competition. Even at 1080p, AMD’s biggest RX 6900 XT GPU hits 34 FPS at 1080p, whereas NVIDIA’s lower-end RTX 3070 hits 36 FPS at 4K.
Unfortunately for AMD, the problems extend even further than that. For as long as we’ve been testing Blender’s viewport, we also had our scripts benchmark Solid and Wireframe modes. However, because those have largely been a CPU bottleneck, the results are usually unimportant – 60 FPS is almost always easy to hit on even modest GPUs, unless the project is simply massive.
What we found in Blender 2.91, however, is dramatically degraded performance on Radeon GPUs. These next charts will spell that out quite clearly:
The results here are a little strange in many ways, so we’ll try to explain them as simply as possible. First and foremost, our viewport is tested with a simple method of revolving the camera around the center of the project without changing the angle – not up and down and all around, because it has never made a difference to the scaling in our LookDev (Material Preview) mode. However, wireframe behavior is a little different.
On GeForce cards, opening a project fresh, moving to Wireframe mode, and then immediately rotating the camera gives us our ~60 FPS result. However, if the camera is moved around with the mouse, and then the project view reset to our default, that ~60 FPS suddenly becomes ~90 FPS on those same GPUs – just like we see with the TITAN and Quadro cards on top of these charts. We must stress at this point that there is no intrinsic benefit to workstation class cards here, what you are seeing in the charts is an artifact of our testing method that these workstation cards seem to be immune to.
As unimportant as this truly is, we found that those workstation cards don’t need to have the project camera moved with the mouse to bump the initial ~60 FPS to the higher value from the start. In the future, we’ll need to go about this viewport test differently, if future Blender versions continue to show such degraded performance on certain GPUs. Again, as mentioned above, while we’ve tested Wireframe for quite some time, this is the first occasion where the data has been interesting enough to talk about.
The most important takeaway from these results is that Radeon fares really poorly. People should expect to get great Wireframe performance on even modest GPUs, so what’s the deal here? We’re not quite sure, but we do know one thing: this weak performance is not reflected in 2.90.
An example of that performance detriment can be seen in the shot above. Note that the FPS values seen in those shots are not reflective of our benchmark results, but just what Fraps happened to want to report at the moment we could snap the screenshot. In reality, the 2.90 result was closer to 70 FPS, while the 2.91 one was glued to around 25-27 FPS in our manual testing.
Part of this explains the weak Material Preview performance, as well. In 2.90, we saw the RX 6800 XT hit 44 FPS with the Controller project at 4K, yet that value dropped to 31 FPS in 2.91. Perhaps this is something that Blender itself needs to fix more than AMD – we’re really not sure. Either way, we do know that current viewport performance is really rough on Radeon, and anyone with any one of these models we tested, can do their own testing of 2.90 and 2.91 back-to-back and check for a difference.
When we finally got around to posting our Blender 2.90 performance look, we noted that the biggest reason for the hold-up was the fact that certain Radeon GPUs kept crashing the software. That included the RX 5700 XT, which with five different drivers would fail to render basic projects like BMW and Classroom. Fortunately, that issue has seemingly disappeared in 2.91.
That said, while certain issues are remedied in 2.91, AMD has suffered a bunch of new ones, and we’re not quite sure who’s to blame. It does seem clear one key change was made in 2.91 that greatly impacted Radeon performance, but left GeForce alone. We even tested with the Radeon Pro W5700 to see if the same viewport issues carry over to that series, and sure enough, they do. Clearly, Radeon users are going to want to stick to 2.90 for the time-being.
Even besides those issues, NVIDIA has such a stranglehold on Blender performance right now. The gains its RT cores deliver with Cycles rendering quite literally take things to a new level, and even in Eevee, NVIDIA wins each one of its competitive match-ups. AMD’s fortunate that the performance gap isn’t quite as strong with Eevee, at least.
The reality is this: if you’re a Blender user needing a new GPU, it makes no sense to go with Radeon right now. With 2.90, we had stability issues, and in 2.91, we have noticeable viewport performance issues – and none of those problems have occurred on the NVIDIA side. If you do stick with Radeon, it’s best to use Blender 2.90, because something about 2.91 doesn’t jive well with AMD’s cards. We’re hopeful that 2.92 will return us to previous viewport performance levels for AMD. A potential alternative is to not use Blender’s built-in rendering engine, Cycles, but instead use AMD’s own Radeon ProRender. This assumes that you’re aware that your scene will look different, and will need new shaders to get things looking close to how they did in Cycles (we can’t do a direct performance comparison between Cycles and RPR because the two engines generate two visually different results).
After poring overall of the results here, it seems that NVIDIA’s GeForce RTX 3060 Ti is easily the best value of the bunch. It costs $399 (SRP), and outperforms all of the company’s last-gen top-end parts in Cycles – including 2080 Ti and TITAN RTX. In Eevee, higher VRAM amounts are really favored, so moving up to the $699 RTX 3080 for that use will reveal even steeper performance gains than you’d expect, even though the VRAM bump is modest.
As for CPUs, it’s really hard to recommend high-core count chips for Blender use right now when NVIDIA’s GPU performance is so strong. We saw a GeForce RTX 3060 Ti with OptiX outperform AMD’s top-end 64-core Threadripper, so we don’t know what else needs to be said. You should only opt for a many-core CPU if other workloads demand it, or if you want to eke as much performance out of a CPU+GPU combo as possible. However, it’s imprtant to remember that a large pool of memory would still be benefitial in more complex scene. As we stated earlier, though, your best value would be doubling up on GPUs. You can gain an impression on dual GPU performance from this recent article.
Copyright © 2005-2021 Techgage Networks Inc. - All Rights Reserved.