At last August’s SIGGRAPH in Vancouver, it would have been difficult to walk around and not get a whiff of NVIDIA’s RTX. The company’s own booth was nearly impossible to miss, and others had RTX demos going on as well. That included HP, which showed off NVIDIA’s AI prowess with style transfers. With the promises of real-time ray tracing, it was understandably easy to get excited at the show.
With its Tensor and RT cores, Quadro RTX brings a lot of accelerated computing to the table. It’s suited for deep-learning and AI, and will take advantage of real-time ray tracing by applications that support the latest version of NVIDIA’s OptiX engine.
Since the launch of the first three Quadro RTX cards, availability has seemed to be spotty. Multiple readers informed us that they had to wait longer than expected for their preorders to be fulfilled, and we’re honestly not sure at this point if availability has improved that greatly since then. The fact that there is now an RTX 4000 may answer that, but we know better than to jump to conclusions.
The RTX 4000, as this article might suggest, has become the first Quadro RTX to hit our doorstep. It packs a real punch in comparison to the previous generation Pascal cards, which is before we get into the addition of Tensor and RT cores. The Tensors alone dramatically improve FP16 performance for deep-learning work. In this article, we’re going to see how the RTX 4000 compares to the outgoing Quadro P4000, which debuted a couple of years ago at the same general price point.
|NVIDIA’s Quadro Workstation GPU Lineup|
|Cores||Base MHz||Peak FP32||Memory||Bandwidth||TDP||Price|
|GV100||5120||1200||14.9 TFLOPS||32 GB 8||870 GB/s||185W||$8,999|
|RTX 8000||4608||1440||16.3 TFLOPS||48 GB 5||624 GB/s||???W||$10,000|
|RTX 6000||4608||1440||16.3 TFLOPS||24 GB 5||624 GB/s||295W||$6,300|
|RTX 5000||3072||1350||11.2 TFLOPS||16 GB 5||448 GB/s||265W||$2,300|
|RTX 4000||2304||???||7.1 TFLOPS||8 GB 1||416 GB/s||160W||$900|
|TITAN V||5120||1200||14.9 TFLOPS||12 GB 4||653 GB/s||250W||$2,999|
|P6000||3840||1417||11.8 TFLOPS||24 GB 6||432 GB/s||250W||$4,999|
|P5000||2560||1607||8.9 TFLOPS||16 GB 6||288 GB/s||180W||$1,999|
|P4000||1792||1227||5.3 TFLOPS||8 GB 3||243 GB/s||105W||$799|
|P2000||1024||1370||3.0 TFLOPS||5 GB 3||140 GB/s||75W||$399|
|P1000||640||1354||1.9 TFLOPS||4 GB 3||80 GB/s||47W||$299|
|P620||512||1354||1.4 TFLOPS||2 GB 3||80 GB/s||40W||$199|
|P600||384||1354||1.2 TFLOPS||2 GB 3||64 GB/s||40W||$179|
|P400||256||1070||0.6 TFLOPS||2 GB 3||32 GB/s||30W||$139|
We’re not sure of the exact base clock speed of the RTX 4000, but we do know it peaks at 7.1 TFLOPS FP32, which puts it in the same performance category as the GeForce RTX 2070 on the gaming side of the market. Based on our knowledge of that GPU, the RTX 4000 would offer great gameplay at both 1080p and 1440p, and with the right game, 4K could be possible, too.
The bigger Quadro RTX cards escalate both the performance and the price just the same, with the ultimate top 10 (thousand dollar) card offering a staggering 48GB of HBM2 ECC memory. Should you have heavier memory requirements or the need for better-than-average performance, the RTX 5000 should be a consideration, unless budget constraints act as a roadblock.
The Quadro P4000 is a 5.3 TFLOPS card, so based on that alone, the new RTX 4000 is 34% faster for the same price point. That performance boost hasn’t come without the addition of some watts, but the 160W TDP allows this 4000-series card to remain as a single-slot solution. The card’s power connector is at the end, not the top, which should suit smaller form-factor PCs better.
Compared to previous generations, there’s more than immediately meets the eye with Quadro RTX. The architecture bump from Pascal to Turing in itself represents a big boost in performance (and efficiency), but the addition of Tensor and RT cores helps set RTX apart from the rest of the market. As covered above, Tensors will prove useful in deep-learning and AI, while the RT cores can be used to take advantage of real-time ray tracing in applications which support it.
In the table below, we highlight the performance differences between the four currently available Quadro RTX cards. Turing’s extra processors forced NVIDIA to create the “RTX-OPS” performance metric, so the higher the number, the more capable the card is overall.
|NVIDIA’s Quadro RTX Performance|
|RT Cores||RTX-OPS||Rays Cast||FP16||INT8||DL TFLOPS|
|RTX 8000||72||84 T||10 Giga Rays/s||32.6 TFLOPS||206.1 TOPS||130.5 TFLOPS|
|RTX 6000||72||84 T||10 Giga Rays/s||32.6 TFLOPS||206.1 TOPS||130.5 TFLOPS|
|RTX 5000||48||62 T||8 Giga Ray/s||22.3 TFLOPS||178.4 TOPS||89.2 TFLOPS|
|RTX 4000||36||43 T||6 Giga Rays/s||14.2 TFLOPS||28.5 TOPS||57 TFLOPS|
On all Pascal-based cards, aside from the GP100, both half- and double-precision compute was crippled, with the performance on offer being supremely worthless to those who could have taken advantage of them. With Turing, double-precision is still restricted to the highest-end cards, but the leash has been taken off of FP16, something that gives the RTX 4000 14.2 TFLOPS to take advantage of.
With its Vega-based GPUs, AMD has offered unlocked half-precision for a couple of years, but it will still have a difficult time competing in deep-learning thanks to its lack of Tensor cores, or something comparable. AMD has talked about its future GPUs that will include similar technologies, so for now, we wait to see what the red team gets up to.
At the moment, there are a number of suites out there that support NVIDIA RTX technologies, and we’ve only begun to explore some of them. We have benchmarks included in this article that take great advantage of Turing itself, but as for the Tensors and RT cores, further analysis on those will come later.
Test PC & What We Test
On the following pages, the results of our WS GPU test gauntlet will be seen. The tests chosen cover a wide range of scenarios, from rendering to compute, and includes the use of both synthetic benchmarks and tests with real-world applications from the likes of Adobe and Autodesk.
Seven graphics cards in total have been tested for this article, which represents the six seen in our Radeon VII review from a few weeks ago, and the addition of the Quadro P4000, since it acts as a useful comparison to the RTX 4000 that replaces it. Another interesting comparison will be AMD’s Radeon Pro WX 8200, which released last fall for around the same price point ($999).
Here are the specs of the test machine:
Benchmark results are categorized and spread across the next four pages. On page 2, Adobe’s Premiere Pro and MAGIX’s Vegas Pro lead our encoding tests, with both AVC and HEVC codecs taken care of. On the same page, Sandra’s financial and scientific performance can be seen, as well as the cryptography.
On page 3, a few renderers are taken care of. These include the popular open-source design suite Blender, as well as LuxMark, and Radeon ProRender. For NVIDIA-specific renderers, Redshift, V-Ray, and OctaneRender also make an appearance.
Page 4 is home to viewport performance, covered with the help of SPEC and its SPECviewperf suite. In total, 8 test results are featured here, covering important design suites like CATIA, SolidWorks, Siemens NX, Creo, as well as Autodesk’s 3ds Max and Maya.
Without further ado, let’s get this train moving.