NVIDIA Quadro RTX 4000 Review

NVIDIA Quadro RTX 4000 Thumb
Print
by Rob Williams on March 16, 2019 in Graphics & Displays

NVIDIA’s Turing-infused Quadro RTX 4000 sets out to be a super-fast performer for its $900 price tag, but it also brings a couple of tricks. Those include some RTX special features, like Tensor and RT cores, which already come in addition to architectural enhancements that helps the card leap far ahead of the older P4000.

Page 1 – Introduction & Testing References

At last August’s SIGGRAPH in Vancouver, it would have been difficult to walk around and not get a whiff of NVIDIA’s RTX. The company’s own booth was nearly impossible to miss, and others had RTX demos going on as well. That included HP, which showed off NVIDIA’s AI prowess with style transfers. With the promises of real-time ray tracing, it was understandably easy to get excited at the show.

With its Tensor and RT cores, Quadro RTX brings a lot of accelerated computing to the table. It’s suited for deep-learning and AI, and will take advantage of real-time ray tracing by applications that support the latest version of NVIDIA’s OptiX engine.

Since the launch of the first three Quadro RTX cards, availability has seemed to be spotty. Multiple readers informed us that they had to wait longer than expected for their preorders to be fulfilled, and we’re honestly not sure at this point if availability has improved that greatly since then. The fact that there is now an RTX 4000 may answer that, but we know better than to jump to conclusions.

NVIDIA Quadro RTX 4000 Workstation Graphics Card

The RTX 4000, as this article might suggest, has become the first Quadro RTX to hit our doorstep. It packs a real punch in comparison to the previous generation Pascal cards, which is before we get into the addition of Tensor and RT cores. The Tensors alone dramatically improve FP16 performance for deep-learning work. In this article, we’re going to see how the RTX 4000 compares to the outgoing Quadro P4000, which debuted a couple of years ago at the same general price point.

NVIDIA’s Quadro Workstation GPU Lineup
Cores Base MHz Peak FP32 Memory Bandwidth TDP Price
GV100 5120 1200 14.9 TFLOPS 32 GB 8 870 GB/s 185W $8,999
RTX 8000 4608 1440 16.3 TFLOPS 48 GB 5 624 GB/s ???W $10,000
RTX 6000 4608 1440 16.3 TFLOPS 24 GB 5 624 GB/s 295W $6,300
RTX 5000 3072 1350 11.2 TFLOPS 16 GB 5 448 GB/s 265W $2,300
RTX 4000 2304 ??? 7.1 TFLOPS 8 GB 1 416 GB/s 160W $900
TITAN V 5120 1200 14.9 TFLOPS 12 GB 4 653 GB/s 250W $2,999
P6000 3840 1417 11.8 TFLOPS 24 GB 6 432 GB/s 250W $4,999
P5000 2560 1607 8.9 TFLOPS 16 GB 6 288 GB/s 180W $1,999
P4000 1792 1227 5.3 TFLOPS 8 GB 3 243 GB/s 105W $799
P2000 1024 1370 3.0 TFLOPS 5 GB 3 140 GB/s 75W $399
P1000 640 1354 1.9 TFLOPS 4 GB 3 80 GB/s 47W $299
P620 512 1354 1.4 TFLOPS 2 GB 3 80 GB/s 40W $199
P600 384 1354 1.2 TFLOPS 2 GB 3 64 GB/s 40W $179
P400 256 1070 0.6 TFLOPS 2 GB 3 32 GB/s 30W $139
Notes 1 GDDR6; 2 GDDR5X; 3 GDDR5; 4 HBM2
5 GDDR6 (ECC); 6 GDDR5X (ECC); 7 GDDR5 (ECC); 8 HBM2 (ECC)
Architecture: P = Pascal; V = Volta; RTX = Turing

We’re not sure of the exact base clock speed of the RTX 4000, but we do know it peaks at 7.1 TFLOPS FP32, which puts it in the same performance category as the GeForce RTX 2070 on the gaming side of the market. Based on our knowledge of that GPU, the RTX 4000 would offer great gameplay at both 1080p and 1440p, and with the right game, 4K could be possible, too.

The bigger Quadro RTX cards escalate both the performance and the price just the same, with the ultimate top 10 (thousand dollar) card offering a staggering 48GB of HBM2 ECC memory. Should you have heavier memory requirements or the need for better-than-average performance, the RTX 5000 should be a consideration, unless budget constraints act as a roadblock.

The Quadro P4000 is a 5.3 TFLOPS card, so based on that alone, the new RTX 4000 is 34% faster for the same price point. That performance boost hasn’t come without the addition of some watts, but the 160W TDP allows this 4000-series card to remain as a single-slot solution. The card’s power connector is at the end, not the top, which should suit smaller form-factor PCs better.

Compared to previous generations, there’s more than immediately meets the eye with Quadro RTX. The architecture bump from Pascal to Turing in itself represents a big boost in performance (and efficiency), but the addition of Tensor and RT cores helps set RTX apart from the rest of the market. As covered above, Tensors will prove useful in deep-learning and AI, while the RT cores can be used to take advantage of real-time ray tracing in applications which support it.

In the table below, we highlight the performance differences between the four currently available Quadro RTX cards. Turing’s extra processors forced NVIDIA to create the “RTX-OPS” performance metric, so the higher the number, the more capable the card is overall.

NVIDIA’s Quadro RTX Performance
RT Cores RTX-OPS Rays Cast FP16 INT8 DL TFLOPS
RTX 8000 72 84 T 10 Giga Rays/s 32.6 TFLOPS 206.1 TOPS 130.5 TFLOPS
RTX 6000 72 84 T 10 Giga Rays/s 32.6 TFLOPS 206.1 TOPS 130.5 TFLOPS
RTX 5000 48 62 T 8 Giga Ray/s 22.3 TFLOPS 178.4 TOPS 89.2 TFLOPS
RTX 4000 36 43 T 6 Giga Rays/s 14.2 TFLOPS 28.5 TOPS 57 TFLOPS

On all Pascal-based cards, aside from the GP100, both half- and double-precision compute was crippled, with the performance on offer being supremely worthless to those who could have taken advantage of them. With Turing, double-precision is still restricted to the highest-end cards, but the leash has been taken off of FP16, something that gives the RTX 4000 14.2 TFLOPS to take advantage of.

With its Vega-based GPUs, AMD has offered unlocked half-precision for a couple of years, but it will still have a difficult time competing in deep-learning thanks to its lack of Tensor cores, or something comparable. AMD has talked about its future GPUs that will include similar technologies, so for now, we wait to see what the red team gets up to.

At the moment, there are a number of suites out there that support NVIDIA RTX technologies, and we’ve only begun to explore some of them. We have benchmarks included in this article that take great advantage of Turing itself, but as for the Tensors and RT cores, further analysis on those will come later.

Test PC & What We Test

On the following pages, the results of our WS GPU test gauntlet will be seen. The tests chosen cover a wide range of scenarios, from rendering to compute, and includes the use of both synthetic benchmarks and tests with real-world applications from the likes of Adobe and Autodesk.

Seven graphics cards in total have been tested for this article, which represents the six seen in our Radeon VII review from a few weeks ago, and the addition of the Quadro P4000, since it acts as a useful comparison to the RTX 4000 that replaces it. Another interesting comparison will be AMD’s Radeon Pro WX 8200, which released last fall for around the same price point ($999).

Here are the specs of the test machine:

Techgage Workstation Test System
Processor Intel Core i9-7980XE (18-core; 2.6GHz)
Motherboard ASUS ROG STRIX X299-E GAMING
Memory HyperX FURY (4x16GB; DDR4-2666 16-18-18)
Graphics AMD Radeon VII (16GB; Jan 22 Press Driver)
AMD Radeon Pro WX 8200 (8GB; 18.Q4.1)
NVIDIA GeForce RTX 2080 Ti (11GB; 417.71)
NVIDIA TITAN Xp (12GB; 417.71)
NVIDIA Quadro RTX 4000 (8GB; 412.16)
NVIDIA Quadro P6000 (24GB; 412.16)
NVIDIA Quadro P4000 (8GB; 412.16)
Audio Onboard
Storage Kingston KC1000 960GB M.2 SSD
Power Supply Corsair 80 Plus Gold AX1200
Chassis Corsair Carbide 600C Inverted Full-Tower
Cooling NZXT Kraken X62 AIO Liquid Cooler
Et cetera Windows 10 Pro build 17763 (1809)
For an in-depth pictorial look at this build, head here.

Benchmark results are categorized and spread across the next four pages. On page 2, Adobe’s Premiere Pro and MAGIX’s Vegas Pro lead our encoding tests, with both AVC and HEVC codecs taken care of. On the same page, Sandra’s financial and scientific performance can be seen, as well as the cryptography.

On page 3, a few renderers are taken care of. These include the popular open-source design suite Blender, as well as LuxMark, and Radeon ProRender. For NVIDIA-specific renderers, Redshift, V-Ray, and OctaneRender also make an appearance.

Page 4 is home to viewport performance, covered with the help of SPEC and its SPECviewperf suite. In total, 8 test results are featured here, covering important design suites like CATIA, SolidWorks, Siemens NX, Creo, as well as Autodesk’s 3ds Max and Maya.

Without further ado, let’s get this train moving.

Support our efforts! With ad revenue at an all-time low for written websites, we're relying more than ever on reader support to help us continue putting so much effort into this type of content. You can support us by becoming a Patron, or by using our Amazon shopping affiliate links listed through our articles. Thanks for your support!

Rob Williams

Rob founded Techgage in 2005 to be an 'Advocate of the consumer', focusing on fair reviews and keeping people apprised of news in the tech world. Catering to both enthusiasts and businesses alike; from desktop gaming to professional workstations, and all the supporting software.

twitter icon facebook icon instagram icon