NVIDIA Quadro RTX 4000 Review

by Rob Williams on March 16, 2019 in Graphics & Displays

NVIDIA’s Turing-infused Quadro RTX 4000 sets out to be a super-fast performer for its $900 price tag, but it also brings a couple of tricks. Those include some RTX special features, like Tensor and RT cores, which already come in addition to architectural enhancements that helps the card leap far ahead of the older P4000.

Page 1 – Introduction & Testing References

At last August’s SIGGRAPH in Vancouver, it would have been difficult to walk around and not get a whiff of NVIDIA’s RTX. The company’s own booth was nearly impossible to miss, and others had RTX demos going on as well. That included HP, which showed off NVIDIA’s AI prowess with style transfers. With the promises of real-time ray tracing, it was understandably easy to get excited at the show.

With its Tensor and RT cores, Quadro RTX brings a lot of accelerated computing to the table. It’s suited for deep-learning and AI, and will take advantage of real-time ray tracing by applications that support the latest version of NVIDIA’s OptiX engine.

Since the launch of the first three Quadro RTX cards, availability has seemed to be spotty. Multiple readers informed us that they had to wait longer than expected for their preorders to be fulfilled, and we’re honestly not sure at this point if availability has improved that greatly since then. The fact that there is now an RTX 4000 may answer that, but we know better than to jump to conclusions.

NVIDIA Quadro RTX 4000 Workstation Graphics Card

The RTX 4000, as this article might suggest, has become the first Quadro RTX to hit our doorstep. It packs a real punch in comparison to the previous generation Pascal cards, which is before we get into the addition of Tensor and RT cores. The Tensors alone dramatically improve FP16 performance for deep-learning work. In this article, we’re going to see how the RTX 4000 compares to the outgoing Quadro P4000, which debuted a couple of years ago at the same general price point.

	NVIDIA’s Quadro Workstation GPU Lineup
	Cores	Base MHz	Peak FP32	Memory	Bandwidth	TDP	Price
GV100	5120	1200	14.9 TFLOPS	32 GB ⁸	870 GB/s	185W	$8,999
RTX 8000	4608	1440	16.3 TFLOPS	48 GB ⁵	624 GB/s	???W	$10,000
RTX 6000	4608	1440	16.3 TFLOPS	24 GB ⁵	624 GB/s	295W	$6,300
RTX 5000	3072	1350	11.2 TFLOPS	16 GB ⁵	448 GB/s	265W	$2,300
RTX 4000	2304	???	7.1 TFLOPS	8 GB ¹	416 GB/s	160W	$900
TITAN V	5120	1200	14.9 TFLOPS	12 GB ⁴	653 GB/s	250W	$2,999
P6000	3840	1417	11.8 TFLOPS	24 GB ⁶	432 GB/s	250W	$4,999
P5000	2560	1607	8.9 TFLOPS	16 GB ⁶	288 GB/s	180W	$1,999
P4000	1792	1227	5.3 TFLOPS	8 GB ³	243 GB/s	105W	$799
P2000	1024	1370	3.0 TFLOPS	5 GB ³	140 GB/s	75W	$399
P1000	640	1354	1.9 TFLOPS	4 GB ³	80 GB/s	47W	$299
P620	512	1354	1.4 TFLOPS	2 GB ³	80 GB/s	40W	$199
P600	384	1354	1.2 TFLOPS	2 GB ³	64 GB/s	40W	$179
P400	256	1070	0.6 TFLOPS	2 GB ³	32 GB/s	30W	$139
Notes	¹ GDDR6; ² GDDR5X; ³ GDDR5; ⁴ HBM2 ⁵ GDDR6 (ECC); ⁶ GDDR5X (ECC); ⁷ GDDR5 (ECC); ⁸ HBM2 (ECC) Architecture: P = Pascal; V = Volta; RTX = Turing

We’re not sure of the exact base clock speed of the RTX 4000, but we do know it peaks at 7.1 TFLOPS FP32, which puts it in the same performance category as the GeForce RTX 2070 on the gaming side of the market. Based on our knowledge of that GPU, the RTX 4000 would offer great gameplay at both 1080p and 1440p, and with the right game, 4K could be possible, too.

The bigger Quadro RTX cards escalate both the performance and the price just the same, with the ultimate top 10 (thousand dollar) card offering a staggering 48GB of HBM2 ECC memory. Should you have heavier memory requirements or the need for better-than-average performance, the RTX 5000 should be a consideration, unless budget constraints act as a roadblock.

The Quadro P4000 is a 5.3 TFLOPS card, so based on that alone, the new RTX 4000 is 34% faster for the same price point. That performance boost hasn’t come without the addition of some watts, but the 160W TDP allows this 4000-series card to remain as a single-slot solution. The card’s power connector is at the end, not the top, which should suit smaller form-factor PCs better.

Compared to previous generations, there’s more than immediately meets the eye with Quadro RTX. The architecture bump from Pascal to Turing in itself represents a big boost in performance (and efficiency), but the addition of Tensor and RT cores helps set RTX apart from the rest of the market. As covered above, Tensors will prove useful in deep-learning and AI, while the RT cores can be used to take advantage of real-time ray tracing in applications which support it.

In the table below, we highlight the performance differences between the four currently available Quadro RTX cards. Turing’s extra processors forced NVIDIA to create the “RTX-OPS” performance metric, so the higher the number, the more capable the card is overall.

	NVIDIA’s Quadro RTX Performance
	RT Cores	RTX-OPS	Rays Cast	FP16	INT8	DL TFLOPS
RTX 8000	72	84 T	10 Giga Rays/s	32.6 TFLOPS	206.1 TOPS	130.5 TFLOPS
RTX 6000	72	84 T	10 Giga Rays/s	32.6 TFLOPS	206.1 TOPS	130.5 TFLOPS
RTX 5000	48	62 T	8 Giga Ray/s	22.3 TFLOPS	178.4 TOPS	89.2 TFLOPS
RTX 4000	36	43 T	6 Giga Rays/s	14.2 TFLOPS	28.5 TOPS	57 TFLOPS

On all Pascal-based cards, aside from the GP100, both half- and double-precision compute was crippled, with the performance on offer being supremely worthless to those who could have taken advantage of them. With Turing, double-precision is still restricted to the highest-end cards, but the leash has been taken off of FP16, something that gives the RTX 4000 14.2 TFLOPS to take advantage of.

With its Vega-based GPUs, AMD has offered unlocked half-precision for a couple of years, but it will still have a difficult time competing in deep-learning thanks to its lack of Tensor cores, or something comparable. AMD has talked about its future GPUs that will include similar technologies, so for now, we wait to see what the red team gets up to.

At the moment, there are a number of suites out there that support NVIDIA RTX technologies, and we’ve only begun to explore some of them. We have benchmarks included in this article that take great advantage of Turing itself, but as for the Tensors and RT cores, further analysis on those will come later.

Test PC & What We Test

On the following pages, the results of our WS GPU test gauntlet will be seen. The tests chosen cover a wide range of scenarios, from rendering to compute, and includes the use of both synthetic benchmarks and tests with real-world applications from the likes of Adobe and Autodesk.

Seven graphics cards in total have been tested for this article, which represents the six seen in our Radeon VII review from a few weeks ago, and the addition of the Quadro P4000, since it acts as a useful comparison to the RTX 4000 that replaces it. Another interesting comparison will be AMD’s Radeon Pro WX 8200, which released last fall for around the same price point ($999).

Here are the specs of the test machine:

	Techgage Workstation Test System
Processor	Intel Core i9-7980XE (18-core; 2.6GHz)
Motherboard	ASUS ROG STRIX X299-E GAMING
Memory	HyperX FURY (4x16GB; DDR4-2666 16-18-18)
Graphics	AMD Radeon VII (16GB; Jan 22 Press Driver) AMD Radeon Pro WX 8200 (8GB; 18.Q4.1) NVIDIA GeForce RTX 2080 Ti (11GB; 417.71) NVIDIA TITAN Xp (12GB; 417.71) NVIDIA Quadro RTX 4000 (8GB; 412.16) NVIDIA Quadro P6000 (24GB; 412.16) NVIDIA Quadro P4000 (8GB; 412.16)
Audio	Onboard
Storage	Kingston KC1000 960GB M.2 SSD
Power Supply	Corsair 80 Plus Gold AX1200
Chassis	Corsair Carbide 600C Inverted Full-Tower
Cooling	NZXT Kraken X62 AIO Liquid Cooler
Et cetera	Windows 10 Pro build 17763 (1809)
For an in-depth pictorial look at this build, head here.

Benchmark results are categorized and spread across the next four pages. On page 2, Adobe’s Premiere Pro and MAGIX’s Vegas Pro lead our encoding tests, with both AVC and HEVC codecs taken care of. On the same page, Sandra’s financial and scientific performance can be seen, as well as the cryptography.

On page 3, a few renderers are taken care of. These include the popular open-source design suite Blender, as well as LuxMark, and Radeon ProRender. For NVIDIA-specific renderers, Redshift, V-Ray, and OctaneRender also make an appearance.

Page 4 is home to viewport performance, covered with the help of SPEC and its SPECviewperf suite. In total, 8 test results are featured here, covering important design suites like CATIA, SolidWorks, Siemens NX, Creo, as well as Autodesk’s 3ds Max and Maya.

Without further ado, let’s get this train moving.

Page List:

Top

1. Introduction & Testing References
2. Adobe Premiere Pro, MAGIX Vegas & Sandra: Crypto, Financial & Scientific
3. V-Ray, Redshift, OctaneRender, Blender, LuxMark & Radeon ProRender
4. Viewport: SolidWorks, CATIA, Siemens NX, PTC Creo, 3ds Max & Maya
5. Final Thoughts

Next Page >>

Support our efforts! With ad revenue at an all-time low for written websites, we're relying more than ever on reader support to help us continue putting so much effort into this type of content. You can support us by becoming a Patron, or by using our Amazon shopping affiliate links listed through our articles. Thanks for your support!

Rob Williams

Rob founded Techgage in 2005 to be an 'Advocate of the consumer', focusing on fair reviews and keeping people apprised of news in the tech world. Catering to both enthusiasts and businesses alike; from desktop gaming to professional workstations, and all the supporting software.