NVIDIA TITAN RTX Workstation Performance Review

TITAN RTX T-Rex Crop
Print
by Rob Williams on July 24, 2019 in Graphics & Displays

NVIDIA’s TITAN RTX means business – and a lot of it. This jack-of-all-trades graphics card caters to those with serious visual computing needs, whether it be designing and rendering 3D scenes, or poring over repositories of photos or other data with deep-learning work.

Introduction & Testing References

NVIDIA’s TITAN series of graphics cards has been an interesting one since the launch of the original in 2013. That Kepler-based GTX TITAN model peaked at 4.5 TFLOPS single-precision (FP32), performance that was boosted to 5.1 TFLOPS with the release of the TITAN Black the following year.

Fast-forward to the present day, where we now have the TITAN RTX, boasting 16.3 TFLOPS of single-precision, and 32.6 TFLOPS of half-precision (FP16). Double-precision (FP64) used to be standard fare on the earlier TITANs, but today, you’ll need the Volta-based TITAN V for unlocked performance (6.1 TFLOPS), or AMD’s Radeon VII for partially unlocked performance (3.4 TFLOPS).

Lately, half-precision has garnered a lot of attention by the ProViz market, since it’s ideal for use with deep-learning and AI, things that are growing in popularity at a ridiculously quick pace. Add specifically tuned Tensor cores to the mix, and deep-learning performance on Turing becomes truly impressive.

NVIDIA TITAN RTX Graphics Card

Tensors are not the only party trick the TITAN RTX has. Like the rest of the RTX line (on both the gaming and pro side), RT cores are present in the TITAN RTX, useful for accelerating real-time ray tracing workloads. The cores need to be specifically supported by developers, using APIs such as DXR and VKRay. While support for NVIDIA’s technology started off tepid, industry support has grown a lot since the original unveiling of RTX at SIGGRAPH last year.

At E3 in June, a handful of games had ray tracing-related announcements, including Watch_Dogs: Legion, Cyberpunk 2077, Call of Duty: Modern Warfare, and of course, Quake II RTX. On the design side, some developers have already released their RTX accelerated solutions, while many more are in the works. NVIDIA has been talking a lot lately about the Adobes and Autodesks of the world helping to grow the list of RTX-infused software. We wouldn’t be surprised if more RTX goodness was revealed at SIGGRAPH this year yet again.

For deep-learning, the TITAN RTX’s strong FP16 performance is fast on its own, but there are a few perks onboard to help take things to the next level. The Tensor cores aid in much of the acceleration, but the ability to use mixed precision is another big part. With it, minimal data tracking will be stored in single-precision, while the key data will get crunched in half-precision. Everything combined, this can boost training performance by 3x over the base GPU.

NVIDIA's TITAN RTX & GeForce RTX 2080 Ti - Backs

Also notable for Turing is concurrent integer / floating-point operations, which allows games (or software) to execute INT and FP operations in parallel without tripping over each other in the pipeline. NVIDIA has noted in the past that with games like Shadow of the Tomb Raider, a sample set of 100 instructions included 62 FP and 38 INT, and that this concurrent feature directly improves performance as a result.

Another important feature of TITAN RTX is its ability to use NVLink, which essentially combines the memory pools of two cards together, resulting in a single framebuffer that can be used for the biggest possible projects. Since GPUs scale generally very well with the sorts of workloads the card targets, it’s the true memory pooling that’s going to offer the greatest benefit here. Gaming content that could also take advantage of multi-GPU would see a benefit with two cards and this connector, as well.

Because it’s a feature exclusive to these RTX GPUs right now, it’s worth mentioning that NVIDIA also bundles a VirtualLink port at the back, allowing you to plug in your HMD for VR, or in the worst case, use it as a full-powered USB-C port, either for data transfer or phone charging.

With all of that covered, let’s take a quick look at the overall current NVIDIA workstation stack:

NVIDIA’s Quadro & TITAN Workstation GPU Lineup
CoresBase MHzPeak FP32MemoryBandwidthTDPPrice
GV1005120120014.9 TFLOPS32 GB 8870 GB/s185W$8,999
RTX 80004608144016.3 TFLOPS48 GB 5624 GB/s???W$5,500
RTX 60004608144016.3 TFLOPS24 GB 5624 GB/s295W$4,000
RTX 50003072135011.2 TFLOPS16 GB 5448 GB/s265W$2,300
RTX 4000230410057.1 TFLOPS8 GB 1416 GB/s160W$900
TITAN RTX4608135016.3 TFLOPS24 GB 1672 GB/s280W$2,499
TITAN V5120120014.9 TFLOPS12 GB 4653 GB/s250W$2,999
P60003840141711.8 TFLOPS24 GB 6432 GB/s250W$4,999
P5000256016078.9 TFLOPS16 GB 6288 GB/s180W$1,999
P4000179212275.3 TFLOPS8 GB 3243 GB/s105W$799
P2000102413703.0 TFLOPS5 GB 3140 GB/s75W$399
P100064013541.9 TFLOPS4 GB 380 GB/s47W$299
P62051213541.4 TFLOPS2 GB 380 GB/s40W$199
P60038413541.2 TFLOPS2 GB 364 GB/s40W$179
P40025610700.6 TFLOPS2 GB 332 GB/s30W$139
Notes1 GDDR6; 2 GDDR5X; 3 GDDR5; 4 HBM2
5 GDDR6 (ECC); 6 GDDR5X (ECC); 7 GDDR5 (ECC); 8 HBM2 (ECC)
Architecture: P = Pascal; V = Volta; RTX = Turing

The TITAN RTX matches the Quadro RTX 6000 and 8000 for having the highest number of cores in the Turing lineup. NVIDIA says the TITAN RTX is about 3 TFLOPS faster in FP32 over the RTX 2080 Ti, and fortunately, we have results for both cards covering a wide-range of tests to see how they compare.

What’s not seen in the specs table above is the actual performance of the ray tracing and deep-learning components. This next table helps clear some of that up:

NVIDIA’s Quadro & TITAN – RTX Performance
RT CoresRTX-OPSRays Cast 1FP16 2INT8 3Deep-learning 2
TITAN RTX7284 T1132.6206.1130.5
RTX 80007284 T1032.6206.1130.5
RTX 60007284 T1032.6206.1130.5
RTX 50004862 T822.3178.489.2
RTX 40003643 T614.228.557
Notes1 Giga Rays/s; 2 TFLOPS; 3 TOPS

You’ll notice that the TITAN RTX has a higher “rays cast” spec than the top Quadros, which might owe its thanks to higher clocks. The other specs are identical across the top three GPUs, with obvious downgrading taking place as we move downward. Currently, the Quadro RTX 4000 (roughly a GeForce RTX 2070 equivalent) is the lowest-end current-gen Quadro from NVIDIA. Again, SIGGRAPH is almost upon us, so it could be that NVIDIA will have a hardware surprise in store; perhaps an RTX 2060 Quadro equivalent.

When the RTX 2080 Ti already offers so much performance, who exactly is the TITAN RTX for? NVIDIA is targeting it largely at researchers, but it secondarily acts as one of the fastest ProViz cards on the market. It could be opted for by those who want the fastest GPU solution going, and not to mention a huge 24GB framebuffer. 24GB might be a bit much for a lot of current visualization work, but with deep-learning, 24GB provides a lot of breathing room.

Despite all it offers, TITAN RTX can’t be called an “ultimate” solution for ProViz since it lacks some Quadro optimizations that the namesake GPUs have. That means in certain high-end design suites like Siemens NX, a true Quadro might prove better. But if you don’t use any workloads that experience specific enhancements, the TITAN RTX is going to be quite attractive given its feature-set (and that framebuffer!) If you’re ever confused about optimizations in your software of choice, please leave a comment!

A couple of years ago, NVIDIA decided to give some love to the TITAN series with driver enhancements that brings some parity between TITAN and Quadro. We can now say that TITAN RTX enjoys the same kind of performance boosts that the TITAN Xp did two years ago, something that will be reflected in some of the graphs ahead.

Test PC & What We Test

On the following pages, the results of our workstation GPU test gauntlet will be seen. The tests chosen cover a wide range of scenarios, from rendering to compute, and includes the use of both synthetic benchmarks and tests with real-world applications from the likes of Adobe and Autodesk.

Nineteen graphics cards have been tested for this article, with the list dominated by Quadro and Radeon Pro workstation cards. There’s a healthy sprinkling of gaming cards in there as well, however, to show you any possible optimization that may be taking place on one or the other.

Please note that the testing for this article was conducted a couple of months ago, before an onslaught of travel and product launches. Graphics card drivers released since our testing might improve performance in certain cases, but we wouldn’t expect any notable changes, having sanity checked a bunch of our usual tested software on both AMD and NVIDIA GPUs. Likewise, the previous version of Windows was used for this particular testing, but that also didn’t reveal any disadvantages when we sanity checked in 1903.

In recent months, we’ve spent a lot of time polishing our test suites, and also our internal testing scripts. We’re currently in the process of rebenchmarking a number of GPUs for an upcoming look at ProViz performance with cards from both AMD’s Radeon RX 5700 and NVIDIA’s GeForce SUPER series. Fortunately, results from those cards don’t really eat into a top-end card like the TITAN RTX, so tardiness hasn’t foiled us this time.

The specs of our test rig are seen below:

Techgage Workstation Test System
ProcessorIntel Core i9-9980XE (18-core; 3.0GHz)
MotherboardASUS ROG STRIX X299-E GAMING
MemoryHyperX FURY (4x16GB; DDR4-2666 16-18-18)
GraphicsAMD Radeon VII (16GB)
AMD Radeon RX Vega 64 (8GB)
AMD Radeon RX 590 (8GB)
AMD Radeon Pro WX 8200 (8GB)
AMD Radeon Pro WX 7100 (8GB)
AMD Radeon Pro WX 5100 (8GB)
AMD Radeon Pro WX 4100 (4GB)
AMD Radeon Pro WX 3100 (4GB)
NVIDIA TITAN RTX (24GB)
NVIDIA TITAN Xp (12GB)
NVIDIA GeForce RTX 2080 Ti (11GB)
NVIDIA GeForce RTX 2060 (6GB)
NVIDIA GeForce GTX 1080 Ti (11GB)
NVIDIA GeForce GTX 1660 Ti (6GB)
NVIDIA Quadro RTX 4000 (8GB)
NVIDIA Quadro P6000 (24GB)
NVIDIA Quadro P5000 (12GB)
NVIDIA Quadro P4000 (8GB)
NVIDIA Quadro P2000 (5GB)
AudioOnboard
StorageKingston KC1000 960GB M.2 SSD
Power SupplyCorsair 80 Plus Gold AX1200
ChassisCorsair Carbide 600C Inverted Full-Tower
CoolingNZXT Kraken X62 AIO Liquid Cooler
Et ceteraWindows 10 Pro build 17763 (1809)
DriversAMD Radeon: Adrenaline 19.4.1
AMD Radeon Pro: Enterprise 19.Q1.2
NVIDIA GeForce & TITAN: Creative Ready 419.67
NVIDIA Quadro: Quadro 419.67

Our benchmark results are categorized and spread across the following six pages. On page 2, we’re looking at some CUDA-based renderers, including V-Ray, Redshift, OctaneRender, and Arnold GPU. Some of these will add support for non-CUDA GPUs in time, and when that happens, we’ll introduce the tests to our Radeon benchmarking suite. Page 3 includes a number of more neutral renderers, like Blender, Radeon ProRender, and LuxMark.

On page 4, we’re tackling encoding with the help of Adobe’s Premiere Pro and MAGIX’s Vegas Pro, while page 5 is home to viewport performance, largely covered with the help of SPECviewperf. In total, 8 test results are featured here, covering important design suites like CATIA, SolidWorks, Siemens NX, Creo, as well as Autodesk’s 3ds Max and Maya. Our own Blender viewport test wraps the page up. And speaking of wrapping things up, page 6 covers mathematical performance with the help of SiSoftware’s Sandra.

And with all of that covered, let’s get on with things:

Rob Williams

Rob founded Techgage in 2005 to be an 'Advocate of the consumer', focusing on fair reviews and keeping people apprised of news in the tech world. Catering to both enthusiasts and businesses alike; from desktop gaming to professional workstations, and all the supporting software.

twitter icon facebook icon googleplus icon instagram icon