Techgage logo

NVIDIA’s Fastest Graphics Card Ever: A Look At The Quadro P6000

Date: February 14, 2017
Author(s): Rob Williams

NVIDIA’s latest and greatest-ever workstation graphics card has arrived: Quadro P6000. This top-tier card is built around NVIDIA’s Pascal architecture, which is produced on a 16nm FinFET process. The card boasts an impressive 3,840 CUDA cores, and not to mention 24GB of super-fast GDDR5X. Let’s check it out.



Introduction

When NVIDIA released its Pascal GeForce series last spring and delivered downright impressive performance, we knew that the company’s Pascal Quadros were going to be something special. And well, the P6000 in particular does prove to be a very special card indeed, for a multitude of reasons.

Considering the fact that NVIDIA’s Maxwell-based Quadro M6000 shared similar specs with the first-gen GeForce TITAN X, it’s easy to jump to conclusions and assume that the P6000 is spec-comparable to the second-gen TITAN X. Well, the two cards are in fact similar, but NVIDIA managed to cram an additional 256 CUDA cores into the P6000, giving it a slight performance boost and securing its right to bear the title: “Fastest NVIDIA GPU Ever!”

As covered last week, NVIDIA has just fleshed out its entire Pascal-based Quadro lineup, now offering options to fit all budgets. The Quadro P6000 sits proud at the top, and like previous generation top-tier Quadros, the P6000 is priced at around $5,000 USD, with Newegg currently offering it for $5,400.

Despite being available for a couple of months now, the P6000 remains difficult to find at etail. Newegg seems to be an exception here; Amazon doesn’t offer a single Pascal Quadro at the moment. System builders like BOXX do, but with the warning of “extended lead time”. So while the P6000 is undeniably the fastest GPU NVIDIA has ever crafted, it might take a little bit of time to acquire.

Nonetheless, let’s take a harder look at what we’re dealing with:

NVIDIA Pascal Quadro Roundup
CoresCore MHzMemoryMem MHzMem BusTDP
Quadro GP1003584 (FP32)
1792 (FP64)
TBD16GB 1TBDTBDTBD
Quadro P60003840141724GB 29008384-bit250W
Quadro P50002560160716GB 29008256-bit180W
Quadro P40001792TBD8GB 3TBDTBDTBD
Quadro P20001024TBD5GB 3TBDTBDTBD
Quadro P1000640TBD4GB 3TBDTBDTBD
Quadro P600384TBD2GB 3TBDTBDTBD
Quadro P400256TBD2GB 3TBDTBDTBD
1 HBM2; 2 GDDR5X; 3 GDDR5

To address the elephant in the room, the Quadro GP100 is different from the P6000 in its focus (and price; I’d expect the GP100 to cost at least 25% more). The GP100 is unique in that it bundles in dedicated CUDA cores for ultra-fast double-precision floating-point performance. Whereas the P6000 musters ~375 GFLOPS of DP performance, the GP100 stomps that with its 5 TFLOPS.

It’s also worth noting that the GP100 is ideal for those seeking out fast half-precision performance, as it boasts the incredible promise of 20 TFLOPS FP16 (2xFP32). Since the P6000 is a GP102 chip, it doesn’t have the same FP16 scaling, and in fact, the half-precision performance is 1/64 of its FP32 rate, or roughly 187 GFLOPS – yes, half the performance of its FP64 rating.

That all said, the GP100 is designed almost as a solution for those who require both a high-end Quadro and a high-end Tesla, where market-leading compute isn’t just needed, but also huge graphics performance.

The P6000 still does have one trick up its sleeve, though, and that’s 256 CUDA cores over the GP100. That means that for typical Quadro workloads, the P6000 is going to be faster overall. It’s when compute becomes an important requirement that the GP100 should be opted for instead.

The table below helps illustrate the improvements NVIDIA’s made to its top-end Quadro over the past couple of generations. Both the K6000 and M6000 included 12GB of VRAM at launch, although the second-gen M6000 bumped that to 24GB, preemptively matching the P6000. Both single- and double-precision performance have seen significant increases with each new generation, and the same applies to the chip’s complexity.

NVIDIA Quadro Generational Improvements
ProcessTDPFP32FP64MemoryTransistors
Quadro P600016nm250W12 TFLOPS375 GFLOPS24GB12 Billion
Quadro M600028nm250W7 TFLOPS190 GFLOPS12GB8 Billion
Quadro K600028nm225W5.2 TFLOPS173 GFLOPS12GB7.1 Billion

Like the Quadro M6000, the P6000 includes 4x DisplayPort connectors in addition to a single DVI-D connector. A single card can support: 8K @ 30Hz, 5K @ 60Hz, and 4K @ 60Hz. I am not sure if multiple 8K monitors can be used off of a single card, but NVIDIA does give explicit support for 5K and 4K x 4.

NVIDIA Quadro P6000 Package Contents

PNY’s Quadro P6000 includes 3x DP-to-DVI adapters, a stereo extension card, and in case your power supply doesn’t include an 8-pin connector, a dual 6-pin to 8-pin adapter.

Alongside the Quadro P6000 is an update to another piece of NVIDIA gear: Quadro Sync. With Quadro Sync II, users can combine the efforts of up to four GPUs to make certain that the frames outputted to their displays are in perfect sync. In the vast majority of usage cases where multiple displays (or even windows) are used, an absolute perfect sync might not matter, but there are other use cases – like broadcast – where it’s imperative.

NVIDIA Quadro Sync II Card

Before it became a gaming technology part of NVIDIA’s GeForce line, Quadro Sync used to be called “G-SYNC”. Whereas on the gaming side, monitors with G-SYNC technology baked-in are required (along with an NVIDIA graphics card), Quadro Sync II can synchronize frames regardless of the monitor model. The card calls the shots; not the monitors. Tying further into the broadcast example, the Sync II card can also be used to generate a house sync, saving you money if you don’t already own a sync generator (but need one).

Before moving into performance, there are a couple of other quick things to mention. The memory solution on the P6000, and also the P5000, is super-fast GDDR5X, much like it is with NVIDIA’s top-end Pascal gaming cards. On these Quadros, though, users are able to enable ECC mode if it’s needed (or simply desired).

While it hasn’t been covered up to this point, the VR push on the latest Quadros is in overdrive, with NVIDIA trying to prove that VR will be huge in the enterprise space – something I agree with. Over the past year, I’ve experienced a handful of VR demos, some revolving around Iray, and after spending just a few moments with each, it’s not hard to understand what kind of impact VR can have for product or video creation, or even architecture, for that matter. With NVIDIA’s annual GPU Technology Conference set to take place this May, I’m sure we’ll be finding many cool examples of this there.

Performance Testing The Quadro P6000

On the following pages, we’ll be putting NVIDIA’s latest top-end Quadro through a gauntlet of real-world and synthetic tests, utilizing apps from Autodesk, Adobe, SPEC, SiSoftware, and a handful of others (including light gaming tests for good measure).

All tests are run at least twice to produce an accurate result, and if for some reason an odd result creeps up, we do a third run. In the case of this particular review, no tests had to go that route, as most of the benchmarks are very good at delivering similar results with each repeated run.

Our Windows 7 Ultimate x64 test OS has a couple of key Windows services disabled (Search, Defender, Firewall, and Update), as well as Aero. During all testing, the display is kept in 4K resolution, with two exceptions: SPECapc Maya 2012 and SPECviewperf are run with a 1080p resolution. Further, Vsync, G-SYNC, and FreeSync are disabled.

Our test system is as follows:

Techgage Workstation Test System
ProcessorIntel Core i7-5960X (8-core; 3GHz)
MotherboardASUS X99-DELUXE
MemoryCorsair Vengeance 32GB (8x4GB; DDR3-2133 11-12-11)
GraphicsNVIDIA GeForce GTX TITAN X 12GB (GeForce 353.30)
NVIDIA Quadro P6000 24GB (Quadro 376.62)
NVIDIA Quadro M6000 12GB (Quadro 352.86)
NVIDIA Quadro M2000 4GB (Quadro 362.13)
NVIDIA Quadro K5200 8GB (Quadro 353.30)
NVIDIA Quadro K5000 4GB (Quadro 353.30)
AMD Radeon Pro WX 5100 8GB (16.12.1)
AMD Radeon Pro WX 4100 4GB (16.12.1)
AMD FirePro W4300 4GB (FirePro 15.201)
AudioOnboard
StorageKingston HyperX 3K 480GB SSD
Power SupplyCooler Master Silent Pro Hybrid 1300W
ChassisCooler Master Storm Trooper
CoolingThermaltake WATER3.0 Extreme Liquid
DisplaysAcer XB280HK 28″ 4K G-SYNC Monitor
Et ceteraWindows 7 Professional 64-bit

With that all covered, it’s time to jump right into the test results.

Rendering: Autodesk 3ds Max, OctaneBench, LuxMark & Cinebench

Autodesk 3ds Max 2017

Our 3ds Max testing takes advantage of the suite’s latest version, 2017, and with it, we render three complex scenes: the interior of a room and an Audi automobile, both using Iray, and a second room interior, using Iray+.

Because 3ds Max 2017 doesn’t support NVIDIA’s Pascal architecture out-of-the-box, an official Autodesk plugin had to be installed, which conveniently came out just at the start of this month. If you’re using the latest version of 3ds Max and want Pascal support, hit up the Product Updates section in your Autodesk account dashboard. For Iray+, testing was done using the latest version of Lightwork’s plugin (1.30).

Autodesk 3ds Max 2017 - Iray Render
NVIDIA Quadro P6000 - Autodesk 3ds Max 2017

Despite having a huge performance advantage over the Quadro M6000, the P6000 doesn’t decimate that card’s performance quite like I expected. From a pure throughput perspective, the P6000 is about 71% faster than the M6000 (something later benchmarks will agree with), but it proves just about ~35% quicker here.

Given what I’ve seen from Iray scaling in the past, I’m simply led to believe that the plugins are not taking as much advantage of Pascal as they could. I’ll be revisiting Iray performance in a month or two, as I’ll be overhauling our test suite with updated tests (and test OS).

Synthetic: Cinebench & LuxMark

To compare our collection of workstation GPUs across other renderers, we rely on Cinebench and LuxMark. The latter is of particular interest as it renders using OpenCL. It also happens to be so good at what it does that we opt to use it for the sake of generating peak temperature and power information.

Cinebench
LuxMark
NVIDIA Quadro P6000 - Cinebench
NVIDIA Quadro P6000 - LuxMark

In the 3ds Max test at the top of the page, I mentioned that Pascal might not have perfect support for Iray right now, which is a bit of a theme with a brand-new architecture like this. As part of the Pascal launch, a new version of CUDA was also released, and the supporting software has yet to take full advantage of it so far. With that in mind, you can now probably take one guess why I didn’t include OctaneBench here!

In talking to OTOY, I was told that a new OctaneBench is en route, so once it drops, I’ll integrate it back into our testing.

Cinebench is one benchmark that’s growing long in the tooth, as both the M6000 and P6000 scored the same (likely a CPU bottleneck at this point), despite our gaming tests (coming up) showing great scaling between the two. In LuxMark, a test that exercises GPUs to their fullest, the P6000 dominates, becoming the first graphics card in our arsenal to ever breach 20,000 on the main LuxBall render.

The remaining LuxMark results highlight the fact that the P6000 can deliver even better results when the going gets tough. In the Hotel Lobby render, the most gruelling of the bunch, the P6000 actually manages to deliver a score more than twice that of the M6000. This is a great example of how much faster Pascal-based Quadros can be when they’re properly utilized.

Encoding & CAD: Adobe Premiere Pro CC & Autodesk AutoCAD 2015

Adobe Premiere Pro CC (2017)

To test the accelerated encoding perks of different GPUs, we make use of the de facto video editing tool Adobe Premiere Pro. In the past, we would have included After Effects results, thanks to its ability to tap into CUDA for accelerated rendering of ray traced elements, but recent versions of that app have failed to update support for Maxwell. Instead, Adobe is preferring to target the renderer bundled with PP, Cinema 4D “lite”.

It’s with this testing that I found the P6000 a little “too” powerful, as it simply didn’t exhibit real gains over the M6000, which is silly for a product almost twice as fast. NVIDIA was kind enough to ship over some updated workloads that help a bit with that, including dual RED encodes, and also a megamix of sorts.

Adobe Premiere Pro 2017 - GPU-based Video Encodes
NVIDIA Quadro P6000 - Adobe Premiere Pro

With straight-forward video encodes, such as with the RED projects here, the gains are small (8~13%). But when the project grows larger and effects are tossed in, the deltas can increase quite a bit, with the P6000 proving about 40% faster than the M6000.

Autodesk AutoCAD 2015

For CAD testing, we’re taking advantage of the excellent Cadalyst benchmark.

Autodesk AutoCAD 2015
NVIDIA Quadro P6000 - Cadalyst 2015

If there’s one target application NVIDIA wouldn’t point its Quadro P6000 towards, it’s AutoCAD, and the results above can help explain why. Here, the M6000 and P6000 are considered equals, even though the 3D performance of the P6000 in most other tests would say otherwise.

Most CAD use is not going to exhibit huge gains on the 3D front. Even the lowbie Quadro M2000 performs admirably here. Higher-end CAD solutions should show much greater performance enhancements. And speaking of, we have SPECviewperf on the following page to help us see proof of that.

SPEC: SPECapc 3ds Max & Maya, SPECviewperf & SPECwpc

When it comes to benchmarking hardware for serious use cases, there are no better people to turn to than those at SPEC. I like to call them the “masters of benchmarking”, as each one of their tools are meticulously crafted by professionals to deliver results as relevant and accurate as possible – a goal shared by us at Techgage.

For testing the performance of workstation cards, we take advantage of two SPECapc benchmarks – 3ds Max 2015 and Maya 2012 – as well as two that don’t require a standalone application: SPECviewperf and SPECwpc. While the Maya benchmark might be growing a little long in the tooth at this point, it still scales well with current GPUs.

SPECapc 3ds Max 2015

SPECapc 3ds Max 2015
NVIDIA Quadro P6000 - SPECapc 3ds Max 2015
P6000M6000M2000WX 5100
1080p 0xAA (CPU)5.885.895.885.90
1080p 4xAA (CPU)5.885.885.875.88
1080p 8xAA (CPU)5.885.845.875.88
1080p 0xAA (Large Model)4.614.524.594.30
1080p 4xAA (Large Model)4.584.533.642.82
1080p 8xAA (Large Model)4.594.483.292.75
4K 0xAA (CPU)5.855.875.885.83
4K 4xAA (CPU)5.855.875.885.81
4K 8xAA (CPU)5.855.855.69
4K 0xAA (Large Model)4.614.504.593.88
4K 4xAA (Large Model)4.564.392.612.24
4K 8xAA (Large Model)4.543.952.12

SPECwpc 3ds Max 2015 doesn’t take advantage of NVIDIA’s Iray, so the test gives us a great second look at general performance in the application, both from the viewport performance to the rendering performance. Overall, the P6000 proves dominant, but as we’ve seen a few times already, the P6000 is almost too powerful for certain workloads.

SPECapc Maya 2012

SPECapc Maya 2012
NVIDIA Quadro P6000 - SPECapc Maya 2012
P6000M6000M2000WX 5100
Shaded5.024.714.093.58
Shaded HQ8.126.805.132.65
Textured5.425.104.393.74
Textured HQ8.967.705.752.97
Wireframe4.364.023.663.48
Selected5.855.464.723.94
Highlighted5.785.304.694.19

I admit that these results surprised me a bit. I’ve mentioned a couple of times already that some workloads are simply not strong enough to take full advantage of the P6000, but here we have a five-year-old test that manages to show further improvement on NVIDIA’s latest and greatest. It’s not a major gain, but neither was the gain between the Kepler-based K5200 and Maxwell-based M6000.

SPECviewperf 12

Whereas both SPECapc benchmarks used above stress a variety of different components of their respective tools, SPECviewperf’s target is singular: viewport performance. One reason I like this test is because it utilizes software we couldn’t otherwise test with (due to the lack of a license); namely CATIA, SolidWorks, and Siemens NX.

SPECviewperf 12
NVIDIA Quadro P6000 - SPECviewperf 12

Here is where we begin to see NVIDIA’s Quadro P6000 show the rest of our lineup just who’s boss. In most of the tests, there are considerable gains seen with the P6000. Whereas an earlier test showed a 30% gain at best, that’s the starting point of the gains here. In particular, the Medical, Energy, and Showcase tests show huge jumps of about 70~75%. The high-end CAD suites CATIA, SolidWorks, and SNX show gains of about 35~40%.

SPECwpc

The “w” in SPECwpc stands for “workstation”, and it acts as a bit of an “overall” testing suite. In some ways, it combines the goals of its other tests and combines them into a single benchmark. Thus, the results are split into six categories, and the result of one might matter more to some people than others.

SPECwpc
NVIDIA Quadro P6000 - SPECwpc

From the bottom to the top, SPECwpc doesn’t show huge deltas between one card and the next, so the P6000 has a hard time strutting its stuff here. Nonetheless, the card still does give us notable gains in most tests.

Sandra: Processing, Cryptography, Scientific, Financial & Bandwidth

On the previous page, I mentioned that SPEC is an organization that crafts some of the best benchmarks going, and in a similar vein, I can compliment SiSoftware. This is a company that thrives on offering support for certain technologies before those technologies are even available to the consumer. In that regard, its Sandra benchmark might seem a little bleeding-edge, but at the same time, its tests are established, refined, and really accurate across multiple runs.

For the purposes of a workstation graphics card review, we focus on four main tests: general GPU processing, cryptography, financial analysis, and scientific analysis. Some of these tests produce complex results, so those will be displayed in a table rather than a graph.

SiSoftware Sandra

GPU Processing

Sandra 2015 – GPU Processing
P6000M6000K5200M2000
CUDA: Single-Float17.38 GPix/s9.13 GPix/s4.16 GPix/s2.48 GPix/s
OpenCL: Single-Float15.4 GPix/s8.10 GPix/s3.37 GPix/s2.19 GPix/s
CUDA: Half-Float17.26 GPix/s9.05 GPix/s4.13 GPix/s2.47 GPix/s
OpenCL: Half-Float15.45 GPix/s8.2 GPix/s3.39 GPix/s2.19 GPix/s
CUDA: Double-Float646.59 MPix/s344.16 MPix/s272.68 MPix/s92.89 MPix/s
OpenCL: Double-Float646.76 MPix/s347.83 GPix/s268.22 MPix/s185.1 MPix/s
CUDA: Quad-Float27.24 MPix/s12.69 MPix/s11.54 MPix/s4 MPix/s
OpenCL: Quad-Float25.19 MPix/s13.59 MPix/s19.62 MPix/s8.37 MPix/s
Results in pixels-per-second. 1 GPix = 1,000 MPix; 1 MPix = 1,000 kPix.

In some of the tests on the previous pages, the P6000 has struggled to shine, but Sandra is having none of that. In raw throughput, the P6000 is roughly double the performance of the M6000. In some cases, it’s 88% faster, and with the quad-float CUDA test, the P6000 actually manages to be more than twice as fast (114%).

Cryptography

NVIDIA Quadro P6000 - Sandra - Cryptography (High)
NVIDIA Quadro P6000 - Sandra - Cryptography (Higher)

The awesome results keep coming for the Quadro P6000. Overall, it’s safe to say that the P6000 is twice as fast where encryption is concerned. I’m not sure of the reason for the specific gain, but CUDA hashing sees dramatic improvement on Pascal. Further testing showed that NVIDIA’s own driver improvements had some hand in these increases, but the architectural boost played the largest role.

Financial Analysis

Sandra 2015 – Financial Analysis (Single Precision)
P6000M6000K5200M2000
CUDA: Black-Scholes11.62 G/s8.14 G/s3.44 G/s2.12 G/s
OpenCL: Black-Scholes11.54 G/s8.10 G/s4.49 G/s1.58 G/s
CUDA: Binomial3 M/s1.58 M/s676.64 k/s445.48 k/s
OpenCL: Binomial3.15 M/s1.60 M/s645.42 k/s375.33 k/s
CUDA: Monte Carlo6.49 M/s3 M/s1.20 M/s883.6 k/s
OpenCL: Monte Carlo6.42 M/s2.81 M/s1.18 M/s756.45 k/s
Results in options-per-second. 1 GOPS = 1,000 MOPS; 1 MOPS = 1,000 kOPS.
Sandra 2015 – Financial Analysis (Double Precision)
P6000M6000K5200M2000
CUDA: Black-Scholes1.33 G/s700 M/s541.32 M/s193.91 M/s
OpenCL: Black-Scholes1.3 G/s691.82 M/s533.91 M/s235.91 M/s
CUDA: Binomial131.83 k/s70.32 k/s52.55 k/s19 k/s
OpenCL: Binomial132 k/s71.45 k/s52.93 k/s15.79 k/s
CUDA: Monte Carlo272.54 k/s147.71 k/s112.53 k/s40 k/s
OpenCL: Monte Carlo272.62 k/s147.79 k/s112.43 k/s35.86 k/s
Results in options-per-second. 1 GOPS = 1,000 MOPS; 1 MOPS = 1,000 kOPS.

The P6000 continues to impress here, with varying degrees of improvement being seen from test to test, but with all of the improvements being substantial. The OpenCL Monte Carlo test, for example, exhibited a 128% performance boost on the P6000, versus the M6000 (which is still a seriously powerful GPU!)

Scientific Analysis

Sandra 2015 – Scientific Analysis (Single Precision)
M6000P6000K5200M2000
CUDA: GEMM5.53 TFLOPS3.2 TFLOPS1.1 TFLOPS951.73 GFLOPS
OpenCL: GEMM6.81 TFLOPS3.6 TFLOPS1 TFLOPS983.37 GFLOPS
CUDA: FFT261.88 GFLOPS204.3 GFLOPS80.8 GFLOPS54.77 GFLOPS
OpenCL: FFT268.44 GFLOPS220.7 GFLOPS97.0 GFLOPS65.24 GFLOPS
CUDA: NBDY5.78 TFLOPS2.9 TFLOPS1 TFLOPS915.53 GFLOPS
OpenCL: NBDY5 TFLOPS3 TFLOPS1 TFLOPS601.82 GFLOPS
Results in floating-point operations-per-second. GEMM = General Matrix Multiply; FFT = Fast Fourier Transform; NBDY = N-Body Simulation.
Sandra 2015 – Scientific Analysis (Double Precision)
P6000M6000K5200M2000
CUDA: GEMM325 GFLOPS175.1 GFLOPS147.8 GFLOPS48.11 GFLOPS
OpenCL: GEMM325.11 GFLOPS174.6 GFLOPS148.0 GFLOPS49.64 GFLOPS
CUDA: FFT111.38 GFLOPS89.1 GFLOPS48.7 GFLOPS28 GFLOPS
OpenCL: FFT131.79 GFLOPS120.3 GFLOPS58.6 GFLOPS36.16 GFLOPS
CUDA: NBDY189.8 GFLOPS103.0 GFLOPS112.1 GFLOPS38.17 GFLOPS
OpenCL: NBDY190.25 GFLOPS103.6 GFLOPS111.9 GFLOPS51.18 GFLOPS
Results in floating-point operations-per-second. GEMM = General Matrix Multiply; FFT = Fast Fourier Transform; NBDY = N-Body Simulation.

To help wrap up our Sandra results, we have more proof that the Quadro P6000 is a really, really fast card. In the worst case, gains of 25% can be seen; in the best case, 99% (CUDA N-Body).

Gaming: Futuremark 3DMark & Unigine Heaven

Gaming is generally not a big focus for professional GPU lines, but the fact of the matter is, they can game.

That especially applies to the top-tier cards on the market, as they all perform similarly to the top-tier gaming cards from the same vendor of the same generation.

So what’s the caveat with gaming on workstation cards? A lack of game-specific optimizations.

While on the GeForce or Radeon (non-Pro) side, the companies constantly roll out updates that improve general performance in gaming or performance specific to one title, Quadro and Radeon Pro drivers don’t have such granularity where gaming’s concerned.

To get a quick gauge on the performance of our workstation GPU collection in gaming, we use Futuremark’s 3DMark and Unigine’s Heaven.

Futuremark 3DMark
NVIDIA Quadro P6000 - Futuremark 3DMark
Unigine Heaven
NVIDIA Quadro P6000 - Unigine Heaven

According to these gaming benchmarks, the Quadro P6000 is about 60~64% faster than the M6000. I would not be surprised if select scenarios would exhibit even greater gains, and this is something I plan to evaluate more in a couple of months as we look to overhaul our test suite.

Power, Temperatures & Final Thoughts

To test workstation graphics cards for both their power consumption and temperature at load, we utilize a couple of different tools. On the hardware side, we use a trusty Kill-a-Watt power monitor which our GPU test machine plugs into directly. For software, we use LuxMark to stress the card, and GPU-Z to record the temperatures.

To test, the area around the chassis is checked with a temperature gun, with the average temp recorded. Once that’s established, the PC is turned on and left to sit idle for five minutes. At this point, we open GPU-Z along with LuxMark. After its initial (automatic) render is complete, we kick off a 15 minute stress-test. Following this, we monitor the Kill-a-Watt for a minute to establish peak load wattage.

NVIDIA Quadro P6000 - GPU Temperatures
NVIDIA Quadro P6000 - Power Consumption

These results highlight some improvements I love to see from generation to generation. The Quadro P6000 is just about twice as fast as the M6000, yet it ran 4°C cooler in this stress test. And, if that wasn’t enough, it also drew 32W less at full load. You’ve gotta love progress.

Final Thoughts

When we’re handed the fastest product ever released in a certain product category, drumming up a conclusion isn’t too difficult. Nothing changes here. As it stands today, the Quadro P6000 is the fastest GPU NVIDIA’s ever produced; a 12 TFLOP monster in a single-GPU form-factor. And despite its huge performance, the P6000 draws less power than last-gen’s M6000, and it even runs a bit cooler, to boot.

Leading up to this review, I put a considerable number of hours into benchmarking the P6000, even more than I spent on the M6000. When I tackled the M6000, it came at a time when all of the software I tested with supported NVIDIA’s Maxwell architecture. I didn’t have the same luxury here with Pascal, as OctaneBench’s support for the architecture is coming soon, and the Iray performance I saw didn’t scale as well as I expected it to (even though considerable gains were seen).

As with all workstation graphics cards, understanding your needs and wants is of utmost performance. In some cases, the P6000 isn’t much (or any) faster than the last-gen M6000, despite its performance being spec’d 71% better. In AutoCAD, we saw that there is a definite point of diminishing returns. In 3D tests, the P6000 just about decimated the M6000.

Ultimately, the Quadro P6000 is cutting-edge hardware, and it requires software to catch up, either by way of the plugins or the entire suite. 3ds Max didn’t officially support Pascal until 10 or so days ago, and as mentioned above, OctaneBench (which is built around CUDA) has a new version coming.

Conveniently, I’ll be rebenchmarking our fleet of workstation graphics cards in a month or two, as a dedicated PC is going to be built around their testing (up to this point I’ve used our gaming GPU testbed as a dual-purpose machine), which will also bring us up to speed on the OS front (Windows 7 grew long in the tooth ages ago). At that time, performance gains on Pascal are likely to be notable.

While the gains seen in real-world tests varied, SiSoftware’s Sandra helped bring some sanity to our results madness. With it, we saw dramatic gains in performance in single-, double-, half-, and quad-float, and that carried on over to the more specific financial, scientific, and crypto tests. In some cases, the P6000 performed better than 2x over the M6000, although a year-and-a-half worth of driver improvements helped out a bit with that as well.

At the end of the day, the Quadro P6000 is the fastest Quadro ever created, and in fact the fastest GPU the company’s ever created. NVIDIA has provided the hardware; you just need to provide the software and workloads to take full advantage of this beast.

Pros

Cons

NVIDIA Quadro P6000 - Techgage Editor's Choice
NVIDIA Quadro P6000

Copyright © 2005-2017 Techgage Networks Inc. - All Rights Reserved.