NVIDIA’s Fastest Graphics Card Ever: A Look At The Quadro P6000

Print
by Rob Williams on February 14, 2017 in Graphics & Displays

NVIDIA’s latest and greatest-ever workstation graphics card has arrived: Quadro P6000. This top-tier card is built around NVIDIA’s Pascal architecture, which is produced on a 16nm FinFET process. The card boasts an impressive 3,840 CUDA cores, and not to mention 24GB of super-fast GDDR5X. Let’s check it out.

Sandra: Processing, Cryptography, Scientific, Financial & Bandwidth

On the previous page, I mentioned that SPEC is an organization that crafts some of the best benchmarks going, and in a similar vein, I can compliment SiSoftware. This is a company that thrives on offering support for certain technologies before those technologies are even available to the consumer. In that regard, its Sandra benchmark might seem a little bleeding-edge, but at the same time, its tests are established, refined, and really accurate across multiple runs.

For the purposes of a workstation graphics card review, we focus on four main tests: general GPU processing, cryptography, financial analysis, and scientific analysis. Some of these tests produce complex results, so those will be displayed in a table rather than a graph.

SiSoftware Sandra

GPU Processing

Sandra 2015 – GPU Processing
P6000M6000K5200M2000
CUDA: Single-Float17.38 GPix/s9.13 GPix/s4.16 GPix/s2.48 GPix/s
OpenCL: Single-Float15.4 GPix/s8.10 GPix/s3.37 GPix/s2.19 GPix/s
CUDA: Half-Float17.26 GPix/s9.05 GPix/s4.13 GPix/s2.47 GPix/s
OpenCL: Half-Float15.45 GPix/s8.2 GPix/s3.39 GPix/s2.19 GPix/s
CUDA: Double-Float646.59 MPix/s344.16 MPix/s272.68 MPix/s92.89 MPix/s
OpenCL: Double-Float646.76 MPix/s347.83 GPix/s268.22 MPix/s185.1 MPix/s
CUDA: Quad-Float27.24 MPix/s12.69 MPix/s11.54 MPix/s4 MPix/s
OpenCL: Quad-Float25.19 MPix/s13.59 MPix/s19.62 MPix/s8.37 MPix/s
Results in pixels-per-second. 1 GPix = 1,000 MPix; 1 MPix = 1,000 kPix.

In some of the tests on the previous pages, the P6000 has struggled to shine, but Sandra is having none of that. In raw throughput, the P6000 is roughly double the performance of the M6000. In some cases, it’s 88% faster, and with the quad-float CUDA test, the P6000 actually manages to be more than twice as fast (114%).

Cryptography

NVIDIA Quadro P6000 - Sandra - Cryptography (High)
NVIDIA Quadro P6000 - Sandra - Cryptography (Higher)

The awesome results keep coming for the Quadro P6000. Overall, it’s safe to say that the P6000 is twice as fast where encryption is concerned. I’m not sure of the reason for the specific gain, but CUDA hashing sees dramatic improvement on Pascal. Further testing showed that NVIDIA’s own driver improvements had some hand in these increases, but the architectural boost played the largest role.

Financial Analysis

Sandra 2015 – Financial Analysis (Single Precision)
P6000M6000K5200M2000
CUDA: Black-Scholes11.62 G/s8.14 G/s3.44 G/s2.12 G/s
OpenCL: Black-Scholes11.54 G/s8.10 G/s4.49 G/s1.58 G/s
CUDA: Binomial3 M/s1.58 M/s676.64 k/s445.48 k/s
OpenCL: Binomial3.15 M/s1.60 M/s645.42 k/s375.33 k/s
CUDA: Monte Carlo6.49 M/s3 M/s1.20 M/s883.6 k/s
OpenCL: Monte Carlo6.42 M/s2.81 M/s1.18 M/s756.45 k/s
Results in options-per-second. 1 GOPS = 1,000 MOPS; 1 MOPS = 1,000 kOPS.
Sandra 2015 – Financial Analysis (Double Precision)
P6000M6000K5200M2000
CUDA: Black-Scholes1.33 G/s700 M/s541.32 M/s193.91 M/s
OpenCL: Black-Scholes1.3 G/s691.82 M/s533.91 M/s235.91 M/s
CUDA: Binomial131.83 k/s70.32 k/s52.55 k/s19 k/s
OpenCL: Binomial132 k/s71.45 k/s52.93 k/s15.79 k/s
CUDA: Monte Carlo272.54 k/s147.71 k/s112.53 k/s40 k/s
OpenCL: Monte Carlo272.62 k/s147.79 k/s112.43 k/s35.86 k/s
Results in options-per-second. 1 GOPS = 1,000 MOPS; 1 MOPS = 1,000 kOPS.

The P6000 continues to impress here, with varying degrees of improvement being seen from test to test, but with all of the improvements being substantial. The OpenCL Monte Carlo test, for example, exhibited a 128% performance boost on the P6000, versus the M6000 (which is still a seriously powerful GPU!)

Scientific Analysis

Sandra 2015 – Scientific Analysis (Single Precision)
M6000P6000K5200M2000
CUDA: GEMM5.53 TFLOPS3.2 TFLOPS1.1 TFLOPS951.73 GFLOPS
OpenCL: GEMM6.81 TFLOPS3.6 TFLOPS1 TFLOPS983.37 GFLOPS
CUDA: FFT261.88 GFLOPS204.3 GFLOPS80.8 GFLOPS54.77 GFLOPS
OpenCL: FFT268.44 GFLOPS220.7 GFLOPS97.0 GFLOPS65.24 GFLOPS
CUDA: NBDY5.78 TFLOPS2.9 TFLOPS1 TFLOPS915.53 GFLOPS
OpenCL: NBDY5 TFLOPS3 TFLOPS1 TFLOPS601.82 GFLOPS
Results in floating-point operations-per-second. GEMM = General Matrix Multiply; FFT = Fast Fourier Transform; NBDY = N-Body Simulation.
Sandra 2015 – Scientific Analysis (Double Precision)
P6000M6000K5200M2000
CUDA: GEMM325 GFLOPS175.1 GFLOPS147.8 GFLOPS48.11 GFLOPS
OpenCL: GEMM325.11 GFLOPS174.6 GFLOPS148.0 GFLOPS49.64 GFLOPS
CUDA: FFT111.38 GFLOPS89.1 GFLOPS48.7 GFLOPS28 GFLOPS
OpenCL: FFT131.79 GFLOPS120.3 GFLOPS58.6 GFLOPS36.16 GFLOPS
CUDA: NBDY189.8 GFLOPS103.0 GFLOPS112.1 GFLOPS38.17 GFLOPS
OpenCL: NBDY190.25 GFLOPS103.6 GFLOPS111.9 GFLOPS51.18 GFLOPS
Results in floating-point operations-per-second. GEMM = General Matrix Multiply; FFT = Fast Fourier Transform; NBDY = N-Body Simulation.

To help wrap up our Sandra results, we have more proof that the Quadro P6000 is a really, really fast card. In the worst case, gains of 25% can be seen; in the best case, 99% (CUDA N-Body).

Rob Williams

Rob founded Techgage in 2005 to be an 'Advocate of the consumer', focusing on fair reviews and keeping people apprised of news in the tech world. Catering to both enthusiasts and businesses alike; from desktop gaming to professional workstations, and all the supporting software.

twitter icon facebook icon googleplus icon instagram icon