by Rob Williams on February 14, 2017 in Graphics & Displays
NVIDIA’s latest and greatest-ever workstation graphics card has arrived: Quadro P6000. This top-tier card is built around NVIDIA’s Pascal architecture, which is produced on a 16nm FinFET process. The card boasts an impressive 3,840 CUDA cores, and not to mention 24GB of super-fast GDDR5X. Let’s check it out.
On the previous page, I mentioned that SPEC is an organization that crafts some of the best benchmarks going, and in a similar vein, I can compliment SiSoftware. This is a company that thrives on offering support for certain technologies before those technologies are even available to the consumer. In that regard, its Sandra benchmark might seem a little bleeding-edge, but at the same time, its tests are established, refined, and really accurate across multiple runs.
For the purposes of a workstation graphics card review, we focus on four main tests: general GPU processing, cryptography, financial analysis, and scientific analysis. Some of these tests produce complex results, so those will be displayed in a table rather than a graph.
GPU Processing
|
Sandra 2015 – GPU Processing |
|
P6000 |
M6000 |
K5200 |
M2000 |
CUDA: Single-Float |
17.38 GPix/s |
9.13 GPix/s |
4.16 GPix/s |
2.48 GPix/s |
OpenCL: Single-Float |
15.4 GPix/s |
8.10 GPix/s |
3.37 GPix/s |
2.19 GPix/s |
CUDA: Half-Float |
17.26 GPix/s |
9.05 GPix/s |
4.13 GPix/s |
2.47 GPix/s |
OpenCL: Half-Float |
15.45 GPix/s |
8.2 GPix/s |
3.39 GPix/s |
2.19 GPix/s |
CUDA: Double-Float |
646.59 MPix/s |
344.16 MPix/s |
272.68 MPix/s |
92.89 MPix/s |
OpenCL: Double-Float |
646.76 MPix/s |
347.83 GPix/s |
268.22 MPix/s |
185.1 MPix/s |
CUDA: Quad-Float |
27.24 MPix/s |
12.69 MPix/s |
11.54 MPix/s |
4 MPix/s |
OpenCL: Quad-Float |
25.19 MPix/s |
13.59 MPix/s |
19.62 MPix/s |
8.37 MPix/s |
In some of the tests on the previous pages, the P6000 has struggled to shine, but Sandra is having none of that. In raw throughput, the P6000 is roughly double the performance of the M6000. In some cases, it’s 88% faster, and with the quad-float CUDA test, the P6000 actually manages to be more than twice as fast (114%).
Cryptography
The awesome results keep coming for the Quadro P6000. Overall, it’s safe to say that the P6000 is twice as fast where encryption is concerned. I’m not sure of the reason for the specific gain, but CUDA hashing sees dramatic improvement on Pascal. Further testing showed that NVIDIA’s own driver improvements had some hand in these increases, but the architectural boost played the largest role.
Financial Analysis
|
Sandra 2015 – Financial Analysis (Single Precision) |
|
P6000 |
M6000 |
K5200 |
M2000 |
CUDA: Black-Scholes |
11.62 G/s |
8.14 G/s |
3.44 G/s |
2.12 G/s |
OpenCL: Black-Scholes |
11.54 G/s |
8.10 G/s |
4.49 G/s |
1.58 G/s |
CUDA: Binomial |
3 M/s |
1.58 M/s |
676.64 k/s |
445.48 k/s |
OpenCL: Binomial |
3.15 M/s |
1.60 M/s |
645.42 k/s |
375.33 k/s |
CUDA: Monte Carlo |
6.49 M/s |
3 M/s |
1.20 M/s |
883.6 k/s |
OpenCL: Monte Carlo |
6.42 M/s |
2.81 M/s |
1.18 M/s |
756.45 k/s |
|
Sandra 2015 – Financial Analysis (Double Precision) |
|
P6000 |
M6000 |
K5200 |
M2000 |
CUDA: Black-Scholes |
1.33 G/s |
700 M/s |
541.32 M/s |
193.91 M/s |
OpenCL: Black-Scholes |
1.3 G/s |
691.82 M/s |
533.91 M/s |
235.91 M/s |
CUDA: Binomial |
131.83 k/s |
70.32 k/s |
52.55 k/s |
19 k/s |
OpenCL: Binomial |
132 k/s |
71.45 k/s |
52.93 k/s |
15.79 k/s |
CUDA: Monte Carlo |
272.54 k/s |
147.71 k/s |
112.53 k/s |
40 k/s |
OpenCL: Monte Carlo |
272.62 k/s |
147.79 k/s |
112.43 k/s |
35.86 k/s |
The P6000 continues to impress here, with varying degrees of improvement being seen from test to test, but with all of the improvements being substantial. The OpenCL Monte Carlo test, for example, exhibited a 128% performance boost on the P6000, versus the M6000 (which is still a seriously powerful GPU!)
Scientific Analysis
|
Sandra 2015 – Scientific Analysis (Single Precision) |
|
M6000 |
P6000 |
K5200 |
M2000 |
CUDA: GEMM |
5.53 TFLOPS |
3.2 TFLOPS |
1.1 TFLOPS |
951.73 GFLOPS |
OpenCL: GEMM |
6.81 TFLOPS |
3.6 TFLOPS |
1 TFLOPS |
983.37 GFLOPS |
CUDA: FFT |
261.88 GFLOPS |
204.3 GFLOPS |
80.8 GFLOPS |
54.77 GFLOPS |
OpenCL: FFT |
268.44 GFLOPS |
220.7 GFLOPS |
97.0 GFLOPS |
65.24 GFLOPS |
CUDA: NBDY |
5.78 TFLOPS |
2.9 TFLOPS |
1 TFLOPS |
915.53 GFLOPS |
OpenCL: NBDY |
5 TFLOPS |
3 TFLOPS |
1 TFLOPS |
601.82 GFLOPS |
|
Sandra 2015 – Scientific Analysis (Double Precision) |
|
P6000 |
M6000 |
K5200 |
M2000 |
CUDA: GEMM |
325 GFLOPS |
175.1 GFLOPS |
147.8 GFLOPS |
48.11 GFLOPS |
OpenCL: GEMM |
325.11 GFLOPS |
174.6 GFLOPS |
148.0 GFLOPS |
49.64 GFLOPS |
CUDA: FFT |
111.38 GFLOPS |
89.1 GFLOPS |
48.7 GFLOPS |
28 GFLOPS |
OpenCL: FFT |
131.79 GFLOPS |
120.3 GFLOPS |
58.6 GFLOPS |
36.16 GFLOPS |
CUDA: NBDY |
189.8 GFLOPS |
103.0 GFLOPS |
112.1 GFLOPS |
38.17 GFLOPS |
OpenCL: NBDY |
190.25 GFLOPS |
103.6 GFLOPS |
111.9 GFLOPS |
51.18 GFLOPS |
To help wrap up our Sandra results, we have more proof that the Quadro P6000 is a really, really fast card. In the worst case, gains of 25% can be seen; in the best case, 99% (CUDA N-Body).