by Rob Williams on July 28, 2015 in Graphics & Displays
What NVIDIA’s GeForce TITAN X does for gaming, its Quadro M6000 does for workstations. As the company’s first Maxwell-based Quadro, the M6000 has a lot going for it: an impressive performance-per-watt rating, support for 4x 4K/60 displays, and despite its 7 TFLOPs performance, requires just a single 8-pin connector.
On the previous page, I mentioned that SPEC is an organization that crafts some of the best benchmarks going, and in a similar vein, I can compliment SiSoftware. This is a company that thrives on offering support for certain technologies before those technologies are even available to the consumer. In that regard, its Sandra benchmark might seem a little bleeding-edge, but at the same time, its tests are established, refined, and really accurate across multiple runs.
For the purposes of a workstation graphics card review, we focus on four main tests: general GPU processing, cryptography, financial analysis, and scientific analysis. Some of these tests produce complex results, so those will be displayed in a table rather than a graph.
|Sandra 2015 – GPU Processing|
|CUDA: Single-Float||9.13 GPix/s||4.16 GPix/s||2.57 GPix/s||9.40 GPix/s|
|OpenCL: Single-Float||8.10 GPix/s||3.37 GPix/s||2 GPix/s||7.75 GPix/s|
|CUDA: Half-Float||9.05 GPix/s||4.13 GPix/s||2.57 GPix/s||8.53 GPix/s|
|OpenCL: Half-Float||8.2 GPix/s||3.39 GPix/s||2 GPix/s||7.53 GPix/s|
|CUDA: Double-Float||344.16 MPix/s||272.68 MPix/s||144 MPix/s||348.07 MPix/s|
|OpenCL: Double-Float||347.83 GPix/s||268.22 MPix/s||140 MPix/s||351.54 MPix/s|
|CUDA: Quad-Float||12.69 MPix/s||11.54 MPix/s||6 MPix/s||12.83 MPix/s|
|OpenCL: Quad-Float||13.59 MPix/s||19.62 MPix/s||5 MPix/s||13.76 MPix/s|
For the most part, the performance differences between CUDA and OpenCL processing are minimal, though it is notable that the latter is faster in the quad-float test. This is another test where the M6000 and TITAN X are close to being equals, but that’s not much of a surprise given NVIDIA wouldn’t optimize its drivers for synthetic tests like these. Compared to the previous-generation K5200, though, the performance differences are stark in both the single- and double-float tests.
This is another great test for showing the dramatic improvements Maxwell can offer over Kepler. I regret not being able to offer up K6000 results here, but comparing to the K5200, it doesn’t take much to understand that we’ve at least doubled performance from one generation to the next. The results are most impressive in the “Higher” test, which represents AES256 and SHA512 testing (versus AES256 + SHA2-256); the M6000 is 7x faster than the K5200 when using CUDA, and 3x faster when using OpenCL.
|Sandra 2015 – Financial Analysis (Single Precision)|
|CUDA: Black-Scholes||8.14 G/s||3.44 G/s||1.47 G/s||8.14 G/s|
|OpenCL: Black-Scholes||8.10 G/s||4.49 G/s||1.48 G/s||8.11 G/s|
|CUDA: Binomial||1.58 M/s||676.64 k/s||381.43 k/s||1.54 M/s|
|OpenCL: Binomial||1.60 M/s||645.42 k/s||379.64 k/s||1.53 M/s|
|CUDA: Monte Carlo||3 M/s||1.20 M/s||771.30 k/s||3 M/s|
|OpenCL: Monte Carlo||2.81 M/s||1.18 M/s||689.37 k/s||2.67 M/s|
|Sandra 2015 – Financial Analysis (Double Precision)|
|CUDA: Black-Scholes||700 M/s||541.32 M/s||286.48 M/s||705.90 M/s|
|OpenCL: Black-Scholes||691.82 M/s||533.91 M/s||266.76 M/s||699.64 M/s|
|CUDA: Binomial||70.32 k/s||52.55 k/s||28.75 k/s||71.14 k/s|
|OpenCL: Binomial||71.45 k/s||52.93 k/s||28.79 k/s||72.48 k/s|
|CUDA: Monte Carlo||147.71 k/s||112.53 k/s||58.53 k/s||149.18 k/s|
|OpenCL: Monte Carlo||147.79 k/s||112.43 k/s||58.57 k/s||149.16 k/s|
It’s here where the results become complex really fast. While some of the performance is measured in the thousands of options-per-second, some is measured in the millions – that’s an obvious problem when trying to sort it all in a graph.
Nonetheless, as with the cryptography test we can see some great performance improvements over the K5200, and also repeating itself is the fact that CUDA and OpenCL performance is quite close.
|Sandra 2015 – Scientific Analysis (Single Precision)|
|CUDA: GEMM||3.2 TFLOPS||1.1 TFLOPS||83.2 GFLOPS||3.2 TFLOPS|
|OpenCL: GEMM||3.6 TFLOPS||1 TFLOPS||374.1 GFLOPS||3.4 TFLOPS|
|CUDA: FFT||204.3 GFLOPS||80.8 GFLOPS||71.4 GFLOPS||205 GFLOPS|
|OpenCL: FFT||220.7 GFLOPS||97.0 GFLOPS||81 GFLOPS||221.5 GFLOPS|
|CUDA: NBDY||2.9 TFLOPS||1 TFLOPS||718.3 GFLOPS||2.9 TFLOPS|
|OpenCL: NBDY||3 TFLOPS||1 TFLOPS||622 GFLOPS||2.9 TFLOPS|
|Sandra 2015 – Scientific Analysis (Double Precision)|
|CUDA: GEMM||175.1 GFLOPS||147.8 GFLOPS||10.6 GFLOPS||177.2 GFLOPS|
|OpenCL: GEMM||174.6 GFLOPS||148.0 GFLOPS||28.2 GFLOPS||176.4 GFLOPS|
|CUDA: FFT||89.1 GFLOPS||48.7 GFLOPS||18.5 GFLOPS||89.3 GFLOPS|
|OpenCL: FFT||120.3 GFLOPS||58.6 GFLOPS||22.5 GFLOPS||120.9 GFLOPS|
|CUDA: NBDY||103.0 GFLOPS||112.1 GFLOPS||63.3 GFLOPS||101.4 GFLOPS|
|OpenCL: NBDY||103.6 GFLOPS||111.9 GFLOPS||63.4 GFLOPS||105.0 GFLOPS|
Wrapping up our Sandra testing is a set of results that backs up what we’ve seen with the others so far on this page: the M6000 is on par with the TITAN X overall in non-optimized applications, and in most cases, it’s dramatically faster than the K5200.