by Rob Williams on May 5, 2016 in Graphics & Displays
		To those who’ve been waiting for a mid-range Maxwell Quadro to come along: your wait is over. To help wrap up its Maxwell-based lineup, NVIDIA’s Quadro team has released the ~$500 M2000. This card is largely targeted at CAD users and those with lighter 3D design needs, and promises to be much more efficient – and faster overall – than its predecessor. We gave the card a thorough test to see just how true that is.
	 
 
On the previous page, I mentioned that SPEC is an organization that crafts some of the best benchmarks going, and in a similar vein, I can compliment SiSoftware. This is a company that thrives on offering support for certain technologies before those technologies are even available to the consumer. In that regard, its Sandra benchmark might seem a little bleeding-edge, but at the same time, its tests are established, refined, and really accurate across multiple runs.
For the purposes of a workstation graphics card review, we focus on four main tests: general GPU processing, cryptography, financial analysis, and scientific analysis. Some of these tests produce complex results, so those will be displayed in a table rather than a graph.
GPU Processing
|  | Sandra 2015 – GPU Processing | 
|  | M6000 | K5200 | K5000 | M2000 | 
| CUDA: Single-Float | 9.13 GPix/s | 4.16 GPix/s | 2.57 GPix/s | 2.48 GPix/s | 
| OpenCL: Single-Float | 8.10 GPix/s | 3.37 GPix/s | 2 GPix/s | 2.19 GPix/s | 
| CUDA: Half-Float | 9.05 GPix/s | 4.13 GPix/s | 2.57 GPix/s | 2.47 GPix/s | 
| OpenCL: Half-Float | 8.2 GPix/s | 3.39 GPix/s | 2 GPix/s | 2.19 GPix/s | 
| CUDA: Double-Float | 344.16 MPix/s | 272.68 MPix/s | 144 MPix/s | 92.89 MPix/s | 
| OpenCL: Double-Float | 347.83 GPix/s | 268.22 MPix/s | 140 MPix/s | 185.1 MPix/s | 
| CUDA: Quad-Float | 12.69 MPix/s | 11.54 MPix/s | 6 MPix/s | 4 MPix/s | 
| OpenCL: Quad-Float | 13.59 MPix/s | 19.62 MPix/s | 5 MPix/s | 8.37 MPix/s | 
For both the single- and double-precision tests, the M2000 and K5000 are effective equals, at least where CUDA is concerned. With OpenCL, the M2000 is much faster overall, especially with the double- and quad-precision tests.
Cryptography
It’s clear from these results that NVIDIA has made great strides with the cryptography performance on its Maxwell-based Quadros. The M2000 makes a complete mockery of the K5000 here – a card that has more often than not outperformed the M2000 in our other tests.
Financial Analysis
|  | Sandra 2015 – Financial Analysis (Single Precision) | 
|  | M6000 | K5200 | K5000 | M2000 | 
| CUDA: Black-Scholes | 8.14 G/s | 3.44 G/s | 1.47 G/s | 2.12 G/s | 
| OpenCL: Black-Scholes | 8.10 G/s | 4.49 G/s | 1.48 G/s | 1.58 G/s | 
| CUDA: Binomial | 1.58 M/s | 676.64 k/s | 381.43 k/s | 445.48 k/s | 
| OpenCL: Binomial | 1.60 M/s | 645.42 k/s | 379.64 k/s | 375.33 k/s | 
| CUDA: Monte Carlo | 3 M/s | 1.20 M/s | 771.30 k/s | 883.6 k/s | 
| OpenCL: Monte Carlo | 2.81 M/s | 1.18 M/s | 689.37 k/s | 756.45 k/s | 
|  | Sandra 2015 – Financial Analysis (Double Precision) | 
|  | M6000 | K5200 | K5000 | M2000 | 
| CUDA: Black-Scholes | 700 M/s | 541.32 M/s | 286.48 M/s | 193.91 M/s | 
| OpenCL: Black-Scholes | 691.82 M/s | 533.91 M/s | 266.76 M/s | 235.91 M/s | 
| CUDA: Binomial | 70.32 k/s | 52.55 k/s | 28.75 k/s | 19 k/s | 
| OpenCL: Binomial | 71.45 k/s | 52.93 k/s | 28.79 k/s | 15.79 k/s | 
| CUDA: Monte Carlo | 147.71 k/s | 112.53 k/s | 58.53 k/s | 40 k/s | 
| OpenCL: Monte Carlo | 147.79 k/s | 112.43 k/s | 58.57 k/s | 35.86 k/s | 
The results are quite cut-and-dried here. The Maxwell-based M2000 dominates the single-precision test, versus the K5000, but it falls short in the double-precision test. Maxwell was released with notably decreased DP performance versus the previous generation Kepler cards, so a result like this isn’t surprising.
Scientific Analysis
|  | Sandra 2015 – Scientific Analysis (Single Precision) | 
|  | M6000 | K5200 | K5000 | M2000 | 
| CUDA: GEMM | 3.2 TFLOPS | 1.1 TFLOPS | 83.2 GFLOPS | 951.73 GFLOPS | 
| OpenCL: GEMM | 3.6 TFLOPS | 1 TFLOPS | 374.1 GFLOPS | 983.37 GFLOPS | 
| CUDA: FFT | 204.3 GFLOPS | 80.8 GFLOPS | 71.4 GFLOPS | 54.77 GFLOPS | 
| OpenCL: FFT | 220.7 GFLOPS | 97.0 GFLOPS | 81 GFLOPS | 65.24 GFLOPS | 
| CUDA: NBDY | 2.9 TFLOPS | 1 TFLOPS | 718.3 GFLOPS | 915.53 GFLOPS | 
| OpenCL: NBDY | 3 TFLOPS | 1 TFLOPS | 622 GFLOPS | 601.82 GFLOPS | 
|  | Sandra 2015 – Scientific Analysis (Double Precision) | 
|  | M6000 | K5200 | K5000 | M2000 | 
| CUDA: GEMM | 175.1 GFLOPS | 147.8 GFLOPS | 10.6 GFLOPS | 48.11 GFLOPS | 
| OpenCL: GEMM | 174.6 GFLOPS | 148.0 GFLOPS | 28.2 GFLOPS | 49.64 GFLOPS | 
| CUDA: FFT | 89.1 GFLOPS | 48.7 GFLOPS | 18.5 GFLOPS | 28 GFLOPS | 
| OpenCL: FFT | 120.3 GFLOPS | 58.6 GFLOPS | 22.5 GFLOPS | 36.16 GFLOPS | 
| CUDA: NBDY | 103.0 GFLOPS | 112.1 GFLOPS | 63.3 GFLOPS | 38.17 GFLOPS | 
| OpenCL: NBDY | 103.6 GFLOPS | 111.9 GFLOPS | 63.4 GFLOPS | 51.18 GFLOPS | 
It looks like we couldn’t finish off our regular performance results without another doozy. Here, the M2000 dramatically outperforms the K5000 in most single-precision and double-precision tests. The newer and higher-end Kepler-based K5200 delivers its own impressive results, although the M6000 chimes in to remind us all that it’s the big dog in NVIDIA’s current Quadro line-up (well, aside from the 24GB variant released last month.)