NVIDIA Accelerates Hyperscale Machine Learning With New Tesla M4 & M40 GPUs

Posted on November 10, 2015 12:45 PM by Rob Williams

Over the span of just a few years, machine- and deep-learning went from being mere murmurs to major focal-points at some of the world’s biggest companies. A couple of those focusing hard on the software side of things include Amazon, Google, and Microsoft, while on the hardware side, NVIDIA has been instrumental in designing hardware that can dramatically accelerate processing of important data.

At the previous couple of GPU Technology Conferences, NVIDIA CEO Jen-Hsun Huang’s opening keynotes have included revelations about machine-learning. At last spring’s event, he invited Google’s Jeff Dean and Baidu’s Andrew Ng on the stage to explain how they make use of GPUs to speed-up their work. During the keynote, we didn’t just witness something expected like a search engine that becomes smarter; we even saw examples of where computers could teach themselves to play – and get better at – games, one being Breakout. There should be no doubt at this point that machine-learning is going to be a major part of our future, even if it’s not so obvious to the casual observer.

Machine-learning Performance Improvements On Tesla GPU Accelerators

Appropriate hardware can dramatically improve machine-learning performance

Machine-learning capabilities are not restricted to certain pieces of hardware, but certain pieces of hardware could be a lot better at achieving an overall goal. At that aforementioned GTC, Jen-Hsun explained just how powerful the desktop-targeted GeForce GTX TITAN X is in deep-learning, able to churn through an image recognition project with AlexNet much faster than on a CPU alone. Alternatively, researchers could have made use of Quadro workstation or Tesla compute cards to get the job done.

Well, now there’s a new option. Or two, actually: Tesla M40, and Tesla M4.

NVIDIA Tesla M40 Hyperscale Accelerator Card

NVIDIA’s Tesla M40 Hyperscale GPU Accelerator

NVIDIA Tesla M4 Hyperscale Accelerator Card

NVIDIA’s Tesla M4 Hyperscale GPU Accelerator

The big gun, Tesla M40, is specs-equivalent to the GeForce TITAN X and Quadro M6000. That’s to say that it has 12GB of GDDR5 memory, 3,072 CUDA cores, and a TDP of 250W. Despite the similarities, NVIDIA says that this card can use GPU Boost to achieve single-precision performance of 7 TFLOPs.

While both of these new Tesla accelerators are suitable for similar purposes, NVIDIA targets the M4 at a couple of specific workloads: video transcoding, video processing, image processing, and machine-learning inference. The M4’s form-factor is low-profile, so it can fit into tight enclosures. It includes 1,024 CUDA cores, 4GB of GDDR5, and peaks at 2.2 TFLOPs. Interestingly, it has a variable TDP, based on the profile chosen; this ranges between 50 – 75W.

Along with these cards, NVIDIA’s also shipping what it calls the NVIDIA Hyperscale Suite. This library includes cuDNN, for developing deep-learning algorithms; GPU-accelerated FFmpeg, acceleration of video processing; GPU REST Engine, allowing the rolling out of high-throughput / low-latency Web services; and Image Compute Engine, one that works with the REST Engine to resize images at up to 5x faster over a traditional CPU.

Pricing on either of these two new Tesla accelerators has not been announced, but the M40 and the Hyperscale Suite will become available “later this year”. The M40 will become available in Q1 2016.

Rob Williams

Rob founded Techgage in 2005 to be an 'Advocate of the consumer', focusing on fair reviews and keeping people apprised of news in the tech world. Catering to both enthusiasts and businesses alike; from desktop gaming to professional workstations, and all the supporting software.