Date: March 26, 2015
Author(s): Rob Williams
The 2015 GPU Technology Conference proved to be an exciting event with a number of big announcements and a slew of other cool bits of information. In this article, we’re going to take an in-depth look at the biggest announcements made at the show, as well as some of the lesser talked-about items that are still worth highlighting.
As is common of NVIDIA’s GPU Technology Conference, this year’s event gave us a lot to think about, talk about, and get excited about. From deep-learning to super-fast graphics cards, the latest GTC had it all.
In this article, I am going to go over some of the biggest announcements and talked-about subjects that came out of the event, and also add in some quips about a few things that stood out to me personally.
Whenever NVIDIA CEO Jen-Hsun Huang takes the stage for a keynote, it’s rare when we know exactly what he’s going to say. At this year’s GTC, he didn’t beat around the bush: A core of all four announcements was deep-learning.
At 2014’s GTC, NVIDIA talked a bit about using GPUs for the sake of deep-learning, and overall, the concepts, implementations, and executions, were mind-blowing (and at times, mind-numbing, due to their complexities). At the most recent GTC, the amount of focus deep-learning received was major; as mentioned above, Jen-Hsun managed to tie it into all four of his announcements.
The reason for this keen focus shouldn’t come as a surprise: GPUs have proven to be excellent computational devices, and thanks to their massively parallel nature, they’re commonly an order of magnitude faster than traditional CPUs at things like deep-learning, where mass data is involved.
This ties into NVIDIA’s CUDA, which it first introduced in 2007 and began making a big deal about in 2008, in particular at its first-ever GTC – then called NVISION. During his presentation, Jen-Hsun highlighted the progress that CUDA has made since then; in 2008, there were 150,000 CUDA downloads, 27 CUDA apps, and 60 universities teaching GPU computing. In 2015, those numbers have soared to 3,000,000 downloads, 319 apps, and 800 universities.
Perhaps the most staggering figures are those involving Tesla GPUs and overall performance. In 2008, 6,000 Tesla GPUs powered the world’s supercomputers, and delivered a total of 77 TFLOPs. Fast-forward to 2015, and the numbers boost to 450,000 GPUs and 54 petaflops.
Speaking of flops, NVIDIA launched its fastest GPU ever at GTC, called TITAN X. At this point, you’re probably already familiar with the card, especially as we posted a review of it last week, so I won’t rehash all of what it brings to the table here. However, tying into NVIDIA’s deep-learning focus, NVIDIA provides some special performance comparisons:
AlexNet is software that acts as a neural network to enhance the reliability of data classification. This is a perfect example of deep-learning; computers act like a brain (a neural network) to piece together a chain of bits of information in order to give an answer with a high rate of success. For example: a computer telling you the difference between a rottweiler and chocolate lab, or that there’s not only a car in an image, but that it’s parked next to a lake.
With that explained, you can look to the performance graph above to gain an understanding of just how much benefit a current TITAN X can offer a researcher versus the last-gen TITAN Black, and the TITAN before that. It’s almost hard to believe, given just how fast the original TITAN still is, but TITAN X can prove twice as fast two years later for the same cost. This graph also highlights why some data is better-suited for a GPU than a CPU.
To highlight the importance of deep-learning further, NVIDIA invited Google’s Jeff Dean and Baidu’s Andrew Ng to explain how they’ve both been using NVIDIA GPUs to accomplish some impressive things. Being that each keynote was over an hour-long and the subject of deep-learning is broad, I’d highly recommend you head here to watch them if you’re interested.
There are a couple of highlights I’d like to bring up, though, such as one that impressed me a lot during Jeff Dean’s presentation. Through the research of Google’s DeepMind group in London, computers taught themselves to play Atari video games, such as Space Invaders and Breakout. It works because the computers come to grips with the fact that they’re losing, and understand why. Then, because a computer has better reflexes than any human, it progressively becomes good enough where some games simply can’t be lost.
While voice-to-text technology is nothing new, Baidu’s Andrew Ng showed-off just how far it’s come. It’s not just about turning speech into written text anymore – it’s about doing so regardless of the environment. Examples were given where simulated crowd noise was played overtop of a speech block, and even in horrible conditions, the end result was quite good.
Autonomous driving played a big role during Jen-Hsun’s deep-learning keynote, and while it’s an area with a different kind of end-goal than those examples mentioned above, the learning is done in the exact same way. Why deep-learning is extremely important for something like autonomous driving is because if computations are not accurate, people’s lives will be at risk.
In an example shown, we were able to see how a mini-ATV called DAVE benefits from NVIDIA’s DRIVE PX – a number NVIDIA pits at 3,000x, compared to the original model.
As with many of the examples above, DAVE had to teach itself how to drive through a complicated back yard. At first, the vehicle would run straight into the nearest roadblock; after a while, it understood which terrain and objects to avoid, and eventually made it to its end goal.
While this is a modest test, it’s the exact same kind of technology that feeds into our full-blown autonomous vehicles, and while things are impressive now, there’s still a way to go before autonomous driving can be considered ideal for real-world use.
In an interview with Tesla’s Elon Musk, a couple of interesting points about this are raised. For starters, Musk said that it’s not so much low-speed and high-speed (highway) that’s the big hurdle for autonomous driving, it’s mid-speed where autos will likely be passing through urban environments or crowded cities. It’s more challenging there simply because there are so many more factors at risk, with anything from someone at the side of a road about to get out of their car to an open manhole cover.
Musk fully believes that we’ll get there eventually, though; he even went as far to claim that in time, regular autos could be outlawed simply because autonomous will be so much safer and result in far decreased deaths. I am not sure I can see that ever becoming a reality myself, but it’s not hard to understand where he’s coming from.
Deep-learning computation isn’t something we’re going to be doing at home, but make no mistake: Its impact on different facets of our life can be substantial. It’ll be interesting to see where things stand at next year’s GTC.
Following-up to 2013’s Quadro K6000, NVIDIA hauled the veil off of its Maxwell-infused M6000 at GTC. As the title of this page states, this is in effect a TITAN X for workstations: It features 3,072 CUDA cores, a 988MHz core clock (it supports Turbo but I’m not sure of its peak), 12GB of VRAM (317GB/s versus TITAN X’s 336GB/s), and has a 250W TDP.
Performance-wise, the M6000 peaks at 7 TFLOPs single-precision, a boost from 5.2 TFLOPs on the K6000. Unlike the top-end Kepler chips, which had good double-precision performance, Maxwell-based cards do not. NVIDIA told me that this was the result of what’s in demand right now, and its desire to deliver incredible ray tracing performance on these cards. The company says that anyone needing fast double-precision performance should look to the last-generation TITANs, although at this point, getting your hands on a new one is going to be difficult (or cost well beyond SRP).
A great addition to the M6000 over K6000 that can’t be seen in the above card shot is that it includes 4x DP 1.2 ports, as well as a single DVI-I port. Like the K6000, the M6000 supports up to 4 displays at once, but the new card doubles the number of 4K-supported displays from 2 to 4. Each added M6000 will allow for 4 more displays, although if you use 3 or 4 cards for 9~16 displays, you’ll need to add a Quadro Sync card (originally called G-SYNC before the company repurposed it for its adaptive sync technology).
NVIDIA introduced its first VCA (Visual Computing Appliance) model at last year’s GTC, and as expected, the M6000 release has resulted in an update.
The latest VCA includes 8x Quadro M6000s, dual Intel Xeon E5 10-core 2.8GHz processors (leading me to believe these are still v2, not v3), 256GB of system memory, 12GB of VRAM per GPU, 2TB worth of SSD storage, dual 1Gbps Ethernet ports, dual 10Gbps Ethernet ports, and one InfiniBand port. Pre-installed software includes CentOS 6.6, VCA Manager, Iray 2014 3.4+, V-Ray 3.0+, and OptiX 3.8+.
With each Quadro M6000 retailing for about $5,000, the latest VCA at $50,000 could be considered well-priced given all of the extra hardware it bundles in, and the package it’s in. Like the original VCA, the new ones can be stacked, and from what I saw at the show, stacks of 4 have been commonly used in the real-world since the original launch, and even with K6000s at the helm, that’s an absurd amount of power at-the-ready – the type of power where a single heavily detailed ray traced scene could denoise itself to a great degree in mere seconds.
The above trailer is for an upcoming short film that’s rendered entirely using NVIDIA GPUs and Chaos V-Ray RT. I managed to catch the session at GTC to learn more, and I’m glad I did, because I was genuinely wowed.
In 2014, director Kevin Margo’s real-time filming solution involved a BOXX PC equipped with a Quadro K6000 and dual Tesla K40s. Overall, the solution was quite good given the hardware, but the scenes rendered on the camera were hardly ideal given the amount of noise. Fast-forward to 2015, and Margo has performed the same filming duties while taking advantage of NVIDIA VCA cloud servers to dramatically improve the rendering time. Yup – 32 M6000s are quite a bit faster than Margo’s original tri-GPU setup!
You can check out the process with the following two videos, with the latter talking about the use of VCAs.
After watching those, you should be able to better understand just how much faster GPUs and the VCAs can make the job of a CG filmmaker easier. In this scenario, they’ll have the option to both render a frame in real-time and view it on their camera before continuing filming, or run the recorded video in real-time before it’s rendered on a PC, and at any point pause it to render that particular frame so that things like lighting could be double-checked. It’s impressive stuff.
Also tying into the M6000 launch is NVIDIA’s promotion of its Iray renderer. As a physically based renderer, Iray can harness the power of current GPUs like the M6000 to deliver some incredible, realistic results. Take the images below as examples:
Iray can help with more than just final frame renders, though. With an ActiveShade window active in an Autodesk product, for example, you’ll be able to preview a scene in real-time, one that will begin rendering as soon as you pause the view. Why this is important is that it allows you to get quick basic results for a particular frame before you settle on that being the one you want. This makes it so you are able to manipulate the camera without lag to get the angle you want, let it run a few render iterations, and then decide whether or not you’ll leave the camera alone or adjust it further.
Thanks to Iray being a physically based renderer, its use can be expanded upon even further. For example, if you want to take the time to create a MAXScript, you’d be able to create a tool that lets you see how architecture is affected based on various real-world effects, like the sun. NVIDIA just so happens to have an example called “Death Ray” that highlights this capability.
Designed by Uruguayan Rafael Viñoly, London’s “20 Fenchurch Street” sports quite an interesting design. Some have dubbed it the “Walkie Talkie” due to this design, and as humorous as that might be, there’s a darker consequence of its shape. What the building’s designer didn’t realize was that because the entire building was covered with glass and arced a bit inward, it would create a “Death Ray” if multiple factors aligned properly.
You might recall hearing about the Vdara hotel in Las Vegas sizzling folks in the pool when the sun hits the building at just the right angle, and if so, prepare to be surprised: Similar shape, same designer.
This is something a physically based render can highlight before a building gets built. NVIDIA recreated London and the 20 Fenchurch Street building in 3ds Max, and developed a tool that would allow manipulation of the time of year, time of day, angle of the sun, and so forth. What you see in the below shot happened in real-life: The beam of light became so strong, that it began to melt the chassis of someone’s Jaguar.
Given the fact that both 20 Fenchurch Street and Vdara prove what can go wrong in building design, we’ll (hopefully) see physically based renderers like Iray become more relied-upon in the future.
While all of the example images above can be rendered on a single PC equipped with one or more GPUs, NVIDIA will be releasing Iray+ at some point this year which will allow people to take advantage of cloud rendering via NVIDIA VCA servers. This plugin will come at a cost, although I’m unsure of what that cost will be at the current time.
Deep-learning and M6000-related announcements were the biggest highlights (to me) at GTC, but there’s still more that came out of the event worth talking about. So, to help wind down this article, I’ll tackle those.
In case you hit this article and wound up right here, I’ll mention again that deep-learning was a massive part of this year’s GTC. But most of the deep-learning I’ve talked about up to this point has involved megacorporations running algorithms on seriously expensive setups. It’s not just those folks that want to take advantage of deep-learning though; so to do scientists and other researchers.
And for them, there’s DIGITS, ‘Deep GPU Training System For Data Scientists’.
DIGITS isn’t just a PC loaded up with hardware; it’s tuned to be as beneficial as possible to deep-learning researchers. There are no Quadros or Teslas in here, but instead TITAN X cards, as those are perfectly suitable for this kind of GPGPU work.
NVIDIA’s reference design involves up to 4x GPUs, 64GB of memory, and 3x3TB RAID 5. It also includes an M.2 SSD for OS purposes, and a 1,500W PSU. For the OS, Ubuntu 14.04 is chosen, while pre-installed software includes the CUDA Toolkit 7.0, DIGITS software (which allows you to view a special administration page to monitor the workload), and learning libraries Caffe, Theano, Torch, and BIDMach.
Want to build your own DIGITS PC? You can request information from NVIDIA on how to do that.
It’s not a GTC without a sneak peek at what’s to come on the GPU front, and as such, we got an update on Pascal. There are three things coming with Pascal that are worth getting excited about:
Compared to Maxwell, Pascal can see a memory capacity increase of 2.7x, mixed-precision performance-per-watt of 4x, and up to 3x the memory bandwidth. NVIDIA hinted that a Pascal card could have 32GB of memory, which is quite something given the just-released TITAN X has 12GB and I deemed that to be overkill!
The last thing I wanted to talk about is OTOY, and the few things it had to announce that the show. At the forefront, it revealed OctaneVR, a renderer that allows developers to create photorealistic content for VR, AR, and holographic cinema. When launched next month, it’ll be free – and the company promises that it’ll remain that way forever if it proves to be a success. It’ll offer easy project exporting to various platforms, and will be available for Windows, OS X, and Linux.
Perhaps the best thing here is OTOY’s ambitious new file format, called ORBX. The goal with ORBX is to allow easy exchanging of project assets, like materials, lighting, audio, and so forth. From the time of launch, ORBX will only be supported by Octane, but in time, we could hope that it’d become interoperable with other content development software.
Octane Render 3 volumetric primitives example
Also worthy of note is that Octane Render 3 is on its way, and the amount of features it introduces is downright incredible. We’re talking deep-pixel rendering, volumetric rendering, advanced live texture baking, infinite mesh and polygon sizes, and a really cool one: Support for OpenSL (Open Shader Language).
And there we have it, our 2015 GTC recap. Things are not going to be stalling until the next GTC, so we’ll be keeping up on the goings-on of various things we’ve talked about here and report on them when we can. In particular, I am looking forward to seeing Kevin Margo’s Construct being completed, and we might just have a deeper look at the Quadro M6000 for you in the not-too-distant future. Stay tuned.
Copyright © 2005-2020 Techgage Networks Inc. - All Rights Reserved.