When Jen-Hsun took the stage in Austin earlier this month, I am not sure anyone expected him to reveal so much about Pascal and its first GPU models. We were not just given performance ballparks, but were told about all of the new technologies that have come along for the ride.
Much of the information here was covered by Jamie at that time, so to avoid rewording all that he said, I am just going to tackle the basics here. Should a follow-up article complement one of these features, I’ll go more in-depth into them there.
Before going further, I want to tackle a question I’ve seen asked numerous times since the GTX 1080’s unveiling: “Does it support Async Compute?” The answer is simply, “Yes”. Pascal’s architecture was designed with Async Compute in mind, and can benefit applications that make good use of physics, post-processing effects, and of course, virtual reality.
GeForce GTX 1080 Block Diagram
Async Compute is just one of the many things that makes Pascal worth learning more about, though. There’s also the brand-new process that the chips are built on: 16nm FinFET.
Pascal Architecture Basics
NVIDIA has made great strides with its Pascal architecture. It’s fast, it’s efficient, and it’s feature-rich. But that being said, if not for the fact that NVIDIA was able to take advantage of a long-overdue die shrink, the GTX 1080 wouldn’t be as impressive as it is. While the smaller die can improve performance, it can also dramatically reduce power usage, and subsequently create less heat.
The GP104 die is built using a 16nm FinFET process which isn’t exclusive to NVIDIA. AMD announced months ago that its upcoming Polaris architecture would be using the same one, which gives us great hope that the red team might be able to surprise us with its own launch.
Improving efficiency even further, the first Pascal graphics cards come equipped with 8GB of GDDR5X memory, or G5X as NVIDIA likes to call it. This delivers a data rate of 10Gbps (10GHz), which in itself increases memory performance over the GTX 980 by 43%. If applications take advantage of the company’s ever-evolving memory compression technologies, that gain can escalate up to 70%.
In all, the GTX 1080 GPU consists of 7.2 billion transistors, which is actually 800 million less than the TITAN X and 980 Ti. As you’ll see in the performance results, that “loss” of transistors sure doesn’t affect the card’s performance versus the older gen.
During his keynote, Jen-Hsun noted “The Marvels of Pascal”, which consists of five separate things that the company believes makes it top-notch. For starters, there’s the architecture – a no-brainer. Beyond that, there’s the 16nm FinFET process, the use of G5X, and then craftsmanship and Simultaneous Multi-Projection. I’ll tackle the latter in a couple of minutes, but to start, let’s dive into a new feature that came out of nowhere: Ansel.
As someone who takes a lot of screenshots in their games, NVIDIA’s Ansel spoke to me. It is, in effect, a robust screenshot tool that a game’s developer must implement and build around its rules (eg: so cheating can’t happen). We were told that in most cases, a developer shouldn’t need to insert more than 150 lines of code – which is nothing in the grand scheme.
It’s the game implementation that makes Ansel so useful. Imagine being at a part of your game where a screenshot simply can’t do the landscape justice. With Ansel, you can detach your camera from the game, adjust it to your liking, and then capture an image in a resolution that redefines “high res”, or create a 360° view of the environment that can be enjoyed with a VR headset.
Once I can spend time testing Ansel out, I’ll follow-up with a more detailed post. However, it doesn’t require much more information to appreciate what the feature can do. If you want to create a seriously large image, you could capture a given scene to create an image file that could weigh into the hundreds of megabytes – or perhaps even surpass 1GB if you happen to dial the settings up high enough.
After a shot is taken, you’ll be able to enjoy a massively detailed shot, or explore the environment in VR. You can even export the screenshots to EXR format, so that you can adjust its settings as if it were a photograph shot in RAW. There will be a ton of different filters to pick and choose from here, so if you somehow can’t get a “perfect” shot, you’re clearly doing something wrong.
Ansel demonstration in The Witcher 3
In an example seen above, an image of The Witcher 3 was captured at a super-high resolution. When viewed to fit inside the screen, we can see Geralt standing on a castle balcony. When he is highlighted and zoomed into, though, the detail is so high, that the writing in a book on another level can be read.
There is one thing to be aware of, though: Ansel can truly sag your GPU’s performance. When a capture is made, you may have to wait upwards of minutes for it to capture, as dozens of segments need to be rendered independently, and then stitched together afterwards. If that’s a downside, it’s a small one for those who want this kind of functionality (*raises hand*).
Ansel is coming soon to a number of titles: Tom Clancy’s The Division, The Witness, Lawbreakers, The Witcher 3: Wild Hunt, Paragon, No Man’s Sky, and Unreal Tournament.
SMP doesn’t just mean “symmetric multi-processing” anymore; it now also means Simultaneous Multi-Projection. This is in reference to a new multi-monitor and VR feature that helps better align the game world to your displays.
It’s the Perspective Surround element of SMP that impresses me so much, perhaps because I’ve been wanting it for a while. While multi-monitor gaming offers a number of benefits, it also brings with it a number of caveats. When stretched across three monitors, games usually don’t take into account the angles that the screens are on, which is a downer given most people do in fact angle them. With Perspective Surround, the GeForce driver injects a bit of logic to scale the game to more realistic perspectives.
Unfortunately, we were not provided with an example outside of some diagrams to explain the technology, but believe me, if you’re a multi-monitor user, you will definitely want to be taking advantage of Perspective Surround.
Other SMP features include Lens Matched Shading for VR, which both improves pixel shading performance and renders the scene to be more accurate to the player; and Multi-Res Shading, which allows a game to render higher-resolution details in the center of the screen, where players look most often.
If you play any games that run at sky-high framerates simply because they’re not that graphically intensive (most MOBAs, for example), NVIDIA has a new technology for you called “Fast Sync”. Fast Sync’s main goal is to continue delivering smooth gameplay even when your FPS is through the roof.
At quick glance, it might seem like this kind of technology negates the need for G-SYNC, but that’s not the case at all. While G-SYNC is designed to benefit gameplay that dips below your desired refresh rate (sub-60 FPS, for example), Fast Sync tackles the opposite problem of games creating tearing simply because they are running too fast.
In this situation, most people would just enable Vsync to lock the framerate to 60, but that’s not ideal, either. What that can cause to happen is the buffers to become overloaded and scenes be spit out in a less-than-ideal order. Fast Sync works by delivering only the final frame render in a given ‘Vsync off’ sequence. That means that of all of the 60 frames you see in a given second, you’d effectively be seeing the final frame from 60 separate segments.
NVIDIA admits that Fast Sync isn’t a perfect solution, but it’s preferable to running VSync since the ‘backpressure’ problem will no longer exist. The one thing that could be better? 200Hz+ monitors.
Don’t hold your breath.
New SLI Bridges
To coincide with the launch of Pascal, NVIDIA is releasing a trio of new SLI bridges, called “SLI HB”. The HB stands for “high bandwidth”, and is required for Pascal GPUs to interact with each other in SLI more efficiently. Old bridges can be used for Pascal, but NVIDIA notes that the interconnection bandwidth will be capped at whatever the bridge is spec’d at.
Interestingly, while NVIDIA is offering a 2-, 3-, and 4-way length bridges, all of them support just 2 graphics cards. In case this is news to you, that means that NVIDIA is only officially supporting 2-way SLI with Pascal. That doesn’t mean that 3- and 4-way configurations won’t work; it’s just that they’re not recommended. At least for now, 3- and 4-way SLI is definitely a “your mileage will vary” technology.
Does this mean the end of officially supported 3- and 4-way configurations? Not likely. What I think NVIDIA’s move here signifies is that the world has way too many console ports, and as such, the need for more than 2 GPUs is minimal. Should game developers start pushing more of their games towards the PC’s much more advanced hardware, we could see NVIDIA promote 3- and 4-way configurations again.
For those who don’t care about NVIDIA’s recommendations, take note: to unlock 3- and 4-way support, you need to go to NVIDIA’s website to obtain an ‘Enthusiast Key’. Downloading and running an application will generate a unique signature for your GPU, at which point 3 and 4 GPU configurations will be enabled. Why NVIDIA has decided to make this so complicated, I’m not sure.
Since NVIDIA’s newest bridges support only two graphics cards, to use 3- or 4-way SLI you will need to use an older bridge for two of the cards, and a new one for the main two. If you have one of the older floppier connectors, you might be able to hide those underneath the new bridge. It’s an odd design, but at least the highest-end of enthusiasts out there won’t be out-of-luck.
With that all taken care of, we can move onto the performance results. Please note that not everything notable about Pascal was talked about here, but the important things were. Some of the features will be talked about in some more depth in the near-future, as certain articles would better complement them.