The GTX 1080’s answer to AMD’s async compute
AMD’s Radeon cards hold an ace in the hole when it comes to games based on Microsoft’s radical new DirectX 12 graphics technology: asynchronous compute engines.
This dedicated hardware essentially allows multiple tasks to be run concurrently. The async shaders didn’t provide much of an advantage in DirectX 11 games, which run tasks in a largely linear fashion, but they can give certain DX12 titles a major performance boost, as you’ll see in our Ashes of the Singularity benchmark results later. And it can make a major difference in the asynchronous timewarp feature that the Oculus Rift VR headset uses to keep you from blowing chunks if there’s a hiccup in processing.
Nvidia’s Maxwell GPU-based GeForce 900-series cards don’t have a hardware-based equivalent for that. Instead, they rely on software-based “pre-emption” that allows a GPU to pause a task to perform a more critical one, then switch back to the original task. (Think of it like a traffic light.) Maxwell’s pre-emption gets the job done, but nowhere near as well as AMD’s dedicated hardware (which behaves more like the flow of cars yielding in traffic).
Pascal GPUs introduces several new hardware and software features to beef up its async compute capabilities, though none behave exactly like the async hardware in Radeon GPUs.
The GeForce GTX 1080 adds flexibility in task execution with the introduction of dynamic load balancing, a new hardware-based feature that allows the GPU to adjust task partitioning on the fly rather than letting resources sit idle.
With the static partitioning technique used exclusively by all previous generation GeForce cards, resources for overlapping tasks each claimed a portion of the GPU resources available—let’s say 50 percent for PhysX compute and 50 percent for graphics, for example. But if the graphics finishes its task first, that 50 percent of resources allocated to it sits idle until the compute portion also completes. The Pascal GPU’s new dynamic load partitioning allows unfinished tasks to tap into idle GPU resources, so the PhysX task in the previous example gains access to the resources available when the graphics task wrapped up, which would obviously allow the PhysX task to finish sooner than it would with the older static partitioning scheme.
A fluid particle demo shown at Nvidia’s GTX 1080 Editors Day hit 78 frames per second with the feature disabled, and climbed to 94fps when it was turned on.
The Pascal GPU also adds “Pixel level pre-emption” and “Thread level pre-emption” to its bag of async tricks, which are designed to help minimize the cost of switching tasks on the fly when time-critical tasks (like Oculus’ asynchronous timewarp) come in hot.
Previously, pre-emption occurred at a fairly high level of the computing process, between rendering commands from the game engine. Each rendering command can consist of up to hundreds of individual draw calls in the command push buffer, Nvidia says, with each draw call containing hundreds of triangles, and each triangle requiring hundreds of individual pixels to be rendered. Performing all that work before switching tasks can take a long time. (Well, relatively speaking.)
Pixel level pre-emption—which is achieved using a blend of hardware and software, Nvidia says—allows Pascal GPUs to save their current workload at pixel-level granularity rather than the high rendering command state, switch to another time-critical task (like asynchronous timewarp), then pick up exactly where they left off. That lets the GTX 1080 pre-empt tasks quickly, with minimal overhead; Nvidia says pixel-level pre-emption takes under 100 microseconds to kick into gear. We’ll talk about real-world results with Pascal’s new async compute tools when we dive into our DirectX 12 testing with Ashes of the Singularity. (Spoiler alert: They’re impressive.)
Thread level pre-emption will be available later this summer and performs similarly, but for CUDA computing tasks rather than graphical commands.
Simultaneous multi-projection (SMP) is a highly intriguing new technology that improves performance when a game needs to render multiple “viewports” for the same game, be it for a multi-monitor setup or the dual lenses inside a virtual reality headset. A more granular SMP feature can also greatly improve frame rates in games on standard displays by building on the groundwork laid by the multi-resolution shading feature already enabled in Nvidia’s Maxwell GPUs.
This fancy new technology’s at the heart of Nvidia’s claim that the GeForce GTX 1080 is faster than two GTX 980s configured in SLI. The card never hits that lofty milestone in traditional gaming benchmarks—though it can come pretty damn close in some titles. But it’s theoretically possible in VR applications coded to take advantage of SMP, which uses dedicated hardware inside the Pascal GPU’s PolyMorph engine hardware.
Displaying scenes on multiple displays traditionally involves some sort of compromise. In dual-lens VR, the scene has to have its geometry fully calculated and the scene fully rendered twice—once for each eye. Multi-monitor setups, on the other hand, tend to distort the imagery on the periphery screens, because they’re angled slightly to envelop the user, as shown above. Think of straight line drawn across a piece of paper: Folding the paper in half makes the line appeared slightly angled instead of truly straight.
Simultaneous multi-projection separates the geometry and rendering portions of creating a scene to fix both of those problems. The Pascal GPU calculates a scene’s geometry just once, then draws the scene to match the exact perspective of up to 16 different viewpoints as needed—a technique Nvidia calls “single-pass stereo.” Any parts of the scene that aren’t in view aren’t rendered.
If you’re using SMP with multi-monitors rather than a VR headset, new Perspective Surround settings in the Nvidia Control Panel will let you configure the output to match your specific setup, so those straight lines in games no longer appear angled and render as the developers intended. Sweet!
But that’s not all simultaneous multi-projection does. A technique called “lens-matched shading”—the part that builds on Maxwell’s multi-res shading—pre-distorts output images to match the warped, curved lenses on VR headsets, rendering the edges of the scene at lower resolution rather than rendering them at full fidelity and throwing all that work away. Like SMP’s single-pass stereo, the idea is to render only the parts of the image that will actually be seen by the user in order to improve efficiency.
Interestingly, lens-matched shading can also be used to improve overall frame rates even on traditional single-display setups. In a single screen demo of Obduction, Cyan Worlds’s upcoming spiritual successor to Myst, frame rates hovered around 42fps in a particular scene with SMP disabled at 4K resolution. Activating SMP caused frame rates to leap to the 60fps maximum supported by the display, and you could only notice the reduced pixel fidelity at the edges of the display if you were standing still and actively looking for blemishes.
Simultaneous multi-projection is fascinating, potentially portentous stuff—and that’s why it’s a major bummer that developers have to explicitly add support for it, and it works only on GeForce cards running on Pascal GPUs. It’s a killer selling point for the GTX 1080, but whether games will support a feature that excludes every graphics card sold up until today is a big question mark.
Next page: Cool new consumer-facing GTX 1080 features
Nvidia GeForce GTX 1080 Founders Edition
The Nvidia GeForce GTX 1080 is the first graphics card built using 16nm technology after GPUs stalled on 28nm for four long years. The performance and power efficiency gains are nothing short of astounding.
- Outrageous performance leap over GTX 980
- Hugely power efficient
- Attractive premium design
- Numerous new features
- Doesn't blow away Radeon cards in heavily AMD-optimized games