RDNA 2. The graphics architecture at the heart of AMD’s kick-ass new Radeon RX 6000 graphics cards may sound like a simple iteration upon the original “RDNA” GPUs that came before it, but RDNA 2—which also powers the next-gen Xbox Series X and PlayStation 5 consoles—is much more than a mere refresh. Significant tweaking has resulted in a stunning 54-percent increase in power-per-watt over AMD’s last-gen Radeon RX 5000 GPUs. Perhaps more notably, the Radeon RX introduces an innovative new “Infinity Cache” technology that reimagines how memory behaves in graphics cards. Oh, and ray tracing? AMD does that now, too.
AMD’s engineers approached RDNA 2 with lofty efficiency goals as their guiding lights. The original RDNA architecture provided a 50-percent performance-per-watt increase over its “GCN”-based predecessors, finally matching Nvidia’s vaunted power efficiency, and the company’s executives wanted RDNA 2 to keep that pace. Spoiler alert: They did. It took a lot of hard work though, as well as close collaboration with the Ryzen CPU architecture team, because RDNA 2 is built using the same TSMC 7nm manufacturing process as RDNA 1. A big part of the original RDNA’s efficiency gains came from the node leap from 14nm to 7nm, but RDNA 2’s improvements required more substantial tweaking.
Despite the intense rejiggering, the fundamental RDNA 2 building blocks remain largely similar to RDNA 1’s in broad strokes—aside from the addition of dedicated ray accelerator hardware, which we’ll get to later—only scaled up much further.
AMD stayed modest with last generation’s RDNA 1 products. Its flagship, the Radeon RX 5700 XT, topped out at 40 compute units and 10.3 billion transistors inside its 251mm² die—a surprise considering AMD’s previous GCN architectures scaled up to 64 CU designs. (We’ll get to why that was later as well.) RDNA 2 blows well past that. The $579 Radeon RX 6800 includes 60 CUs, the $649 Radeon RX 6800 XT ups that to 72 CUs, and the flagship $999 Radeon RX 6900 XT will fully double-up last generation’s RX 5700 XT with a whopping 80 CUs inside a massive 519mm² die with over 26 billion transistors. By contrast, the “Ampere” GPU die inside Nvidia’s rival $1,500 GeForce RTX 3090 packs a hair over 28 billion transistors into a much larger 628mm² die.
Swiping a page from AMD’s fantastic Ryzen 5000 CPUs, RDNA 2 implements pervasive fine-grain clock gating to allow parts of the GPU to slow down if they aren’t being used, improving power efficiency. RDNA 2 additionally features more robust clock tree splitting and gating (like server CPUs) for the same reason, but more parallelized to hit the higher bandwidths capable with GPUs. The company’s engineers also “aggressively” rebalanced data pipelines and even redesigned entire data paths, honing the architecture for maximum efficiency. Those optimizations accounted for about a third of the up to 54-percent performance-per-watt increase delivered in the Radeon RX 6800 and 6800 XT (and the whopping 65-percent increase promised for the flagship Radeon RX 6900 XT coming December 8).
Performance-per-watt isn’t all about power efficiency, though—hence the word “performance.” Another third of RDNA 2’s perf-per-watt improvement comes from pushing the pedal to the metal even harder. Once again, AMD’s engineers optimized the microarchitecture, logic, and performance libraries with a focus on speed. The most tangible results of their efforts have to be the insane clock speeds of the Radeon RX 6000 GPUs. AMD’s CPU engineers have spent a long time honing speeds on the 7nm process node by this point, and they shared their expertise with the Radeon team to great effect.
The Radeon RX 6000-series graphics cards push well past the 2GHz barrier. Company representatives were keen to tout the “unprecedented” speeds in conversations with press. They should be. All three high-end options—the Radeon RX 6800, 6800 XT, and 6900 XT—have boost clock speeds that surpass a whopping 2.1GHz. The two XT models go all the way up to 2,250MHz. Those are under ideal conditions, but AMD says the XT cards hit 2,015MHz even in gaming workloads, keeping pace with Nvidia’s staggeringly powerful Ampere GPUs, which can boost to roughly 2GHz during gameplay.
AMD couldn’t have hit such fast speeds or achieved its power efficiency goals without the introduction of RDNA 2’s revolutionary Infinity Cache.
RDNA 2 Infinity Cache explained
RDNA 2’s standout feature also swipes a page from processor design—Epyc server processors, in this case. Traditional GPUs include L1 and L2 caches of various sizes. Radeon RX 6000 graphics cards add an “Infinity Cache” that behaves similarly to the “Game Cache” that helps modern Ryzen processors game so much better than earlier models did. Inspired by Epyc server CPUs, Infinity Cache is basically a massive 128MB L3 cache that has been heavily optimized for gaming workloads. It’s four times denser than the L3 SRAM in Epyc processors to help improve power efficiency, too.
Equipping the GPU with such a large, high-speed cache lets it keep most of the working data for any given frame on-die. This saves the GPU from having to keep sending signals all the way across the package to the 16GB of onboard GDDR6 memory in many cases, especially because the cache holds a lot of temporal and spatial data that can be reused in subsequent frames. That makes Infinity Cache much faster and much more power-efficient compared to simply increasing the bus width to the memory modules.
Sam Naffziger, AMD’s product technology architect, says that even though the Radeon RX 6000 GPUs stick to a modest 256-bit bus, the Infinity Cache helps RDNA 2 deliver massively more bandwidth-per-watt than traditional GDDR6 equipped with even a humongous 512-bit bus. By comparison, Nvidia’s rival high-end RTX 3080 and 3090 graphics cards utilize wider 320-bit and 384-bit buses, respectively, paired with cutting-edge GDDR6X memory that uses “PAM4” signaling technology, which lets them send four possible values per cycle, up from the traditional two. That lets GDDR6X move data at twice the rate of GDDR6, but with higher latency and power demands.
The Infinity Cache also helps enable RDNA 2’s sky-high clock speeds. If AMD had tried to force the original RDNA memory subsystem on RDNA 2, Naffziger said, it would have required a massively larger memory configuration to avoid starving the GPU for bandwidth. That would have required upgrading to huge 512-bit buses, and more, faster memory, all of which would have sent the power demands skyrocketing—a no-go given RDNA 2’s design goals.
The overwhelming bandwidth enabled by Infinity Cache keeps RDNA 2’s CUs amply fed, as you can see in the chart above. When AMD’s engineers disable Infinity Cache in their labs and revert to the standard cache design with 16GB of GDDR6 memory over a 256-bit bus, GPU clock frequencies fall off a cliff.
By keeping so much frame data on die, the Infinity Cache helps the Radeon RX 6800 average 34 percent less latency than the older Radeon RX 5700 XT. When a scene fully “hits” the Infinity Cache, the latency reduces further. Naffziger says that AMD’s Infinity Fabric communication technology can scale its speeds up and down to optimize efficiency, ramping up to 550GB/s when the Infinity Cache becomes especially stressed. But even when the GPU needs to access your card’s actual VRAM, latency also improves compared to the last-gen Radeon cards thanks to a general speed increase for Infinity Fabric.
AMD tuned the Infinity Cache on this initial trio of enthusiast-class cards for 4K gaming, which is why it’s configured with an impressive 128MB. Naffziger says the large size lets Infinity Cache achieve a 56 percent “hit rate” across a wide range of titles at 4K resolution, and higher hit rates as the resolution scales down. Part of the reason why these cards perform better than their Nvidia competition at 1440p gaming is due to high Infinity Cache hit rates, AMD’s Laura Smith said.
But the Infinity Cache performance doesn’t scale linearly as resolution decreases, Naffziger warned. When you drop down to 1080p, games often become more CPU- or engine-bound than memory-bound. (I wouldn’t be surprised if more affordable Radeon RX 6000 offerings in the future decreased the Infinity Cache’s size because of that.)
Likewise, the Infinity Cache spreads its wings the most in applications that are more memory-bound, though its benefits can be felt even when a game needs to access traditional VRAM more often. Naffziger says in those cases, RDNA 2’s overall memory system behaves roughly on a par with what you’d see if you’d equipped these cards with a 512-bit bus.
Infinity Cache greatly helps with ray tracing too.
Ray tracing with RDNA 2
Yes, AMD’s Radeon GPUs can handle real-time ray tracing now. Nvidia kicked off the ray tracing party by adding dedicated “RT cores” for handling ray tracing to its older RTX 20-series GPUs. Now AMD is joining the fun by adding a single dedicated “ray accelerator” to each RDNA 2 compute unit. That means as you move up the Radeon RX 6000 stack, more powerful graphics cards with more compute units will also be better at ray tracing, as they’ll have more dedicated ray tracing hardware.
As you can see in our Radeon RX 6800 and 6800 XT review, RDNA 2 isn’t quite on a par with Nvidia’s second-gen ray tracing implementation. It still delivers surprisingly good ray tracing performance, achieving very playable frame rates at both 1440p and 1080p resolution. You won’t be able to play games at 4K with the intensive lighting technologies enabled, however, and AMD says it targeted 1440p gaming as its ray tracing goal. By and large, it delivered.
Infinity Cache comes through in the clutch here, too. We delved deeper into how ray tracing works in our original deep-dive of Nvidia’s Turing architecture, where the technology debuted, but basically it works by having dedicated ray tracing hardware perform calculations of how the light rays behave, using a technique known as bounding volume hierarchy (BVH) traversal. Performing that task is very memory-intensive, which is why VRAM demands leap upward when you enable ray tracing in a game.
AMD says it’s able to keep “a very high percentage of the BVH working set” directly inside the Infinity Cache, reducing latency and improving overall performance. The ray accelerator handles intersections in the BVH, while RDNA 2 uses standard shader code in the compute units for ray transversal and shading the actual scene.
That said, AMD does not have an answer for Nvidia’s Deep Learning Super Sampling (DLSS) technology. Ray tracing is incredibly computationally expensive, and activating it creates a striking performance impact. To counteract the loss in frame rate, DLSS renders games at a lower resolution, then upscales the final image to your game resolution using machine learning to spiff up the image, all powered by Nvidia’s dedicated AI-focused tensor cores.
Early iterations of DLSS could look like Vaseline smeared on your screen, but the DLSS 2.0 technology rolling out in newer games works like black magic. It’s wonderful, and truly makes flipping ray tracing on less painful. The tensor cores also handle “denoising” when ray tracing is on to avoid a gritty look common on older, less advanced ray tracing implementations.
AMD doesn’t include dedicated AI upscaling hardware in RDNA 2. Denoising is handled by the general compute units, and it works very well by my eye—but there’s no DLSS-like feature to claw back lost frames. During its Radeon RX 6000 reveal, AMD teased some sort of DLSS rival dubbed “Super Resolution” as part of its FidelityFX suite of open-source tools without going into detail. Representatives declined to say more, other than to state that Super Resolution will not be available immediately. That said, because AMD’s RDNA 2 powers both next-gen consoles as well, the company hopes its open-source alternative winds up gaining traction with developers when it does arrive. The company’s FidelityFX toolkit also includes a denoiser solution that developers can implement.
DirectX 12 Ultimate features and more
But wait, there’s more. Like Nvidia’s recent RTX-branded GPUs, RDNA 2 is fully DirectX 12 Ultimate-compliant. Microsoft calls DX12 “a force multiplier for the entire gaming ecosystem” by unifying an array of new features—mostly ones introduced in Nvidia’s Turing-based RTX 20-series, but largely ignored by developers—across all modern PC and next-gen Xbox Series X hardware.
That means Radeon RX 6000-series graphics cards also pick up nifty tricks like mesh shading, variable rate shading, and sampler feedback, which we covered in our look at DirectX 12 Ultimate. All of the features hold great potential to improve both performance and visual fidelity. AMD optimized various parts of RDNA 2 around them, such as improving the color compression behavior and adding dedicated sampler feedback logic.
AMD’s Radeon GPUs will also support Microsoft’s DirectStorage API when it debuts in 2021 (as will Nvidia’s RTX 30-series). DirectStorage lets your NVMe SSD talk directly to your graphics card’s memory for vastly improved loading and asset-streaming performance. Here’s how DirectStorage aims to kill game-loading times on the PC. It has the potential to be a real game-changer.
Other aspects of RDNA 2 received upgrades as well. The display engine now supports HDM1 2.1, for example. The multi-media engine can handle AV1 decoding for 8K videos and includes a high-quality 8K HEVC encode accelerator, matching advancements found in Nvidia’s Ampere GPUs. 8K is the most niche of niche cases at this point, though, and this is getting long enough.
Be sure to check out our full Radeon RX 6800 and RX 6800 XT review to see how all these RDNA 2 improvements translate into graphics cards you can actually buy. They’re fantastic, and they truly challenge Nvidia’s high-end gaming options for the first time since 2013’s Radeon R9 290X hit the streets. Whatever else you can say about 2020, it’s a great year to be a gamer.