Tested: DirectX 12's potential performance leap is insane

3dmark api overhead screenshot

3DMark's new API Overhead Test comes with an Inception Hans Zimmer "bwah" sound.

Credit: Futuremark

We already know DirectX 12 will drastically improve game performance when Windows 10 ships later this year, but just how much of a "free" boost you'll get isn't exactly known.

Microsoft execs expect frame rates to more than double when comparing DX12 to the current DX11 API. But that estimate looks to be conservative if Futuremark's new API Overhead Feature test is to be believed. 

Microsoft and Futuremark gave us early access to the latest 3DMark test, which lets us measure just how much more efficient Microsoft's Windows 10-only gaming API is than its predecessor, and whoa mama, does it perform. But before we get too carried way and the hype train leaves the station with a big toot toot, remember that this is a theoretical test, and not based on an actual game engine.

3dmark12preview PCWorld

3DMark's new API Overheat Feature Test measures the performance difference between DirectX11 and the new Windows 10-only DirectX 12.

How we tested

For the tests, I used an Intel Core i7-4770K processor in an Asus Z87 Deluxe/Dual motherboard alongside 16GB of DDR3/1600 RAM, a 240GB Corsair Neutron SSD, and either a Gigabyte WindForce Radeon R9 290X or a GeForce GTX Titan X. I also switched off the Asus board's "Core enhancement" feature, which essentially overclocks the chip a little for you. All of our tests were performed at 1280x720 resolution at Microsoft's recommendation.

3DMark and Microsoft point out that the new feature test is not a tool to compare GPUs but an easy way to gauge a single PC's performance and API efficiency. Don't use it to compare PC Y with PC X, nor as a GPU test: This is all about how your particular PC configuration performs when running DX11, and how that same PC configuration performs when running DX12.

Even better, the test is available immediately to anyone who owns 3DMark. Here's how to get it to run if you want to do your own testing.

The test works by tasking the GPU to draw something on the screen. This instruction goes through the API, whether it is DX11, DX12, or AMD's Mantle. The less efficient the API is in handling these "draw calls" from the CPU to the GPU, the fewer objects can be drawn on the screen. 3DMark rapidly ramps up the draw calls and objects until the frame rate drops under 30 frames per second (fps).

The first test is a comparison of DirectX 11's single-threaded performance vs. DirectX 11's multi-threaded performance. It then tests AMD's DirectX 12-like Mantle API—if Mantle is supported by the hardware—and finally DirectX 12 performance.

As you can see from the chart below, DirectX 11's single-threaded and multi-threaded performance is underwhelming, churning through roughly 900,000 draw calls before performance drops under 30 fps on the Gigabyte WindForce Radeon R9 290X card. Using AMD's Mantle, we see an incredible jump to 12.4 million draw calls per second.

Although it's difficult to actually say which actual API was developed first (I've read stories that said DirectX 12 has been in the works for numerous years, while others report AMD was likely the first to push it), this test at least shows the potential of both of the new APIs. DirectX 12, in fact, is even slightly more efficient, cranking out 13.4 million draw calls per second.

The whole Mantle vs. DirectX debate is over anyway, as AMD itself has encouraged developers to use DirectX 12 or Vulkan, OpenGL's gaming-focused successor, instead.

dx12 api perf mantle PCWorld

Mantle and DirectX 12 performance stack up nicely against the older DX11 API.

Remember that part where I said that Futuremark says not to use this as a GPU test? You still want to see to see this on Nvidia hardware, so I repeated the above test with the GeForce GTX Titan X in the R9 290X's place. There's no Mantle, of course, since that's an AMD-only feature.

dx12 api perf 4770k titanx PCWorld

If gaming only worked like theory. Here we see the GeForce GTX Titan X in action in DirectX 12.

You're also going to be curious about DX12's impact on integrated graphics processors (IGP). With Intel's graphics actually in use by more "gamers" than AMD and Nvidia's discrete cards, it's a valid question, but I'll be honest: Gaming with IGP ain't real gaming. I didn't see the point in running tests on Intel's IGP.

Fortunately, Microsoft provided results from its own IGP machine using a Core i7-4770R "Crystal Well" CPU with Intel Iris Pro 5200 graphics. The quad-core chip is used in high-end all-In-ones and Gigabyte's Brix Pro. That's pretty much the very best performance you can get out of an Intel chip today, and here are the results.

Not bad—until you glance back at the AMD and Nvidia charts above. 

3dmark api test 4770r iris pro PCWorld

Using results supplied by Microsoft, DX12 does indeed give a bump to the Intel's integrated graphics but it's pretty far short of real gaming metal.

But remember that DirectX 12 is about making the API more efficient so it can take better advantage of multi-core CPUs. It's not really about graphics cards. It's about exploiting more performance from CPUs so they don't bottleneck the GPU.

With that in mind, I decided to see how the CPU can change the results, by varying the core and thread count as well as clock speeds in various configurations. I limited the quad-core Core i7-4770K to two cores, switched Hyper-Threading on and off, and limited the clock speeds the CPU could run at.

The big winner was the Core i7-4770K set to its default state: four cores and Hyper-Threading on. All of the tests were conducted with the GeForce GTX Titan X card. I'm only comparing the DirectX 12 performance because that's all that matters here.

dx12 performance clock cores PCWorld

Here's how DirectX 12 performance could scale across different clock speeds and thread counts on Futuremark's new API Overhead Feature test

Which CPU is best for DirectX 12?

To make this a little easier to understand from a system-buying or -building perspective, I also tried mimicking the different clock, core count and Hyper-Threading states of various Intel CPUs. (If you want to dig into the details of the chips I tried to simulate, I've lined up the details over at Intel's ARK.) 

Keep in mind that these are simulated performance results. The Core i7-4770K has 8MB of cache vs. the 2MB on a Celeron or 6MB on a Core i5 chip. I haven't found cache to make huge dents on most tests, and if anything my simulated CPUs would perform better than their actual counterparts thanks to the larger cache available. One other note: I had an issue underclocking the CPU to lower speeds while trying to simulate the Core i5-4670K. The lowest I could set it for was 3.6GHz, which is 200MHz higher than the stock part. 

dx12 performance simulated cpus updated PCWorld

"How many combat drops Lieutenant?" "Two. Including this one." These are tests of DirectX 12 performance between various simulated CPUs. 

No replacement for displacement

The takeaway from these simulated CPU tests is that thread-count trumps clock speed when it comes to DirectX 12 performance improvements.

Look at the results for the simulated dual-core Core i3-4330, which is a 3.5GHz CPU with no Turbo-Boost and Hyper-Threading. Compare that to Intel's low-cost wonder: the dual-core Pentium G3258 "overclocked" to 4.8GHz. Nicknamed the Pentium K by budget gamers, the chip is the only unlocked dual-core chip in Intel's lineup and cheaper than dirt. It overclocks easily, and just about anyone should be able to push a real one to 4.8GHz.

But look at the benchmark results: Despite running 1.3GHz higher than the simulated Core i3, the Pentium G3258 isn't much better in DX12 draw calls. This isn't anything new. In tests I've performed with CPUs as far back as the 2nd-gen "Sandy Bridge" Core processors in 2011, I've seen a 1GHz overclock on a Core i5 chip hit roughly the same performance as a stock Core i7 with Hyper-Threading in heavy multi-threading tests. 

DX12 also seems to scale nicely with core-count. Again, compare the dual-core, 4-thread "Core i3-4330" with the quad-core, 8-thread Core i7-4770K. It's almost exactly double.

Unfortunately, PCWorld's 8-core Core i7-5960X was working on another secret product so I didn't have time to test it with 3DMark's new feature test. I see nothing to make me believe that the test, at least, will scale nicely with an 8-core, Hyper-Threaded CPU.

The silicon elephant in the room is also AMD's FX series. Generally AMD CPUs are inferior to their Intel counterparts clock-for-clock. A quad-core Haswell CPU generally lumps up an "eight" core FX chip in the vast majority of performance tests. But that FX chip, though it uses shared cores, runs much closer to— and sometimes faster than—Haswell if the test is heavily multi-threaded. DirectX 12 may make AMD's often ignored FX parts hot again, since you can find an 8-core FX processor priced for as low as $153. 

At some point, I'll hopefully spool up an FX box to see if that advantage indeed crops up in DX12. 

But back in reality

Before you look at the results from these tests and assume you're going to see a frickin' 10x free performance boost from DX12 games later this year, zing, zam, zow! You won't. So ease off the hype engine. 

What's realistic? I'd expect anywhere from Microsoft's claims of 50-percent improvement all the way to the what we're seeing here in Futuremark's test. This will depend very much on the kind of game and the coding behind it.

The reality is this test is a theoretical test, although one made with advice from Microsoft and the GPU vendors. This test reveals the potential, but translating that potential into an actual game isn't quite as easy. 

We won't know for sure until actual DirectX-12-enabled games ship. Microsoft estimates that will be the end of this year. 

Subscribe to the Best of PCWorld Newsletter

Comments