With the release of its Ryzen 7 series CPUs, AMD came out swinging at Intel’s high-end Core i7 line. As I noted in a previous column, version 4.10 of the Linux kernel corrects an issue that kept Intel CPUs from reaching their turbo speeds, but there’s also something in the new kernel for Team Red.
The top-end Ryzen 7 1800X boasts eight cores and 16 threads just like Intel’s Kaby Lake Core i7-6900K, but in a 95W package that costs half the price of Intel’s octo-core offering. And when it comes to multithreaded applications, Ryzen is giving enthusiast Core i7s a fight. As Team Red gets back into the high-end CPU game, that’s good news for consumers.
To take advantage of all those threads, the kernel has to make sure it identifies the cores correctly. Anticipating the release of Ryzen, AMD clearly worked to make sure its new Zen architecture would properly offer up its cores to the Linux kernel.
Why you need to care about multithreading
Multithreading is a key component of modern desktop processors. Back in the late ‘90s, CPUs had only one core. To get dual-core performance, you had to build a PC that had two physical CPUs socketed into the motherboard. Needless to say, this was pricey and bulky. Most modern desktop CPUs nowadays have at least two cores.
Multithreading really matters in things like video encoding and design applications, where big jobs can be tackled by multiple cores/threads simultaneously. In many situations, a slower CPU with more cores can encode video faster than a faster CPU with fewer cores.
What got fixed for Zen
The fixes included in kernel 4.10 come from three different commits, or changes to the code. The commits essentially change the way the kernel identifies threads and physical cores to enable proper multithreading in the operating system.
In January, a commit altered some code to fix multithreading in Bulldozer-based CPUs. The Bulldozer fix attempted to give each core its own identifier, whereas earlier code treated each thread (compute unit) as if it were its own core. While the fix got multithreading in Bulldozer CPUs to work, it created latency due to thread siblings not being paired up on physical cores correctly. If a program is using the same data and instructions split between two threads, there was no way to know if the two threads were going to the same physical core. The threads could still share resources, but it would introduce latency.
For maximum efficiency the threads had to go to the cores that were most convenient for resource fetching. A fix added in February reverted the Bulldozer code and added extra code to assign threads to the core that had a sibling—that is, a thread that uses the same resources—running on it. This reduced the latency and made multithreading faster.
While this fix made Bulldozer work, Zen CPUs were broken because Zen provided thread ID information slightly differently. The fix for Zen CPUs checks to see if the CPU is from the Zen line, then divides the number of reported cores (threads) by the number of siblings per compute unit to get the true number of cores and lets SMT work with the threads in the cores.
Hold on—in plain English, please!
Okay, that can all be a little hard to picture. It might be easier to imagine this as a kitchen with eight workers, each armed with a vegetable peeler.
Each of these workers are potential threads. The workers are separated into four pairs, with each pair at a table with its own discard bucket for peelings (the physical cores). There is only one bucket for each type of peeling: apples, potatoes, carrots, and cucumbers—their contents can’t be mixed.
With the Bulldozer fix, each worker only gets one type of food at a time, but the baskets of food are dispersed randomly. If a discard bucket for a food that a worker is peeling is at another table, he must get up from the table, peelings in hand, and dump them in the bucket before returning to the task. You still have all eight workers toiling away, but it’s inefficient.
The February fixes introduce someone who directs baskets of food to the tables that have the corresponding discard buckets, so that workers have to get up as few times as possible.
What this means for the rest of us
Ultimately, these fixes should offer better multithreaded performance on Zen CPUs. That’s a good thing because it’s no good to pay $500 for an 8-core, 16-thread CPU that doesn’t work efficiently. With efficient multithreading, a Ryzen-powered Linux workstation will encode files faster, and make for a snappier workstation.