Nvidia Chief Scientist: CPUs Slowed by Legacy Design
When it comes to power-efficient computing, CPUs are weighed down by too many legacy features to outperform GPUs (graphics processing units) in executing common tasks in parallel, said the chief scientist for the GPU vendor Nvidia.
CPUs "burn a lot of power" executing tasks that may be unnecessary in today's computing environment, noted Bill Dally, chief scientist and senior vice president of research for Nvidia, during his keynote Wednesday at the Supercomputer 2010 conference in New Orleans..
The GPU "is optimized for throughput," while "the CPU is optimized for low latency, for getting really good thread performance," he said.
Dally pointed to some of the features that most modern CPUs posses that waste energy in their pursuit of low latencies.
"They have branch predictors that predict a branch every cycle whether the program branches or not -- that burns gobs of power. They reorder instructions to hide memory latency. That burns a lot of power. They carry along a [set of] legacy instructions that requires lots of interpretation. That burns a lot of power. They do speculative execution and execute code that they may not need and throw it away. All these things burn a lot of power," he said.
Although the GPU was originally designed for rendering graphics on the screen, vendors such as Nvidia and Advanced Micro Devices are now positioning their GPU cards as general computation engines, at least for workloads that can be broken into multiple parts and run in tandem.
At least some industries are taking note of this idea, notably the world of high performance computing (HPC). Earlier this week, China's newly built Tianhe-1A system topped the latest iteration of the Top 500 List of the world's most powerful supercomputers. That system includes 7,168 Nvidia Tesla M2050 GPUs in addition to its 14,000 CPUs. Nvidia claims that without the GPUs, the system would need almost four times as many CPUs, twice as much floor space and three times as much electricity to operate.
And although Dally focused his remarks on use in HPC, he said that the general idea will permeate the computing world as a whole.
"HPC is, in many ways, an early adopter, because they run into problems sooner because they operate at a larger scale. But this applies completely to consumer applications as well as to server applications," he said, in an interview following the keynote.
Dally said that while not many current applications are written to run in parallel environments, eventually programmers will move to this model. "I think over time, people will convert applications to parallel, and those parallel segments will be well-suited for GPUs," he said. He even predicted that systems will one day be able to boot off the GPU as well as the CPU, though he said he knows of no work in particular to build a GPU-based operating system.
Factoring in energy use is one of Dally's crucial tenants for claiming GPU superiority. He noted that while the next-generation Nvidia GPU architecture, nick-named Fermi, would consume 200 pJs (picojoules) in power for each instruction executed, a CPU consumes 2nJ (nanojoules), or an order-of-magnitude more joules.
This tiny difference will amount to a huge chasm when amplified across large systems. Dally pointed to the U.S. Defense Advanced Research Projects Agency's efforts to fund development of an exascale computer, or a computer that can execute 1 quintillion calculations per second. Such a system built from CPUs alone, he argued, would require a "nuclear power plant built next door" just to operate in terms of energy use.
Not everyone in the HPC community is completely sold on the idea of using GPUs as a substitute for CPUs. One potential problem many point to is that while GPUs may have greater throughput, it is difficult for systems to provide that much data to these processors.
"There is very little amount of memory that is available to each of the GPUs. If you have something really fast, you need to feed it really fast, and if you don't have enough memory to feed that processor, that processor will just sit there and wait," Dave Turek, head of IBM's deep computing division, said last week.
Dally said that this bandwidth problem is not unique to GPUs--CPUs face the same dilemma. "Bandwidth is a big problem for any computing system," he said. He admitted the problem is more acute for GPUs, though. Nvidia's just-released GTX 580 card has a raw bandwidth of 200 gigabytes per second, whereas a "top-of-the-line" CPU has only about 35 gigabytes per second. "Memory systems need to evolve to be more efficient," he said.