Following their march from standard processors to dual-core and quad-core designs in 2006, Intel researchers have built an 80-core chip that performs more than a teraflop of operations (trillions of floating point operations per second) while using less electricity than a modern desktop PC chip.
First described by Intel executives at a September trade show, the chip fits 80 cores onto a 275-square millimeter, fingernail-size chip and draws only 62 watts of power--less than many modern desktop chips.
The company has no plans to bring this "teraflop research chip" to market, but is using it to test new technologies such as high-bandwidth interconnects, energy management techniques, and a tile design method to build multicore chips, said Jerry Bautista, director of Intel's tera-scale research program. He spoke in a conference call with reporters on Friday before presenting technical details of the research at the ISSCC (Integrated Solid State Circuits Conference) trade show in San Francisco. Intel has discussed the 'era of tera' before.
Intel engineers are also using the chip to explore new forms of tera-scale computing, in which future users could process terabytes of data on their desktops to perform real-time speech recognition, multimedia data mining, photo-realistic gaming, and artificial intelligence.
Until now, that degree of computing performance has been available only to scientists and academics using machines like ASCI Red, the teraflop supercomputer built by Intel and its partners in 1996 for U.S. government researchers at Sandia National Laboratories, near Albuquerque, New Mexico. That system handled a similar amount of computing as the new chip, but demanded an enormous 500 kilowatts of power and 500 kilowatts of cooling to run its nearly 10,000 Pentium Pro chips.
Shrunk onto a single chip, that power would allow average consumers to use their PCs in new ways. They could use improved search functions on the vast amounts of digital media stored on home desktops, searching large photo archives for specific attributes such as all the shots where a certain person is smiling, or where that person is posing with a friend, Bautista said.
Running at 3.16 GHz, the new chip achieves 1.01 teraflops of computation--an efficiency of 16 gigaflops per watt. It can run even faster, but loses efficiency at higher speeds, performing at 1.63 teraflops at 5.1 GHz and 1.81 teraflops at 5.7 GHz.
The processor saves power by shunting idle cores into sleep mode, then instantly turning them on as they're needed. Each modular tile has its own router built alongside the core, creating a "network on a chip."
Despite using such an efficient grid, the researchers found they could actually hurt performance by adding too many cores. Performance scaled up directly from 2 cores to 4, 8, and 16. But they found that computing performance began to drop with 32 and 64 cores.
"If we simply added more than 16 cores, we would get diminishing returns, because the threads and data traffic would not be used properly, so the cores get in the way of each other. It's like having too many cooks in the kitchen," said Bautista.
To solve the problem on the new chip, they used a hardware-based thread scheduler and faster on-chip memory caches, optimizing the way data flows from memory into each core. To improve the design, Intel researchers plan to add a layer of "3D stacked memory" under the chip to minimize the time and power required to feed the cores with data. Next, they will create a mega-chip that uses general purpose cores instead of the floating-point units used in the current design.