Supercomputer Race: Tricky to Boost System Speed
University of Tennessee computer science professor Jack Dongarra is one of the developers of the Linpack benchmark and a co-publisher of the Top500 report. He calls Roadrunner a "general-purpose computer" but one that, because of its hybrid architecture, "specializes in what it can do." Invoking that specialization is not trivial, he admits.
"If you are writing a program for Roadrunner, you essentially have to write three programs -- one for the AMD Opteron processor, one for the Power core that's on the Cell chip and one for the vector units in the Cell chip," he says. "The only way to get to a point where you'd be happy with the performance is to rewrite your old applications. The guys at Los Alamos believe that they can in fact benefit by rewriting their code."
Dongarra says a computer at the top of the Top500 list will typically spend six years on the list before falling off the bottom, and he doesn't expect Roadrunner's hybrid Opteron/Power/Cell architecture to stay on top for long.
"The trend is to large numbers of [processor] cores on a single die," he says. "And it looks like we'll have this one chip with different kinds of cores on it. We might have cores that specialize in floating point, ones that specialize in graphics and those that are more commodity-based." Exploiting that flexibility so the chip is, in essence, tuned for a specific application domain, such as climate modeling, will require software tools that do not yet exist, he says.
Intel Corp. is doing as Dongarra suggests -- developing specialized microprocessor cores and the software tools to exploit them. It's also responding to Loft's plea for faster memory access.
Bandwidth aside, memory will have to be more power-efficient if exascale computers are to draw reasonable amounts of power, says Steve Pawlowski, an Intel senior fellow. He says both objectives can be met in part by building bigger on-chip cache memories that act as very fast buffers between processor cores and dynamic RAM.
"If you can cache a significant number of DRAM pages, the machine thinks it's talking to flat DRAM at high speeds, and you can populate behind it much slower and more power-efficient DRAMs," he says. "You want the cache big enough to hide the [memory] latency, and you want to be clever in how you manage the pages by doing page prefetching and things like that."
He says Intel is also working on increasing the communication bandwidth of the individual pins that connect the processor chip to the memory controller. "I'd like to push the memory bandwidth to be 10 times greater than it is today by 2013 or 2014," Pawlowski says. "The engineers working for me say I'm crazy, but it's a goal."
In the meantime, Intel and others are working on one or two other possibilities -- very high-speed communication via silicon photonics (light) and "3-D die-stacking," which creates a dense sandwich of CPU and DRAM. Both technologies have been proved in labs but have not yet been shown to be economically viable for manufacturers, Pawlowski says.
Petaflops, peak performance, benchmark results, positions on a list -- "it's a little shell game that everybody plays," says NCAR's Loft. "But all we care about is the number of years of climate we can simulate in one day of wall-clock computer time. That tells you what kinds of experiments you can do." State-of-the-art systems today can simulate about five years per day of computer time, he says, but some climatologists yearn to simulate 100 years in a day.
"The idea," Loft says, "is to get an answer to a question before you forget what the question is."