Supercomputer Race: Tricky to Boost System Speed
Every June and November, with fanfare lacking only in actual drum rolls and trumpet blasts, a new list of the world's fastest supercomputers is revealed. Vendors brag, and the media reach for analogies such as "It would take a patient person with a handheld calculator x number of years (think millennia) to do what this hunk of hardware can spit out in one second."
The latest Top500 list, released in June, was seen as especially noteworthy because it marked the scaling of computing's then-current Mount Everest -- the petaflops barrier. Dubbed "Roadrunner" by its users, a computer built by IBM for Los Alamos National Laboratory in New Mexico topped the list of the 500 fastest computers, burning up the bytes at 1.026 petaflops, or more than 1,000 trillion arithmetic operations per second.
A computer to die for if you are a supercomputer user for whom no machine ever seems fast enough? Maybe not.
Richard Loft, director of supercomputing research at the National Center for Atmospheric Research in Boulder, Colo., says he doubts Roadrunner would operate at more than 2% of its peak rated power on NCAR's ocean and climate models. That would bring it in at 20 to 30 teraflops -- no slouch, to be sure, but so far short of that petaflops goal as to seem more worthy of the nickname "Roadwalker."
"The Top500 list is only useful in telling you the absolute upper bound of the capabilities of the computers," Loft says. "It's not useful in terms of telling you their utility in real scientific calculations."
The problem, he says, is that placement on the Top500 list is determined by performance on a decades-old benchmark called Linpack, which is Fortran code that measures the speed of processors on floating-point math operations -- for example, multiplying two long decimal numbers. It's not meant to rate the overall performance of an application, especially one that does a lot of interprocessor communication or memory access.
Moreover, users and vendors seeking fame high on the list go to elaborate pains to tweak their systems to run Linpack as fast as possible -- a tactic permitted by the list's compilers.
The computer models at NCAR simulate the flow of fluids over time by dividing a big space -- the Pacific Ocean, say -- into huge grids and assigning each cell or group of cells in the grid to a specific processor in a supercomputer.
It's nice to have that processor run very fast, of course, but getting to the end of a 100-year climate simulation requires an enormous number of memory accesses by a processor, something that typically happens much more slowly. In addition, some applications require passing many messages from one processor to another, which can also be relatively slow.
So, for many applications, the bandwidth of the communications network inside the box is far more important than the floating-point performance of its processors. That's even more true for business applications, such as online search or transaction processing.
An even greater bottleneck can crop up in programs that can't easily be broken into uniform, parallel streams of instructions. If a processor gets more than its fair share of work, all the others may wait for it, reducing the overall performance of the machine as seen by the user. Linpack operates on the cells of matrices, and by making the matrices just the right size, users can keep every processor uniformly busy and thereby chalk up impressive performance ratings for the system overall.
"As long as we continue to focus on peak floating-point performance, we are missing the actual hard problem that is holding up a lot of science," Loft says.