If the increase in supercomputer speeds continue at their current pace, we will see the first exascale machine by 2020, estimated the maintainers of the Top500 compilation of the world's fastest systems.
System architects of such large computers, however, will face a number of critical issues, a keeper of the list warns.
"The challenges will be substantial for delivering the machine," said Jack Dongarra, a University of Tennessee, Knoxville, researcher who is one of the principals behind the Top500. Dongarra spoke at the SC2012 conference, being held this week in Salt Lake City, during a presentation about the latest edition of the list, released last week.
We still have a way to go before exascale performance is possible. An exascale machine would be capable of one quintillion FLOPS (floating point operations per second), or 10 to the 18th FLOPS. Even today's fastest supercomputers offer less than 20 percent of the capability of an exascale machine.
In the most recent edition of the Top500 list of supercomputers, released Monday, the fastest computer on the list was the Oak Ridge National Laboratory Titan system, a machine capable of executing 17.59 petaflops. A petaflop is a quadrillion floating point calculations per second, or 10 to the 15th FLOPS.
But each new Top500—the list that is compiled twice a year—shows how quickly the speeds of supercomputers grow. Judging from the list, supercomputers seem to gain tenfold in power every ten years or so. In 1996, the first teraflop computer appeared on the Top500, and in 2008, the first petaflop computer appeared on the list. Extrapolating from this rate of progress, Dongarra estimates that exascale computing should arrive around 2020.
The High Performance Computing (HPC) community has taken on exascale computing as a major milestone. Intel has created a line of massively multicore processors, called Phi, that the company hopes could serve as the basis of exascale computers that could be running by 2018.
In his talk, Dongarra sketched out the characteristics of an exascale machine. Such a machine will likely have somewhere between 100,000 and 1,000,000 nodes and will be able to execute up to a billion threads at any given time. Individual node performance should be between 1.5 and 15 teraflops and interconnects will need to have throughputs of 200 to 400 gigabytes per second.
Supercomputer makers will have to construct their machines so that their cost and power consumption do not increase in a linear fashion along with performance, lest they grow too expensive to purchase and run, Dongarra said. An exascale machine should cost about $200 million, and use only about 20 megawatts, or about 50 gigaflops per watt.
Dongarra expects that half the cost of building such a computer would be earmarked for buying memory for the system. Judging from the roadmaps of memory manufacturers, Dongarra estimated that $100 million would purchase between 32 petabytes to 64 petabytes of memory by 2020.
In addition to challenges in hardware, designers of exascale supercomputers must also grapple with software issues. One issue will be synchronization, Dongarra said. Today's machines pass tasks among many different nodes, though this approach needs to be streamlined as the number of nodes increases.
"Today, our model for parallel processing is a fork/join model, but you can't do that at [the exascale] level of a parallelism. We have to change our model. We have to be more synchronous," Dongarra said. Along the same lines, algorithms need to be developed that reduce the amount of overall communication among nodes.
Other factors must be considered as well. The software must come with built-in routines for optimization. "We can't rely on the user setting the right knobs and dials to get the software to run anywhere near peak performance," Dongarra said. Fault resilience will be another important feature, as will reproducibility of results, or the guarantee that a complex calculation will produce the exact same answer when run more than once.
Reproducibility may seem like an obvious trait for a computer. But in fact, it can be a challenge for huge calculations on multinode supercomputers.
"From the standpoint of numerical methods, it is hard to guarantee bit-wise reproducibility," Dongarra said. "The primary problem is in doing a reduction -- a summing up of numbers in parallel. If I can't guarantee the order in which those numbers come together, I'll have different round-off errors. That small difference can be magnified in a way that can cause answers to diverge catastrophically," he said.
"We have to come up with a scenario in which we can guarantee the order in which those operations are done, so we can guarantee we have the same results," Dongarra said.