As each new generation of computer processors arrives with a larger number of computing cores, computer scientists grapple with how best to make use of this proliferation of parallel power.
Now researchers at the Massachusetts Institute of Technology have created a data structure that they claim can help large multicore processors churn through their workloads more effectively. Their trick? Do away with the traditional first-come, first-served work queue and assign tasks more randomly.
MIT’s new SprayList algorithm allows processors with many cores to spread out their work so they don’t stumble over one another, creating bottlenecks that hamper performance.
If it pans out in day-to-day operation, something like SprayList might pave the way for more effective use of new, many-core processors coming our way, such as Intel’s new 18-core server chip,the E5 2600v3.
Multicore processors, in which a single processor contains two or more computational cores that operate simultaneously, can present a challenge in programming. The problem is that the work a computer needs to do must be distributed evenly across each of the cores for maximum performance.
When the first commodity two-core and four-core processors came out more than a decade ago, software researchers harnessed a simple and well-known computer science technique to dole out work, called priority queue, in which the task on top of the work queue is assigned to the next available core. The queue can be ordered through a combination of job priority and good old-fashioned first-come, first-served serialization.
Traditional implementations of priority queue work fine for up to eight cores. But performance suffers when additional cores are added, the researchers said.
Like having too many cooks in the kitchen, too many cores working on the top of a single priority queue can slow performance. Multiple cores hitting the queue at the exact same time can cause bottlenecks as each core contends for the top task. And because each core keeps its own copy of the priority queue in a cache, synchronizing the ever-changing queue across these multiple caches can be a headache—if processors could get headaches, that is.
So the researchers devised a new way of implementing priority queues in such a way that they will continue to be effective for up to 80 cores. Instead of each core being assigned the next task in the queue, the core is assigned a random task, reducing the chances of creating a bottleneck from two cores contending for the same task.
Random assignment has traditionally been frowned upon by those who think a lot about computer processors, the researchers noted in a paper explaining the work. A random scheduling algorithm takes longer to jump around the queue than a conventional one does. Caches can’t be used to store upcoming work items. And if a set of tasks needed to perform one job are executed out of order, then the computer needs additional time to reassemble the final results.
But as the number of cores increase, these disadvantages are outweighed by performance gains of using a more “relaxed” style of random assignment, the researchers said.
To test the effectiveness of SprayList, the researchers ran an implementation on a Fujitsu RX600 S6 server with four 10-core Intel Xeon E7-4870 (Westmere EX) processors, which supported 80 hardware threads. In effect, it mimicked an 80-core processor.
When used to juggle fewer than eight processing threads, SprayList was indeed slower than a set of more traditional algorithms. As more threads were introduced, the performance of these established algorithms leveled out, and SprayList’s performance continued to increase linearly, as measured by operations per second.
SprayList does not pick tasks entirely at random, but rather works off a kind of priority queue called a skip list, which bundles tasks into different priority levels, ensuring high-priority items still get processed before low-priority ones.
“Users who specifically choose to use a priority queue require that items with the highest priority are selected before items with low priority. Our work argues that it is OK to relax this somewhat—we can process the tenth-highest priority before the first highest priority without too much problem,” said MIT Graduate student Justin Kopinsky, who led the work with fellow graduate student Jerry Li. Pure randomization might lead the computer to first process tasks with very low priority. “Then you run into trouble,” Kopinsky wrote by email.
For the work, Kopinsky and Li received help from their advisor Nir Shavit, an MIT professor of computer science and engineering, as well as Dan Alistarh, who works at Microsoft Research and is a former student of Shavit’s.
The researchers will present their work next month in San Francisco at the Association for Computing Machinery’s Symposium on Principles and Practice of Parallel Programming.