MIT invention to speed up data centers should cheer developers
A breakthrough by researchers at the Massachusetts Institute of Technology could change the way Web and mobile apps are written and help companies like Facebook keep the cat videos coming.
Their main innovation is a new way to decide when each packet can scurry across a data center to its destination. The software that the MIT team developed, called Fastpass, uses parallel computing to make those decisions almost as soon as the packets arrive at each switch. They think Fastpass may show up in production data centers in about two years.
In today’s networks, packets can spend a lot of their time in big, memory-intensive queues, lined up like tourists at Disney World. That’s because switches mostly decide on their own when each packet can go on to its destination, and they do so with limited information. Fastpass gives that job to a central server, called an arbiter, that can look at a whole segment of the data center and schedule packets in a more efficient way, according to Hari Balakrishnan, MIT’s Fujitsu Professor in Electrical Engineering and Computer Science. He co-wrote a paper that will be presented at an Association for Computing Machinery conference next month. The co-authors included Facebook researcher Hans Fugal.
Centralized decision-making is all the rage in networking as vendors implement various versions of SDN (software-defined networking). In fact, Balakrishnan was one of the authors of a key early paper on SDN. But those systems make higher level decisions, such as how to handle various types of traffic, in seconds or minutes. Fastpass applies the same concept to packet-by-packet forwarding decisions, Balakrishnan said.
Their motivation is not so much to make your Facebook page load faster or your Google search results come up sooner, though that might happen. Instead, the inventors of Fastpass want to simplify both applications and switches, and shrink the amount of bandwidth companies have to have in their data centers.
In switches, tools for managing queues add complexity that raises costs, Balakrishnan said. He envisions future switches with room for very small queues, “just to be defensive,” and correspondingly lower cost and complexity.
By making all packets arrive on time, Fastpass can also save network architects from having to overprovision data center links for unpredictable bursts of traffic. As the number of users and the volume of data grows, it should be easier to keep up.
There’s a similar benefit for developers of distributed applications, which split problems up and send them to different servers around a network for answers.
“Developers struggle a lot with the variable latencies that current networks offer,” said co-author Jonathan Perry, an electrical engineering and computer science graduate student at MIT. With that solved, “It’s much easier to develop complex, distributed programs like the one Facebook implements,” he said.
The current decentralized way of forwarding packets allows for vast networks with little oversight. But because traffic is unpredictable, network designers have to either invest in fat enough pipes to carry the highest possible load or put a queue in each switch to hold packets until they can go out. Usually, it’s a balancing act between the two.
“It’s very hard to figure out how big the queues need to be. ... This has been a difficult question since 1960,” Balakrishnan said. Making them too big can slow performance, while making them too small can lead to dropped packets and time-consuming retransmissions.
Fastpass assigns transmission times and selects paths for each packet, and it can do that quicker than a typical switch can, according to MIT. Fastpass is so much faster that even though it makes every packet go over the network to the arbiter, a trip that may take about 40 microseconds, it still speeds things up, according to MIT.
With that kind of speed, there’s essentially no need for queues. In experiments in a Facebook data center, Fastpass cut the average length of a queue by 99.6 percent, the researchers say. Latency, or the delay between requesting and receiving an item, went from 3.56 microseconds to 0.23 microseconds.
In the test, an arbiter with just eight cores was able to make decisions for a network carrying 2.2 terabits of data per second, which is equal to a 2,000-server data center with gigabit-speed links running at full speed, MIT said. The arbiter was linked to a twin system for redundancy.
Instead of making all eight cores work on assigning a transmission for one time slot at a time, Balakrishnan’s team gave each core its own time slot. One tries to fill the next time slot while another is working two slots ahead, and another three slots ahead, and so on.
“You want to allocate for many time slots into the future, in parallel,” Balakrishnan said. Each core looks through the full list of transmission requests, assigns one, and modifies the list, and all the cores can work on the problem simultaneously.
Fastpass, or software like it, could be implemented in dedicated server clusters or even built into specialized chips, Balakrishnan said. The researchers plan to release the Fastpass software as open source, though they warned it’s not production-ready code.
“Anyone with a high-speed data center should be interested,” he said.