Microsoft reported Tuesday that a new “Catapult” board, running a specialized type of reprogrammable chip, improved the performance of its Bing search engine dramatically enough that it plans to deploy the technology in a data center in 2015. Baidu, one of the largest Web giants in China, reported similar results.
Both companies presented papers at Hot Chips, an academic conference in Cupertino dedicated to describing improvements in microprocessors and related technologies.
By now, graphics chips are accepted components of PCs and game consoles. Years ago, dedicated audio accelerators were, too. Both chips performed specific functions, very efficiently, over and over. Microprocessors like Intel’s Core, AMD’s A-series APUs, or ARM’s Cortex processors, meanwhile, are known as general-purpose chips, programmed for a variety of tasks.
FPGAs (field-programmable gate arrays) sit somewhere in between, allowing limited programmability at a lower performance than a fixed-function chip. The idea, according to Andrew Putnam, a senior research design engineer at Microsoft, was to use FPGAs to add life to older machines “with an eye toward flexibility,” he said. “Once you buy a machine for the data center, it’s pretty much stuck there.”
When CPUs aren’t good enough
Flexibility is critical to accelerate a software algorithm like search, which is constantly being tweaked and improved. Over time, a fixed-function accelerator would have simply become more and more inefficient, wasting space and power, Putnam said,
Putnam’s team tried and rejected a dedicated board (top) using six Xilinx FPGAs, as it complicated the server design, created a single point of failure, and consumed a disproportionate amount of heat—all of which freaked out the administrators overseeing the datacenters powering Bing and other Microsoft cloud services. Instead, Microsoft turned to “Catapult,” a board containing a Altera Stratix V G5 D5, 8GB of memory, 32MB of flash, and a PCI Gen 3 x8 connector.

Microsoft’s “Catapult” board.
Catapult would fit within Microsoft’s Open Compute server, a design popularized by Facebook, that doesn’t necessarily require the development services of an OEM like HP or Dell. Microsoft’s servers are powered by a pair of 2.1GHz, 8-core Xeon chips, 64GB of DRAM, and four 2TB hard drives along with a pair of 512GB SSDs. Microsoft inserted one FPGA board into each server, and inserted the servers into a half-rack (48 servers). Cables connected the FPGA boards in a 6×8 torus network.
Microsoft then tucked the cards into a production test: 1,632 servers in a datacenter. What the company found, Putnam said, was that the FPGA cards accelerated Bing’s scoring of documents for relevance compared to a user’s search parameters. Microsoft achieved a 2X improvement in search throughput and a 29-percent reduction in the latency delay to process the search. The savings allowed Microsoft to cut the number of servers it needed in half.

The test was so successful that it will be rolled out to all Bing servers in one datacenter in 2015, Putnam said. The challenge now is to figure out where next to apply the FPGA technology, he added.
Baidu, which owns tens of thousand of servers in China, used FPGAs to accelerate deep neural networks, algorithms used for everything from traditional search to speech recognition to image search and recognition. Baidu used a board with a Xilinx K7 480t-2l FPGA board that could be plugged into any type of 1U or 2U server. Under various workloads, Baidu found that the FPGA boards were several times more efficient than either a CPU or GPU.
All end users care about is the quality of their search results from Bing or Baidu. Improving the efficiency and performance of the search algorithms is good news for everyone.
This story has been corrected to note that the Catapult boards will be added to all servers within one Microsoft data center, powering Bing, in 2015.