Facebook engineers identify Graph Search's big data challenges

Facebook’s engineers have many challenges ahead of them as they work to scale up Graph Search, the site’s new social search tool. One stumbling block: an over-abundance of data to sift through.

Take the example of searching for Japanese restaurants in New York City liked by people from Japan. A search that would seem to generate hundreds if not thousands of results only spits back two measly businesses.

The search engine, in its current beta form, simply does not have the processing power to sift through the millions of connections among Japanese people on the site to perform the search, Facebook engineers said Thursday during a small media briefing at the company’s headquarters in Menlo Park, California.

Kerry Davis, IDGNS
Mike Curtiss, engineer at Facebook, describes the background of Graph Search, which is powered by a search engine Facebook calls Unicorn.

“There’s still a lot of work we have to do,” said software engineer Michael Curtiss. “A query like this is very difficult computationally,” to start with the 100 million in Japan, and then in a fraction of a second to sort through all the pages liked by people in Japan, he said.

“This is virtually intractable in the limited amount of time that we have,” said the engineer, who helped to design the site’s Unicorn search engine that provides Graph Search’s infrastructure. “What we end up having to do is cut out possibly good results.”

Kerry Davis, IDGNS
Facebook's graph search underpinning is called Unicorn by the company. Within it, edge types are ascribed a number. Here, the number for friends, which works as a keyword inside the system, is shown on the bottom left of the screen.

Facebook is taking a variety of approaches to solve this and other big data problems associated with Graph Search.

One strategy involves a concept in computer databases known as “query optimization,” to improve the speed and efficiency of certain types of searches.

In the case of the Japanese restaurant search, the technique could be applied to start first with the restaurants that are liked instead of starting with Japan, and then filtering down the likes by people, Facebook engineers said.

The company is also addressing the challenges at the hardware level, by adding additional flash memory and other new features to the servers it uses at data centers, to accommodate the increase in search traffic caused by Graph Search.

“We need to do extra work in data centers, buying new hardware platforms, [with] new types of servers being put up to support the computational needs of Unicorn,” said Soren Lassen, who led the search infrastructure team behind Graph Search.

Facebook began rolling out Graph Search last month to a limited number of users in the U.S. The search tool is designed to let people comb through the social network’s 1 trillion connections among users to search for people, places, photos and interests using phrases in plain English.

In principle, nothing can stop users from typing in a query that is unusually long, such as “Employers of friends of my friends who live in New York and who like Downton Abbey,” engineers said, since Graph Search uses cues such as “Likes” and check-ins to more easily rank the results.

Eventually Graph Search will incorporate other metrics such as user comments and status updates to compile and rank results, but that’s further down the line, the company said.

Subscribe to the Today in Tech Newsletter

Comments