Google has given its Web search engine an injection of semantic technology, as the search leader pushes into what many consider the future of search on the Internet.
The new technology will allow Google’s search engine to identify associations and concepts related to a query, improving the list of related search terms Google displays along with its results, the company announced in an official blog on Tuesday.
“For example, if you search for ‘principles of physics’, our algorithms understand that ‘angular momentum,’ ‘special relativity,’ ‘big bang’ and ‘quantum mechanic’ are related terms that could help you find what you need,” wrote Ori Allon, technical lead of Google’s Search Quality team, and Ken Wilder, team engineer at the company’s Snippets project.
Google has often been criticized for using what is considered an aging approach to solving search queries based primarily on analyzing keywords and not on understanding their meaning.
Google executives over the years have acknowledged that semantic search technology will be an important component of search engines in the future.
“Right now, Google is really good with keywords and that’s a limitation we think the search engine should be able to overcome with time,” Google Vice President of Search Products & User Experience Marissa Mayer said in an interview with IDG News Service in October 2007. “People should be able to ask questions and we should understand their meaning, or they should be able to talk about things at a conceptual level. We see a lot of concept-based questions — not about what words will appear on the page but more like ‘what is this about?’. A lot of people will turn to things like the semantic Web as a possible answer to that.”
She cautioned, however, that Google sees semantic search technology as part of the algorithmic mix, not as a replacement to its traditional keyword-analysis approach.
“What we’re seeing actually is that with a lot of data, you ultimately see things that seem intelligent even though they’re done through brute force,” she said. “Because we’re processing so much data, we have a lot of context around things like acronyms. Suddenly, the search engine seems smart, like it achieved that semantic understanding, but it hasn’t really. It has to do with brute force. That said, I think the best algorithm for search is a mix of both brute-force computation and sheer comprehensiveness and also the qualitative human component.”
In January of this year, during Google’s fourth-quarter earnings conference call, CEO Eric Schmidt touched briefly on this topic, hinting that the company is getting more serious about semantic search technology. “Wouldn’t it be nice if Google understood the meaning of your phrase, rather than just the words that are in the phrase? We have [done] a lot of discoveries in that area that are going to roll out [soon],” Schmidt said.
There is an entire field of Google competitors that are busy developing and perfecting semantic search engines, betting that they will be able to deliver on the promise of this technology: to let users type in queries in natural language and have the search engine understand their meaning and intent.
Critics have often pointed out that these excerpts aren’t very useful in previewing enough context so that users can decide whether to click over to the Web site.
Now, when people enter queries that are three words or longer, Google will deliver longer snippets in order to provide users with a better view as to how their query keywords appear on the Web site.
It remains to be seen if Web site publishers will cry foul over longer snippets. In the past, publishers have sometimes complained that search engine abstracts that are too long give away too much of their sites’ content. This in turn, they say, could cause potential visitors to not click over to the page, particularly if the abstract, or snippet, gives them the information they’re looking for.
This is an area where search engines have to strike a delicate balance between fulfilling their mission — giving their users the most precise information possible related to their query — and not violating the copyrights of Web site publishers.