Microsoft Research Develops Map Search for Unstructured Data
Microsoft Research India has developed technology that will allow users to search maps even in countries like India where the addresses are often not in a structured format.
The research project is called Robust Location Search, and a prototype of the technology is already ready, said B. Ashok, director of advanced development and prototyping at Microsoft Research India, on Wednesday. Although developed in India, the technology is generic and has been designed to be deployed in any country that has unstructured addresses, he added.
Rather than look for rules in the address, the algorithm uses underlying geospatial data to figure out what location the terms in the address string match with, Ashok said.
In countries like India, a location is often described in an address by its spatial relationship such as near or opposite a landmark, rather than by a formal, hierarchical address structure consisting of a street number, street name, city, state and postal code.
Very often the same location may have a different address or a reference to a different landmark, Ashok said. The local postman knows how to deliver letters based on these unstructured addresses, but such unstructured data poses a challenge for software used for map searches, he added.
Commercial mapping services, including those of Google, Yahoo and Microsoft were initially designed for countries like the U.S., which has structured addresses, but they may not be as good when working on unstructured addresses, Ashok said.
The research lab in Bangalore is in discussions to incorporate the new algorithm in Microsoft's Windows Live Local.
Microsoft Research India uses a technology, called spatial intersection, to analyze various terms in the address string to figure out the location for an unstructured address such as " 2nd Cross, 10th Main, Sadashivnagar, Bangalore."
The software starts with terms like "2nd Cross" and "10th Main," then uses the street intersection information that 2nd Cross intersects with 10th Main to identify all the instances on the map where the 2nd Cross intersects with 10th Main, Ashok said. The next term, Sadashivnagar, intersects with the information collected on intersections of 2nd Cross and 10th Main to arrive at the location that the address refers to, he added.
The software will arrive at the location regardless of the order in which the terms are presented in the address, and also when the same location has a number of alias addresses, Ashok said. It can also be used by users in multiple languages. A query in Hindi, an Indian language, would for example be transliterated to the language of the map, and the search done based on these terms, according to Ashok.
Terms in the address such as "near" that don't conform to data on the map are also discarded, Ashok said.