Opera Software on Wednesday revealed a search engine that indexes structural information about Web pages so Web developers and standards bodies can see what technologies are being used to build Web sites and how they are being used.
The Metadata Analysis and Mining Application search engine — “MAMA” for short — is being tested by the company and should be released in an invitation-only beta by the end of the year, said Snorre Grimsby, vice president of quality assurance at Opera in Oslo, Norway.
MAMA grew out of tests Opera routinely does to make sure its own browser software products work well with existing Web pages that use the most commonly used Web site-creation technology, he said.
“We realized internally that we needed to be able to find lots of live sites out there that used certain technologies in certain combinations so we could test our browser on them,” Grimsby said.
The resulting search engine crawls the Web, but instead of indexing the content of Web sites, as most search engines do, it discards the content and indexes the types of technologies being used on sites, such as Cascading Style Sheets (CSS), Hypertext Markup Language (HTML), XHTML (Extensible HTML) and the like, Grimsby said.
This information is helpful for Web developers, who can use MAMA to identify sites that are using certain kinds of technology and see how other developers have implemented it, he said.
“It’s a known fact that Web developers borrow ideas from each other,” Grimsby said. If developers are working with a Web application that needs, for example, a new menu system, MAMA can help them find sites that use the technology being considered to build the system to get ideas for their own implementation.
Developers also can use MAMA to see how well sites conform to current World Wide Web Consortium (W3C) specifications for commonly used Web standards, such as CSS, HTML and others. The W3C oversees the creation and maintenance of specs for many of the most prevalent Web-site development technologies.
Grimsby said that in Opera’s own use of MAMA, Opera found that the average Web page has 47 discrepancies in how the site renders W3C-maintained technologies and the W3C specifications themselves.
MAMA also can be useful for the W3C and other standards bodies to help them set priorities for developing specifications. For example, if a technology is used a certain way on the majority of Web sites, or not used very much at all, the W3C “can change the spec or take something out of the spec,” Grimsby said.
During an interview Wednesday, Grimsby demonstrated MAMA in real time by using it to crawl an International Data Group Web page, http://www.idg.net/idgns, to find out what technologies the site used.
According to the search engine, the site is running on version 2.2.8 of the Apache Web Server on a Windows 32-bit hardware server, has 56 hyperlinks and uses XHTML (Extensible HTML) 1.0 and CSS, he said.
In the next eight weeks Opera expects to publish a series of articles on its developer Web site about its own internal use of MAMA, noting key findings, statistics and trends the search engine discovers, he said.
By the end of the year, the company will invite key people within standards bodies to test the search engine, with a goal of releasing it publicly to developers sometime in the first or second quarter of next year, Grimsby said.