Archiving the Net 'Wayback' to 1996
If you've ever wondered what books Amazon.com was recommending five years ago, what the first Web cam was focused on, or what was on more than 10 billion other Web pages dating back to 1996, you're in luck. That's exactly what's offered by the Internet Archive's free "
The Wayback Machine, unveiled to the public on Wednesday, is an online archive of Web sites, as well as the public face of San Francisco-based
Thanks to the archive, we now know there are 1.5 million Hungarian-language pages on the Web. Xerox researchers also found there are 201 other languages represented on the Web and they are thriving in the digital universe.
The sheer size of the Internet Archive almost guarantees there is something for everybody. If you digitized all the books in the U.S. Library of Congress, it would take up about 20 terabytes of storage space, just scratching the surface of more than 100 terabytes of information available using the Wayback Machine.
"Our opportunity is to not have to be selective," Brewster Kahle, director of the Internet Archive, said at Wednesday night's launch of the Wayback Machine at the University of California at Berkeley. "Our opportunity is not only to have it all, but to make it widely available."
However, having it all isn't easy--the Internet Archive is still growing by 12 terabytes a month, meaning that the material archived every two months contains more data than all the books in the Library of Congress.
To keep up with this growth, the Internet Archive keeps all this information on a networked chain of 300 desktop PCs in the basement of a former military building in the Presidio of San Francisco.
These aren't your average desktops, though. Most of them only have a single 1.5-GHz processor, but they also have 640MB of memory and four 80GB hard drives, says Niall O'Driscoll, vice president of engineering for Alexa Internet, one of the companies behind the Internet Archive. Alexa was also cofounded by Kahle, and was purchased by online bookseller Amazon.com in 1999.
"There are basically 20 machines in the front line," O'Driscoll says. "When you query it, it asks all 20 and one says 'I have it,' or 'I know where to find it,' and it redirects to the actual machine with the information."
Perusing through the Web of yesteryear is a lot like looking at a huge collection of old newspapers without getting your fingers dirty: You never know what you're going to find.
One page Kahle found while perusing the Web was a 1996 White House statement from then-U.S. President Bill Clinton discussing aviation security following the crash of TWA Flight 800 off the coast of New York. "Shortly, I will submit to Congress a budget request for more than $1 billion to expand our FBI anti-terrorism forces and to put the most sophisticated bomb detection machines in America's airports," Clinton said on September 9, 1996.
"The overwhelming metaphor for the Internet is a library," Stanford University Law professor and author Lawrence Lessig said at the launch. Following that metaphor, the Internet Archive is "that quiet librarian working in a room making sure that you can have access to what you want access to," he said.
The Wayback Machine even houses "special collections" covering the terrorist attacks on September 11, the history of the U.S. government on the Web, and the vote controversy following the 2000 U.S. presidential election. The "Web Pioneers" collection pays tribute to Web sites that "shaped the character of the Net in the early years." And among those influential sites is the University of Cambridge Trojan Room Coffee Machine--the first Webcam, which traces its roots back to 1991.
"We've finally created a library for the world," Lessig said.