KDnuggets : News : 2005 : n16 : item37 < PREVIOUS | NEXT >

Briefs

In Silicon Valley, a Debate Over the Size of the Web

New York Times (08/15/05) P. C6; Markoff, John

Debate erupted over how big the World Wide Web is last week when Yahoo! declared at an Internet search engine conference that there were upwards of 19.2 billion documents in its search engine index, more than double the 8.1 billion currently reported by Google; this led to Google raising questions about Yahoo!'s accounting methods. "The comprehensiveness of any search engine should be measured by real Web pages that can be returned in response to real search queries and verified to be unique," said Google co-founder Sergey Brin on Aug. 12, suggesting that Yahoo! inflated its index with duplicate entries.

Jeff Weiner of Yahoo!'s search and marketplace group insisted that the document count in its index was accurate. However, both sides of the Web size debate agree that the relation of index size to the quality of results returned is loose, and perhaps somewhat converse. Researchers at the National Center for Supercomputer Applications ran a random sample of 10,012 queries on both the Yahoo! and Google indices on Aug. 14, and found that Google returned 166.9 percent more results than Yahoo!, on average, while Yahoo! turned up more results than Google in a mere 3 percent of cases. Both search engines are fiercely protective of their collection techniques' underlying software, and the continued secrecy will make accurate Web or index size estimates very difficult, according to search engine experts. "The whole question of how big indexes are has clearly become extremely political and commercial," laments Stanford University professor Christopher Manning.


KDnuggets : News : 2005 : n16 : item37 < PREVIOUS | NEXT >

Copyright © 2005 KDnuggets.   Subscribe to KDnuggets News!