NewsDate: May 5, 2001 Subject: WhizBang Labs makes search engines that learn Technology Review Online (04/24/01) reports on WhizBang Labs new advances in search engines. Tom Mitchell, a Carnegie Mellon University professor and the chief scientist at WhizBang Labs, said the current focus point of search-engine development is in "entity extraction," the ability to build databases from collections of specific entities--names, addresses, and phone numbers, for example--extracted from Web pages. "This kind of record extraction is where we are driving the evolution of tools for managing the information flood," he told a recent meeting of the Society for Industrial and Applied Mathematics. Mitchell said there are three types of search algorithms used to build these databases, including the Naive Bayes model, which focuses on topic-word frequencies. Also in use are "maximum entropy" algorithms, which focus on word combinations and how frequently within specific Web documents they are associated. The most promising algorithm, Mitchell explained, is the "co-training" model, which studies the information on a certain Web page as well as the pages that link to that page, building an association of correlations from the linked pages. Mitchell says the co-training algorithm has a hit-accuracy of 96 percent, while the other algorithms' accuracy is only 86 percent. WhizBang's online job site, FlipDog.com, launched last year as a demonstration of its data-mining technology. Since then it has signed up clients such as Dun & Bradstreet and the U.S. Department of Labor, for which it is compiling a directory of continuing and distance education opportunities. See http://www.technologyreview.com/web/aquino/aquino042401.asp |
Copyright © 2001 KDnuggets. Subscribe to KDnuggets News!