KDnuggets News 01:10, item 8, News

KDnuggets : News : 2001 : n10 : item8 (previous | next)

News

Date: May 5, 2001
Subject: WhizBang Labs makes search engines that learn

Technology Review Online (04/24/01) reports on
WhizBang Labs new advances in search engines.
Tom Mitchell, a Carnegie Mellon University professor and the chief
scientist at WhizBang Labs, said the current focus point of search-engine
development is in "entity extraction," the ability to build databases
from collections of specific entities--names, addresses, and phone
numbers, for example--extracted from Web pages. "This kind of record
extraction is where we are driving the evolution of tools for managing
the information flood," he told a recent meeting of the Society for
Industrial and Applied Mathematics.

Mitchell said there are three
types of search algorithms used to build these databases, including
the Naive Bayes model, which focuses on topic-word frequencies. Also
in use are "maximum entropy" algorithms, which focus on word
combinations and how frequently within specific Web documents they are
associated. The most promising algorithm, Mitchell explained, is the
"co-training" model, which studies the information on a certain Web
page as well as the pages that link to that page, building an
association of correlations from the linked pages. Mitchell says the
co-training algorithm has a hit-accuracy of 96 percent, while the
other algorithms' accuracy is only 86 percent.

WhizBang's online job site, FlipDog.com, launched last year as a
demonstration of its data-mining technology. Since then it has signed
up clients such as Dun & Bradstreet and the U.S. Department of Labor,
for which it is compiling a directory of continuing and distance
education opportunities.

See
http://www.technologyreview.com/web/aquino/aquino042401.asp

KDnuggets : News : 2001 : n10 : item8 (previous | next)