KDnuggets : News : 2005 : n03 : item22 < PREVIOUS | NEXT >

Briefs

Search for Meaning: Normalized Google Distance Between Words

New Scientist (01/28/05); Graham-Rowe, Duncan

Computers can learn the meaning of words simply by plugging into Google. The finding could bring forward the day that true artificial intelligence is developed.

Trying to get a computer to work out what words mean - distinguish between "rider" and "horse" say, and work out how they relate to each other - is a long-standing problem in artificial intelligence research.

The "Google distance"

But Paul Vitanyi and Rudi Cilibrasi of the National Institute for Mathematics and Computer Science in Amsterdam, the Netherlands, realised that a Google search can be used to measure how closely two words relate to each other. For instance, imagine a computer needs to understand what a hat is.

To do this, it needs to build a word tree - a database of how words relate to each other. It might start with any two words to see how they relate to each other. For example, if it googles "hat" and "head" together it gets nearly 9 million hits, compared to, say, fewer than half a million hits for "hat" and "banana". Clearly "hat" and "head" are more closely related than "hat" and "banana".

To gauge just how closely, Vitanyi and Cilibrasi have developed a statistical indicator based on these hit counts that gives a measure of a logical distance separating a pair of words. They call this the normalised Google distance, or NGD. The lower the NGD, the more closely the words are related.

Here is preprint of Automatic Meaning Discovery Using Google.


KDnuggets : News : 2005 : n03 : item22 < PREVIOUS | NEXT >

Copyright © 2005 KDnuggets.   Subscribe to KDnuggets News!