KDnuggets : News : 2006 : n15 : item26 < PREVIOUS | NEXT >

Briefs

UCI researchers 'text mine' The New York Times

UCI, Irvine, Calif., July 26, 2006

Performing what a team of dedicated and bleary-eyed newspaper librarians would need months to do, scientists at UC Irvine have used an up-and-coming technology to complete in hours a complex topic analysis of 330,000 stories published primarily by The New York Times.

...

Text mining allows a computer to extract useful information from unstructured text. Until recently, text mining required a great deal of preparation before documents could be analyzed in a meaningful way. A new text-mining technique called "topic modeling" -- which UCI scientists used in their New York Times experiment -- looks for patterns of words that tend to occur together in documents, then automatically categorizes those words into topics -- all with minimal human effort.

The topic model, applied to the collection of news articles published from 2000 to 2002, identified patterns of words that occurred together in the stories. From those words, researchers were able to identify topics. Information associated with those topics was charted over time, allowing the scientists to pinpoint what months of the year certain topics were most in the news and how much ink they received from year to year.

Read more.


KDnuggets : News : 2006 : n15 : item26 < PREVIOUS | NEXT >

Copyright © 2006 KDnuggets.   Subscribe to KDnuggets News!