KDnuggets : News : 2007 : n07 : item12 < PREVIOUS | NEXT >

Courses


Subject: Webcast on-demand: Towards Web-Scale Information Extraction

available at www.kdd.org/webcasts.php Presented March 22, 2007

By Eugene Agichtein, Assistant Professor, Mathematics & Computer Science, Emory University

Description: Overview of techniques for scaling information extraction to the Web.

Abstract

Data mining applications over text require efficient methods for extracting and structuring the information embedded in millions, or billions, of text documents. This presentation reviews the current research on enabling information extraction to operate on Web scale. Different dimensions of scalability include corpus size, heterogeneity of the information sources, access to the documents, and the diversity of the extraction domains. This presentation will focus on the first three dimensions. First I will briefly review common information extraction tasks such as entity, relation, and event extraction, indicating the main scalability bottlenecks associated with each task. I will then review the key algorithmic approaches to improving the efficiency of information extraction, which include applications of randomized algorithms, ideas adapted from information retrieval, and recently developed specialized indexing techniques. I hope that data mining, databases, and knowledge management researchers and developers can build on these general ideas to develop more effective tools to manage and discover information in text.

Biography

Eugene Agichtein is an Assistant Professor in the Mathematics & Computer Science Department at Emory University. Previously, Eugene was a Postdoctoral Researcher in the Text Mining, Search, and Navigation group at Microsoft Research, working on data mining for information retrieval. He received a Ph.D. in Computer Science from Columbia University in 2005, and a B.S. in Engineering from The Cooper Union in 1998. Eugene co-authored several publications on scalable and efficient information extraction, including the best student paper award at the IEEE ICDE 2003 conference and the best paper award at the SIGMOD 2006 conference.


KDnuggets : News : 2007 : n07 : item12 < PREVIOUS | NEXT >

Copyright © 2007 KDnuggets.   Subscribe to KDnuggets News!