KDnuggets News 07:05, item 3, Features

KDnuggets : News : 2007 : n05 : item3

Features

Subject: KDD Webcast, Mar 22: Towards Web-Scale Information Extraction

Thursday, March 22, 2007 12:00 pm EDT, 9 am PT, 16:00 GMT

Duration: 1 hour

Towards Web-Scale Information Extraction

Eugene Agichtein
http://www.mathcs.emory.edu/~eugene/
Mathematics & Computer Science, Emory University

ABSTRACT: Data mining applications over text require efficient methods for extracting and structuring the information embedded in millions, or billions, of text documents. This presentation reviews the current research on enabling information extraction to operate on Web scale. Different dimensions of scalability include corpus size, heterogeneity of the information sources, access to the documents, and the diversity of the extraction domains. This presentation will focus on the first three dimensions. First I will briefly review common information extraction tasks such as entity, relation, and event extraction, indicating the main scalability bottlenecks associated with each task. I will then review the key algorithmic approaches to improving the efficiency of information extraction, which include applications of randomized algorithms, ideas adapted from information retrieval, and recently developed specialized indexing techniques. I hope that data mining, databases, and knowledge management researchers and developers can build on these general ideas to develop more effective tools to manage and discover information in text.

BIOGRAPHY:
Eugene Agichtein is an Assistant Professor in the Mathematics & Computer Science Department at Emory University. Previously, Eugene was a Postdoctoral Researcher in the Text Mining, Search, and Navigation group at Microsoft Research, working on data mining for information retrieval. He received a Ph.D. in Computer Science from Columbia University in 2005, and a B.S. in Engineering from The Cooper Union in 1998. Eugene co-authored several publications on scalable and efficient information extraction, including the best student paper award at the IEEE ICDE 2003 conference and the best paper award at the ACM SIGMOD 2006 conference.

https://kdd.webex.com/kdd/onstage/g.php?t=a&d=710420324

KDnuggets : News : 2007 : n05 : item3

PREVIOUS | NEXT