| KDnuggets : News : 2007 : n05 : item3 | |
FeaturesSubject: KDD Webcast, Mar 22: Towards Web-Scale Information Extraction Thursday, March 22, 2007 12:00 pm EDT, 9 am PT, 16:00 GMT Duration: 1 hour
ABSTRACT: Data mining applications over text require efficient methods for extracting and structuring the information embedded in millions, or billions, of text documents. This presentation reviews the current research on enabling information extraction to operate on Web scale. Different dimensions of scalability include corpus size, heterogeneity of the information sources, access to the documents, and the diversity of the extraction domains. This presentation will focus on the first three dimensions. First I will briefly review common information extraction tasks such as entity, relation, and event extraction, indicating the main scalability bottlenecks associated with each task. I will then review the key algorithmic approaches to improving the efficiency of information extraction, which include applications of randomized algorithms, ideas adapted from information retrieval, and recently developed specialized indexing techniques. I hope that data mining, databases, and knowledge management researchers and developers can build on these general ideas to develop more effective tools to manage and discover information in text.
BIOGRAPHY: Register at |
| KDnuggets : News : 2007 : n05 : item3 | |
Copyright © 2007 KDnuggets. Subscribe to KDnuggets News!