KDnuggets : News : 2003 : n22 : item14 < PREVIOUS | NEXT >

Publications

From: Martin Rajman
Date: 20 Nov 2003
Subject: European Network of Excellence Text Mining Study

In the framework of the European Network of Excellence NEMIS (http://nemis.cti.gr/), the Swiss Institute of Technology in Lausanne (EPFL) has produced a study on the State of the Art, Evaluation and Recommendations regarding Document processing and visualization techniques in the domain of Text Mining.

As a continuously growing amount of textual resources is available for data analysts, one of the important goals of the study is to give a detailed overview of the main concepts, tools, and problems that researchers and practitioners are faced with in the field of Text Mining, especially for cases where document processing and visualization is involved.

The next step for this work is to circulate this study in the respective community in order to improve the exhaustivity and representativeness of the document by integrating external feedback, comments, and suggestions.

Here is the current version of the study (PDF).

You can also find the abstract of the study at the end of this mail.

Any comments are welcome, be it missing references, disputable statements, or missing research topics. The simplest way to send us comments is to reply to this mail or to use the following electronic mail address: gcc@epfl.ch.

The gathered feedback will be used to produce an updated version of the study and will correspond to one of the deliverables of the NEMIS he market and the potential users.

We would also strongly appreciate if you could provide us with additional references to researchers you think we should also contact.

Looking forward for your feedback,

best regards from Lausanne,

Prof. Martin Rajman

---

Abstract:

Report of WG1 - State of the Art, Evaluation and Recommendations regarding "Document Processing and Visualization Techniques".

Several Networks of Excellence have been set up in the framework of the European FP5 research program. Among these Networks of Excellence, the NEMIS project focuses on the field of Text Mining.

Within this field, document processing and visualization was identified as one of the key topics and the WG1 working group was created in the NEMIS project, to carry out a detailed survey of techniques associated with the text mining process and to identify the relevant research topics in related research areas.

In this document we present the results of this comprehensive survey. The report includes a description of the current state-of-the-art and practice, a roadmap for follow-up research in the identified areas, and recommendations for anticipated technological development in the domain of text mining. In the part dedicated to document processing, the discussion focuses on research topics in natural language processing and information retrieval. More precisely, the work covers the tasks related with data selection, filtering and cleaning, morphological normalization and parsing, document representation and similarity computation, and various aspects of data analysis that have all been developed and successfully used in data mining.

In the part dedicated to the visualization, the study essentially focuses on the issue of high dimensionality for document representation. Indeed, the high dimensional representations that are produced in the various stages of the text mining process are usually not well suited for a simple and easily exploitable presentation of text mining results which require specific interpretation techniques, tightly connected to the task of document summarization. In addition, the study has identified a clear need for the development of a unified methodology in the field of visualization.


KDnuggets : News : 2003 : n22 : item14 < PREVIOUS | NEXT >

Copyright © 2003 KDnuggets.   Subscribe to KDnuggets News!