KDnuggets : News : 2009 : n04 : item5 < PREVIOUS | NEXT >

Features

From: Seth Grimes
Date: Tue, 24 Feb 2009
Subject: KDnuggets Exclusive: March Text Analytics New and Noteworthy from Seth Grimes

I am pleased to introduce a new monthly column on Text Analytics from Seth Grimes, a leading consultant and expert on this topic. Gregory Piatetsky-Shapiro

Welcome to a first, monthly text-analytics update, special to KDnuggets. This month's and subsequent updates will cover developments ranging from software, market trends, conferences, and whatever else is new that will help KDnuggets readers better understand advances in Knowledge Discovery in Text.

Software news

The Nature Publishing Group announced January 27 that it is no longer actively pursuing the Open Text Mining Interface (OTMI), which had aimed to enable scholarly publishers, among others, to disclose their full text for indexing and text-mining purposes. Timo Hannay, publishing director at Nature.com, says if interest returns then we're open to picking up OTMI again. And if anyone else should want to take it forward then we would be delighted, though I haven't yet heard of anyone wanting to do that. Send inquiries to otmi@nature.com. http://opentextmining.org

NLTK 0.9.8 has been released, an update version of the Python open-source Natural Language Toolkit with "a new off-the-shelf tokenizer, POS tagger, and named-entity tagger. A new metrics package includes inter-annotator agreement scores and various distance and word association measures. There's a new collocations package. There are many improvements to the WordNet package and browser and to the semantics and inference packages. The NLTK corpus collection now includes the PE08 Parser Evaluation data, and the CoNLL 2007 Basque and Catalan Dependency Treebanks. We have added an interface for dependency treebanks. Many chapters of the book have been revised in response to feedback from readers. For full details see the ChangeLog (http://www.nltk.org/)

SAS has released SAS Content Categorization, based on text technologies from Teragram, which SAS acquired in 2008. According to SAS, the software applies natural language processing and advanced linguistic techniques to automatically categorize large volumes of multilingual content that is acquired, generated, or exists in a repository. It correctly parses and analyzes content for entities and events, which are then used to create metadata and trigger business processes. (http://www.sas.com/technologies/analytics/contentcategorization/index.html)

Company news

Infonic, a UK text-analytics and document-management software publisher, merged with US sentiment-analysis specialist Lexalytics on December 1 and was subsequently declared insolvent on February 3. VC firm Lake House Capital bought Infonic out of administration on February 10. Lexalytics has continued operating independently, without disruption, in the interim. (Visit http://intelligententerprise.com/blog/archives/2009/02/infonic_reloade.html for a fuller examination of the story.)

Conferences

The fifth annual Text Analytics Summit has been announced for June 1-2, 2009, in Boston. I will reprise my role as chair of the summit, which is targeted to practitioners, users, solution providers, researchers, and industry observers. (http://www.textanalyticsnews.com/usa/)

SIGIR 2009, the Association for Computing Machinery's Special Interest Group on Information Retrieval, will convene July 19-23, 2009 in Boston. SIGIR focuses on all aspects of information storage, retrieval and dissemination, including research strategies, output schemes and system evaluations. The conference's Industry Track aims to bridge the gap between research and practice across a broad spectrum of topics in information retrieval. (http://www.sigir2009.org/)

The sixth International Workshop on Text-based Information Retrieval (TIR) will be held in conjunction with DEXA 2009 in Linz, Austria, August 31-September 4, 2009. The call for papers is open. (http://beamtenherrschaft.blogspot.com/2009/02/6th-international-workshop-on-text.html)

The 2009 NooJ conference and workshop is slated for June 8-10, 2009 in Touzeur, Tunisia. NooJ is a freeware, linguistic engineering development environment used to formalize various types of textual phenomena using a large gamut of computational devices, from Finite-State Automata to Augmented Recursive Transition Networks. Paper abstracts may be submitted through March 15. (http://www.miracl.rnu.tn/nooj/)


KDnuggets : News : 2009 : n04 : item5 < PREVIOUS | NEXT >

Copyright © 2009 KDnuggets.   Subscribe to KDnuggets News!