KDnuggets : News : 2003 : n17 : item14 < PREVIOUS | NEXT >

Software

From: Normand Peladeau
Date: 8 Sep 2003
Subject: Wordnet 2.0 Based Categorization Dictionary

Provalis Research is pleased to announce the release of four WordStat categorization dictionary based on the latest WordNet 2.0 lexical database. These categorization dictionaries provide categorization of noun, verbs, adjectives and adverbs into 44 syntactic category and logical groupings. Four versions are currently available:

  1. 174,268 words and phrases - including overlaps.
  2. 126,869 unambiguous words and phrases.
  3. 109,231 words (no phrase) - including overlaps.
  4. 65,425 unambiguous words (no phrase).

Overlaps are words or phrases appearing in more than one WordNet lexical category, while unambiguous entries are defined as words or phrases that are categorized in only one category.

We strongly recommend using those dictionaries with the latest release of WordStat v4.05 since important speed optimizations have been made to support those large categorization dictionaries.

ROGET THESAUSUS BASED CATEGORIZATION DICTIONARY

A WordStat categorization dictionary based on the well know Roget thesaurus can now be downloaded from our web site. This dictionary categorizes 100,685 words and phrases into 1042 categories (6 broad classes).

Please note that this dictionary has been created using the 1911 edition of the thesaurus. Many modern words are missing. For this reason, its may not be entirely suitable for use in natural-language processing tasks. It may nevertheless be useful for experimentation as well as for the development of more comprehensive categorization systems.

You can access these dictionaries from WordStat web page:

www.simstat.com/wordstat.htm

From there follow the Dictionaries link.

Normand Peladeau
Provalis Research
www.simstat.com


KDnuggets : News : 2003 : n17 : item14 < PREVIOUS | NEXT >

Copyright © 2003 KDnuggets.   Subscribe to KDnuggets News!