KDnuggets : News : 2004 : n15 : item18 < PREVIOUS | NEXT >

Software

From: Ke Wang
Date: 7 Aug 2004
Subject: Document clustering software - FIHC 1.0 (Free)

A new document clustering software, FIHC 1.0, is available to academic and research community: http://www.cs.sfu.ca/~ddm. The package includes executable code, source code, sample data.

FIHC, Frequent Itemset-based Hierarchical Clustering, is a program that constructs a document cluster hierarchy from a set of unlabeled documents based on "frequent itemsets". As an abstraction of "English sentences", frequent itemsets serve a natural measure of cohesiveness of a cluster: documents in the same cluster are expected to share more itemsets than those in different clusters. FIHC produces a hierarchy of clusters in a XML file that can be browsed interactively based on the cluster description that is also frequent itemsets.


KDnuggets : News : 2004 : n15 : item18 < PREVIOUS | NEXT >

Copyright © 2004 KDnuggets.   Subscribe to KDnuggets News!