Date: Jun 8, 2010
I have manually annotated all 139 abstracts from the ACM KDD'09 proceedings. All their data mining concept mentions are linked to a data mining ontology. It could make a good benchmark dataset for text mining tasks.
The following paper summarizes both the corpus and ontology.
www.gabormelli.com/RKB/2010_ConceptMentionsWithinKDD2009Abs
www.lrec-conf.org/proceedings/lrec2010/pdf/889_Paper.pdf
The ICDM'09 abstracts are currently being annotated, and the data mining ontology updated.
Contact gmelli AT gabormelli DOT com if interested in participating in this knowledge discovery benchmark effort.