LSHTC4: Large Scale Hierarchical Text classification
This challenge has three tracks and is based on two very large, multi-class, multi-label and hierarchical datasets created from the ODP web directory (DMOZ) and Wikipedia.
We are pleased to announce the fourth edition of the LSHTC challenge. This year's challenge comprises three tracks and is based on two large datasets created from the ODP web directory (DMOZ) and Wikipedia. The datasets are multi-class, multi-label and hierarchical. The number of categories ranges between 13,000 and 325,000 and the number of documents between 380,000 and 2,400,000.
The tracks of the challenge are organized as follows:
1. Very Large Scale Supervised Learning on a large collection from Wikipedia
2. Multi-task learning, based on both DMOZ and Wikipedia category systems
3. Refinement-learning on a subset of the DMOZ category system
In order to register for the challenge and gain access to the datasets, you must have an account at the challenge Web site. Please consult the web site lshtc.iit.demokritos.gr/ for more information on this challenge.
- July 17, start of the challenge
- August 5, opening of the evaluation
- January 15, closing of evaluation
Associated Workshop at WSDM www.wsdm-conference.org/2014/ (February 2014):
The results of the challenge will be presented in a workshop at WSDM 2014 on Web-Scale Classification: Classifying Big Data from the Web.