This year's discovery challenge hosts the third edition of the successful PASCAL challenges on large scale hierarchical text classification. The challenge comprises three tracks and it is based on two large datasets created from the ODP web directory (DMOZ) and Wikipedia. The datasets are multi-class, multi-label and hierarchical. The number of categories ranges between 13,000 and 325,000 roughly and the number of documents between 380,000 and 2,400,000.
The tracks of the challenge are organized as follows:
1. Standard large-scale hierarchical classification
a) On collection of medium size from Wikipedia
b) On a large collection from Wikipedia
2. Multi-task learning, based on both DMOZ and Wikipedia category systems
a) Semi-Supervised approach
b) Unsupervised approach
In order to register for the challenge and gain access to the datasets you must have an account at the challenge Web site.
- March 30, start of the challenge
- April 20, opening of the evaluation
- June 29, closing of evaluation
- July 20, paper submission deadline
- August 3, paper notifications