Design an algorithm for multi-label classification of scientific publications in biomedicine. There are 20,000 samples made available for analysis, each comprising 25,640 attributes.
The data mining competition
Topical Classification of Biomedical Research Papers, a special
event of Joint Rough Sets Symposium
(JRS 2012), has just
started. The task is to design most accurate algorithm for
multi-label classification of scientific publications in
biomedicine. There are 20,000 samples made available for analysis,
each comprising 25,640 attributes. Money prizes worth $1,500 will be
awarded to the most successful teams. The contest is organized by a
research team from University of Warsaw, Poland, and sponsored by
Southwest Jiaotong University, China. It is hosted at
TunedIT Challenges platform.
Development of freely available biomedical databases allows users to
search for documents containing highly specialized biomedical
knowledge. Rapidly increasing size of meta-data and text
repositories, such as MEDLINE or PubMed Central, emphasizes the
growing need for accurate and scalable methods for automatic tagging
and classification of textual data. For example, medical doctors
often search through biomedical documents for information regarding
diagnostics, drugs dosage or possible complications resulting from
specific treatments. In the queries, they use highly sophisticated
terminology, that can be properly interpreted only with the use of a
domain ontology, such as Medical Subject Headings (MeSH). In order
to facilitate the search process, documents in a database should be
indexed with concepts from the ontology. Additionally, search
results could be grouped into clusters of documents that correspond
to meaningful topics matching different information needs. Such
clusters should not necessarily be disjoint since one document may
contain information related to several topics. In the JRS12
Competition, we address both problems, i.e. we are interested in
identification of efficient algorithms for topical classification of
biomedical research papers based on information about concepts from
the MeSH ontology, that were automatically assigned by our tagging
algorithm.
This challenge may be appealing to all data mining practitioners due
to its strong relations with well-founded subjects: generalized
decision rules induction, feature extraction, soft and rough
computing, semantic text mining, scalable classification methods.
Apart from money prizes for the top teams, authors of selected
solutions will be invited to prepare papers for presentation at JRS
2012 special session devoted to the competition and for inclusion in
conference proceedings.
Important dates:
- Jan 2, 2012: start of the challenge, data sets become available
- Mar 30, 2012: deadline for submitting predictions
- Apr 2, 2012: deadline for submitting reports
- Apr 6, 2012: publication of final results
- May 10, 2012: deadline for submitting camera-ready papers for JRS
Competition web page: http://tunedit.org/challenge/JRS12Contest
Marcin Wojnarski, TunedIT
|