CHEMDNER competition: Chemical and drug name recognition task in patents
Want to test your data science skills? Get ready for the next text mining and information retrieval challenge by CHEMDNER.
The CHEMDNER-patents task (BioCreative V - http://www.biocreative.org) is a community challenge on named entity recognition of chemical compounds in patents and text classification.
- Martin Krallinger, Spanish National Cancer Research Centre (CNIO)
- Florian Leitner, Universidad Politecnica de Madrid
- Obdulia Rabal, Center for Applied Medical Research (CIMA), University of Navarra
- Julen Oyarzabal, Center for Applied Medical Research (CIMA), University of Navarra
- Alfonso Valencia, Spanish National Cancer Research Centre (CNIO)
Registration and participation
Teams interested in the CHEMDNER-patents task should register for track 2 of BioCreative V:
This task will address the automatic extraction of chemical and biological data from medicinal chemistry patents. The identification and integration of all information contained in these patents (e.g., chemical structures, their synthesis and associated biological data) is currently a very hard task not only for database curators but for life sciences researches and biomedical text mining experts as well. Despite the valuable characterizations of biomedical relevant entities such as chemical compounds, genes and proteins contained in patents, academic research in the area of text mining and information extraction using patent data has been minimal. Pharmaceutical patents covering chemical compounds provide information on their therapeutic applications and, in most cases, on their primary biological targets.
This task would cover three essential steps for the identification of biomedical relevant descriptions of chemical compounds:
- CEMP (chemical entity mention in patents): the detection of chemical named entity mentions in patents (start and end indices corresponding to all the chemical entities).
- CPD (chemical passage detection, text classification task): the detection of patent titles and abstracts that mention chemical compounds.
- GPRO (gene and protein related object task): for the GPRO task teams have to identify mentions of gene and protein related objects (named as GPROs) mentioned in patent tiles and abstracts.
CHEMDNER session at the BioCreative V workshop
At the BioCreative V Workshop to be held in Seville (Spain) September 9-11 (2015) there will be a session devoted to the CHEMDNER patents task. This session will include an overview talk presenting the used datasets and results obtained by the participating teams.
Previous CHEMDNER (Biocreative IV)
The CHEMDNER-Biocreative IV special issue was published in the Journal of Chemoinformatics: Volume 7 Supplement 1, 'Text mining for chemistry and the CHEMDNER track'. It focused on the detection of chemical entities from PubMed abstracts. The entire supplement is available from the Journal of Cheminformatics: http://www.jcheminf.com/supplements/7/S1
- Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., & Valencia, A. CHEMDNER: The drugs and chemical names extraction challenge. Journal of Cheminformatics 2015, 7(Suppl 1):S1
- Krallinger, M. et al. The CHEMDNER corpus of chemicals and drugs and its annotation principles. Journal of Cheminformatics 2015, 7(Suppl 1):S2