CheNER: a tool for the identification of chemical entities and their classes in biomedical literature

dc.contributor.author
Solsona Tehàs, Francesc
dc.contributor.author
Alves, Rui
dc.contributor.author
Usié Chimenos, Anabel
dc.contributor.author
Cruz, Joaquim
dc.contributor.author
Comas, Jorge
dc.date.accessioned
2024-12-05T22:21:42Z
dc.date.available
2024-12-05T22:21:42Z
dc.date.issued
2015-11-02T12:04:09Z
dc.date.issued
2015-11-02T12:04:09Z
dc.date.issued
2015
dc.identifier
https://doi.org/10.1186/1758-2946-7-S1-S15
dc.identifier
1758-2946
dc.identifier
http://hdl.handle.net/10459.1/48890
dc.identifier.uri
http://hdl.handle.net/10459.1/48890
dc.description.abstract
Background: Small chemical molecules regulate biological processes at the molecular level. Those molecules are often involved in causing or treating pathological states. Automatically identifying such molecules in biomedical text is difficult due to both, the diverse morphology of chemical names and the alternative types of nomenclature that are simultaneously used to describe them. To address these issues, the last BioCreAtIvE challenge proposed a CHEMDNER task, which is a Named Entity Recognition (NER) challenge that aims at labelling different types of chemical names in biomedical text. Methods: To address this challenge we tested various approaches to recognizing chemical entities in biomedical documents. These approaches range from linear Conditional Random Fields (CRFs) to a combination of CRFs with regular expression and dictionary matching, followed by a post-processing step to tag those chemical names in a corpus of Medline abstracts. We named our best performing systems CheNER. Results: We evaluate the performance of the various approaches using the F-score statistics. Higher F-scores indicate better performance. The highest F-score we obtain in identifying unique chemical entities is 72.88%. The highest F-score we obtain in identifying all chemical entities is 73.07%. We also evaluate the F-Score of combining our system with ChemSpot, and find an increase from 72.88% to 73.83%. Conclusions: CheNER presents a valid alternative for automated annotation of chemical entities in biomedical documents. In addition, CheNER may be used to derive new features to train newer methods for tagging chemical entities. CheNER can be downloaded from http://metres.udl.cat and included in text annotation pipelines.
dc.description.abstract
Funding for publication of this article comes from grants BFU2010-17704 and TIN2011-28689-C02-02 from the Spanish Ministry of Economy and Competitiveness.
dc.language
eng
dc.publisher
BioMed Central
dc.relation
info:eu-repo/grantAgreement/MICINN//BFU2010-17704/ES/METRES (METABOLIC RECONSTRUCTION SERVER DESARROLLO Y APLICACION EN EN ESTUDIO DE PRINCIPIOS DE DISEÑO BIOLOGICO/
dc.relation
info:eu-repo/grantAgreement/MICINN//TIN2011-28689-C02-02/ES/EJECUCION EFICIENTE DE APLICACIONES MULTIDISCIPLINARES: NUEVOS DESAFIOS EN LA ERA MULTI%2FMANY CORE/
dc.relation
Reproducció del document publicat a: https://doi.org/10.1186/1758-2946-7-S1-S15
dc.relation
Journal of Cheminformatics, 2015, vol. 7 (Suppl 1): S15, p. 1-8.
dc.rights
cc-by (c) Usié Chimenos, Anabel et al., 2015
dc.rights
info:eu-repo/semantics/openAccess
dc.rights
http://creativecommons.org/licenses/by/3.0/es/
dc.title
CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
dc.type
article
dc.type
publishedVersion


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)