-
ChemEx: information extraction system for chemical data curation
- Back
Metadata
Document Title
ChemEx: information extraction system for chemical data curation
Author
Tharatipyakul A, Numnark S, Wichadakul D, Ingsriswang S
Name from Authors Collection
Affiliations
National Science & Technology Development Agency - Thailand; National Center Genetic Engineering & Biotechnology (BIOTEC)
Type
Article; Proceedings Paper
Source Title
BMC BIOINFORMATICS
ISSN
1471-2105
Year
2012
Volume
13
Open Access
Green Published, gold
Publisher
BIOMED CENTRAL LTD
DOI
10.1186/1471-2105-13-S17-S9
Format
Abstract
Background: Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure image recognition and text mining together. Results: We have developed ChemEx, a chemical information extraction system. ChemEx processes both text and images in publications. Text annotator is able to extract compound, organism, and assay entities from text content while structure image recognition enables translation of chemical raster images to machine readable format. A user can view annotated text along with summarized information of compounds, organism that produces those compounds, and assay tests. Conclusions: ChemEx facilitates and speeds up chemical data curation by extracting compounds, organisms, and assays from a large collection of publications. The software and corpus can be downloaded from http://www.biotec.or.th/isl/ChemEx.
Industrial Classification
Knowledge Taxonomy Level 1
Knowledge Taxonomy Level 2
Knowledge Taxonomy Level 3
License
CC BY
Rights
Authors
Publication Source
WOS