Back to Search
Start Over
A method for automatically extracting infectious disease-related primers and probes from the literature
- Source :
- BMC Bioinformatics, Vol 11, Iss 1, p 410 (2010), BMC Bioinformatics, Repisalud, Instituto de Salud Carlos III (ISCIII)
- Publication Year :
- 2010
- Publisher :
- BMC, 2010.
-
Abstract
- BACKGROUND: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. RESULTS: We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. CONCLUSIONS: We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. The present work has been funded, in part, by the European Commission through the ACGT integrated project (FP6-2005-IST-026996) and the ACTION-Grid support action (FP7-ICT-2007-2-224176), the Spanish Ministry of Science and Innovation through the OntoMineBase project (ref. TSI2006-13021-C02-01), the ImGraSec project (ref. TIN2007-61768), FIS/AES PS09/00069 and COMBIOMED-RETICS, and the Comunidad de Madrid, Spain. Sí
- Subjects :
- 0206 medical engineering
02 engineering and technology
Biology
computer.software_genre
lcsh:Computer applications to medicine. Medical informatics
Biochemistry
DNA sequencing
03 medical and health sciences
Structural Biology
Databases, Genetic
Data Mining
Molecular Biology
lcsh:QH301-705.5
030304 developmental biology
DNA Primers
0303 health sciences
Finite-state machine
Base Sequence
business.industry
Applied Mathematics
Hybridization probe
Methodology Article
Pattern recognition
Gene Annotation
Expert system
3. Good health
Computer Science Applications
lcsh:Biology (General)
Test set
lcsh:R858-859.7
Artificial intelligence
Data mining
Primer (molecular biology)
DNA microarray
Periodicals as Topic
business
DNA Probes
computer
020602 bioinformatics
Subjects
Details
- Language :
- English
- ISSN :
- 14712105
- Volume :
- 11
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....9c06220800e00af467d5142cd26f9e3e