Back to Search
Start Over
Classification of nucleotide sequences using support vector machines
- Source :
- Journal of molecular evolution. 71(4)
- Publication Year :
- 2009
-
Abstract
- Species identification is one of the most important issues in biological studies. Due to recent increases in the amount of genomic information available and the development of DNA sequencing technologies, the applicability of using DNA sequences to identify species (commonly referred to as “DNA barcoding”) is being tested in many areas. Several methods have been suggested to identify species using DNA sequences, including similarity scores, analysis of phylogenetic and population genetic information, and detection of species-specific sequence patterns. Although these methods have demonstrated good performance under a range of circumstances, they also have limitations, as they are subject to loss of information, require intensive computation and are sensitive to model mis-specification, and can be difficult to evaluate in terms of the significance of identification. Here, we suggest a new DNA barcoding method in which support vector machine (SVM) procedures are adopted. Our new method is nonparametric and thus is expected to be robust for a wide range of evolutionary scenarios as well as multilocus analyses. Furthermore, we describe bootstrap procedures that can be used to test the significances of species identifications. We implemented a novel conversion technique for transforming sequence data to real-valued vectors, and therefore, bootstrap procedures can be easily combined with our SVM approach. In this study, we present the results of simulation studies and empirical data analyses to demonstrate the performance of our method and discuss its properties.
- Subjects :
- Population
Molecular Sequence Data
Computational biology
Biology
DNA barcoding
DNA sequencing
Artificial Intelligence
Resampling
Genetics
DNA Barcoding, Taxonomic
Computer Simulation
education
Molecular Biology
Ecology, Evolution, Behavior and Systematics
Phylogeny
education.field_of_study
Phylogenetic tree
Base Sequence
Nucleotides
DNA
DNA, Concatenated
Support vector machine
Identification (information)
Genetic Loci
Pattern recognition (psychology)
Algorithms
Subjects
Details
- ISSN :
- 14321432
- Volume :
- 71
- Issue :
- 4
- Database :
- OpenAIRE
- Journal :
- Journal of molecular evolution
- Accession number :
- edsair.doi.dedup.....9d396ee721b6b7ed412cf24ecc2b20e0