Back to Search Start Over

Fast Kernel Methods for SVM Sequence Classifiers.

Authors :
Istrail, Sorin
Pevzner, Pavel
Waterman, Michael S.
Giancarlo, Raffaele
Hannenhalli, Sridhar
Kuksa, Pavel
Pavlovic, Vladimir
Source :
Algorithms in Bioinformatics (9783540741251); 2007, p228-239, 12p
Publication Year :
2007

Abstract

In this work we study string kernel methods for sequence analysis and focus on the problem of species-level identification based on short DNA fragments known as barcodes. We introduce efficient sorting-based algorithms for exact string k-mer kernels and then describe a divide-and-conquer technique for kernels with mismatches. Our algorithms for mismatch kernel matrix computations improve currently known time bounds for these computations. We then consider the mismatch kernel problem with feature selection, and present efficient algorithms for it. Our experimental results show that, for string kernels with mismatches, kernel matrices can be computed 100-200 times faster than traditional approaches. Kernel vector evaluations on new sequences show similar computational improvements. On several DNA barcode datasets, k-mer string kernels considerably improve identification accuracy compared to prior results. String kernels with feature selection demonstrate competitive performance with substantially fewer computations. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783540741251
Database :
Complementary Index
Journal :
Algorithms in Bioinformatics (9783540741251)
Publication Type :
Book
Accession number :
33290249
Full Text :
https://doi.org/10.1007/978-3-540-74126-8_22