Back to Search Start Over

Dinucleotide distance histograms for fast detection of rRNA in metatranscriptomic sequences

Authors :
Heiner Klingenberg and Robin Martinjak and Frank Oliver Glöckner and Rolf Daniel and Thomas Lingner and Peter Meinicke
Klingenberg, Heiner
Martinjak, Robin
Glöckner, Frank Oliver
Daniel, Rolf
Lingner, Thomas
Meinicke, Peter
Heiner Klingenberg and Robin Martinjak and Frank Oliver Glöckner and Rolf Daniel and Thomas Lingner and Peter Meinicke
Klingenberg, Heiner
Martinjak, Robin
Glöckner, Frank Oliver
Daniel, Rolf
Lingner, Thomas
Meinicke, Peter
Publication Year :
2013

Abstract

With the advent of metatranscriptomics it has now become possible to study the dynamics of microbial communities. The analysis of environmental RNA-Seq data implies several challenges for the development of efficient tools in bioinformatics. One of the first steps in the computational analysis of metatranscriptomic sequencing reads requires the separation of rRNA and mRNA fragments to ensure that only protein coding sequences are actually used in a subsequent functional analysis. In the context of the rRNA filtering task it is desirable to have a broad spectrum of different methods in order to find a suitable trade-off between speed and accuracy for a particular dataset. We introduce a machine learning approach for the detection of rRNA in metatranscriptomic sequencing reads that is based on support vector machines in combination with dinucleotide distance histograms for feature representation. The results show that our SVM-based approach is at least one order of magnitude faster than any of the existing tools with only a slight degradation of the detection performance when compared to state-of-the-art alignment-based methods.

Details

Database :
OAIster
Notes :
application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1358720233
Document Type :
Electronic Resource
Full Text :
https://doi.org/10.4230.OASIcs.GCB.2013.80