Back to Search Start Over

SpliceVec: Distributed feature representations for splice junction prediction.

Authors :
Dutta, Aparajita
Dubey, Tushar
Singh, Kusum Kumari
Anand, Ashish
Source :
Computational Biology & Chemistry. Jun2018, Vol. 74, p434-441. 8p.
Publication Year :
2018

Abstract

Identification of intron boundaries, called splice junctions, is an important part of delineating gene structure and functions. This also provides valuable insights into the role of alternative splicing in increasing functional diversity of genes. Identification of splice junctions through RNA-seq is by mapping short reads to the reference genome which is prone to errors due to random sequence matches. This encourages identification of splicing junctions through computational methods based on machine learning. Existing models are dependent on feature extraction and selection for capturing splicing signals lying in the vicinity of splice junctions. But such manually extracted features are not exhaustive. We introduce distributed feature representation, SpliceVec , to avoid explicit and biased feature extraction generally adopted for such tasks. SpliceVec is based on two widely used distributed representation models in natural language processing. Learned feature representation in form of SpliceVec is fed to multilayer perceptron for splice junction classification task. An intrinsic evaluation of SpliceVec indicates that it is able to group true and false sites distinctly. Our study on optimal context to be considered for feature extraction indicates inclusion of entire intronic sequence to be better than flanking upstream and downstream region around splice junctions. Further, SpliceVec is invariant to canonical and non-canonical splice junction detection. The proposed model is consistent in its performance even with reduced dataset and class-imbalanced dataset. SpliceVec is computationally efficient and can be trained with user-defined data as well. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14769271
Volume :
74
Database :
Academic Search Index
Journal :
Computational Biology & Chemistry
Publication Type :
Academic Journal
Accession number :
130076125
Full Text :
https://doi.org/10.1016/j.compbiolchem.2018.03.009