Back to Search Start Over

Characterization and identification of long non-coding RNAs based on feature relationship.

Authors :
Wang, Guangyu
Yin, Hongyan
Li, Boyang
Yu, Chunlei
Wang, Fan
Xu, Xingjian
Cao, Jiabao
Bao, Yiming
Wang, Liguo
Abbasi, Amir A
Bajic, Vladimir B
Ma, Lina
Zhang, Zhang
Source :
Bioinformatics. Sep2019, Vol. 35 Issue 17, p2949-2956. 8p.
Publication Year :
2019

Abstract

Motivation The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations. Results Here we first characterize lncRNAs in contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between open reading frame length and guanine-cytosine (GC) content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species. Availability and implementation LGC web server is publicly available at http://bigd.big.ac.cn/lgc/calculator. The scripts and data can be downloaded at http://bigd.big.ac.cn/biocode/tools/BT000004. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13674803
Volume :
35
Issue :
17
Database :
Academic Search Index
Journal :
Bioinformatics
Publication Type :
Academic Journal
Accession number :
138485798
Full Text :
https://doi.org/10.1093/bioinformatics/btz008