Back to Search Start Over

Representation learning applications in biological sequence analysis

Authors :
Hitoshi Iuchi
Taro Matsutani
Keisuke Yamada
Natsuki Iwano
Shunsuke Sumi
Shion Hosoda
Shitao Zhao
Tsukasa Fukunaga
Michiaki Hamada
Source :
Computational and Structural Biotechnology Journal, Vol 19, Iss , Pp 3198-3208 (2021)
Publication Year :
2021
Publisher :
Elsevier, 2021.

Abstract

Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis.

Details

Language :
English
ISSN :
20010370
Volume :
19
Issue :
3198-3208
Database :
Directory of Open Access Journals
Journal :
Computational and Structural Biotechnology Journal
Publication Type :
Academic Journal
Accession number :
edsdoj.79a1a11c24f74a719aa4ebd0c70d1fce
Document Type :
article
Full Text :
https://doi.org/10.1016/j.csbj.2021.05.039