1. Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data
- Author
-
Jonathan Göke, Yuk Kei Wan, Ploy N. Pratanwanich, and Christopher Hendra
- Subjects
business.industry ,High-Throughput Nucleotide Sequencing ,Nucleotide Motif ,Genomics ,Sequence Analysis, DNA ,Biology ,Machine learning ,computer.software_genre ,Genome ,Signal ,Machine Learning ,Nanopore Sequencing ,Nanopores ,Nanopore ,Genetics ,Nanopore sequencing ,Artificial intelligence ,business ,Algorithm ,computer ,Algorithms ,Curse of dimensionality - Abstract
Nanopore sequencing provides signal data corresponding to the nucleotide motifs sequenced. Through machine learning-based methods, these signals are translated into long-read sequences that overcome the read size limit of short-read sequencing. However, analyzing the raw nanopore signal data provides many more opportunities beyond just sequencing genomes and transcriptomes: algorithms that use machine learning approaches to extract biological information from these signals allow the detection of DNA and RNA modifications, the estimation of poly(A) tail length, and the prediction of RNA secondary structures. In this review, we discuss how developments in machine learning methodologies contributed to more accurate basecalling and lower error rates, and how these methods enable new biological discoveries. We argue that direct nanopore sequencing of DNA and RNA provides a new dimensionality for genomics experiments and highlight challenges and future directions for computational approaches to extract the additional information provided by nanopore signal data.
- Published
- 2022
- Full Text
- View/download PDF