1. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data.
- Author
-
Calvo-Roitberg E, Daniels RF, and Pai AA
- Subjects
- Humans, High-Throughput Nucleotide Sequencing methods, Transcription Initiation Site, Exons, Computational Biology methods, RNA, Messenger genetics, Sequence Analysis, RNA methods
- Abstract
Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology through the comprehensive identification and quantification of full-length mRNA isoforms. Despite great promise, challenges remain in the widespread implementation of LRS technologies for RNA-based applications, including concerns about low coverage, high sequencing error, and robust computational pipelines. Although much focus has been placed on defining mRNA exon composition and structure with LRS data, less careful characterization has been done of the ability to assess the terminal ends of isoforms, specifically, transcription start and end sites. Such characterization is crucial for completely delineating full mRNA molecules and regulatory consequences. However, there are substantial inconsistencies in both start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. Here, we describe the specific challenges of identifying and quantifying mRNA terminal ends with LRS technologies and how these issues influence biological interpretations of LRS data. We then review recent experimental and computational advances designed to alleviate these problems, with ideal use cases for each approach. Finally, we outline anticipated developments and necessary improvements for the characterization of terminal ends from LRS data., (© 2024 Calvo-Roitberg et al.; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2024
- Full Text
- View/download PDF