Back to Search Start Over

Limitations of current high-throughput sequencing technologies lead to biased expression estimates of endogenous retroviral elements.

Authors :
Kitsou K
Katzourakis A
Magiorkinis G
Source :
NAR genomics and bioinformatics [NAR Genom Bioinform] 2024 Jul 09; Vol. 6 (3), pp. lqae081. Date of Electronic Publication: 2024 Jul 09 (Print Publication: 2024).
Publication Year :
2024

Abstract

Human endogenous retroviruses (HERVs), the remnants of ancient germline retroviral integrations, comprise almost 8% of the human genome. The elucidation of their biological roles is hampered by our inability to link HERV mRNA and protein production with specific HERV loci. To solve the riddle of the integration-specific RNA expression of HERVs, several bioinformatics approaches have been proposed; however, no single process seems to yield optimal results due to the repetitiveness of HERV integrations. The performance of existing data-bioinformatics pipelines has been evaluated against real world datasets whose true expression profile is unknown, thus the accuracy of widely-used approaches remains unclear. Here, we simulated mRNA production from specific HERV integrations to evaluate second and third generation sequencing technologies along with widely used bioinformatic approaches to estimate the accuracy in describing integration-specific expression. We demonstrate that, while a HERV-family approach offers accurate results, per-integration analyses of HERV expression suffer from substantial expression bias, which is only partially mitigated by algorithms developed for calculating the per-integration HERV expression, and is more pronounced in recent integrations. Hence, this bias could erroneously result into biologically meaningful inferences. Finally, we demonstrate the merits of accurate long-read high-throughput sequencing technologies in the resolution of per-locus HERV expression.<br /> (© The Author(s) 2024. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.)

Details

Language :
English
ISSN :
2631-9268
Volume :
6
Issue :
3
Database :
MEDLINE
Journal :
NAR genomics and bioinformatics
Publication Type :
Academic Journal
Accession number :
38984066
Full Text :
https://doi.org/10.1093/nargab/lqae081