Back to Search Start Over

Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project.

Authors :
Schloissnig S
Pani S
Rodriguez-Martin B
Ebler J
Hain C
Tsapalou V
Söylev A
Hüther P
Ashraf H
Prodanov T
Asparuhova M
Hunt S
Rausch T
Marschall T
Korbel JO
Source :
BioRxiv : the preprint server for biology [bioRxiv] 2024 Apr 20. Date of Electronic Publication: 2024 Apr 20.
Publication Year :
2024

Abstract

Structural variants (SVs) contribute significantly to human genetic diversity and disease <superscript>1-4</superscript> . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution <superscript>5-7</superscript> . Here we leveraged nanopore sequencing <superscript>8</superscript> to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies <superscript>3,4</superscript> . Our analysis details diverse SV classes-deletions, duplications, insertions, and inversions-at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions <superscript>9,10</superscript> of unique sequences, with both mobile element classes transducing sequences at either the 3'- or 5'-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.

Details

Language :
English
ISSN :
2692-8205
Database :
MEDLINE
Journal :
BioRxiv : the preprint server for biology
Accession number :
38659906
Full Text :
https://doi.org/10.1101/2024.04.18.590093