Back to Search Start Over

Assembly of a pan-genome from deep sequencing of 910 humans of African descent.

Authors :
Sherman RM
Forman J
Antonescu V
Puiu D
Daya M
Rafaels N
Boorgula MP
Chavan S
Vergara C
Ortega VE
Levin AM
Eng C
Yazdanbakhsh M
Wilson JG
Marrugo J
Lange LA
Williams LK
Watson H
Ware LB
Olopade CO
Olopade O
Oliveira RR
Ober C
Nicolae DL
Meyers DA
Mayorga A
Knight-Madden J
Hartert T
Hansel NN
Foreman MG
Ford JG
Faruque MU
Dunston GM
Caraballo L
Burchard EG
Bleecker ER
Araujo MI
Herrera-Paz EF
Campbell M
Foster C
Taub MA
Beaty TH
Ruczinski I
Mathias RA
Barnes KC
Salzberg SL
Source :
Nature genetics [Nat Genet] 2019 Jan; Vol. 51 (1), pp. 30-35. Date of Electronic Publication: 2018 Nov 19.
Publication Year :
2019

Abstract

We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.

Details

Language :
English
ISSN :
1546-1718
Volume :
51
Issue :
1
Database :
MEDLINE
Journal :
Nature genetics
Publication Type :
Academic Journal
Accession number :
30455414
Full Text :
https://doi.org/10.1038/s41588-018-0273-y