Back to Search Start Over

Assembly of a pan-genome from deep sequencing of 910 humans of African descent

Authors :
Steven L. Salzberg
Nadia N. Hansel
Albert M. Levin
Candelaria Vergara
Monica Campbell
Kathleen C. Barnes
Valentin Antonescu
Alvaro Mayorga
Victor E. Ortega
Esteban G. Burchard
Edwin Francisco Herrera-Paz
Cassandra Foster
Javier Marrugo
Michelle Daya
Margaret A. Taub
Christopher O. Olopade
Georgia M. Dunston
Marilyn G. Foreman
Mezbah U. Faruque
Carole Ober
Eugene R. Bleecker
Jennifer Knight-Madden
Rasika A. Mathias
Sameer Chavan
Deborah A. Meyers
Dan L. Nicolae
Lorraine B. Ware
Maria Yazdanbakhsh
Ingo Ruczinski
Celeste Eng
Daniela Puiu
Terri H. Beaty
L. Keoki Williams
Harold Watson
Nicholas Rafaels
James G. Wilson
Leslie A. Lange
Tina V. Hartert
Olufunmilayo I. Olopade
Maria Ilma Araujo
Luis Caraballo
Juliet Forman
Rachel M. Sherman
Ricardo Riccio Oliveira
Meher Preethi Boorgula
Jean G. Ford
Source :
Nature Genetics, 51(1), 30
Publication Year :
2017

Abstract

We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic. Assembly of a pan-genome from 910 humans of African descent identifies 296.5 Mb of novel DNA mapping to 125,715 distinct contigs. This African pan-genome contains ~10% more DNA than the current human reference genome.

Details

ISSN :
15461718
Volume :
51
Issue :
1
Database :
OpenAIRE
Journal :
Nature genetics
Accession number :
edsair.doi.dedup.....51b622aaf04a13b3e284df6ffdbd3e01