Back to Search
Start Over
Whole genome sequencing data of multiple individuals of Pakistani descent
- Source :
- Scientific Data, Scientific Data, Vol 7, Iss 1, Pp 1-9 (2020), Scientific data, vol 7, iss 1
- Publication Year :
- 2020
- Publisher :
- Springer Science and Business Media LLC, 2020.
-
Abstract
- Here we report whole genome sequencing of four individuals (H3, H4, H5, and H6) from a family of Pakistani descent. Whole genome sequencing yielded 1084.92, 894.73, 1068.62, and 1005.77 million mapped reads corresponding to 162.73, 134.21, 160.29, and 150.86 Gb sequence data and 52.49x, 43.29x, 51.70x, and 48.66x average coverage for H3, H4, H5, and H6, respectively. We identified 3,529,659, 3,478,495, 3,407,895, and 3,426,862 variants in the genomes of H3, H4, H5, and H6, respectively, including 1,668,024 variants common in the four genomes. Further, we identified 42,422, 39,824, 28,599, and 35,206 novel variants in the genomes of H3, H4, H5, and H6, respectively. A major fraction of the variants identified in the four genomes reside within the intergenic regions of the genome. Single nucleotide polymorphism (SNP) genotype based comparative analysis with ethnic populations of 1000 Genomes database linked the ancestry of all four genomes with the South Asian populations, which was further supported by mitochondria based haplogroup analysis. In conclusion, we report whole genome sequencing of four individuals of Pakistani descent.<br />Measurement(s) SNV • genome Technology Type(s) whole genome sequencing • DNA sequencing Factor Type(s) individual Sample Characteristic - Organism Homo sapiens Sample Characteristic - Location Pakistan Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12642761
- Subjects :
- Statistics and Probability
Data Descriptor
Single-nucleotide polymorphism
Library and Information Sciences
Biology
Polymorphism, Single Nucleotide
Genome
Haplogroup
Education
03 medical and health sciences
0302 clinical medicine
Intergenic region
Genotype
Genetics
Humans
SNP
Pakistan
Polymorphism
1000 Genomes Project
lcsh:Science
030304 developmental biology
Whole genome sequencing
0303 health sciences
Whole Genome Sequencing
Genome, Human
Comparative genomics
Human Genome
Single Nucleotide
Computer Science Applications
Next-generation sequencing
lcsh:Q
Statistics, Probability and Uncertainty
030217 neurology & neurosurgery
Human
Information Systems
Subjects
Details
- ISSN :
- 20524463
- Volume :
- 7
- Database :
- OpenAIRE
- Journal :
- Scientific Data
- Accession number :
- edsair.doi.dedup.....a15e6d1b497701b687881abdcd612418
- Full Text :
- https://doi.org/10.1038/s41597-020-00664-2