1. Online Population Data Resources for Forensic SNP Analysis with Massively Parallel Sequencing: An Overview of Online Population Data for Forensic Purposes
- Author
-
Dennis McNevin, Elaine Y.Y. Cheung, Jorge Amigo, Christopher Phillips, Maria Victoria Lareu, and Maria de la Puente
- Subjects
Forensic science ,Massive parallel sequencing ,Computer science ,Population data ,Data science ,SNP array - Abstract
Recently, extensive catalogs of human variation derived from whole-genome sequencing have been released as openly accessible archives of sequence variants—a resource that is highly suitable for selecting markers for new forensic tests using massively parallel sequencing. In particular, the comparison of population patterns in these databases of variants can help identify markers suitable for the inference of ancestry that can then form informative 242forensic assays for this purpose. This chapter outlines in detail the ancestry or co-ancestry composition, genotype data access details, and geographic distributions of the samples in the most extensive variant catalogs of: 1000 Genomes; the HGDP–CEPH panel; Simons Foundation Human Genome Diversity Project; and Estonian Biocentre Genome Diversity Panel. While 1000 Genomes systematically characterizes a large number of samples from a limted number of carefully selected populations and restricts itself to five to six populations per continent, both SGDP and EGDP have analyzed only 2–4 samples per geographic location and these are much more widely dispersed geographically. The pros and cons of each approach are discussed, and details are provided of more recently published variant compilation projects with much larger sample sizes, notably the genome aggregation database gnomAD.
- Published
- 2021