Back to Search Start Over

A comparison between similarity matrices for principal component analysis to assess population stratification in sequenced genetic data sets

Authors :
Sanghun, Lee
Georg, Hahn
Julian, Hecker
Sharon M, Lutz
Kristina, Mullin
Winston, Hide
Lars, Bertram
Dawn L, DeMeo
Rudolph E, Tanzi
Christoph, Lange
Dmitry, Prokopenko
Source :
Briefings in bioinformatics.
Publication Year :
2022

Abstract

Genetic similarity matrices are commonly used to assess population substructure (PS) in genetic studies. Through simulation studies and by the application to whole-genome sequencing (WGS) data, we evaluate the performance of three genetic similarity matrices: the unweighted and weighted Jaccard similarity matrices and the genetic relationship matrix. We describe different scenarios that can create numerical pitfalls and lead to incorrect conclusions in some instances. We consider scenarios in which PS is assessed based on loci that are located across the genome (‘globally’) and based on loci from a specific genomic region (‘locally’). We also compare scenarios in which PS is evaluated based on loci from different minor allele frequency bins: common (>5%), low-frequency (5–0.5%) and rare (

Details

ISSN :
14774054
Database :
OpenAIRE
Journal :
Briefings in bioinformatics
Accession number :
edsair.doi.dedup.....710121f652bb62519789fd08503c3a90