Gusev, A, Shah, MJ, Kenny, EE, Ramachandran, A, Lowe, JK, Salit, J, Lee, CC, Levandowsky, EC, Weaver, TN, Doan, QC, Peckham, HE, McLaughlin, SF, Lyons, MR, Sheth, VN, Stoffel, M, De La Vega, FM, Friedman, JM, Breslow, JL, and Pe'er, I
Whole-genome sequencing in an isolated population with few founders directly ascertains variants from the population bottleneck that may be rare elsewhere. In such populations, shared haplotypes allow imputation of variants in unsequenced samples without resorting to statistical methods, as in studies of outbred cohorts. We focus on an isolated population cohort from the Pacific Island of Kosrae, Micronesia, where we previously collected SNP array and rich phenotype data for the majority of the population. We report identification of long regions with haplotypes co-inherited between pairs of individuals and methodology to leverage such shared genetic content for imputation. Our estimates show that sequencing as few as 40 personal genomes allows for imputation in up to 60% of the 3,000-person cohort at the average locus. We ascertained a pilot data-set of whole-genome sequences from seven Kosraean individuals, with average 5X coverage. This dataset identified 5,735,306 unique sites of which 1,212,831 were previously unknown. Additionally, these Kosraen variants are unusually enriched for alleles that are rare in other populations when compared to geographic neighbors. We were able to use the presence of shared haplotypes between the seven individuals to estimate imputation accuracy of known and novel variants and achieved levels of 99.6% and 97.3%, respectively. This study presents the first whole-genome analysis of a homogenous isolate population with emphasis on rare variant inference.