1. Profiling and Leveraging Relatedness in a Precision Medicine Cohort of 92,455 Exomes
- Author
-
Shane McCarthy, David H. Ledbetter, Frederick E. Dewey, Lukas Habegger, John Penn, David J. Carey, George D. Yancoupolos, Cristopher V. Van Hout, H. Lester Kirchner, Suganthi Balasubramanian, Joseph B. Leader, Tanya M. Teslovich, Xiaodong Bai, Colm O'Dushlaine, Aris Baras, John D. Overton, Michael F. Murray, Nehal Gosalia, Jeffrey G. Reid, Evan Maxwell, Alan R. Shuldiner, Alexander Lopez, Claudia Gonzaga-Jauregui, Christopher Snyder, Jeffrey Staples, Ricardo Ulloa, and Alicia Hawes
- Subjects
Male ,0301 basic medicine ,Heterozygote ,Population ,Genomics ,Pedigree chart ,Biology ,Compound heterozygosity ,Article ,Cohort Studies ,03 medical and health sciences ,Genetics ,Electronic Health Records ,Humans ,Computer Simulation ,Exome ,Family ,Precision Medicine ,education ,Nuclear family ,Genetics (clinical) ,Exome sequencing ,education.field_of_study ,Geography ,Reproducibility of Results ,Exons ,Human genetics ,Pedigree ,Genetics, Population ,Phenotype ,030104 developmental biology ,Evolutionary biology ,Mutation ,Cohort ,Female ,Tandem exon duplication - Abstract
Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in-line with expectations given the underlying population and ascertainment approach. For example, we identified ∼66,000 close (first- and second-degree) relationships within DiscovEHR involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships as DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in ∼1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations through reconstructed pedigrees, including a tandem duplication in LDLR causing familial hypercholesterolemia. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.
- Published
- 2018