1. Considerations for using population frequency data in germline variant interpretation: Cancer syndrome genes as a model
- Author
-
John V. Pearson, Georgina E Hollway, Nicola Waddell, Felicity Newell, Aimee L Davidson, Lambros T. Koufariotis, Michael T. Parsons, Conrad Leonard, and Amanda B. Spurdle
- Subjects
Population ,Computational biology ,Biology ,Germline ,Cancer syndrome ,Population genomics ,03 medical and health sciences ,Gene Frequency ,Neoplasms ,Genetics ,medicine ,Humans ,Genetic Testing ,education ,Gene ,Allele frequency ,Genetics (clinical) ,030304 developmental biology ,Whole genome sequencing ,0303 health sciences ,education.field_of_study ,Genome, Human ,030305 genetics & heredity ,Genetic Variation ,Genomics ,medicine.disease ,Germ Cells ,Precision and recall - Abstract
Aggregate population genomics data from large cohorts are vital for assessing germline variant pathogenicity. However, there are no specifications on how sequencing quality metrics should be considered, and whether exome-derived and genome-derived allele frequencies should be considered in isolation. Germline genome sequence data were simulated for nine read-depths to identify a minimum acceptable read-depth for detecting variants. gnomAD exome-derived and genome-derived datasets were assessed for read-depth, for six key cancer genes selected for variant curation by ClinGen expert panels. Non-Finnish European allele frequency (AF) or filter AF of coding variants in these genes, assigned into frequency bins using modified ACMG-AMP criteria, was compared between exome-derived and genome-derived datasets. A 30X read-depth achieved acceptable precision and recall for detection of substitutions, but poor recall for small insertions/deletions. Exome-derived and genome-derived datasets exhibited low read-depth for different gene exons. Individual variants were mostly assigned to non-divergent AF bins (>95%) or filter AF bins (>97%). Two major bin divergences were resolved by applying the minimal acceptable read-depth threshold. These findings show the importance of assessing read-depth separately for population datasets sourced from different short-read sequencing technologies before assigning a frequency-based ACMG-AMP classification code for variant interpretation.
- Published
- 2021