Back to Search Start Over

Machine learning models for accurate prioritization of variants of uncertain significance.

Authors :
Mahecha D
Nuñez H
Lattig MC
Duitama J
Source :
Human mutation [Hum Mutat] 2022 Apr; Vol. 43 (4), pp. 449-460. Date of Electronic Publication: 2022 Feb 19.
Publication Year :
2022

Abstract

The growing use of next-generation sequencing technologies on genetic diagnosis has produced an exponential increase in the number of variants of uncertain significance (VUS). In this manuscript, we compare three machine learning methods to classify VUS as Pathogenic or No pathogenic, implementing a Random Forest (RF), a Support Vector Machine (SVM), and a Multilayer Perceptron. To train the models, we extracted high-quality variants from ClinVar that were previously classified as VUS. For each variant, we retrieved nine conservation scores, the loss-of-function tool, and allele frequencies. For the RF and SVM models, hyperparameters were tuned using cross-validation with a grid search. The three models were tested on a nonoverlapping set of variants that had been classified as VUS over the last 3 years, but had been reclassified in August 2020. The three models yielded superior accuracy on this set compared to the benchmarked tools. The RF-based model yielded the best performance across different variant types and was used to create VusPrize, an open-source software tool for prioritization of VUS. We believe that our model can improve the process of genetic diagnosis in research and clinical settings.<br /> (© 2022 Wiley Periodicals LLC.)

Details

Language :
English
ISSN :
1098-1004
Volume :
43
Issue :
4
Database :
MEDLINE
Journal :
Human mutation
Publication Type :
Academic Journal
Accession number :
35143088
Full Text :
https://doi.org/10.1002/humu.24339