Back to Search Start Over

LYRUS: a machine learning model for predicting the pathogenicity of missense variants

Authors :
Gamsiz Uzun Ed
Indra Neil Sarkar
Brenda M. Rubenstein
Jordan Yang
Lai J
Source :
Bioinformatics Advances
Publication Year :
2021
Publisher :
Oxford University Press (OUP), 2021.

Abstract

Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can aid in the diagnosis and understanding of the genetic architecture of complex diseases, such as cancer. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. Nevertheless, previous analyses have shown that methods that depend on only sequence or structural information may have limited accuracy. Recently, researchers have attempted to increase the accuracy of their predictions by incorporating protein dynamics into pathogenicity predictions. This study presents < Lai Yang Rubenstein Uzun Sarkar > (LYRUS), a machine learning method that uses an XGBoost classifier selected by TPOT to predict the pathogenicity of SAVs. LYRUS incorporates five sequence-based features, six structure-based features, and four dynamics-based features. Uniquely, LYRUS includes a newly-proposed sequence co-evolution feature called variation number. LYRUS’s performance was evaluated using a dataset that contains 4,363 protein structures corresponding to 20,307 SAVs based on human genetic variant data from the ClinVar database. Based on our dataset, the LYRUS classifier has a higher accuracy, specificity, F-measure, and Matthews correlation coefficient (MCC) than alternative methods including PolyPhen2, PROVEAN, SIFT, Rhapsody, EVMutation, MutationAssessor, SuSPect, FATHMM, and MVP. Variation numbers used within LYRUS differ greatly between pathogenic and neutral SAVs, and have a high feature weight in the XGBoost classifier employed by this method. Applications of the method to PTEN and TP53 further corroborate LYRUS’s strong performance. LYRUS is freely available and the source code can be found at https://github.com/jiaying2508/LYRUS.

Details

ISSN :
26350041
Volume :
2
Database :
OpenAIRE
Journal :
Bioinformatics Advances
Accession number :
edsair.doi.dedup.....6bd40388ea7a297e9908b57220a19957