Back to Search
Start Over
Identifying digenic disease genes using machine learning in the undiagnosed diseases network
- Publication Year :
- 2020
- Publisher :
- Cold Spring Harbor Laboratory, 2020.
-
Abstract
- Rare diseases affect hundreds of millions of people worldwide, and diagnosing their genetic causes is challenging. The Undiagnosed Diseases Network (UDN) was formed in 2014 to identify and treat novel rare genetic diseases, and despite many successes, more than half of UDN patients remain undiagnosed. The central hypothesis of this work is that many unsolved rare genetic disorders are caused by multiple variants in more than one gene. However, given the large number of variants in each individual genome, experimentally evaluating even just pairs of variants for potential to cause disease is currently infeasible. To address this challenge, we developed DiGePred, a random forest classifier for identifying candidate digenic disease gene pairs using features derived from biological networks, genomics, evolutionary history, and functional annotations. We trained the DiGePred classifier using DIDA, the largest available database of known digenic disease causing gene pairs, and several sets of non-digenic gene pairs, including variant pairs derived from unaffected relatives of UDN patients. DiGePred achieved high precision and recall in cross-validation and on a held out test set (PR area under the curve >77%), and we further demonstrate its utility using novel digenic pairs from the recent literature. In contrast to other approaches, DiGePred also appropriately controls the number of false positives when applied in realistic clinical settings like the UDN. Finally, to facilitate the rapid screening of variant gene pairs for digenic disease potential, we freely provide the predictions of DiGePred on all human gene pairs. Our work facilitates the discovery of genetic causes for rare non-monogenic diseases by providing a means to rapidly evaluate variant gene pairs for the potential to cause digenic disease.
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi...........3cc42b3f72183fc3474252ac5c7b9cce