1. A Machine Learning Approach to Identifying Causal Monogenic Variants in Inflammatory Bowel Disease
- Author
-
Anne M. Griffiths, Claudia Gonzaga-Jauregui, Aleixo M. Muise, Peter C Church, Arun K. Ramani, Daniel J. Mulder, Sam Khalouei, T Walters, Michael Li, Amanda Ricciuto, Eric I Benchimol, and Neil Warner
- Subjects
Prioritization ,business.industry ,Tertiary care hospital ,Machine learning ,computer.software_genre ,medicine.disease ,Monogenic disease ,Inflammatory bowel disease ,Patient care ,Medicine ,Artificial intelligence ,business ,Allele frequency ,computer ,Exome sequencing - Abstract
Background Diagnosis of monogenic disease is increasingly important for patient care and personalizing therapy. However, the current process is non-standardized, expensive, and time consuming. There is currently no accepted strategy to help identify disease-causing variants in monogenic inflammatory bowel disease (IBD). Aims To develop a prioritization strategy for monogenic IBD variant discovery through detailed analysis of a whole exome sequencing (WES) data set. Methods All consenting pediatric patients with IBD presenting to our tertiary care hospital during the study period were enrolled and underwent WES (n = 1005). Available family members also underwent WES. Variants were analyzed en mass using the GEMINI framework, and were further annotated using data from dbNSFP, CADD, and gnomAD. Known disease-causing variants (n = 36) were used as positive controls. Machine learning algorithms were optimized then compared to assist with identifying monogenic IBD case characteristics. Results Initial gene-level analysis identified 11 genes not previously linked to IBD that could potentially harbour IBD causing variants. Machine learning algorithms identified four primary variant characteristics (CADD score, dbNSFP score, relationship with a known immunodeficiency gene, and alternate allele frequency) and optimal threshold values for each were determined to assist with identifying monogenic IBD variants. Based on these characteristics, an automated variant prioritization pipeline was then created that filters and prioritizes variants from >100,000 variants per patient down to a mean of 15. This pipeline is available online for all to use. Conclusion Leveraging a large WES data set, we demonstrate a statistically rigorous strategy for prioritization of variants for monogenic IBD diagnosis.
- Published
- 2022