1. Machine Learning-Driven Prediction of CRISPR-Cas9 Off-Target Effects and Mechanistic Insights.
- Author
-
Bhardwaj, Anuradha, Tomar, Pradeep, and Nain, Vikrant
- Subjects
MACHINE learning ,ARTIFICIAL neural networks ,GENOME editing ,SUPPORT vector machines ,K-nearest neighbor classification ,NAIVE Bayes classification - Abstract
The precise prediction of off-target effects in CRISPR-Cas9 genome editing is critical for ensuring the safety and efficacy of this powerful tool. This study leverages machine learning techniques to predict off-target cleavage sites and investigate the underlying mechanisms that affect cleavage efficiencies. By integrating data from Tsai et al. and Kleinsteiver et al., who employed the GUIDE-seq method, we aim to enhance our understanding of the factors influencing CRISPR-Cas9 activity. Our research analyzed datasets from Tsai et al. and Kleinsteiver et al., standardizing cleavage efficiencies to align with Tsai et al.'s comprehensive dataset. We identified a range of sequence features, including PAM sequence types, nucleotide composition, GC content, chromatin structure, CpG islands, and gene expression levels. Various machine learning models, including Artificial Neural Networks, Support Vector Machines, Naïve Bayes, k-Nearest Neighbors, Logistic Regression, and Extra Trees Classifiers, were developed and evaluated. The Extra Trees Classifier, particularly with class weighting, exhibited robust performance, achieving high accuracy, precision, recall, and F1 scores. SHAP analysis provided insights into feature importance, highlighting the significant factors contributing to model predictions. The application of machine learning to predict CRISPR-Cas9 off-target effects demonstrates significant potential in enhancing the precision of genome editing. Our findings underscore the importance of considering a diverse range of sequence and genomic features to improve prediction models. The insights gained from this study can inform the development of safer and more effective CRISPR-based applications in medicine, agriculture, and biotechnology. Future work will focus on further refining these models and exploring their applicability across different genomic contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF