Back to Search Start Over

Comparative analysis of models in predicting the effects of SNPs on TF-DNA binding using large-scale in vitro and in vivo data.

Authors :
Han D
Li Y
Wang L
Liang X
Miao Y
Li W
Wang S
Wang Z
Source :
Briefings in bioinformatics [Brief Bioinform] 2024 Jan 22; Vol. 25 (2).
Publication Year :
2024

Abstract

Non-coding variants associated with complex traits can alter the motifs of transcription factor (TF)-deoxyribonucleic acid binding. Although many computational models have been developed to predict the effects of non-coding variants on TF binding, their predictive power lacks systematic evaluation. Here we have evaluated 14 different models built on position weight matrices (PWMs), support vector machines, ordinary least squares and deep neural networks (DNNs), using large-scale in vitro (i.e. SNP-SELEX) and in vivo (i.e. allele-specific binding, ASB) TF binding data. Our results show that the accuracy of each model in predicting SNP effects in vitro significantly exceeds that achieved in vivo. For in vitro variant impact prediction, kmer/gkm-based machine learning methods (deltaSVM_HT-SELEX, QBiC-Pred) trained on in vitro datasets exhibit the best performance. For in vivo ASB variant prediction, DNN-based multitask models (DeepSEA, Sei, Enformer) trained on the ChIP-seq dataset exhibit relatively superior performance. Among the PWM-based methods, tRap demonstrates better performance in both in vitro and in vivo evaluations. In addition, we find that TF classes such as basic leucine zipper factors could be predicted more accurately, whereas those such as C2H2 zinc finger factors are predicted less accurately, aligning with the evolutionary conservation of these TF classes. We also underscore the significance of non-sequence factors such as cis-regulatory element type, TF expression, interactions and post-translational modifications in influencing the in vivo predictive performance of TFs. Our research provides valuable insights into selecting prioritization methods for non-coding variants and further optimizing such models.<br /> (© The Author(s) 2024. Published by Oxford University Press.)

Details

Language :
English
ISSN :
1477-4054
Volume :
25
Issue :
2
Database :
MEDLINE
Journal :
Briefings in bioinformatics
Publication Type :
Academic Journal
Accession number :
38517697
Full Text :
https://doi.org/10.1093/bib/bbae110