Back to Search Start Over

Employing bimodal representations to predict DNA bendability within a self-supervised pre-trained framework.

Authors :
Yang M
Zhang S
Zheng Z
Zhang P
Liang Y
Tang S
Source :
Nucleic acids research [Nucleic Acids Res] 2024 Apr 12; Vol. 52 (6), pp. e33.
Publication Year :
2024

Abstract

The bendability of genomic DNA, which measures the DNA looping rate, is crucial for numerous biological processes of DNA. Recently, an advanced high-throughput technique known as 'loop-seq' has made it possible to measure the inherent cyclizability of DNA fragments. However, quantifying the bendability of large-scale DNA is costly, laborious, and time-consuming. To close the gap between rapidly evolving large language models and expanding genomic sequence information, and to elucidate the DNA bendability's impact on critical regulatory sequence motifs such as super-enhancers in the human genome, we introduce an innovative computational model, named MIXBend, to forecast the DNA bendability utilizing both nucleotide sequences and physicochemical properties. In MIXBend, a pre-trained language model DNABERT and convolutional neural network with attention mechanism are utilized to construct both sequence- and physicochemical-based extractors for the sophisticated refinement of DNA sequence representations. These bimodal DNA representations are then fed to a k-mer sequence-physicochemistry matching module to minimize the semantic gap between each modality. Lastly, a self-attention fusion layer is employed for the prediction of DNA bendability. In conclusion, the experimental results validate MIXBend's superior performance relative to other state-of-the-art methods. Additionally, MIXBend reveals both novel and known motifs from the yeast. Moreover, MIXBend discovers significant bendability fluctuations within super-enhancer regions and transcription factors binding sites in the human genome.<br /> (© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.)

Details

Language :
English
ISSN :
1362-4962
Volume :
52
Issue :
6
Database :
MEDLINE
Journal :
Nucleic acids research
Publication Type :
Academic Journal
Accession number :
38375921
Full Text :
https://doi.org/10.1093/nar/gkae099