Back to Search
Start Over
Employing bimodal representations to predict DNA bendability within a self-supervised pre-trained framework.
- Source :
-
Nucleic acids research [Nucleic Acids Res] 2024 Apr 12; Vol. 52 (6), pp. e33. - Publication Year :
- 2024
-
Abstract
- The bendability of genomic DNA, which measures the DNA looping rate, is crucial for numerous biological processes of DNA. Recently, an advanced high-throughput technique known as 'loop-seq' has made it possible to measure the inherent cyclizability of DNA fragments. However, quantifying the bendability of large-scale DNA is costly, laborious, and time-consuming. To close the gap between rapidly evolving large language models and expanding genomic sequence information, and to elucidate the DNA bendability's impact on critical regulatory sequence motifs such as super-enhancers in the human genome, we introduce an innovative computational model, named MIXBend, to forecast the DNA bendability utilizing both nucleotide sequences and physicochemical properties. In MIXBend, a pre-trained language model DNABERT and convolutional neural network with attention mechanism are utilized to construct both sequence- and physicochemical-based extractors for the sophisticated refinement of DNA sequence representations. These bimodal DNA representations are then fed to a k-mer sequence-physicochemistry matching module to minimize the semantic gap between each modality. Lastly, a self-attention fusion layer is employed for the prediction of DNA bendability. In conclusion, the experimental results validate MIXBend's superior performance relative to other state-of-the-art methods. Additionally, MIXBend reveals both novel and known motifs from the yeast. Moreover, MIXBend discovers significant bendability fluctuations within super-enhancer regions and transcription factors binding sites in the human genome.<br /> (© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.)
Details
- Language :
- English
- ISSN :
- 1362-4962
- Volume :
- 52
- Issue :
- 6
- Database :
- MEDLINE
- Journal :
- Nucleic acids research
- Publication Type :
- Academic Journal
- Accession number :
- 38375921
- Full Text :
- https://doi.org/10.1093/nar/gkae099