Back to Search
Start Over
TExCNN: Leveraging Pre-Trained Models to Predict Gene Expression from Genomic Sequences.
- Source :
-
Genes . Dec2024, Vol. 15 Issue 12, p1593. 14p. - Publication Year :
- 2024
-
Abstract
- Background/Objectives: Understanding the relationship between DNA sequences and gene expression levels is of significant biological importance. Recent advancements have demonstrated the ability of deep learning to predict gene expression levels directly from genomic data. However, traditional methods are limited by basic word encoding techniques, which fail to capture the inherent features and patterns of DNA sequences. Methods: We introduce TExCNN, a novel framework that integrates the pre-trained models DNABERT and DNABERT-2 to generate word embeddings for DNA sequences. We partitioned the DNA sequences into manageable segments and computed their respective embeddings using the pre-trained models. These embeddings were then utilized as inputs to our deep learning framework, which was based on convolutional neural network. Results: TExCNN outperformed current state-of-the-art models, achieving an average R2 score of 0.622, compared to the 0.596 score achieved by the DeepLncLoc model, which is based on the Word2Vec model and a text convolutional neural network. Furthermore, when the sequence length was extended from 10,500 bp to 50,000 bp, TExCNN achieved an even higher average R2 score of 0.639. The prediction accuracy improved further when additional biological features were incorporated. Conclusions: Our experimental results demonstrate that the use of pre-trained models for word embedding generation significantly improves the accuracy of predicting gene expression. The proposed TExCNN pipeline performes optimally with longer DNA sequences and is adaptable for both cell-type-independent and cell-type-dependent predictions. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 20734425
- Volume :
- 15
- Issue :
- 12
- Database :
- Academic Search Index
- Journal :
- Genes
- Publication Type :
- Academic Journal
- Accession number :
- 181911019
- Full Text :
- https://doi.org/10.3390/genes15121593