Back to Search Start Over

PromGER: Promoter Prediction Based on Graph Embedding and Ensemble Learning for Eukaryotic Sequence

Authors :
Yan Wang
Shiwen Tai
Shuangquan Zhang
Nan Sheng
Xuping Xie
Source :
Genes, Vol 14, Iss 7, p 1441 (2023)
Publication Year :
2023
Publisher :
MDPI AG, 2023.

Abstract

Promoters are DNA non-coding regions around the transcription start site and are responsible for regulating the gene transcription process. Due to their key role in gene function and transcriptional activity, the prediction of promoter sequences and their core elements accurately is a crucial research area in bioinformatics. At present, models based on machine learning and deep learning have been developed for promoter prediction. However, these models cannot mine the deeper biological information of promoter sequences and consider the complex relationship among promoter sequences. In this work, we propose a novel prediction model called PromGER to predict eukaryotic promoter sequences. For a promoter sequence, firstly, PromGER utilizes four types of feature-encoding methods to extract local information within promoter sequences. Secondly, according to the potential relationships among promoter sequences, the whole promoter sequences are constructed as a graph. Furthermore, three different scales of graph-embedding methods are applied for obtaining the global feature information more comprehensively in the graph. Finally, combining local features with global features of sequences, PromGER analyzes and predicts promoter sequences through a tree-based ensemble-learning framework. Compared with seven existing methods, PromGER improved the average specificity of 13%, accuracy of 10%, Matthew’s correlation coefficient of 16%, precision of 4%, F1 score of 6%, and AUC of 9%. Specifically, this study interpreted the PromGER by the t-distributed stochastic neighbor embedding (t-SNE) method and SHAPley Additive exPlanations (SHAP) value analysis, which demonstrates the interpretability of the model.

Details

Language :
English
ISSN :
20734425
Volume :
14
Issue :
7
Database :
Directory of Open Access Journals
Journal :
Genes
Publication Type :
Academic Journal
Accession number :
edsdoj.2a4cdbe8db5b499188ddf3de43fccc9c
Document Type :
article
Full Text :
https://doi.org/10.3390/genes14071441