Back to Search Start Over

dPromoter-XGBoost: Detecting promoters and strength by combining multiple descriptors and feature selection using XGBoost

Authors :
Hongfei, Li
Lei, Shi
Wentao, Gao
Zixiao, Zhang
Lichao, Zhang
Yuming, Zhao
Guohua, Wang
Source :
Methods. 204:215-222
Publication Year :
2022
Publisher :
Elsevier BV, 2022.

Abstract

Promoters play an irreplaceable role in biological processes and genetics, which are responsible for stimulating the transcription and expression of specific genes. Promoter abnormalities have been found in some diseases, and the level of promoter-binding transcription factors can be used as a marker before a disease occurs. Hence, detecting promoters from DNA sequences has important biological significance, particular, distinguishing strong promoters can help to elucidate differences in gene expression and the mechanisms of specific diseases. With the introduction of third-generation sequencing, it is difficult to match the speed of sequencing to the speed of labeling promoters experimentally. Many computing models have been designed to fill this gap and identify unlabeled DNA. However, their feature representation methods are very singular, which cannot reflect the information contained in the original samples. With the aim of avoiding information loss, we propose a computational model based on multiple descriptors and feature selection to jointly express samples. It is worth mentioning that a new feature descriptor called K-mer word vector is defined. The promoter model of multiple feature descriptors dominated by K-mer word vector achieves similar performance to existing methods, the sensitivity of 85.72% can distinguish the promoter more effectively than other methods. Furthermore, the performance of the promoter strength has surpassed published methods, and accuracy of 77.00% greatly improves the ability to distinguish between strong and weak promoters.

Details

ISSN :
10462023
Volume :
204
Database :
OpenAIRE
Journal :
Methods
Accession number :
edsair.doi.dedup.....d0dbe98670745d4df933aa8445158647