Back to Search Start Over

iDHS-RGME: Identification of DNase I hypersensitive sites by integrating information on nucleotide composition and physicochemical properties.

Authors :
Jin J
Feng J
Source :
Biochemical and biophysical research communications [Biochem Biophys Res Commun] 2024 Nov 19; Vol. 734, pp. 150618. Date of Electronic Publication: 2024 Aug 29.
Publication Year :
2024

Abstract

As pivotal markers of chromatin accessibility, DNase I hypersensitive sites (DHSs) intimately link to fundamental biological processes encompassing gene expression regulation and disease pathogenesis. Developing efficient and precise algorithms for DHSs identification holds paramount importance for unraveling genome functionality and elucidating disease mechanisms. This study innovatively presents iDHS-RGME, an Extremely Randomized Trees (Extra-Trees)-based algorithm that integrates unique feature extraction techniques for enhanced DHSs prediction. Specifically, iDHS-RGME utilizes two feature extraction approaches: Reverse Complementary Kmer (RCKmer) and Geary Spatial Autocorrelation (GSA), which comprehensively capture sequence attributes from diverse angles, bolstering information richness and accuracy. To address data imbalance, Borderline-SMOTE is employed, followed by Maximum Information Coefficient (MIC) for meticulous feature selection. Comparative evaluations underscored the superiority of the Extra-Trees classifier, which was subsequently adopted for model prediction. Through rigorous five-fold cross-validation, iDHS-RGME achieved remarkable accuracies of 94.71 % and 95.07 % on two independent datasets, outperforming previous models in terms of both precision and effectiveness.<br />Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.<br /> (Copyright © 2024 Elsevier Inc. All rights reserved.)

Details

Language :
English
ISSN :
1090-2104
Volume :
734
Database :
MEDLINE
Journal :
Biochemical and biophysical research communications
Publication Type :
Academic Journal
Accession number :
39222575
Full Text :
https://doi.org/10.1016/j.bbrc.2024.150618