Back to Search Start Over

OriC-ENS: A sequence-based ensemble classifier for predicting origin of replication in S. cerevisiae.

Authors :
Azim SM
Haque MR
Shatabda S
Source :
Computational biology and chemistry [Comput Biol Chem] 2021 Jun; Vol. 92, pp. 107502. Date of Electronic Publication: 2021 Apr 26.
Publication Year :
2021

Abstract

DNA Replication plays the most crucial part in biological inheritance, ensuring an even flow of genetic information from parent to offspring. The beginning site of DNA Replication which is called the Origin of Replication (ORI), plays a significant role in understanding the molecular mechanisms and genomic analysis of DNA. Hence, it is paramount to accurately identify the origin of replication to gain a more accurate understanding of the biochemical and genomic properties of DNA. In this paper, We have proposed a new approach named OriC-ENS that uses sequence-based feature extraction techniques, K-mer, K-gapped Mono-Di, and Di Mono, and an ensemble classification technique that uses majority voting for the identification of Origin of Replication. We have used three SVM classifiers, one for the K-mer features and two more for K-Gapped Mono-Di and K-Gapped Di-mono features. Finally, we used majority voting to combine the prediction by each predictor. Experimental results on the S. Cerevisiae dataset have shown that our method achieves an accuracy of 91.62 % which outperforms other state-of-the-art methods by a significant margin. We have also tested our method using other evaluation metrics such as Matthews Correlation Coefficient (MCC), Area Under Curve(AUC), Sensitivity, and Specificity, where it has achieved a score of 0.83, 0.98, 0.90, and 0.92 respectively. We have further evaluated our model on an independent test set collected from OriDB, consisting of the sequences of Schizosaccharomyces pombe where we have seen that our model can predict the origin of replication efficiently and with great precision. We have made our python-based source code available at https://github.com/MehediAzim/OriC-ENS.<br /> (Copyright © 2021 Elsevier Ltd. All rights reserved.)

Details

Language :
English
ISSN :
1476-928X
Volume :
92
Database :
MEDLINE
Journal :
Computational biology and chemistry
Publication Type :
Academic Journal
Accession number :
33962169
Full Text :
https://doi.org/10.1016/j.compbiolchem.2021.107502