1. A Convolutional Neural Network-Based Approach to Identify the Origins of Replication in Saccharomyces Cerevisiae
- Author
-
Chengjin Zhang, Chen Jingui, Feng Wu, and Runtao Yang
- Subjects
0209 industrial biotechnology ,Artificial neural network ,biology ,Computer science ,viruses ,Feature vector ,Saccharomyces cerevisiae ,Inheritance (genetic algorithm) ,DNA replication ,02 engineering and technology ,Computational biology ,biochemical phenomena, metabolism, and nutrition ,Origin of replication ,biology.organism_classification ,Convolutional neural network ,Replication (computing) ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Gene - Abstract
DNA replication is key to the inheritance of genetic information. Accurate, efficient and rapid identification of the origins of replication (ORIs) is crucial for understanding the mechanism of DNA replication. Especially for eukaryotes, each of their gene sequences contains multiple ORIs for more efficient replication. Although there are many predictors designed to identify eukaryotes’ ORIs, many of them are only targeted to the gene sequences with a fixed length. In addition, the prediction accuracies are not satisfying, which still has great room to be improved. In view of the limitations in this field, a convolutional neural network-based approach is developed in this study to identify ORIs with different lengths in Saccharomyces cerevisiae (S. cerevisiae). As combining this study with the field of Natural Language Processing (NLP), trinucleotide feature vectors are constructed by Word2vec to represent each trinucleotide so as to the subsequent ORIs identification using Text-Convolutional Neural Network. As a result, the overall success rate of 88.3% was achieved which proved the effeciency of the proposed method to identify ORIs with any length.
- Published
- 2020