Back to Search
Start Over
GBDTCDA: Predicting circRNA-disease Associations Based on Gradient Boosting Decision Tree with Multiple Biological Data Fusion
- Source :
- International Journal of Biological Sciences
- Publication Year :
- 2019
-
Abstract
- Circular RNA (circRNA) is a closed-loop structural non-coding RNA molecule which plays a significant role during the gene regulation processes. There are many previous studies shown that circRNAs can be regarded as the sponges of miRNAs. Thus, circRNA is also a key point for disease diagnosing, treating and inferring. However, traditional experimental approaches to verify the associations between the circRNA and disease are time-consuming and money-consuming. There are few computational models to predict potential circRNA-disease associations, which become our motivation to propose a new computational model. In this study, we propose a machine learning based computational model named Gradient Boosting Decision Tree with multiple biological data to predict circRNA-disease associations (GBDTCDA). The known circRNA-disease associations' data are downloaded from cricR2Disease database (http://bioinfo.snnu.edu.cn/CircR2Disease/). The feature vector of each circRNA-disease association pair is composed of four parts, which are the statistics information of different biological networks, the graph theory information of different biological networks, circRNA-disease associations' network information and circRNA nucleotide sequence information, respectively. Therefore, we use those feature vectors to train the gradient boosting decision tree regression model. Then, the leave one out cross validation (LOOCV) is adopted to evaluate the performance of our computational model. As for predicting some common diseases related circRNAs, our method GBDTCDA also obtains the better results. The Area under the ROC Curve (AUC) values of Basal cell carcinoma, Non-small cell lung cancer and cervical cancer are 95.8%, 88.3% and 93.5%, respectively. For further illustrating the performance of GBDTCDA, a case study of breast cancer is also supplemented in this study. Thus, our proposed method GBDTCDA is a powerful tool to predict potential circRNA-disease associations based on experimental results and analyses.
- Subjects :
- circRNA-disease associations
Computer science
Feature vector
Machine learning
computer.software_genre
Applied Microbiology and Biotechnology
Cross-validation
Machine Learning
03 medical and health sciences
Circular RNA
Humans
Molecular Biology
Ecology, Evolution, Behavior and Systematics
030304 developmental biology
0303 health sciences
Computational model
Biological data
business.industry
Decision Trees
Computational Biology
Regression analysis
Cell Biology
RNA, Circular
Gradient Boosting
Artificial intelligence
Gradient boosting
multiple biological data
business
computer
Biological network
Developmental Biology
Research Paper
Subjects
Details
- ISSN :
- 14492288
- Volume :
- 15
- Issue :
- 13
- Database :
- OpenAIRE
- Journal :
- International journal of biological sciences
- Accession number :
- edsair.doi.dedup.....1fbe9245ab635cc4770d903c780f8bc6