Back to Search Start Over

Machine Learning-Based Prediction of Orphan Genes and Analysis of Different Hybrid Features of Monocot and Eudicot Plants.

Authors :
Gao, Qijuan
Zhang, Xiaodan
Yan, Hanwei
Jin, Xiu
Source :
Electronics (2079-9292); Mar2023, Vol. 12 Issue 6, p1433, 14p
Publication Year :
2023

Abstract

Orphan genes (OGs) may evolve from noncoding sequences or be derived from older coding material. Some shares of OGs are present in all sequenced genomes, participating in the biochemical and physiological pathways of many species, while many of them may be associated with the response to environmental stresses and species-specific traits or regulatory patterns. However, identifying OGs is a laborious and time-consuming task. This paper presents an automated predictor, XGBoost-A2OGs (identification of OGs for angiosperm based on XGBoost), used to identify OGs for seven angiosperm species based on hybrid features and XGBoost. The precision and accuracy of the proposed model based on fivefold cross-validation and independent testing reached 0.90 and 0.91, respectively, outperforming other classifiers in cross-species validation via other models, namely, Random Forest, AdaBoost, GBDT, and SVM. Furthermore, by analyzing and subdividing the hybrid features into five sets, it was proven that different hybrid feature sets influenced the prediction performance of OGs involving eudicot and monocot groups. Finally, testing of small-scale empirical datasets of each species separately based on optimal hybrid features revealed that the proposed model performed better for eudicot groups than for monocot groups. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20799292
Volume :
12
Issue :
6
Database :
Complementary Index
Journal :
Electronics (2079-9292)
Publication Type :
Academic Journal
Accession number :
162803885
Full Text :
https://doi.org/10.3390/electronics12061433