Back to Search Start Over

Ensemble Machine Learning Techniques Using Computer Simulation Data for Wild Blueberry Yield Prediction

Authors :
Hayam R. Seireg
Yasser M. K. Omar
Fathi E. Abd El-Samie
Adel S. El-Fishawy
Ahmed Elmahalawy
Source :
IEEE Access, Vol 10, Pp 64671-64687 (2022)
Publication Year :
2022
Publisher :
IEEE, 2022.

Abstract

Precision agriculture is a challenging task to achieve. Several studies have been conducted to forecast agricultural yields using machine learning algorithms (MLA), but few studies have used ensemble machine learning algorithms (EMLA). In the current study, we use a dataset generated by a computer simulation program, and meteorological data obtained over 30 years from Maine, United States (USA). The primary goal of this research is to increase the forecast accuracy of the best characteristics for overcoming hunger challenges. We adopted stacking regression (SR) and cascading regression (CR) with a novel combination of MLA based on the wild blueberry dataset. We used features that indicated the best regulation for wild blueberry agroecosystems. Four feature engineering selection techniques are applied, namely variance inflation factor (VIF), sequential forward feature selection (SFFS), sequential backward elimination feature selection (SBEFS), and extreme gradient boosting based on feature importance (XFI). We applied Bayesian optimization on popular MLA to obtain the best hyperparameters to achieve accurate wild blueberry yield prediction. The SR used a two-layer structure: level-0 containing light gradient boosting machine (LGBM), gradient boost regression (GBR) and extreme gradient boosting (XGBoost), and level-1 providing the output prediction using a Ridge. The CR topology is the same MLA used in SR, but in a series form that takes the new prediction as a feeder to each MLA and removes the previous prediction in each stage. We assessed the CR, and SR with outcomes according to the root mean square error (RMSE) and coefficient of determination ( $R^{2}$ ). In the results, the proposed SR showed the best performance with $R^{2}$ of 0.984 and RMSE of 179.898 compared with another study that reported $R^{2}$ of 0.938 and RMSE of 343.026 on the seven features selected by XFI. The SR achieved the highest $R^{2}$ of 0.985 on all features and the features that were selected by the SBEFS. Our SR outperformed CR, and another study on wild blueberry yield prediction.

Details

Language :
English
ISSN :
21693536
Volume :
10
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.16a6c5be8ac442e919aab796d4117f2
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2022.3181970