1. Predicting China's Maize Yield Using Multi-Source Datasets and Machine Learning Algorithms
- Author
-
Miao, L, Zou, Y, Cui, X, Kattel, GR, Shang, Y, Zhu, J, Miao, L, Zou, Y, Cui, X, Kattel, GR, Shang, Y, and Zhu, J
- Abstract
A timely and accurately predicted grain yield can ensure regional and global food security. The scientific community is gradually advancing the prediction of regional-scale maize yield. However, the combination of various datasets while predicting the regional-scale maize yield using simple and accurate methods is still relatively rare. Here, we have used multi-source datasets (climate dataset, satellite dataset, and soil dataset), lasso algorithm, and machine learning methods (random forest, support vector, extreme gradient boosting, BP neural network, long short-term memory network, and K-nearest neighbor regression) to predict China’s county-level maize yield. The use of multi-sourced datasets advanced the predicting accuracy of maize yield significantly compared to the single-sourced dataset. We found that the machine learning methods were superior to the lasso algorithm, while random forest, extreme gradient boosting, and support vector machine represented the most preferable methods for maize yield prediction in China (R2 ≥ 0.75, RMSE = 824–875 kg/ha, MAE = 626–651 kg/ha). The climate dataset contributed more to the prediction of maize yield, while the satellite dataset contributed to tracking the maize growth process. However, the methods’ accuracies and the dominant variables affecting maize growth varied with agricultural regions across different geographic locations. Our research serves as an important effort to examine the feasibility of multi-source datasets and machine learning techniques for regional-scale maize yield prediction. In addition, the methodology we have proposed here provides guidance for reliable yield prediction of different crops.
- Published
- 2024