1. Enhancing XGBoost's accuracy in soil organic matter prediction through feature fusion.
- Author
-
He, Shaofang, Zhou, Li, Xie, Hongxia, and Tan, Siqiao
- Abstract
Soil organic matter (SOM) content serves as a crucial indicator for assessing soil fertility and quality, making accurate and efficient prediction methods paramount. The application of visible near-infrared reflectance (vis–NIR) spectroscopy has been pivotal in predicting SOM content. However, utilizing soil profile data obtained during soil sample collection can provide additional insights into organic matter, suggesting that their separate use may not be optimal. This study aimed to investigate whether the fusion of vis–NIR and soil profile properties could enhance the performance of the extreme gradient boosting (XGBoost) algorithm in predicting SOM content. The sample set was sourced from paddy soils in Changsha and Zhuzhou, China. Three different modeling approaches (XGBoost constructed by LASSO feature of vis–NIR spectroscopy (LF-XGBoost), profile feature (PF-XGBoost), and fusion feature (FF-XGBoost)) were compared and evaluated using randomly split sample sets, fivefold cross-validation (fivefold CV), coefficient of determination (R
2 ), root mean square error (RMSE), and mean absolute error (MAE). Compared to LF-XGBoost and PF-XGBoost, the FF-XGBoost model demonstrated superior prediction capabilities for SOM content, indicating that the fusion feature improved SOM content prediction. In randomly segmented datasets, FF-XGBoost achieved an R2 of 0.897, RMSE of 3.746, and MAE of 2.935, with R2 improvements of 31 and 24%, respectively. In fivefold CV, FF-XGBoost achieved an R2 CV of 0.806, RMSECV of 5.136, and MAECV of 1.913, with R2 CV improvements of 11 and 51%, respectively. According to Shapley additive explanations model, variations in 'Color_class', 'Profile_level', and wavelength '767' within the fusion feature had the most significant impact on FF-XGBoost's output. Compared to other commonly used regression algorithms, FF-XGBoost demonstrated higher prediction accuracy. This study only focused on paddy soils in Changsha and Zhuzhou and employed well-established modeling methods. These results can serve as a catalyst for further research into new feature fusion techniques, advanced modeling methods, and the transferability of findings to other soil landscapes. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF