Hasan, Mahmudul, Marjan, Md Abu, Uddin, Md Palash, Ibn Afjal, Masud, Kardy, Seifedine, Shaoqi Ma, and Yunyoung Nam
Agriculture is the most critical sector for food supply on the earth, and it is also responsible for supplying raw materials for other industrial productions. Currently, the growth in agricultural production is not sufficient to keep up with the growing population, which may result in a food shortfall for the world’s inhabitants. As a result, increasing food production is crucial for developing nations with limited land and resources. It is essential to select a suitable crop for a specific region to increase its production rate. Effective crop production forecasting in that area based on historical data, including environmental and cultivation areas, and crop production amount, is required. However, the data for such forecasting are not publicly available. As such, in this paper, we take a case study of a developing country, Bangladesh, whose economy relies on agriculture. We first gather and preprocess the data from the relevant research institutions of Bangladesh and then propose an ensemble machine learning approach, called K-nearest Neighbor Random Forest Ridge Regression (KRR), to effectively predict the production of the major crops (three different kinds of rice, potato, and wheat). KRR is designed after investigating five existing traditional machine learning (Support Vector Regression, Naïve Bayes, and Ridge Regression) and ensemble learning (Random Forest and CatBoost) algorithms. We consider four classical evaluation metrics, i.e., mean absolute error, mean square error (MSE), root MSE, and R², to evaluate the performance of the proposed KRR over the other machine learning models. It shows 0.009 MSE, 99% R² for Aus; 0.92 MSE, 90% R² for Aman; 0.246 MSE, 99% R² for Boro; 0.062 MSE, 99% R² for wheat; and 0.016 MSE, 99% R² for potato production prediction. The Diebold–Mariano test is conducted to check the robustness of the proposed ensemble model, KRR. In most cases, it shows 1% and 5% significance compared to the benchmark ML models. Lastly, we design a recommender system that suggests suitable crops for a specific land area for cultivation in the next season. We believe that the proposed paradigm will help the farmers and personnel in the agricultural sector leverage proper crop cultivation and production. [ABSTRACT FROM AUTHOR]