Witjes, Martijn, Parente, Leandro, J van Diemen, Chris, Hengl, Tomislav, Landa, Martin, Brodský, Lukáš, Halounova, Lena, Križan, Josip, Antonić, Luka, Ilie, Codrina Maria, Craciunescu, Vasile, Kilibarda, Milan, Antonijević, Ognjen, Glušica, Luka, Witjes, Martijn, Parente, Leandro, J van Diemen, Chris, Hengl, Tomislav, Landa, Martin, Brodský, Lukáš, Halounova, Lena, Križan, Josip, Antonić, Luka, Ilie, Codrina Maria, Craciunescu, Vasile, Kilibarda, Milan, Antonijević, Ognjen, and Glušica, Luka
A spatiotemporal machine learning framework for automated prediction and analysis of long-term Land Use/Land Cover dynamics is presented. The framework includes: (1) harmonization and preprocessing of spatial and spatiotemporal input datasets (GLAD Landsat, NPP/VIIRS) including five million harmonized LUCAS and CORINE Land Cover-derived training samples, (2) model building based on spatial k-fold cross-validation and hyper-parameter optimization, (3) prediction of the most probable class, class probabilities and model variance of predicted probabilities per pixel, (4) LULC change analysis on time-series of produced maps. The spatiotemporal ensemble model consists of a random forest, gradient boosted tree classifier, and an artificial neural network, with a logistic regressor as meta-learner. The results show that the most important variables for mapping LULC in Europe are: seasonal aggregates of Landsat green and near-infrared bands, multiple Landsat-derived spectral indices, long-term surface water probability, and elevation. Spatial cross-validation of the model indicates consistent performance across multiple years with overall accuracy (a weighted F1-score) of 0.49, 0.63, and 0.83 when predicting 43 (level-3), 14 (level-2), and five classes (level-1). Additional experiments show that spatiotemporal models generalize better to unknown years, outperforming single-year models on known-year classification by 2.7% and unknown-year classification by 3.5%. Results of the accuracy assessment using 48,365 independent test samples shows 87% match with the validation points. Results of time-series analysis (time-series of LULC probabilities and NDVI images) suggest forest loss in large parts of Sweden, the Alps, and Scotland. Positive and negative trends in NDVI in general match the land degradation and land restoration classes, with “urbanization” showing the most negative NDVI trend. An advantage of using spatiotemporal ML is that the fitted model can be used to predict