Pierre Defourny, Guerric Le Maire, Nataliia Kussul, Nicolas Bellemans, Sergii Skakun, Sergey Bartalev, Diego de Abelleyra, Mykola Lavreniuk, François Waldner, Terrence Newby, Margareth Simoes, Zvi Hochman, Santiago R. Verón, Commonwealth Scientific and Industrial Research Organisation [Canberra] (CSIRO), Université Catholique de Louvain = Catholic University of Louvain (UCL), Agricultural Research Council (ARC), Instituto Nacional de Tecnología Agropecuaria (INTA), Russian Academy of Sciences [Moscow] (RAS), Space Research Institute, Partenaires INRAE, Ecologie fonctionnelle et biogéochimie des sols et des agro-écosystèmes (UMR Eco&Sols), Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut de Recherche pour le Développement (IRD)-Institut National de la Recherche Agronomique (INRA), Universidade Estadual de Campinas (UNICAMP), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Universidade do Estado do Rio de Janeiro [Rio de Janeiro] (UERJ), Dept Geog Sci., University of Maryland [College Park], University of Maryland System-University of Maryland System, CSIRO Future Science Platform 'GrainCast', CAPES, project 'Characterizing And Predicting Biomass Production In Sugarcane And Eucalyptus Plantations In Brazil' (FAPESP-Microsoft Research) : (2014/50715-9), CESOSO project (TOSCA program Grant of the French Space Agency, CNES), European Project: 603719,EC:FP7:ENV,FP7-ENV-2013-two-stage,SIGMA(2013), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut de Recherche pour le Développement (IRD)-Institut National de la Recherche Agronomique (INRA)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro)-Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), Universidade Estadual de Campinas = University of Campinas (UNICAMP), Centro Nacional de Pesquisa em Energia e Materiais = Brazilian Center for Research in Energy and Materials (CNPEM), Département Performances des systèmes de production et de transformation tropicaux (Cirad-PERSYST), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad), and UCL - SST/ELI/ELIE - Environmental Sciences
International audience; Cropland maps derived from satellite imagery have become a common source of information to estimate food production, support land use policies, and measure the environmental impacts of agriculture. Cropland classification models are typically calibrated with data collected from roadside surveys which enable the sampling of large areas at a relatively low cost. However, there is a risk of providing biased data as environmental and management gradients may not be fully captured from road networks, thereby violating the assumption of representativeness of calibration data. Despite being widely adopted, the potential biases of roadside sampling have so far not been thoroughly addressed. In this study, we looked for evidence of these biases by comparing three sampling strategies: Random sampling, Roadside sampling, and Transect sampling - a spatially constrained variant of Roadside sampling. In these three strategies, non-cropland data are randomly distributed as they can be photo-interpreted. Based on reference maps at 30 m in four study sites, we followed a Monte Carlo approach to generate multiple realizations of each sampling strategy for ten sample sizes. The effect of the sampling strategy was then assessed in terms of representativeness of the data set collected and accuracy of the resulting maps. Results showed that data sets obtained from Roadside sampling were significantly less representative than those obtained from Random sampling but the resulting maps were only marginally less accurate (2% difference). Transect sampling captured systematically less variability than Random or Roadside sampling which led to differences in accuracy as large as 15%. The effect of sample size on accuracy varied across sites but generally leveled off after reaching 3000 pixels. Augmenting the size of Transect samples improved the classification accuracy but not sufficiently to match the performance of the other sampling strategies. Finally, we found that Random and Roadside training sets with similar representativeness yield comparable accuracy. Therefore, we conclude that roadside sampling can be a viable source of training data for cropland mapping if the range of environmental and management gradients is surveyed. This underlines the importance of survey planning to identify those routes that capture most variability.