Back to Search Start Over

The GeoLifeCLEF 2023 Dataset to evaluate plant species distribution models at high spatial resolution across Europe

Authors :
Botella, Christophe
Benjamin, Deneu
Diego Gonzalez, Marcos
Maximilien, Servajean
Théo, Larcher
Joaquim, Estopinan
César, Leblanc
Bonnet, Pierre
Joly, Alexis
Scientific Data Management (ZENITH)
Inria Sophia Antipolis - Méditerranée (CRISAM)
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)
ADVanced Analytics for data SciencE (ADVANSE)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Université Paul-Valéry - Montpellier 3 (UPVM)
Botanique et Modélisation de l'Architecture des Plantes et des Végétations (UMR AMAP)
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche pour le Développement (IRD [France-Sud])-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Université de Montpellier (UM)
Département Systèmes Biologiques (Cirad-BIOS)
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)
European Project: 101060693,GUARDEN
European Project: 101060639,MAMBO
Publication Year :
2023
Publisher :
HAL CCSD, 2023.

Abstract

The full dataset is freely available at the link below (perennial repository) for academic use or other non-commercial use: https://lab.plantnet.org/seafile/d/936fe4298a5a4f4c8dbd/; The difficulty to measure or predict species community composition at fine spatio-temporal resolution and over large spatial scales severely hampers our ability to understand species assemblages and take appropriate conservation measures. Despite the progress in species distribution modeling (SDM) over the past decades, SDM are just beginning to integrate high resolution remote sensing data and their predictions are still entailed by the many biases and heterogeneous quality of the available biodiversity observations, most often opportunistic presence only data. We designed a European scale dataset covering around 10K plant species to calibrate and evaluate SDM predictions of species composition in space and time at high spatial resolution (~10m), and study their spatial transferability. For model training, we extracted and harmonized 5 million heterogeneous presence-only observations from selected GBIF datasets and 5 thousand exhaustive presence-absence surveys both sampled during the 2017-2021. We associated them to various environmental rasters classically used in SDMs, as well as to 10x10m resolution RGB and Near-Infra-Red satellite images and 20 years-time series of climatic variables and satellite point values. The evaluation dataset is based on 20K standardized presence-absence surveys separated from the training set with a spatial block hold out procedure. The GeoLifeCLEF 2023 dataset is open access and the first benchmark for researchers aiming to improve the prediction of plant species composition at a very fine spatial grain and at continental scale. It is also a space to explore new ways of combining massive and diverse species observations with environmental information at various scales. Innovative AI-based approaches, in particular, should be among the most interesting methods to experiment with on the GeoLifeCLEF 2023 dataset.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.od.......165..da9d1c82c96f289b308e04f2069aecb4