Herrero-Langreo, Ana, Gorretta, Nathalie, Tisseyre, Bruno, Gowen, Aoife, Jun-Li Xu, Chaix, Gilles, and Roger, Jean-Michel
Hyperspectral (HS) images have the characteristic of containing both spectral and spatial information of a sample. Typically, spectral information can be related with chemical and physical properties through multivariate regression models. The application of these models onto HS images results in prediction maps, which provide an estimation of the modelled chemical information for each pixel of the image. This approach has wide applications in food processing industries for online monitoring of product quality and process control. One of the main difficulties derived from an imaging set up, is that the size of the pixels is usually much smaller than the area required to obtain a wet chemical reference. This means that, as opposed to point spectroscopy, the performance of the estimations cannot be evaluated by directly comparing observed and estimated values for each pixel. Moreover, the selection of regression model parameters, such as the number of latent variables (LV) in a partial least squares (PLS) model, cannot be assessed on a pixel basis either. Nonetheless, compared to point spectroscopy, HS imaging does provide information on the spatial distribution of the predicted values. The objective of this work is to propose a quantitative approach to use spatial information of prediction maps for supporting the evaluation of regression models applied to HS images. This approach is based on the use of geostatistical indexes, which allow decomposing the total variance of the prediction maps into two components: non spatially structured and spatially structured variance, represented respectively by the nugget effect (C0) and the partial sill (C1). This strategy was tested in a simulated dataset and two real case studies. Geostatistical indices of the prediction maps were compared to model performance metrics for PLS models with increasing number of LV. As a result, this work stablishes a connection between linear regression model performance estimates and the spatial decomposition of variance in prediction maps, when the ground truth to estimate is spatially structured. The presented study [1], allowed to evaluate HS imaging models, not only from average estimations which can be compared to reference values, but also from the spatial structure of prediction maps. This approach does not require ground truth values and could be used as a source of information for supporting the choice of optimum calibration options, such as the number of LV, or the pre-treatments, complementing the traditional visual inspection of prediction maps with quantitative and objective metrics. Further works could explore the effect of spatial structure at different scales in the data. This approach could also be applied to evaluate variations of other parameters in the model, such as different methods for spectral pretreatments.