Back to Search Start Over

PROSAC as a selection tool for SO-PLS regression: A strategy for multi-block data fusion.

Authors :
Diaz-Olivares, Jose A.
Bendoula, Ryad
Saeys, Wouter
Ryckewaert, Maxime
Adriaens, Ines
Fu, Xinyue
Pastell, Matti
Roger, Jean-Michel
Aernouts, Ben
Source :
Analytica Chimica Acta. Aug2024, Vol. 1319, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Spectral data from multiple sources can be integrated into multi-block fusion chemometric models, such as sequentially orthogonalized partial-least squares (SO-PLS), to improve the prediction of sample quality features. Pre-processing techniques are often applied to mitigate extraneous variability, unrelated to the response variables. However, the selection of suitable pre-processing methods and identification of informative data blocks becomes increasingly complex and time-consuming when dealing with a large number of blocks. The problem addressed in this work is the efficient pre-processing, selection, and ordering of data blocks for targeted applications in SO-PLS. We introduce the PROSAC–SO–PLS methodology, which employs pre-processing ensembles with response-oriented sequential alternation calibration (PROSAC). This approach identifies the best pre-processed data blocks and their sequential order for specific SO-PLS applications. The method uses a stepwise forward selection strategy, facilitated by the rapid Gram-Schmidt process, to prioritize blocks based on their effectiveness in minimizing prediction error, as indicated by the lowest prediction residuals. To validate the efficacy of our approach, we showcase the outcomes of three empirical near-infrared (NIR) datasets. Comparative analyses were performed against partial-least-squares (PLS) regressions on single-block pre-processed datasets and a methodology relying solely on PROSAC. The PROSAC–SO–PLS approach consistently outperformed these methods, yielding significantly lower prediction errors. This has been evidenced by a reduction in the root-mean-squared error of prediction (RMSEP) ranging from 5 to 25 % across seven out of the eight response variables analyzed. The PROSAC–SO–PLS methodology offers a versatile and efficient technique for ensemble pre-processing in NIR data modeling. It enables the use of SO-PLS minimizing concerns about pre-processing sequence or block order and effectively manages a large number of data blocks. This innovation significantly streamlines the data pre-processing and model-building processes, enhancing the accuracy and efficiency of chemometric models. Summary of the PROSAC–SO–PLS methodology applied to one of the response variables (protein) of the miniS-milk dataset, which is one of the three datasets presented in this study. In this specific case, multiple blocks of spectral data derived from multiple miniature spectrometers measuring the same milk samples simultaneously are subjected to an ensemble of pre-processing techniques within the PROSAC framework. The five most variance-explaining blocks identified by PROSAC are subsequently utilized as input variables for SO-PLS, facilitating the construction of a model that optimizes block utilization for predicting the protein content in milk, with protein content reference values obtained from laboratory analyses. [Display omitted] • PROSAC and SO-PLS are combined for improved multi-block data fusion. • PROSAC streamlines SO-PLS by easing pre-processing and block order concerns. • PROSAC–SO–PLS effectively manages a large number of input data blocks. • PROSAC–SO–PLS is tested on three NIR datasets, surpassing existing state-of-art. • PROSAC–SO–PLS reduces prediction errors by 5–25 % for the evaluated NIR datasets. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00032670
Volume :
1319
Database :
Academic Search Index
Journal :
Analytica Chimica Acta
Publication Type :
Academic Journal
Accession number :
178884259
Full Text :
https://doi.org/10.1016/j.aca.2024.342965