1. Orthogonality constrained inverse regression to improve model selectivity and analyte predictions from vibrational spectroscopic measurements
- Author
-
Jahan B. Ghasemi, Carl Emil Eskildsen, Peter B. Skou, Age K. Smilde, Ensie Hosseini, Epidemiology and Data Science, APH - Methodology, APH - Personalized Medicine, Amsterdam Gastroenterology Endocrinology Metabolism, Biosystems Data Analysis (SILS, FNWI), and Freshwater and Marine Ecology (IBED, FNWI)
- Subjects
Analyte ,Mean squared error ,Calibration (statistics) ,Chemistry ,Spectrum Analysis ,NIPALS ,PLS ,Biochemistry ,Regression ,Analytical Chemistry ,Orthogonality ,Test set ,Calibration ,Ordinary least squares ,Partial least squares regression ,Inverse regression ,Environmental Chemistry ,Selectivity ,Least-Squares Analysis ,Orthogonality constraint ,Maltose ,Biological system ,Infrared spectroscopy ,Algorithms ,Spectroscopy - Abstract
In analytical chemistry spectroscopy is attractive for high-throughput quantification, which often relies on inverse regression, like partial least squares regression. Due to a multivariate nature of spectroscopic measurements an analyte can be quantified in presence of interferences. However, if the model is not fully selective against interferences, analyte predictions may be biased. The degree of model selectivity against an interferent is defined by the inner relation between the regression vector and the pure interfering signal. If the regression vector is orthogonal to the signal, this inner relation equals zero and the model is fully selective. The degree of model selectivity largely depends on calibration data quality. Strong correlations may deteriorate calibration data resulting in poorly selective models. We show this using a fructose-maltose model system. Furthermore, we modify the NIPALS algorithm to improve model selectivity when calibration data are deteriorated. This modification is done by incorporating a projection matrix into the algorithm, which constrains regression vector estimation to the null-space of known interfering signals. This way known interfering signals are handled, while unknown signals are accounted for by latent variables. We test the modified algorithm and compare it to the conventional NIPALS algorithm using both simulated and industrial process data. The industrial process data consist of mid-infrared measurements obtained on mixtures of beta-lactoglobulin (analyte of interest), and alpha-lactalbumin and caseinoglycomacropeptide (interfering species). The root mean squared error of beta-lactoglobulin (% w/w) predictions of a test set was 0.92 and 0.33 when applying the conventional and the modified NIPALS algorithm, respectively. Our modification of the algorithm returns simpler models with improved selectivity and analyte predictions. This paper shows how known interfering signals may be utilized in a direct fashion, while benefitting from a latent variable approach. The modified algorithm can be viewed as a fusion between ordinary least squares regression and partial least squares regression and may be very useful when knowledge of some (but not all) interfering species is available.
- Published
- 2021