30 results on '"Sven Serneels"'
Search Results
2. direpack: A Python 3 package for state-of-the-art statistical dimensionality reduction methods
- Author
-
Emmanuel Jordy Menvouta, Sven Serneels, and Tim Verdonck
- Subjects
Dimensionality reduction ,Projection pursuit ,Sufficient dimension reduction ,Robust statistics ,Energy statistics ,Statistical learning ,Computer software ,QA76.75-76.765 - Abstract
The direpack package establishes a set of modern statistical dimensionality reduction techniques into the Python universe as a single, consistent package. Several of the methods included are only available as open source through direpack, whereas the package also offers competitive Python implementations of methods previously only available in other programming languages. In its present version, the package is structured in three subpackages for different approaches to dimensionality reduction: projection pursuit, sufficient dimension reduction and robust M estimators. As a corollary, the package also provides access to regularized regression estimators based on these reduced dimension spaces, as well as a set of classical and robust preprocessing utilities, including very recent developments such as generalized spatial signs. Finally, direpack has been written to be consistent with the scikit-learn API, such that the estimators can flawlessly be included into (statistical and/or machine) learning pipelines in that framework.
- Published
- 2023
- Full Text
- View/download PDF
3. Non-Fungible Token Transactions: Data and Challenges
- Author
-
Jason B. Cho, Sven Serneels, and David S. Matteson
- Subjects
FOS: Economics and business ,FOS: Computer and information sciences ,Statistical Finance (q-fin.ST) ,Quantitative Finance - Statistical Finance ,Applications (stat.AP) ,Statistics - Applications - Abstract
Non-fungible tokens (NFT) have recently emerged as a novel blockchain hosted financial asset class that has attracted major transaction volumes. Investment decisions rely on data and adequate preprocessing and application of analytics to them. Both owing to the non-fungible nature of the tokens and to a blockchain being the primary data source, NFT transaction data pose several challenges not commonly encountered in traditional financial data. Using data that consist of the transaction history of eight highly valued NFT collections, a selection of such challenges is illustrated. These are: price differentiation by token traits, the possible existence of lateral swaps and wash trades in the transaction history and finally, severe volatility. While this paper merely scratches the surface of how data analytics can be applied in this context, the data and challenges laid out here may present opportunities for future research on the topic.
- Published
- 2023
4. Robust Multivariate Methods: The Projection Pursuit Approach.
- Author
-
Peter Filzmoser, Sven Serneels, Christophe Croux, and Pierre J. Van Espen
- Published
- 2005
- Full Text
- View/download PDF
5. The Partial Robust M-approach.
- Author
-
Sven Serneels, Christophe Croux, Peter Filzmoser, and Pierre J. Van Espen
- Published
- 2005
- Full Text
- View/download PDF
6. Practicable optimization for portfolios that contain nonfungible tokens
- Author
-
Emmanuel Jordy Menvouta, Sven Serneels, and Tim Verdonck
- Subjects
Finance - Published
- 2023
7. Detecting wash trading for nonfungible tokens
- Author
-
Sven Serneels
- Subjects
Economics ,Finance - Abstract
Nonfungible tokens have recently developed into a new financial market segment notorious for wash trading activity. Wash trades are made when users swap tokens between two or more of their own wallet addresses, most often to boost asset prices and volumes artificially, or to harvest marketplace rewards. The presence of wash trades can entirely distort the apparent fair value of tokens in a collection. They should therefore be detected and excluded from further financial assessments such as token valuations and appraisals. This letter introduces three novel strategies to flag transactions for suspicious wash activity, tailored to the NFT markets.
- Published
- 2023
8. Review for 'Distribution‐Free Predictive Inference for PLS Regression with Applications to Molecular Descriptors Datasets'
- Author
-
Sven Serneels
- Published
- 2022
9. Hybrid AI Models in Chemical Engineering – A Purpose-driven Perspective
- Author
-
Arijit Chakraborty, Sven Serneels, Heiko Claussen, and Venkat Venkatasubramanian
- Published
- 2022
10. direpack: A Python 3 package for state-of-the-art statistical dimensionality reduction methods
- Author
-
Sven Serneels, Tim Verdonck, and Emmanuel Jordy Menvouta
- Subjects
Computer. Automation ,Technology ,Science & Technology ,PROJECTION ,PARTIAL LEAST-SQUARES ,Robust statistics ,Computer Science, Software Engineering ,Dimensionality reduction ,Energy statistics ,Statistical learning ,Computer Science Applications ,Sufficient dimension reduction ,Computer Science ,REGRESSION ,SPARSE ,Projection pursuit ,Software - Abstract
The direpack package establishes a set of modern statistical dimensionality reduction techniques into the Python universe as a single, consistent package. Several of the methods included are only available as open source through direpack, whereas the package also offers competitive Python implementations of methods previously only available in other programming languages. In its present version, the package is structured in three subpackages for different approaches to dimensionality reduction: projection pursuit, sufficient dimension reduction and robust M estimators. As a corollary, the package also provides access to regularized regression estimators based on these reduced dimension spaces, as well as a set of classical and robust preprocessing utilities, including very recent developments such as generalized spatial signs. Finally, direpack has been written to be consistent with the scikit-learn API, such that the estimators can flawlessly be included into (statistical and/or machine) learning pipelines in that framework.(c) 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
- Published
- 2023
11. Sparse dimension reduction based on energy and ball statistics
- Author
-
Sven Serneels, Emmanuel Jordy Menvouta, and Tim Verdonck
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Variables ,Computer science ,Applied Mathematics ,Dimensionality reduction ,media_common.quotation_subject ,Nonparametric statistics ,Estimator ,Sufficient dimension reduction ,Feature selection ,Covariance ,Computer Science Applications ,Methodology (stat.ME) ,62G05, 62H12 ,Divergence (statistics) ,Algorithm ,Mathematics ,Statistics - Methodology ,media_common - Abstract
As its name suggests, sufficient dimension reduction (SDR) targets to estimate a subspace from data that contains all information sufficient to explain a dependent variable. Ample approaches exist to SDR, some of the most recent of which rely on minimal to no model assumptions. These are defined according to an optimization criterion that maximizes a nonparametric measure of association. The original estimators are nonsparse, which means that all variables contribute to the model. However, in many practical applications, an SDR technique may be called for that is sparse and as such, intrinsically performs sufficient variable selection (SVS). This paper examines how such a sparse SDR estimator can be constructed. Three variants are investigated, depending on different measures of association: distance covariance, martingale difference divergence and ball covariance. A simulation study shows that each of these estimators can achieve correct variable selection in highly nonlinear contexts, yet are sensitive to outliers and computationally intensive. The study sheds light on the subtle differences between the methods. Two examples illustrate how these new estimators can be applied in practice, with a slight preference for the option based on martingale difference divergence in the bioinformatics example.
- Published
- 2020
12. Outlyingness: Which variables contribute most?
- Author
-
Tim Verdonck, Sebastiaan Höppner, Sven Serneels, and Michiel Debruyne
- Subjects
Statistics and Probability ,Clustering high-dimensional data ,Computer science ,Perspective (graphical) ,Univariate ,010103 numerical & computational mathematics ,computer.software_genre ,01 natural sciences ,Least squares ,Regression ,Theoretical Computer Science ,010104 statistics & probability ,Computational Theory and Mathematics ,Partial least squares regression ,Outlier ,Anomaly detection ,Data mining ,0101 mathematics ,Statistics, Probability and Uncertainty ,computer - Abstract
Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier’s outlyingness. Thereby, it helps the analyst understand in which way an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal outlyingness. It is shown that the problem of estimating that direction can be rewritten as the normed solution of a classical least squares regression problem. Identifying the subset of variables contributing most to outlyingness, can thus be achieved by estimating the associated least squares problem in a sparse manner. From a practical perspective, sparse partial least squares (SPLS) regression, preferably by the fast sparse NIPALS (SNIPLS) algorithm, is suggested to tackle that problem. The performed method is demonstrated to perform well both on simulated data and real life examples.
- Published
- 2018
13. Multivariate constrained robust M-regression for shaping forward curves in electricity markets
- Author
-
Pieter Segaert, Tim Verdonck, Peter Leoni, and Sven Serneels
- Subjects
Economics and Econometrics ,Multivariate statistics ,business.industry ,Efficient algorithm ,020209 energy ,02 engineering and technology ,M-estimator ,01 natural sciences ,General Business, Management and Accounting ,Regression ,010104 statistics & probability ,Accounting ,Outlier ,0202 electrical engineering, electronic engineering, information engineering ,Economics ,Econometrics ,Electricity market ,Arbitrage ,Electricity ,0101 mathematics ,business ,Finance - Abstract
In this paper, a multivariate constrained robust M‐regression method is developed to estimate shaping coefficients for electricity forward prices. An important benefit of the new method is that model arbitrage can be ruled out at an elementary level, as all shaping coefficients are treated simultaneously. Moreover, the new method is robust to outliers, such that the provided results are stable and not sensitive to isolated sparks or dips in the market. An efficient algorithm is presented to estimate all shaping coefficients at a low computational cost. To illustrate its good performance, the method is applied to German electricity prices.
- Published
- 2018
14. Sparse and robust PLS for binary classification
- Author
-
Irene Hoffmann, Peter Filzmoser, Sven Serneels, and Kurt Varmuza
- Subjects
0301 basic medicine ,business.industry ,Applied Mathematics ,Dimensionality reduction ,Pattern recognition ,Feature selection ,Linear discriminant analysis ,01 natural sciences ,Analytical Chemistry ,010104 statistics & probability ,03 medical and health sciences ,030104 developmental biology ,Robustness (computer science) ,Optimal discriminant analysis ,Partial least squares regression ,Outlier ,Leverage (statistics) ,Artificial intelligence ,0101 mathematics ,business ,Mathematics - Abstract
Partial robust M regression (PRM), as well as its sparse counterpart sparse PRM, have been reported to be regression methods that foster a partial least squares-alike interpretation while having good robustness and efficiency properties, as well as a low computational cost. In this paper, the partial robust M discriminant analysis classifier is introduced, which consists of dimension reduction through an algorithm closely related to PRM and a consecutive robust discriminant analysis in the latent variable space. The method is further generalized to sparse partial robust M discriminant analysis by introducing a sparsity penalty on the estimated direction vectors. Thereby, an intrinsic variable selection is achieved, which yields a better graphical interpretation of the results, as well as more precise coefficient estimates, in case the data contain uninformative variables. Both methods are robust against leverage points within each class, as well as against adherence outliers (points that have been assigned a wrong class label). A simulation study investigates the effect of outliers, wrong class labels, and uninformative variables on the proposed methods and its classical PLS counterparts and corroborates the robustness and sparsity claims. The utility of the methods is demonstrated on data from mass spectrometry analysis (time-of-flight secondary ion mass spectrometry) of meteorite samples. Copyright © 2016 John Wiley & Sons, Ltd.
- Published
- 2016
15. Case specific prediction intervals for tri-PLS1: The full local linearisation
- Author
-
Klaas Faber, Tim Verdonck, Pierre J. Van Espen, and Sven Serneels
- Subjects
Computer. Automation ,Propagation of uncertainty ,Mathematical optimization ,Process Chemistry and Technology ,Univariate ,Prediction interval ,Generalized least squares ,Regression ,Computer Science Applications ,Analytical Chemistry ,Chemistry ,Non-linear least squares ,Partial least squares regression ,Applied mathematics ,Total least squares ,Spectroscopy ,Software ,Mathematics - Abstract
A new method to estimate case specific prediction uncertainty for univariate trilinear partial least squares (tri-PLS1) regression is introduced. This method is, from a theoretical point of view, the most exact finite sample approximation to true prediction uncertainty that has been reported up till now. Using the new method, different error sources can be propagated, which is an advantage that cannot be offered by data driven approaches such as the bootstrap. In a concise example, it is illustrated how the method can be applied. In the Appendix, efficient algorithms are presented to compute the estimates required.
- Published
- 2011
16. Robustified least squares support vector classification
- Author
-
Tim Verdonck, Sven Serneels, and Michiel Debruyne
- Subjects
Computer Science::Machine Learning ,Computer. Automation ,Structured support vector machine ,business.industry ,Applied Mathematics ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Linear discriminant analysis ,Machine learning ,computer.software_genre ,Regression ,Analytical Chemistry ,Support vector machine ,Statistics::Machine Learning ,Chemistry ,ComputingMethodologies_PATTERNRECOGNITION ,Robustness (computer science) ,Computer Science::Computer Vision and Pattern Recognition ,Principal component analysis ,Least squares support vector machine ,Outlier ,Artificial intelligence ,business ,computer ,Mathematics - Abstract
Support vector machine (SVM) algorithms are a popular class of techniques to perform classification. However, outliers in the data can result in bad global misclassification percentages. In this paper, we propose a method to identify such outliers in the SVM framework. A specific robust classification algorithm is proposed adjusting the least squares SVM (LS-SVM). This yields better classification performance for heavily tailed data and data containing outliers. Copyright © 2009 John Wiley & Sons, Ltd.
- Published
- 2009
17. TOMCAT: A MATLAB toolbox for multivariate calibration techniques
- Author
-
Beata Walczak, Christophe Croux, Sven Serneels, K. Kaczmarek, Michal Daszykowski, and Piet Van Espen
- Subjects
business.industry ,Computer science ,Calibration (statistics) ,Process Chemistry and Technology ,Machine learning ,computer.software_genre ,Plot (graphics) ,Numbering ,Toolbox ,Computer Science Applications ,Analytical Chemistry ,Partial least squares regression ,Principal component regression ,Artificial intelligence ,MATLAB ,business ,computer ,Algorithm ,Spectroscopy ,Software ,computer.programming_language ,Graphical user interface - Abstract
We have developed a new user-friendly graphical interface for robust calibration with a collection of m-files, called TOMCAT (TOolbox for Multivariate CAlibration Techniques). The graphical interface and its routines are freely available and programmed in MATLAB 6.5, probably one of the most popular programming environments in the chemometrics community. The graphical interface allows a user to apply the implemented methods in an easy way and it gives a straightforward possibility to visualize the obtained results. Several useful features such as interactive numbering of the displayed objects on a plot, viewing the content of the data, easy transfer of the data between the toolbox and the MATLAB workspace and vice versa, are also implemented. Among the implemented methods there are Principal Component Analysis and its robust variant, Partial Least Squares, Continuum Power Regression, Partial Robust M-Regression, Robust Continuum Regression and Radial Basis Functions Partial Least Squares.
- Published
- 2007
18. Sparse Partial Robust M Regression
- Author
-
Peter Filzmoser, Sven Serneels, Irene Hoffmann, and Christophe Croux
- Subjects
Biplot ,Least trimmed squares ,Analytical Chemistry ,Robust regression ,Partial least squares ,Robustness (computer science) ,Partial least squares regression ,Leverage (statistics) ,Total least squares ,Robustness ,Spectroscopy ,Interpretability ,Mathematics ,business.industry ,Process Chemistry and Technology ,Dimensionality reduction ,Local regression ,Pattern recognition ,Sparse approximation ,Regression ,Computer Science Applications ,Outlier ,Artificial intelligence ,Sparse estimation ,business ,Software - Abstract
Sparse partial robust M regression is introduced as a new regression method. It is the first dimension reduction and regression algorithm that yields esti-mates with a partial least squares like interpretability that are sparse and robust with respect to both vertical outliers and leverage points. A simula-tion study underpins these claims. Real data examples illustrate the validity of the approach. publisher: Elsevier articletitle: Sparse partial robust M regression journaltitle: Chemometrics and Intelligent Laboratory Systems articlelink: http://dx.doi.org/10.1016/j.chemolab.2015.09.019 content_type: article copyright: Copyright © 2015 Elsevier B.V. All rights reserved. ispartof: Chemometrics and Intelligent Laboratory Systems vol:149 pages:50-59 status: published
- Published
- 2015
19. Bootstrap confidence intervals for trilinear partial least squares regression
- Author
-
Sven Serneels and Pierre J. Van Espen
- Subjects
Chemistry ,Estimator ,Expression (computer science) ,Biochemistry ,Confidence interval ,Robust confidence intervals ,Analytical Chemistry ,Parameter identification problem ,Partial least squares regression ,Statistics ,Confidence distribution ,Environmental Chemistry ,Spectroscopy ,CDF-based nonparametric confidence interval - Abstract
The boostrap is a successful technique to obtain confidence limits for estimates where it is theoretically impossible to establish an exact expression thereunto. Trilinear partial least squares regression (tri-PLS) is an estimator for which this is the case; in the current paper we thus propose to apply the bootstrap in order to obtain confidence intervals for the predictions made by tri-PLS. By dint of an extensive simulation study, we show that bootstrap confidence intervals have a desirable coverage. Finally, we apply the method to an identification problem of micro-organisms and show that from the bootstrap confidence intervals, the organisms can (up to a misclassification probability of 3.5%) correctly be identified.
- Published
- 2005
20. Principal component analysis for data containing outliers and missing elements
- Author
-
Sven Serneels and Tim Verdonck
- Subjects
Statistics and Probability ,Computer. Automation ,business.industry ,Covariance matrix ,Applied Mathematics ,Pattern recognition ,Missing data ,Computational Mathematics ,Computational Theory and Mathematics ,Robustness (computer science) ,Expectation–maximization algorithm ,Principal component analysis ,Outlier ,Projection pursuit ,Principal component regression ,Artificial intelligence ,business ,Mathematics - Abstract
Two approaches are presented to perform principal component analysis (PCA) on data which contain both outlying cases and missing elements. At first an eigendecomposition of a covariance matrix which can deal with such data is proposed, but this approach is not fit for data where the number of variables exceeds the number of cases. Alternatively, an expectation robust (ER) algorithm is proposed so as to adapt the existing methodology for robust PCA to data containing missing elements. According to an extensive simulation study, the ER approach performs well for all data sizes concerned. Using simulations and an example, it is shown that by virtue of the ER algorithm, the properties of the existing methods for robust PCA carry through to data with missing elements. (C) 2007 Elsevier B.V. All rights reserved.
- Published
- 2008
21. How to construct a multiple regression model for data with missing elements and outlying objects
- Author
-
Ivana Stanimirova, Pierre J. Van Espen, Sven Serneels, and Beata Walczak
- Subjects
Chemistry ,Least trimmed squares ,Regression analysis ,Missing data ,computer.software_genre ,Biochemistry ,Analytical Chemistry ,Robust regression ,Partial least squares regression ,Outlier ,Expectation–maximization algorithm ,Statistics ,Linear regression ,Environmental Chemistry ,Data mining ,computer ,Spectroscopy - Abstract
The aim of this study is to show the usefulness of robust multiple regression techniques implemented in the expectation maximization framework in order to model successfully data containing missing elements and outlying objects. In particular, results from a comparative study of partial least squares and partial robust M-regression models implemented in the expectation maximization algorithm are presented. The performances of the proposed approaches are illustrated on simulated data with and without outliers, containing different percentages of missing elements and on a real data set. The obtained results suggest that the proposed methodology can be used for constructing satisfactory regression models in terms of their trimmed root mean squared errors.
- Published
- 2006
22. The Partial Robust M-approach
- Author
-
Sven Serneels, Pierre J. Van Espen, Christophe Croux, and Peter Filzmoser
- Subjects
Computer. Automation ,education.field_of_study ,Current (mathematics) ,Computer science ,media_common.quotation_subject ,Outlier ,Path (graph theory) ,Population ,Type (model theory) ,education ,Algorithm ,Normality ,media_common - Abstract
The PLS approach is a widely used technique to estimate path models relating various blocks of variables measured from the same population. It is frequently applied in the social sciences and in economics. In this type of applications, deviations from normality and outliers may occur, leading to an efficiency loss or even biased results. In the current paper, a robust path model estimation technique is being proposed, the partial robust M (PRM) approach. In an example its benefits are illustrated.
- Published
- 2006
23. Robust Multivariate Methods: The Projection Pursuit Approach
- Author
-
Peter Filzmoser, Pierre J. Van Espen, Christophe Croux, and Sven Serneels
- Subjects
Computer. Automation ,Multivariate statistics ,Multivariate analysis ,Multivariate analysis of variance ,Computer science ,Robustness (computer science) ,Projection pursuit ,Statistics ,Robust statistics ,Estimator ,Algorithm ,Computer Science::Databases ,Subspace topology - Abstract
Projection pursuit was originally introduced to identify structures in multivariate data clouds (Huber, 1985). The idea of projecting data to a low-dimensional subspace can also be applied to multivariate statistical methods. The robustness of the methods can be achieved by applying robust estimators to the lower-dimensional space. Robust estimation in high dimensions can thus be avoided which usually results in a faster computation. Moreover, flat data sets where the number of variables is much higher than the number of observations can be easier analyzed in a robust way. We will focus on the projection pursuit approach for robust continuum regression (Serneels et al., 2005). A new algorithm is introduced and compared with the reference algorithm as well as with classical continuum regression.
- Published
- 2006
24. Spatial sign preprocessing: a simple way to impart moderate robustness to multivariate estimators
- Author
-
Sven Serneels, Evert De Nolf, and Pierre J. Van Espen
- Subjects
General Chemical Engineering ,Linear model ,Estimator ,Context (language use) ,General Chemistry ,Library and Information Sciences ,Covariance ,Computer Science Applications ,Transformation (function) ,Robustness (computer science) ,Partial least squares regression ,Statistics ,Algorithm ,Sign (mathematics) ,Mathematics - Abstract
The spatial sign is a multivariate extension of the concept of sign. Recently multivariate estimators of covariance structures based on spatial signs have been examined by various authors. These new estimators are found to be robust to outlying observations. From a computational point of view, estimators based on spatial sign are very easy to implement as they boil down to a transformation of the data to their spatial signs, from which the classical estimator is then computed. Hence, one can also consider the transformation to spatial signs to be a preprocessing technique, which ensures that the calibration procedure as a whole is robust. In this paper, we examine the special case of spatial sign preprocessing in combination with partial least squares regression as the latter technique is frequently applied in the context of chemical data analysis. In a simulation study, we compare the performance of the spatial sign transformation to nontransformed data as well as to two robust counterparts of partial least squares regression. It turns out that the spatial sign transform is fairly efficient but has some undesirable bias properties. The method is applied to a recently published data set in the field of quantitative structure-activity relationships, where it is seen to perform equally well as the previously described best linear model for these data.
- Published
- 2006
25. Influence properties of trilinear partial least squares regression
- Author
-
Maarten Moens, Frank Blockhuys, Sven Serneels, Pierre J. Van Espen, and Paul Geladi
- Subjects
Chemometrics ,Calibration (statistics) ,Applied Mathematics ,Statistics ,Partial least squares regression ,Influence function ,Regression ,Plot (graphics) ,Analytical Chemistry ,Mathematics ,Prediction variance - Abstract
In this article we derive an algorithm to compute the influence function for tri-PLS1 regression. Based on the influence function, we propose the squared influence diagnostic plot to assess the influence of individual samples on calibration and prediction. We illustrate the applicability of the squared influence diagnostic plot for tri-PLS1 to two different data sets which have previously been reported in literature. Finally we note that from the influence function, a new estimate of prediction variance can be obtained. Copyright © 2005 John Wiley & Sons, Ltd.
- Published
- 2005
26. Robust continuum regression
- Author
-
Sven Serneels, Peter Filzmoser, Christophe Croux, and Pierre J. Van Espen
- Subjects
Yield ,Squares ,Efficiency ,Distribution ,Estimator ,Analytical Chemistry ,Robust regression ,Partial least squares ,Partial least squares regression ,Statistics ,Studies ,Statistics::Methodology ,Outliers ,Total least squares ,Robustness ,Spectroscopy ,Mathematics ,Least-squares ,Polynomial regression ,Ordinary least squares ,Data ,Process Chemistry and Technology ,Local regression ,Regression ,Computer Science Applications ,Applications ,Principal component regression ,Simple linear regression ,Projection-pursuit ,Nonlinear regression ,Software ,Simulation - Abstract
Several applications of continuum regression (CR) to non-contaminated data have shown that a significant improvement in predictive power can be obtained compared to the three standard techniques which it encompasses (ordinary least squares (OLS), principal component regression (PCR) and partial least squares (PLS)). For contaminated data continuum regression may yield aberrant estimates due to its non-robustness with respect to outliers. Also for data originating from a distribution which significantly differs from the normal distribution, continuum regression may yield very inefficient estimates. In the current paper, robust continuum regression (RCR) is proposed. To construct the estimator, an algorithm based on projection pursuit (PP) is proposed. The robustness and good efficiency properties of RCR are shown by means of a simulation study. An application to an X-ray fluorescence analysis of hydrometallurgical samples illustrates the method's applicability in practice. ispartof: Chemometrics and Intelligent Laboratory Systems vol:76 issue:2 pages:197-204 status: published
- Published
- 2005
27. Partial robust M-regression
- Author
-
Pierre J. Van Espen, Peter Filzmoser, Christophe Croux, and Sven Serneels
- Subjects
Yield ,Squares ,Robust statistics ,Least trimmed squares ,Efficiency ,Distribution ,M-estimators ,Estimator ,Least squares ,Analytical Chemistry ,Robust regression ,Statistics::Machine Learning ,Partial least squares ,Spectometric quantization ,Variables ,Partial least squares regression ,Statistics ,Methods ,Studies ,Outliers ,Statistics::Methodology ,Robustness ,Spectroscopy ,Least-squares ,Mathematics ,Ordinary least squares ,Data ,Process Chemistry and Technology ,Optimal ,Precision ,Regression ,Computer Science Applications ,Chemistry ,Multicollinearity ,Applications ,Calibration ,Advantages ,Prediction ,Projection-pursuit ,Simulation ,Software ,Model - Abstract
Partial Least Squares (PLS) is a standard statistical method in chemometrics. It can be considered as an incomplete, or 'partial', version of the Least Squares estimator of regression, applicable when high or perfect multicollinearity is present in the predictor variables. The Least Squares estimator is well-known to be an optimal estimator for regression, but only when the error terms are normally distributed. In the absence of normality, and in particular when outliers are in the data set, other more robust regression estimators have better properties. In this paper a 'partial' version of M-regression estimators will be defined. If an appropriate weighting scheme is chosen, partial M-estimators become entirely robust to any type of outlying points, and are called Partial Robust M-estimators. It is shown that partial robust M-regression outperforms existing methods for robust PLS regression in terms of statistical precision and computational speed, while keeping good robustness properties. The method is applied to a data set consisting of EPXMA spectra of archaeological glass vessels. This data set contains several outliers, and the advantages of partial robust M-regression are illustrated. Applying partial robust M-regression yields much smaller prediction errors for noisy calibration samples than PLS. On the other hand, if the data follow perfectly well a normal model, the loss in efficiency to be paid for is very small. ispartof: Chemometrics and Intelligent Laboratory Systems vol:79 issue:1 pages:55-64 status: published
- Published
- 2005
28. Calculation of PLS prediction intervals using efficient recursive relations for the Jacobian matrix
- Author
-
Pierre J. Van Espen, Sven Serneels, and P. Lemberge
- Subjects
Applied Mathematics ,Structure (category theory) ,Prediction interval ,Estimator ,Analytical Chemistry ,Chemometrics ,symbols.namesake ,Linearization ,Linear regression ,Jacobian matrix and determinant ,Partial least squares regression ,Calculus ,symbols ,Applied mathematics ,Mathematics - Abstract
Several algorithms to calculate the vector of regression coefficients and the Jacobian matrix for partial least squares regression have been published. Whereas many efficient algorithms to calculate the regression coefficients exist, algorithms to calculate the Jacobian matrix are inefficient. Here we introduce a new, efficient algorithm for the Jacobian matrix, thus making the calculation of prediction intervals via a local linearization of the PLS estimator more practicable. Copyright © 2004 John Wiley & Sons, Ltd.
- Published
- 2004
29. Identification of micro-organisms by dint of the electronic nose and trilinear partial least squares regression
- Author
-
Sven Serneels, Pierre J. Van Espen, Maarten Moens, and Frank Blockhuys
- Subjects
Electronic nose ,Chemistry ,Calibration (statistics) ,business.industry ,Prediction interval ,Pattern recognition ,Biochemistry ,Regression ,Analytical Chemistry ,Chemometrics ,Identification (information) ,Intensive care ,Statistics ,Partial least squares regression ,Environmental Chemistry ,Artificial intelligence ,business ,Spectroscopy - Abstract
Ventilator-associated pneumonia is one of the most lethal infections occurring in intensive care units of hospitals. In order to obtain a faster method of diagnosis, we proposed to apply the electronic nose to cultures of the relevant micro-organisms. This allowed to halve the time of the analysis. In the current paper, we focus on the application of some chemometrical tools which enhance the performance of the method. Trilinear partial least squares (tri-PLS) regression is used to perform calibration and is shown to produce satisfactory predictions. Sample specific prediction intervals are produced for each predicted value, which allows us to eliminate erroneous predictions. The method is applied to an external validation set and it is shown that only a single observation out of 22 is being wrongly classified, so that the method is acceptable for inclusion in the clinical routine.
- Published
- 2004
30. Erratum to 'Influence properties of partial least squares regression' [Chemometr. Intell. Lab. Syst. 71 (2004) 13–20]
- Author
-
Pierre J. Van Espen, Christophe Croux, and Sven Serneels
- Subjects
Process Chemistry and Technology ,Partial least squares regression ,Statistics ,Spectroscopy ,Software ,Computer Science Applications ,Analytical Chemistry ,Mathematics - Published
- 2004
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.