1. A new strategy to prevent over-fitting in partial least squares models based on model population analysis.
- Author
-
Deng, Bai-Chuan, Yun, Yong-Huan, Liang, Yi-Zeng, Cao, Dong-Sheng, Xu, Qing-Song, Yi, Lun-Zhao, and Huang, Xin
- Subjects
- *
LEAST squares , *CHEMICAL models , *LATENT variables , *PREDICTION theory , *MATHEMATICAL models - Abstract
Partial least squares (PLS) is one of the most widely used methods for chemical modeling. However, like many other parameter tunable methods, it has strong tendency of over-fitting. Thus, a crucial step in PLS model building is to select the optimal number of latent variables (nLVs). Cross-validation (CV) is the most popular method for PLS model selection because it selects a model from the perspective of prediction ability. However, a clear minimum of prediction errors may not be obtained in CV which makes the model selection difficult. To solve the problem, we proposed a new strategy for PLS model selection which combines the cross-validated coefficient of determination ( Q c v 2 ) and model stability ( S ). S is defined as the stability of PLS regression vectors which is obtained using model population analysis (MPA). The results show that, when a clear maximum of Q c v 2 is not obtained, S can provide additional information of over-fitting and it helps in finding the optimal nLVs. Compared with other regression vector based indictors such as the Euclidean 2-norm ( B2 ), the Durbin Watson statistic ( DW ) and the jaggedness ( J ), S is more sensitive to over-fitting. The model selected by our method has both good prediction ability and stability. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF