1. Sparse regression with Multi-type Regularized Feature modeling
- Author
-
Katrien Antonio, Roel Verbelen, Sander Devriendt, and Tom Reynkens
- Subjects
Statistics and Probability ,Generalized linear model ,FOS: Computer and information sciences ,Economics and Econometrics ,Mathematical optimization ,Level fusion ,Optimization problem ,Estimation theory ,business.industry ,Computer science ,Regular polygon ,010103 numerical & computational mathematics ,01 natural sciences ,Regularization (mathematics) ,Statistics - Computation ,Methodology (stat.ME) ,010104 statistics & probability ,Analytics ,0101 mathematics ,Statistics, Probability and Uncertainty ,business ,Statistics - Methodology ,Computation (stat.CO) ,Sparse regression - Abstract
Within the statistical and machine learning literature, regularization techniques are often used to construct sparse (predictive) models. Most regularization strategies only work for data where all predictors are treated identically, such as Lasso regression for (continuous) predictors treated as linear effects. However, many predictive problems involve different types of predictors and require a tailored regularization term. We propose a multi-type Lasso penalty that acts on the objective function as a sum of subpenalties, one for each type of predictor. As such, we allow for predictor selection and level fusion within a predictor in a data-driven way, simultaneous with the parameter estimation process. We develop a new estimation strategy for convex predictive models with this multi-type penalty. Using the theory of proximal operators, our estimation procedure is computationally efficient, partitioning the overall optimization problem into easier to solve subproblems, specific for each predictor type and its associated penalty. Earlier research applies approximations to non-differentiable penalties to solve the optimization problem. The proposed SMuRF algorithm removes the need for approximations and achieves a higher accuracy and computational efficiency. This is demonstrated with an extensive simulation study and the analysis of a case-study on insurance pricing analytics.
- Published
- 2021