Back to Search
Start Over
Deterministic subsampling for logistic regression with massive data.
- Source :
-
Computational Statistics . Apr2024, Vol. 39 Issue 2, p709-732. 24p. - Publication Year :
- 2024
-
Abstract
- For logistic regression with massive data, subsampling is an effective way to alleviate the computational challenge. In contrast to most existing methods in the literature that select subsamples randomly, we propose to obtain subsamples in a deterministic way. To be more specific, we measure with leverage scores the influence of each sample to model fitting and select the ones with the highest scores deterministically. We propose a faster alternative method by mimicking the leverage scores with a simple and intuitive form. Our methods pick subsamples catering for constructing a linear classification boundary and hence are more efficient when the subsample size is small. We derive non-asymptotic properties of the two methods regarding the observed information, prediction, and parameter estimation accuracy. Extensive simulation studies and two real applications validate the theoretical results and demonstrate the superiority of our methods. [ABSTRACT FROM AUTHOR]
- Subjects :
- *PARAMETER estimation
Subjects
Details
- Language :
- English
- ISSN :
- 09434062
- Volume :
- 39
- Issue :
- 2
- Database :
- Academic Search Index
- Journal :
- Computational Statistics
- Publication Type :
- Academic Journal
- Accession number :
- 176079414
- Full Text :
- https://doi.org/10.1007/s00180-022-01319-z