Back to Search Start Over

Deterministic subsampling for logistic regression with massive data.

Authors :
Song, Yan
Dai, Wenlin
Source :
Computational Statistics. Apr2024, Vol. 39 Issue 2, p709-732. 24p.
Publication Year :
2024

Abstract

For logistic regression with massive data, subsampling is an effective way to alleviate the computational challenge. In contrast to most existing methods in the literature that select subsamples randomly, we propose to obtain subsamples in a deterministic way. To be more specific, we measure with leverage scores the influence of each sample to model fitting and select the ones with the highest scores deterministically. We propose a faster alternative method by mimicking the leverage scores with a simple and intuitive form. Our methods pick subsamples catering for constructing a linear classification boundary and hence are more efficient when the subsample size is small. We derive non-asymptotic properties of the two methods regarding the observed information, prediction, and parameter estimation accuracy. Extensive simulation studies and two real applications validate the theoretical results and demonstrate the superiority of our methods. [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
*PARAMETER estimation

Details

Language :
English
ISSN :
09434062
Volume :
39
Issue :
2
Database :
Academic Search Index
Journal :
Computational Statistics
Publication Type :
Academic Journal
Accession number :
176079414
Full Text :
https://doi.org/10.1007/s00180-022-01319-z