Back to Search Start Over

Online estimation of individual-level effects using streaming shrinkage factors

Authors :
Lianne Ippel
Jeroen K. Vermunt
Maurits Kaptein
Department of Methodology and Statistics
Institute of Data Science
RS: FSE DACS IDS
RS: FSE Studio Europa Maastricht
Source :
Computational Statistics & Data Analysis, 137, 16-32. Elsevier, Computational Statistics & Data Analysis, 137, 16-32. Elsevier Science
Publication Year :
2019

Abstract

It has become increasingly easy to collect data from individuals over long periods of time. Examples include smart-phone applications used to track movements with GPS, web-log data tracking individuals' browsing behavior, and longitudinal (cohort) studies where many individuals are monitored over an extensive period of time. All these datasets cover a large number of individuals and collect data on the same individuals repeatedly, causing a nested structure in the data. Moreover, the data collection is never 'finished' as new data keep streaming in. It is well known that predictions that use the data of the individual whose individual-level effect is predicted in combination with the data of all the other individuals, are better in terms of squared error than those that just use the individual mean. However, when data are both nested and streaming, and the outcome variable is binary, computing these individual-level predictions can be computationally challenging. Five computationally-efficient estimation methods which do not revise "old" data but do account for the nested data structure are developed and evaluated. The methods are based on existing shrinkage factors. A shrinkage factor is used to predict an individual-level effect (i.e., the probability to score a 1), by weighing the individual mean and the mean over all data points. The performance of the existing and newly developed shrinkage factors are compared in a simulation study. While the existing methods differ in their prediction accuracy, the differences in accuracy between the novel shrinkage factors and the existing methods are extremely small. The novel methods are however computationally much more appealing. (C) 2019 Elsevier B.V. All rights reserved.

Details

Language :
English
ISSN :
01679473
Volume :
137
Database :
OpenAIRE
Journal :
Computational Statistics & Data Analysis
Accession number :
edsair.doi.dedup.....27d9c530d8fa2254eea45c2d5a3763dd