1. Modeling Preferential Recruitment for Respondent-Driven Sampling
- Author
-
McLaughlin, Katherine Rumjahn
- Subjects
- Statistics, Markov chain Monte Carlo, matching model, network sampling, peer referral, respondent-driven sampling, social network analysis
- Abstract
Respondent-driven sampling (RDS) is a network sampling methodology used worldwide to sample key populations at high risk for HIV/AIDS who often practice stigmatized/illegal behaviors and are not typically reachable by conventional sampling techniques. In RDS, study participants recruit their peers to enroll, resulting in a sampling mechanism that is unknown to researchers. Current estimators for RDS data require many assumptions about the sampling process, including that recruiters choose people from their network uniformly at random to participate in the study. However, this is likely not true in practice. We believe that people recruit based on observable covariates, such as age, frequency of interaction, geography, socioeconomic status, or social capital.To model preferential recruitment, I develop a sequential two-sided rational-choice framework, referred to as the RCPR model. At each wave of recruitment, each recruiter has a utility for selecting each peer, and symmetrically each peer has a utility for being recruited by each recruiter. Each person also has utilities for selecting themself (not recruiting or not participating). People in the network behave in a way that maximizes their utility given the constraints of the network and the restrictions on recruitment. Although a person's utility is not observed, it can be modeled as a linear combination of observable nodal or dyadic covariates plus unobserved pair-specific heterogeneities. This framework allows generative probabilistic network models to be created for the RDS recruitment process. The models can incorporate observable characteristics of the population and have interpretable parameters. It greatly increases the sophistication of the modeling of the RDS sampling mechanism. Inference can be made about the preference coefficients by maximizing the likelihood of the observed recruitment chain given the observed covariates. As the likelihood is computationally intractable, I develop a Bayesian framework where inference is made feasible by approximating the posterior distribution of the preference coefficients via a Markov chain Monte Carlo algorithm. Each update step samples new values of the preference coefficients and utilities via Metropolis-Hastings, subject to constraints. New prevalence estimates can be calculated be generating many recruitment chains from the population using the RCPR coefficients, then directly obtaining the first-order and second-order inclusion probabilities. This framework allows the incorporation of covariates we think effect recruitment into the sample weights.
- Published
- 2016