1. Statistical Methods for Multiple Size Measurements in Respondent-Driven Sampling
- Author
-
Wang, Yibo
- Subjects
- respondent-driven sampling, sample size estimation, population size estimation, individual network size estimation
- Abstract
Respondent-driven sampling (RDS) is a chain-referral network-sampling method, specifically designed for studying rare, marginalized, or otherwise "hidden" populations that are difficult to sample using traditional probability-sampling methods. Despite its wide application, challenges are present in RDS data collection and analysis. This dissertation presents improvements in implementing RDS as a sampling technique and an analytical tool, thereby enabling more reliable and insightful inferences about the characteristics of hidden populations that lack easily constructed sampling frames. Unlike standard probability samples, sample sizes are not easily controlled in RDS studies. RDS operates by generating chains of respondents who are invited to participate by previous respondents through coupon-based referrals. As recruitment chains start from a small sample of initial seeds and proceed beyond researchers' control, the resulting sample size becomes a random variable. Chapter II explores the sample size distribution in RDS as a function of seed numbers, coupon numbers issued to respondents, and response probabilities to invitations. Because RDS is usually used where traditional sampling frames are absent, Chapter III investigates population size estimation using RDS data. We develop a population size estimator within a Bayesian inferential framework, leveraging information on duplicate occurrences (when an individual is recruited more than once) and participants' recruitment sequences and network sizes. Through simulation and case studies, we demonstrate that our method yields credible and more reasonable lower-bound population size estimates than existing competing methods, especially in common small sampling fraction settings. Chapter IV delves into the individual network size (degree) measures in RDS studies. RDS relies on degree values reported by participants to adjust for their unequal sampling probabilities to generalize findings. However, this self-reporting often exhibits substantial measurement errors, evident in the unusual frequency of multiples of five and implausibly large values. To address this, we propose a novel individual degree estimator based on a latent variable model of the true degree. This model effectively accounts for response errors through a reporting mechanism and incorporates recruitment information and external demographic profiles, resulting in improved inference. Through case studies and simulations, our method delivers more accurate and reliable degree estimates than competing methods.
- Published
- 2023