1. The design and precision of data-fusion studies.
- Author
-
Sharot, Trevor
- Subjects
ANALYSIS of covariance ,STATISTICAL sampling ,CONFIDENCE intervals ,STATISTICAL matching ,DISTRIBUTION (Probability theory) ,STATISTICAL hypothesis testing ,CROSS references (Information retrieval) ,RESEARCH methodology - Abstract
Fusion is the linking of two survey datasets by pairing up similar respondents and joining their data records, in order to be able to cross-analyse outputs from one survey with those from the other. Invariably, the two surveys are pre-existing rather than being designed specifically for the fusion, and their samples of respondents differ both in design and size. Depending on the particular method of fusion used, the size of the fused dataset may be the same as one of the surveys or different to both. An unresolved issue is: what is the effective sample size of the fused dataset -- that is, the size of a hypothetical single-source sample that would deliver equal variances and standard errors to the fusion? This paper addresses this question and provides three main findings. First, it is shown that the assumption of conditional independence, crucial for good fusion, also facilitates analysis and comparison of effective sample sizes and variances. Second, across the range of fusion methods and outputs examined, the effective sample size is shown to be a weighted geometric mean of the two source sample sizes and therefore lies between them; and for designers of fusion the simple (unweighted) geometric mean may be taken as a representative figure. Third, while limited validation of the geometric mean result has been performed so far, the generality of the conditions under which it was derived implies that it should have wide validity across different fusion methodologies. Knowledge of the effective sample size in turn provides several benefits: it is a tool for designers of fusion to deliver outputs of required precision, and a tool for users to compute the standard error of outputs; this in turn permits calculation of confidence intervals and significance tests. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF