1. Inconsistency in UK Biobank Event Definitions From Different Data Sources and Its Impact on Bias and Generalizability: A Case Study of Venous Thromboembolism.
- Author
-
Bassett, Emily, Broadbent, James, Gill, Dipender, Burgess, Stephen, and Mason, Amy M
- Subjects
- *
SELF-evaluation , *PULMONARY embolism , *RESEARCH funding , *VEINS , *QUESTIONNAIRES , *INTERVIEWING , *VENOUS thrombosis , *PRIMARY health care , *INFORMATION resources , *LONGITUDINAL method , *THROMBOEMBOLISM , *ELECTRONIC health records , *SOCIODEMOGRAPHIC factors , *COMPARATIVE studies - Abstract
The UK Biobank study contains several sources of diagnostic data, including hospital inpatient data and data on self-reported conditions for approximately 500,000 participants and primary-care data for approximately 177,000 participants (35%). Epidemiologic investigations require a primary disease definition, but whether to combine data sources to maximize statistical power or focus on only 1 source to ensure a consistent outcome is not clear. The consistency of disease definitions was investigated for venous thromboembolism (VTE) by evaluating overlap when defining cases from 3 sources: hospital inpatient data, primary-care reports, and self-reported questionnaires. VTE cases showed little overlap between data sources, with only 6% of reported events for persons with primary-care data being identified by all 3 sources (hospital, primary-care, and self-reports), while 71% appeared in only 1 source. Deep vein thrombosis–only events represented 68% of self-reported VTE cases and 36% of hospital-reported VTE cases, while pulmonary embolism–only events represented 20% of self-reported VTE cases and 50% of hospital-reported VTE cases. Additionally, different distributions of sociodemographic characteristics were observed; for example, patients in 46% of hospital-reported VTE cases were female, compared with 58% of self-reported VTE cases. These results illustrate how seemingly neutral decisions taken to improve data quality can affect the representativeness of a data set. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF