Back to Search
Start Over
Augmenting cancer registry data with health survey data with no cases in common: the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancer
- Source :
- BMC Cancer, BMC Cancer, Vol 20, Iss 1, Pp 1-11 (2020)
- Publication Year :
- 2020
-
Abstract
- Background For epidemiological research, cancer registry datasets often need to be augmented with additional data. Data linkage is not feasible when there are no cases in common between data sets. We present a novel approach to augmenting cancer registry data by imputing pre-diagnosis health behaviour and estimating its relationship with post-diagnosis survival time. Methods Six measures of pre-diagnosis health behaviours (focussing on tobacco smoking, ‘at risk’ alcohol consumption, overweight and exercise) were imputed for 28,000 cancer registry data records of US oesophageal cancers using cold deck imputation from an unrelated health behaviour dataset. Each data point was imputed twice. This calibration allowed us to estimate the misclassification rate. We applied statistical correction for the misclassification to estimate the relative risk of dying within 1 year of diagnosis for each of the imputed behaviour variables. Subgroup analyses were conducted for adenocarcinoma and squamous cell carcinoma separately. Results Simulated survival data confirmed that accurate estimates of true relative risks could be retrieved for health behaviours with greater than 5% prevalence, although confidence intervals were wide. Applied to real datasets, the estimated relative risks were largely consistent with current knowledge. For example, tobacco smoking status 5 years prior to diagnosis was associated with an increased age-adjusted risk of all cause death within 1 year of diagnosis for oesophageal squamous cell carcinoma (RR = 1.99 95% CI 1.24,3.12) but not oesophageal adenocarcinoma RR = 1.61, 95% CI 0.79,2.57). Conclusions We have demonstrated a novel imputation-based algorithm for augmenting cancer registry data for epidemiological research which can be used when there are no cases in common between data sets. The algorithm allows investigation of research questions which could not be addressed through direct data linkage.
- Subjects :
- Adult
Male
Cancer Research
medicine.medical_specialty
Alcohol Drinking
Esophageal Neoplasms
Health Behavior
Datasets as Topic
Overweight
lcsh:RC254-282
03 medical and health sciences
0302 clinical medicine
Risk Factors
Epidemiology
Genetics
medicine
Tobacco Smoking
Humans
Cancer registries
030212 general & internal medicine
Imputation (statistics)
Registries
Obesity
Exercise
Aged
Aged, 80 and over
business.industry
Middle Aged
medicine.disease
lcsh:Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Health Surveys
Survival Analysis
Confidence interval
United States
Cancer registry
Oesophageal neoplasms
Oncology
030220 oncology & carcinogenesis
Relative risk
Case-Control Studies
Adenocarcinoma
Female
medicine.symptom
business
Algorithms
Demography
Research Article
Subjects
Details
- ISSN :
- 14712407
- Volume :
- 20
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- BMC cancer
- Accession number :
- edsair.doi.dedup.....fb6c26defc3fe4439d76e71aafa6ebe3