1. Analysis of missing data in electronic health records of people with diabetes in primary care in Spain: A population-based cohort study.
- Author
-
Quesada JA and Orozco-Beltran D
- Subjects
- Humans, Spain epidemiology, Female, Male, Middle Aged, Retrospective Studies, Aged, Cohort Studies, Adult, Risk Factors, Cardiovascular Diseases epidemiology, Electronic Health Records statistics & numerical data, Primary Health Care statistics & numerical data, Diabetes Mellitus epidemiology
- Abstract
Introduction: Researchers conducting studies based on electronic health records (EHRs) often have to deal with missing data. We aimed to analyze patterns of missing data in lipid profile, sociodemographic variables and risk factors contained in the EHRs of the CARDIABETES project and compare different strategies for addressing the issue., Methods: We conducted a retrospective cohort study of people with diabetes, based on EHRs in the Spanish Pharmacoepidemiological Research Database for Public Health Systems (BIFAP). Our response variable was major adverse cardiovascular events (MACE), including all-cause death and hospital admission for cerebrovascular disease or ischemic heart disease. We analyzed patterns of missing data, associations between missingness and MACE, and the effect of eliminating cases with missing data or imputing missing data., Results: Our total sample included 309,556 people with diabetes. The proportion of individuals with at least one missing value was 76.0%. Regarding diabetes control measures, 10.8% of records had missing glycated hemoglobin values, and 21.4% had missing basal blood glucose values. We observed a non-random pattern of association between missingness and MACE. The strategy of eliminating records with missing data greatly reduced the number of cases and statistical power, and altered the average participant characteristics and cumulative incidence of MACE. By imputing missing data, we were able to circumvent these problems., Conclusion: A considerable proportion of missing data was observed for variables such as fasting blood glucose and glycated hemoglobin, and also for other variables such as blood test parameters, BMI, and tobacco and alcohol use. The missing data show a non-random pattern and are associated with a higher incidence of MACE. The strategy of eliminating records with missing data greatly reduced the number of cases and statistical power. The recommended solution is to impute missing data with methods that take all the variables into account, such as MICE with PPM., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.)
- Published
- 2025
- Full Text
- View/download PDF