1. Privacy-protecting, reliable response data discovery using COVID-19 patient observations
- Author
-
Douglas S. Bell, Jason N. Doctor, Xiaoqian Jiang, Lisa M. Schilling, Michael Aratow, Mark J. Pletcher, Hua Xu, Kai Zheng, Paulina Paul, Larissa Neumann, Jihoon Kim, Ludwig Christian Hinske, Daniella Meeker, Katherine K. Kim, Spencer L. SooHoo, Michele E. Day, Michael E. Matheny, and Lucila Ohno-Machado
- Subjects
Male ,Multivariate statistics ,AcademicSubjects/SCI01060 ,Hospitalized patients ,Computer science ,Ethnic group ,Information Storage and Retrieval ,Logistic regression ,Medical and Health Sciences ,01 natural sciences ,regression analysis ,Engineering ,0302 clinical medicine ,Medicine ,Electronic Health Records ,Registries ,030212 general & internal medicine ,Common Data Elements ,Clinical course ,Regression analysis ,electronic health record ,Distributed algorithm ,Female ,Confidentiality ,Algorithms ,Natural language ,medicine.medical_specialty ,2019-20 coronavirus outbreak ,Coronavirus disease 2019 (COVID-19) ,MEDLINE ,Health Informatics ,Research and Applications ,Health outcomes ,Article ,Computer Communication Networks ,03 medical and health sciences ,Information and Computing Sciences ,Humans ,ddc:610 ,0101 mathematics ,AcademicSubjects/MED00580 ,Natural Language Processing ,Descriptive statistics ,business.industry ,010102 general mathematics ,COVID-19 ,Data discovery ,Data science ,Good Health and Well Being ,Logistic Models ,R2D2 Consortium ,Emergency medicine ,observational study ,Observational study ,Generic health relevance ,AcademicSubjects/SCI01530 ,business ,Medical Informatics - Abstract
Objective To utilize, in an individual and institutional privacy-preserving manner, electronic health record (EHR) data from 202 hospitals by analyzing answers to COVID-19-related questions and posting these answers online. Materials and Methods We developed a distributed, federated network of 12 health systems that harmonized their EHRs and submitted aggregate answers to consortia questions posted at https://www.covid19questions.org. Our consortium developed processes and implemented distributed algorithms to produce answers to a variety of questions. We were able to generate counts, descriptive statistics, and build a multivariate, iterative regression model without centralizing individual-level data. Results Our public website contains answers to various clinical questions, a web form for users to ask questions in natural language, and a list of items that are currently pending responses. The results show, for example, that patients who were taking angiotensin-converting enzyme inhibitors and angiotensin II receptor blockers, within the year before admission, had lower unadjusted in-hospital mortality rates. We also showed that, when adjusted for, age, sex, and ethnicity were not significantly associated with mortality. We demonstrated that it is possible to answer questions about COVID-19 using EHR data from systems that have different policies and must follow various regulations, without moving data out of their health systems. Discussion and Conclusions We present an alternative or a complement to centralized COVID-19 registries of EHR data. We can use multivariate distributed logistic regression on observations recorded in the process of care to generate results without transferring individual-level data outside the health systems.
- Published
- 2021
- Full Text
- View/download PDF