Talvik HA, Oja M, Tamm S, Mooses K, Särg D, Lõo M, Renata Siimon Õ, Šuvalov H, Kolde R, Vilo J, Reisberg S, and Laur S
Objective: This study aims to address the gap in the literature on converting real-world Clinical Document Architecture (CDA) data into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), focusing on the initial steps preceding the mapping phase. We highlight the importance of a repeatable Extract-Transform-Load (ETL) pipeline for health data extraction from HL7 CDA documents in Estonia for research purposes., Methods: We developed a repeatable ETL pipeline to facilitate the extraction, cleaning, and restructuring of health data from CDA documents to OMOP CDM, ensuring a high-quality and structured data format. This pipeline was designed to adapt to continuously updated data exchange format changes and handle various CDA document subsets for different scientific studies., Results: We demonstrated via selected use cases that our pipeline successfully transformed a significant portion of diagnosis codes, body weight and eGFR measurements, and PAP test results from CDA documents into OMOP CDM, showing the ease of extracting structured data. However, challenges such as harmonising diverse coding systems and extracting lab results from free-text sections were encountered. The iterative development of the pipeline facilitated swift error detection and correction, enhancing the process's efficiency., Conclusion: After a decade of focused work, our research has led to the development of an ETL pipeline that effectively transforms HL7 CDA documents into OMOP CDM in Estonia, addressing key data extraction and transformation challenges. The pipeline's repeatability and adaptability to various data subsets make it a valuable resource for researchers dealing with health data. While tested on Estonian data, the principles outlined are broadly applicable, potentially aiding in handling health data standards that vary by country. Despite newer health data standards emerging, the relevance of CDA for retrospective health studies ensures the continuing importance of this work., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 The Author(s). Published by Elsevier Inc. All rights reserved.)