Back to Search Start Over

Clinical Annotations for Prostate Cancer Research: Defining Data Elements, Creating a Reproducible Analytical Pipeline, and Assessing Data Quality

Authors :
Daniel C. Danila
Ethan Barnett
Samantha E. Vasselman
Philip W. Kantoff
Susan F. Slovin
Michael J. Morris
Konrad H. Stopsack
Alexander Blum
Emily Carbone
Barbara Nweji
Wassim Abida
Niamh M. Keegan
Dana E. Rathkopf
Karen A. Autio
Publication Year :
2021
Publisher :
Cold Spring Harbor Laboratory, 2021.

Abstract

BackgroundRoutine clinical data from clinical charts are indispensable for retrospective and prospective observational studies and clinical trials. Their reproducibility is often not assessed.ObjectiveTo develop a prostate cancer-specific database with a defined source hierarchy for clinical annotations in conjunction with molecular profiling and to evaluate data reproducibility.Design, setting, and participantsFor men with prostate cancer and clinical-grade paired tumor–normal sequencing, we performed team-based retrospective data collection from the electronic medical record at a comprehensive cancer center. We developed an open-source R package for data processing. We assessed reproducibility using blinded repeat annotation by a reference medical oncologist.Outcome measurements and statistical analysisWe evaluated completeness of data elements, reproducibility of team-based annotation compared to the reference, and impact of measurement error on bias in survival analyses.Results and limitationsData elements on demographics, diagnosis and staging, disease state at the time of procuring a genomically characterized sample, and clinical outcomes were piloted and then abstracted for 2,261 patients (with 2,631 samples). Completeness of data elements was generally high. Comparing to the repeat annotation by a medical oncologist blinded to the database (100 patients/samples), reproducibility of annotations was high to very high; T stage, metastasis date, and presence and date of castration resistance had lower reproducibility. Impact of measurement error on estimates for strong prognostic factors was modest.ConclusionsWith a prostate cancer-specific data dictionary and quality control measures, manual clinical annotations by a multidisciplinary team can be scalable and reproducible. The data dictionary and the R package for reproducible data processing are freely available to increase data quality in clinical prostate cancer research.Patient summaryInformation in the medical record is the backbone for clinical research on prostate cancer. The tools provided in this study can increase quality and efficiency of this research.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........e3ea555e94b0c39193b0ee4f2eb817be
Full Text :
https://doi.org/10.1101/2021.09.20.21263842