Back to Search Start Over

Curation of myeloma observational study MALIMAR using XNAT: solving the challenges posed by real-world data

Authors :
Simon J. Doran
Theo Barfoot
Linda Wedlake
Jessica M. Winfield
James Petts
Ben Glocker
Xingfeng Li
Martin Leach
Martin Kaiser
Tara D. Barwick
Aristeidis Chaidos
Laura Satchwell
Neil Soneji
Khalil Elgendy
Alexander Sheeka
Kathryn Wallitt
Dow-Mu Koh
Christina Messiou
Andrea Rockall
Source :
Insights into Imaging, Vol 15, Iss 1, Pp 1-13 (2024)
Publication Year :
2024
Publisher :
SpringerOpen, 2024.

Abstract

Abstract Objectives MAchine Learning In MyelomA Response (MALIMAR) is an observational clinical study combining “real-world” and clinical trial data, both retrospective and prospective. Images were acquired on three MRI scanners over a 10-year window at two institutions, leading to a need for extensive curation. Methods Curation involved image aggregation, pseudonymisation, allocation between project phases, data cleaning, upload to an XNAT repository visible from multiple sites, annotation, incorporation of machine learning research outputs and quality assurance using programmatic methods. Results A total of 796 whole-body MR imaging sessions from 462 subjects were curated. A major change in scan protocol part way through the retrospective window meant that approximately 30% of available imaging sessions had properties that differed significantly from the remainder of the data. Issues were found with a vendor-supplied clinical algorithm for “composing” whole-body images from multiple imaging stations. Historic weaknesses in a digital video disk (DVD) research archive (already addressed by the mid-2010s) were highlighted by incomplete datasets, some of which could not be completely recovered. The final dataset contained 736 imaging sessions for 432 subjects. Software was written to clean and harmonise data. Implications for the subsequent machine learning activity are considered. Conclusions MALIMAR exemplifies the vital role that curation plays in machine learning studies that use real-world data. A research repository such as XNAT facilitates day-to-day management, ensures robustness and consistency and enhances the value of the final dataset. The types of process described here will be vital for future large-scale multi-institutional and multi-national imaging projects. Critical relevance statement This article showcases innovative data curation methods using a state-of-the-art image repository platform; such tools will be vital for managing the large multi-institutional datasets required to train and validate generalisable ML algorithms and future foundation models in medical imaging. Key points • Heterogeneous data in the MALIMAR study required the development of novel curation strategies. • Correction of multiple problems affecting the real-world data was successful, but implications for machine learning are still being evaluated. • Modern image repositories have rich application programming interfaces enabling data enrichment and programmatic QA, making them much more than simple “image marts”. Graphical Abstract

Details

Language :
English
ISSN :
18694101
Volume :
15
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Insights into Imaging
Publication Type :
Academic Journal
Accession number :
edsdoj.3e19c56ad064f76a98af16a0519cd4d
Document Type :
article
Full Text :
https://doi.org/10.1186/s13244-023-01591-7