Back to Search Start Over

A pipeline to further enhance quality, integrity and reusability of the NCCID clinical data

Authors :
Anna Breger
Ian Selby
Michael Roberts
Judith Babar
Effrossyni Gkrania-Klotsas
Jacobus Preller
Lorena Escudero Sánchez
AIX-COVNET Collaboration
James H. F. Rudd
John A. D. Aston
Jonathan R. Weir-McCall
Evis Sala
Carola-Bibiane Schönlieb
Source :
Scientific Data, Vol 10, Iss 1, Pp 1-16 (2023)
Publication Year :
2023
Publisher :
Nature Portfolio, 2023.

Abstract

Abstract The National COVID-19 Chest Imaging Database (NCCID) is a centralized UK database of thoracic imaging and corresponding clinical data. It is made available by the National Health Service Artificial Intelligence (NHS AI) Lab to support the development of machine learning tools focused on Coronavirus Disease 2019 (COVID-19). A bespoke cleaning pipeline for NCCID, developed by the NHSx, was introduced in 2021. We present an extension to the original cleaning pipeline for the clinical data of the database. It has been adjusted to correct additional systematic inconsistencies in the raw data such as patient sex, oxygen levels and date values. The most important changes will be discussed in this paper, whilst the code and further explanations are made publicly available on GitLab. The suggested cleaning will allow global users to work with more consistent data for the development of machine learning tools without being an expert. In addition, it highlights some of the challenges when working with clinical multi-center data and includes recommendations for similar future initiatives.

Subjects

Subjects :
Science

Details

Language :
English
ISSN :
20524463
Volume :
10
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Scientific Data
Publication Type :
Academic Journal
Accession number :
edsdoj.9878da5de9d34045a08b088c7c1813d9
Document Type :
article
Full Text :
https://doi.org/10.1038/s41597-023-02340-7