Author: "Jan Stanczuk" / Search Limiters: Available in Library Collection - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Jan Stanczuk"' showing total 2 results

Start Over Author "Jan Stanczuk" Search Limiters Available in Library Collection

2 results on '"Jan Stanczuk"'

1. The impact of imputation quality on machine learning classifiers for datasets with missing values

Author: Tolou Shadbahr, Michael Roberts, Jan Stanczuk, Julian Gilbey, Philip Teare, Sören Dittmer, Matthew Thorpe, Ramon Viñas Torné, Evis Sala, Pietro Lió, Mishal Patel, Jacobus Preller, AIX-COVNET Collaboration, James H. F. Rudd, Tuomas Mirtti, Antti Sakari Rannikko, John A. D. Aston, Jing Tang, and Carola-Bibiane Schönlieb
Subjects: Medicine
Abstract: Abstract Background Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete samples. The focus of the machine learning researcher is to optimise the classifier’s performance. Methods We utilise three simulated and three real-world clinical datasets with different feature types and missingness patterns. Initially, we evaluate how the downstream classifier performance depends on the choice of classifier and imputation methods. We employ ANOVA to quantitatively evaluate how the choice of missingness rate, imputation method, and classifier method influences the performance. Additionally, we compare commonly used methods for assessing imputation quality and introduce a class of discrepancy scores based on the sliced Wasserstein distance. We also assess the stability of the imputations and the interpretability of model built on the imputed data. Results The performance of the classifier is most affected by the percentage of missingness in the test data, with a considerable performance decline observed as the test missingness rate increases. We also show that the commonly used measures for assessing imputation quality tend to lead to imputed data which poorly matches the underlying data distribution, whereas our new class of discrepancy scores performs much better on this measure. Furthermore, we show that the interpretability of classifier models trained using poorly imputed data is compromised. Conclusions It is imperative to consider the quality of the imputation when performing downstream classification as the effects on the classifier can be considerable.
Published: 2023
Full Text: View/download PDF

2. A pipeline to further enhance quality, integrity and reusability of the NCCID clinical data.

Author: Breger, Anna, Selby, Ian, Roberts, Michael, Babar, Judith, Gkrania-Klotsas, Effrossyni, Preller, Jacobus, Escudero Sánchez, Lorena, AIX-COVNET Collaboration, Dittmer, Sören, Thorpe, Matthew, Gilbey, Julian, Korhonen, Anna, Jefferson, Emily, Langs, Georg, Yang, Guang, Xing, Xiaodan, Nan, Yang, Li, Ming, Prosch, Helmut, and Jan Stanczuk
Subjects: MACHINE learning, COVID-19, ARTIFICIAL intelligence, DATABASES, IMAGE databases, COMPUTER software reusability, DATA integrity
Abstract: The National COVID-19 Chest Imaging Database (NCCID) is a centralized UK database of thoracic imaging and corresponding clinical data. It is made available by the National Health Service Artificial Intelligence (NHS AI) Lab to support the development of machine learning tools focused on Coronavirus Disease 2019 (COVID-19). A bespoke cleaning pipeline for NCCID, developed by the NHSx, was introduced in 2021. We present an extension to the original cleaning pipeline for the clinical data of the database. It has been adjusted to correct additional systematic inconsistencies in the raw data such as patient sex, oxygen levels and date values. The most important changes will be discussed in this paper, whilst the code and further explanations are made publicly available on GitLab. The suggested cleaning will allow global users to work with more consistent data for the development of machine learning tools without being an expert. In addition, it highlights some of the challenges when working with clinical multi-center data and includes recommendations for similar future initiatives. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

2 results on '"Jan Stanczuk"'

1. The impact of imputation quality on machine learning classifiers for datasets with missing values

2. A pipeline to further enhance quality, integrity and reusability of the NCCID clinical data.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

2 results on '"Jan Stanczuk"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources