Start Over

A Bayesian Network Approach to Lung Cancer Screening: Assessing the Impact of Data Quantity, Quality, and the Combination of Data from Danish Electronic Health Records.

Authors :: Daalen, Florian van
Henriksen, Margrethe Høstgaard Bang
Hansen, Torben Frøstrup
Jensen, Lars Henrik
Brasen, Claus Lohman
Hilberg, Ole
Andersen, Martin Ask Klausholt
Humerfelt, Elise
Wee, Leonard
Bermejo, Inigo
Source :: Cancers. Dec2024, Vol. 16 Issue 23, p3989. 16p.
Publication Year :: 2024
Abstract: Simple Summary: This study developed and evaluated Bayesian Network models for lung cancer risk prediction using a decade of data from 38,944 high-risk individuals in Denmark. The models were trained and validated on datasets with varying sizes and levels of missing data to reflect real-world screening scenarios. The results showed that a model trained on a small, complete dataset (AUC 0.78) performed similarly on a larger dataset with 21% missing data (AUC 0.78), but performance decreased when 39% of data were missing (AUC 0.67). The laboratory results and smoking data were the most informative variables, significantly outperforming models based only on age and smoking status (AUC 0.70). These findings suggest that BN models can maintain strong predictive performance despite incomplete data and highlight the value of including standard laboratory results in future LC screening programs. Background/Objectives: Lung cancer (LC) is the leading cause of cancer mortality, making early diagnosis essential. While LC screening trials are underway globally, optimal prediction models and inclusion criteria are still lacking. This study aimed to develop and evaluate Bayesian Network (BN) models for LC risk prediction using a decade of data from Denmark. The primary goal was to assess BN performance on datasets varying in size and completeness, simulate real-world screening scenarios, and identify the most valuable data sources for LC screening. Methods: The study included 38,944 patients evaluated for LC, with 11,284 (29%) diagnosed. Data on comorbidities, medications, and general practice were available for the entire cohort, while laboratory results, smoking habits, and other variables were only available for subsets. The cohort was divided into four subsets based on data availability, and BNs were trained and validated across these subsets using cross-validation and external validation. To determine the optimal combination of variables, all possible data combinations were evaluated on the samples that contained all the variables (n = 5587). Results: A model trained on the small, complete dataset (AUC 0.78) performed similarly on a larger dataset with 21% missing data (AUC 0.78). Performance dropped when 39% of data were missing (AUC 0.67), resulting in informative variables missing completely in the dataset. Laboratory results and smoking data were the most informative, significantly outperforming models based only on age and smoking status (AUC 0.70). Conclusions: BN models demonstrated moderate to strong predictive performance, even with incomplete data, highlighting the potential value of incorporating laboratory results in LC screening programs. [ABSTRACT FROM AUTHOR]

Subjects :: *RISK assessment
*PREDICTION models
*RESEARCH funding
*EARLY detection of cancer
*SMOKING
*RETROSPECTIVE studies
*AGE distribution
*DESCRIPTIVE statistics
*SIMULATION methods in education
*LUNG tumors
*MEDICAL records
*ACQUISITION of data
*DATA quality
*COMORBIDITY
*DISEASE risk factors

Details

Language :: English
ISSN :: 20726694
Volume :: 16
Issue :: 23
Database :: Academic Search Index
Journal :: Cancers
Publication Type :: Academic Journal
Accession number :: 181660988
Full Text :: https://doi.org/10.3390/cancers16233989

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A Bayesian Network Approach to Lung Cancer Screening: Assessing the Impact of Data Quantity, Quality, and the Combination of Data from Danish Electronic Health Records.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A Bayesian Network Approach to Lung Cancer Screening: Assessing the Impact of Data Quantity, Quality, and the Combination of Data from Danish Electronic Health Records.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources