Back to Search
Start Over
Incorporating label uncertainty during the training of convolutional neural networks improves performance for the discrimination between certain and inconclusive cases in dopamine transporter SPECT.
- Source :
-
European journal of nuclear medicine and molecular imaging [Eur J Nucl Med Mol Imaging] 2025 Mar; Vol. 52 (4), pp. 1535-1548. Date of Electronic Publication: 2024 Nov 27. - Publication Year :
- 2025
-
Abstract
- Purpose: Deep convolutional neural networks (CNN) hold promise for assisting the interpretation of dopamine transporter (DAT)-SPECT. For improved communication of uncertainty to the user it is crucial to reliably discriminate certain from inconclusive cases that might be misclassified by strict application of a predefined decision threshold on the CNN output. This study tested two methods to incorporate existing label uncertainty during the training to improve the utility of the CNN sigmoid output for this task.<br />Methods: Three datasets were used retrospectively: a "development" dataset (nā=ā1740) for CNN training, validation and testing, two independent out-of-distribution datasets (nā=ā640, 645) for testing only. In the development dataset, binary classification based on visual inspection was performed carefully by three well-trained readers. A ResNet-18 architecture was trained for binary classification of DAT-SPECT using either a randomly selected vote ("random vote training", RVT), the proportion of "reduced" votes ( "average vote training", AVT) or the majority vote (MVT) across the three readers as reference standard. Balanced accuracy was computed separately for "inconclusive" sigmoid outputs (within a predefined interval around the 0.5 decision threshold) and for "certain" (non-inconclusive) sigmoid outputs.<br />Results: The proportion of "inconclusive" test cases that had to be accepted to achieve a given balanced accuracy in the "certain" test case was lower with RVT and AVT than with MVT in all datasets (e.g., 1.9% and 1.2% versus 2.8% for 98% balanced accuracy in "certain" test cases from the development dataset). In addition, RVT and AVT resulted in slightly higher balanced accuracy in all test cases independent of their certainty (97.3% and 97.5% versus 97.0% in the development dataset).<br />Conclusion: Making between-readers-discrepancy known to CNN during the training improves the utility of their sigmoid output to discriminate certain from inconclusive cases that might be misclassified by the CNN when the predefined decision threshold is strictly applied. This does not compromise on overall accuracy.<br />Competing Interests: Declarations. Ethics approval and consent to participate: Waiver of informed consent for the retrospective analysis of the clinical samples (development dataset, MPH dataset) was obtained from the ethics review board of the general medical council of the state of Hamburg, Germany. All procedures performed in this study were in accordance with the ethical standards of the ethics review board of the general medical council of the state of Hamburg, Germany, and with the 1964 Helsinki declaration and its later amendments. Competing interests: The authors have no relevant financial or non-financial interests to disclose.<br /> (© 2024. The Author(s).)
Details
- Language :
- English
- ISSN :
- 1619-7089
- Volume :
- 52
- Issue :
- 4
- Database :
- MEDLINE
- Journal :
- European journal of nuclear medicine and molecular imaging
- Publication Type :
- Academic Journal
- Accession number :
- 39592475
- Full Text :
- https://doi.org/10.1007/s00259-024-06988-0