1. Artificially augmenting data or adding more samples? A study on a 3D CNN for lung nodule classification
- Author
-
Panagiotis Gonidakis, Jef Vandemeulebroucke, Bart Jansen, Hahn, Horst K., Mazurowski, Maciej A., Faculty of Engineering, and Electronics and Informatics
- Subjects
Reduction (complexity) ,Set (abstract data type) ,Computer science ,business.industry ,Deep learning ,Pattern recognition ,CAD ,Topology (electrical circuits) ,Artificial intelligence ,Translation (geometry) ,business ,Rotation (mathematics) ,Convolutional neural network - Abstract
Convolutional neural networks are known to require large amounts of data to achieve optimal performance. In addition, data is commonly computationally augmented using a variety of geometric and intensity transformations to further extent the set of training samples. In medical imaging, annotated data is often scarce or costly to obtain, and there is considerable interest in methods to reduce the amount of data needed. In this work, we investigate the relative benefit of increasing the amount of original data, with respect to computationally augmenting the amount of training samples, for the case of false positive reduction of lung nodules candidates. To this end, we have implemented a previously published topology for classification, shown to achieve state of the art results on the publicly available Luna16 dataset. Numerous models were trained using different amounts of unique training samples and different degrees of data augmentation involving rotations and translations, and the performance was compared. Results indicate that in general, better performance is achieved when increasing the amount of data, or augmenting the data more extensively, as expected. Surprisingly however, we observed that after reaching a certain amount of unique training samples, data augmentation leads to significantly better performance compared to adding the same number of new samples to the training dataset. We hypothesize that the augmentation has aided in learning more general {rotation and translation invariant-features, leading to improved performance on unseen data. Future experiments include more detailed characterization of this behavior, and relating this to the topology and amount of parameters to be trained.
- Published
- 2020