Start Over

Quantifying image naturalness using transfer learning and fusion model.

Authors :: P, Shabari Nath
Chouhan, Rajlaxmi
Source :: Multimedia Tools & Applications; Jun2024, Vol. 83 Issue 19, p56303-56320, 18p
Publication Year :: 2024
Abstract: Distinguishing a natural scene from artwork is a simple task for the human visual system, but a challenging one for machines due to the wide range of psychovisual features, illumination gamuts, and varied interpretations of glossiness. While state-of-the-art image quality metrics quantify overall visual quality to a remarkable degree of accuracy, quantification of 'glossiness' of a scene to represent its naturalness is still an emerging area. The rapid growth of deep learning methods and CNN-based architectures inspired us to explore the fusion of best performing CNN architectures in this paper for an image dataset specifically created to replicate unnaturalness of artwork or portrait-like images. Performance of four CNN networks was tested using transfer learning, selective retraining and optimizing initial learning rates on a dataset of about 8.5k images created to represent various degrees of glossiness. A fusion framework was then proposed using the top two architectures. In terms of eleven levels of naturalness (0 to 10), both quantitative and qualitative evaluation of the fusion frameworks was conducted. The framework resulting from fusion of GoogleNet and VGG16, referred to as GoogleVGG Fusion in this paper, is found to reach accuracies comparable to individual networks but with nearly half the computational cost. The proposed GoogleVGG Fusion model achieved an accuracy of 87.86% with the labelled scores and a Spearman's Rank Correlation (SROCC) of 0.9794. As expected, the accuracy of the proposed framework with the subjective scores in comparison with non–deep learning (DL) & DL-based methods is remarkably better. [ABSTRACT FROM AUTHOR]