Back to Search Start Over

Domestic Cat Sound Classification Using Learned Features from Deep Neural Nets.

Authors :
Pandeya, Yagya Raj
Kim, Dongwhoon
Lee, Joonwhoan
Source :
Applied Sciences (2076-3417); Oct2018, Vol. 8 Issue 10, p1949, 17p
Publication Year :
2018

Abstract

Featured Application: Domestic cats are ancient human pet animal that communicate through generating sounds. Automatic animal sound classification creates a better human-animal communication environment that will obviously be helpful to know the pet animal requirements more clearly. Sometime pet also protect human being in case of natural disaster or in criminal activity by alerting messages. In this work, our main contribution is domestic cat sound classification using limited number of available data. We try to overcome the lack of data problem using different well-known techniques such as transfer learning, ensemble, cross validation, etc. We propose frequency division average pooling (FDAP) technique instead of global average pooling (GAP) to make a robust prediction using various frequency band features. In order to better understand our results, we visualize networks learned features, confusion matrix, and receiver operating characteristic (ROC) curve. The proposed techniques to deal with lack of data problem for domestic cat sound classification can be useful to other researchers working in similar domain. The domestic cat (Feliscatus) is one of the most attractive pets in the world, and it generates mysterious kinds of sound according to its mood and situation. In this paper, we deal with the automatic classification of cat sounds using machine learning. Machine learning approach for the classification requires class labeled data, so our work starts with building a small dataset named CatSound across 10 categories. Along with the original dataset, we increase the amount of data with various audio data augmentation methods to help our classification task. In this study, we use two types of learned features from deep neural networks; one from a pre-trained convolutional neural net (CNN) on music data by transfer learning and the other from unsupervised convolutional deep belief network that is (CDBN) solely trained on a collected set of cat sounds. In addition to conventional GAP, we propose an effective pooling method called FDAP to explore a number of meaningful features. In FDAP, the frequency dimension is roughly divided and then the average pooling is applied in each division. For the classification, we exploited five different machine learning algorithms and an ensemble of them. We compare the classification performances with respect following factors: the amount of data increased by augmentation, the learned features from pre-trained CNN or unsupervised CDBN, conventional GAP or FDAP, and the machine learning algorithms used for the classification. As expected, the proposed FDAP features with larger amount of data increased by augmentation combined with the ensemble approach have produced the best accuracy. Moreover, both learned features from pre-trained CNN and unsupervised CDBN produce good results in the experiment. Therefore, with the combination of all those positive factors, we obtained the best result of 91.13% in accuracy, 0.91 in f1-score, and 0.995 in area under the curve (AUC) score. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20763417
Volume :
8
Issue :
10
Database :
Complementary Index
Journal :
Applied Sciences (2076-3417)
Publication Type :
Academic Journal
Accession number :
132686771
Full Text :
https://doi.org/10.3390/app8101949