1. A Deep Learning Framework for Audio Deepfake Detection.
- Author
-
Khochare, Janavi, Joshi, Chaitali, Yenarkar, Bakul, Suratkar, Shraddha, and Kazi, Faruk
- Subjects
- *
CONVOLUTIONAL neural networks , *DEEP learning , *SPEECH synthesis , *MACHINE learning , *DEEPFAKES , *CLASSIFICATION algorithms - Abstract
Audio deepfakes have been increasingly emerging as a potential source of deceit, with the development of avant-garde methods of synthetic speech generation. Hence, differentiating fake audio from the real one is becoming even more difficult owing to the increasing accuracy of text-to-speech models, posing a serious threat to speaker verification systems. Within the domain of audio deepfake detection, a majority of experiments have been based on the ASVSpoof or the AVSpoof dataset using various machine learning and deep learning approaches. In this work, experiments were performed on a more recent dataset, the Fake or Real (FoR) dataset which contains data generated using some of the best text to speech models. Two approaches have been adopted to the solve problem: feature-based approach and image-based approach. The feature-based approach involves converting audio data into a dataset consisting of various spectral features of the audio samples, which are fed to the machine learning algorithms for the classification of audio as fake or real. While in the image-based approach audio samples are converted into melspectrograms which are input into deep learning algorithms, namely Temporal Convolutional Network (TCN) and Spatial Transformer Network (STN). TCN has been implemented because it is a sequential model and has been shown to give good results on sequential data. A comparison between the performances of both the approaches has been made, and it is observed that deep learning algorithms, particularly TCN, outperforms the machine learning algorithms by a significant margin, with a 92 percent test accuracy. This solution presents a model for audio deepfake classification which has an accuracy comparable to the traditional CNN models like VGG16, XceptionNet, etc. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF