Descriptor: "Audio forensics" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Audio forensics"' showing total 226 results

Start Over Descriptor "Audio forensics"

226 results on '"Audio forensics"'

1. All-for-One and One-for-All: Deep Learning-Based Feature Fusion for Synthetic Speech Detection

Author: Mari, Daniele, Salvi, Davide, Bestagini, Paolo, Milani, Simone, Ghosh, Ashish, Editorial Board Member, Meo, Rosa, editor, and Silvestri, Fabrizio, editor
Published: 2025
Full Text: View/download PDF

2. Digital audio tampering detection based on spatio-temporal representation learning of electrical network frequency.

Author: Zeng, Chunyan, Kong, Shuai, Wang, Zhifeng, Li, Kun, Zhao, Yuhao, Wan, Xiangkui, and Chen, Yunfan
Subjects: ARTIFICIAL neural networks, CONVOLUTIONAL neural networks, DIGITAL audio, DISCRETE Fourier transforms, FEATURE extraction
Abstract: The majority of Digital Audio Tampering Detection (DATD) methods, which are based on Electrical Network Frequency (ENF), predominantly concentrate on the static spatial information of ENF. Unfortunately, this focus neglects the temporal variation present in the ENF time series. This limitation significantly hampers the ENF feature representation capability, consequently diminishing the overall accuracy of tampering detection. To address this gap, our paper introduces an innovative digital audio tampering detection method founded on ENF spatio-temporal feature representation learning. To enhance the feature representation capability and subsequently improve tampering detection accuracy, we propose the construction of a parallel spatio-temporal network model. This model incorporates both Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) network architectures. Through this hybrid model, we aim to deeply extract both ENF spatial and temporal feature information. In the process of extracting spatial and temporal features of ENF, we utilize high-precision Discrete Fourier Transform (DFT) analysis on digital audio. This analysis allows us to extract ENF phase sequences, which are then adaptively divided into frames through frame shifting. The result is feature matrices of uniform size, effectively representing the spatial features of ENF. Concurrently, phase sequences are segmented into frames based on ENF time changes to capture the temporal features of ENF. Subsequently, deep spatial and temporal features are extracted using CNN and BiLSTM, respectively. To further enhance the representation capability of the spatio-temporal features, we introduce an attention mechanism. This mechanism dynamically assigns weights to the deep spatial and temporal features, providing a nuanced and refined representation. Finally, a deep neural network is employed to discern whether the audio has undergone tampering. Our experimental results validate the effectiveness of our approach, showcasing superior performance compared to six state-of-the-art methods across three public databases for digital audio tampering detection. This comprehensive methodology, focusing on both spatial and temporal aspects of ENF, establishes a robust foundation for advancing the field of DATD and contributes significantly to improving detection accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Discriminative Component Analysis Enhanced Feature Fusion of Electrical Network Frequency for Digital Audio Tampering Detection.

Author: Zeng, Chunyan, Kong, Shuai, Wang, Zhifeng, Li, Kun, Zhao, Yuhao, Wan, Xiangkui, and Chen, Yunfan
Subjects: *DIGITAL audio, *CLASSIFICATION algorithms, *CRIME prevention, *FREQUENCY spectra, *CURVE fitting
Abstract: Research in the domain of digital audio tampering detection has advanced significantly with the use of Electrical Network Frequency (ENF) analysis, presenting notable benefits for crime prevention and the enhancement of judicial integrity. However, the existing methodologies, particularly those analyzing ENF phase and frequency, are impeded by data clutter, redundancy, and incompatibilities with standard classification algorithms, leading to decreased detection efficacy. This study proposes a novel methodology employing Discriminant Component Analysis (DCA) for the fusion of ENF features, aiming to address these issues directly. By analyzing the distinct characteristics of ENF phase and frequency spectra, our approach uses DCA to merge these features effectively. This fusion not only amplifies the correlation between the features of phase and frequency but also simplifies the feature space through efficient dimensionality reduction. Additionally, to bridge the gap with traditional classification methods, we introduce a cascaded deep random forest algorithm, designed for intricate representational learning of the fused features. This sequential processing enhances the precision of our classification model significantly. Experimental results on both the Carioca and New Spanish public datasets demonstrate that our approach surpasses current state-of-the-art methods in terms of accuracy and robustness, establishing its superiority in the field of digital audio tampering detection. By integrating the DCA algorithm to accentuate feature uniqueness and maximize inter-feature correlation, alongside advanced representational learning via the deep random forest algorithm, our methodology markedly improves the accuracy of digital audio tampering detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Factors affecting forensic electric network frequency matching – A comprehensive study

Author: Guang Hua, Qingyi Wang, Dengpan Ye, Haijian Zhang, Guoyin Wang, and Shuyin Xia
Subjects: Digital forensics, Audio forensics, Data authentication, Timestamp verification, Electric network frequency criterion, Information technology, T58.5-58.64
Abstract: The power system frequency fluctuations could be captured by digital recordings and extracted to compare with a reference database for forensic timestamp verification. It is known as the Electric Network Frequency (ENF) criterion, enabled by the properties of random fluctuations and intra-grid consistency. In essence, this is a task of matching a short random sequence within a long reference, whose accuracy is mainly concerned with whether this match could be uniquely correct. In this paper, we comprehensively analyze the factors affecting the reliability of ENF matching, including the length of test recording, length of reference, temporal resolution, and Signal-to-Noise Ratio (SNR). For synthetic analysis, we incorporate the first-order AutoRegressive (AR) ENF model and propose an efficient Time-Frequency Domain noisy ENF synthesis method. Then, the reliability analysis schemes for both synthetic and real-world data are respectively proposed. Through a comprehensive study, we quantitatively reveal that while the SNR is an important external factor to determine whether timestamp verification is viable, the length of test recording is the most important inherent factor, followed by the length of reference. However, the temporal resolution has little impact on performance. Finally, a practical workflow of the ENF-based audio timestamp verification system is proposed, incorporating the discovered results.
Published: 2024
Full Text: View/download PDF

5. Audio Deep Fake Detection with Sonic Sleuth Model.

Author: Alshehri, Anfal, Almalki, Danah, Alharbi, Eaman, and Albaradei, Somayah
Subjects: ARTIFICIAL intelligence, DEEP learning, MACHINE learning, DEEPFAKES, INFORMATION dissemination
Abstract: Information dissemination and preservation are crucial for societal progress, especially in the technological age. While technology fosters knowledge sharing, it also risks spreading misinformation. Audio deepfakes—convincingly fabricated audio created using artificial intelligence (AI)—exacerbate this issue. We present Sonic Sleuth, a novel AI model designed specifically for detecting audio deepfakes. Our approach utilizes advanced deep learning (DL) techniques, including a custom CNN model, to enhance detection accuracy in audio misinformation, with practical applications in journalism and social media. Through meticulous data preprocessing and rigorous experimentation, we achieved a remarkable 98.27% accuracy and a 0.016 equal error rate (EER) on a substantial dataset of real and synthetic audio. Additionally, Sonic Sleuth demonstrated 84.92% accuracy and a 0.085 EER on an external dataset. The novelty of this research lies in its integration of datasets that closely simulate real-world conditions, including noise and linguistic diversity, enabling the model to generalize across a wide array of audio inputs. These results underscore Sonic Sleuth's potential as a powerful tool for combating misinformation and enhancing integrity in digital communications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Recurrent neural network and long short-term memory models for audio copy-move forgery detection: a comprehensive study.

Author: Akdeniz, Fulya and Becerikli, Yaşar
Subjects: *LONG short-term memory, *FORGERY, *DIGITAL forensics, *DEEP learning
Abstract: One of the most pressing challenges in audio forgery detection—a major topic of signal analysis and digital forensics research—is detecting copy-move forgery in audio data. Because audio data are used in numerous sectors, including security, but increasingly tampered with and manipulated, studies dedicated to detecting forgery and verifying voice data have intensified in recent years. In our study, 2189 fake audio files were produced from 2189 audio recordings on the TIMIT corpus, for a total of 4378 audio files. After the 4378 files were preprocessed to detect silent and unsilent regions in the signals, a Mel-frequency-based hybrid feature data set was obtained from the 4378 files. Next, RNN and LSTM deep learning models were applied to detect audio forgery in the data set in four experimental setups—two with RNN and two with LSTM—using the AdaGrad and AdaDelta optimizer algorithms to identify the optimum solution in the unlinear systems and minimize the loss rate. When the experimental results were compared, the accuracy rate of detecting forgery in the hybrid feature data was 76.03%, and the hybrid model, in which the features are used together, demonstrated high accuracy even with small batch sizes. This article thus reports the first-ever use of RNN and LSTM deep learning models to detect audio copy-move forgery. Moreover, because the proposed method does not require adjusting threshold values, the resulting system is more robust than other systems described in the literature. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks.

Author: Büker, Aykut and Hanilçi, Cemal
Subjects: *ARTIFICIAL neural networks, *CONVOLUTIONAL neural networks, *AUTOMATIC speech recognition, *SPEECH processing systems, *SIGNAL detection, *SMARTPHONES, *DATABASES
Abstract: Detecting double compressed (DC) speech signals is an important audio forensics task since it is highly related to the integrity and the authenticity of the recording. Adaptive multi-rate (AMR) speech codec is a popular audio compression technique specifically optimized for speech signals and it is a standard audio recording format in the vast majority of the smart phones. All of the previous studies addressing the detection of DC AMR signals report their findings for the speech signals compressed using the narrowband AMR codec (AMR-NB). Meanwhile, wideband AMR codec (AMR-WB) has been used by several mobile phone manufacturers, but DC AMR-WB speech signal detection performance remains unknown. To the best of our knowledge, this is the first study focusing on detecting the DC signals compressed using the AMR-WB speech codec. To this end, we propose three different deep neural network-based DC AMR-WB signal detection systems where the spectrogram representations of the speech signals are used as the input features. Experimental results conducted on TIMIT database provide several important findings regarding the DC AMR-WB speech detection. Firstly, DC AMR-WB detection is found to be a more challenging task than detecting the AMR-NB signals. For example, convolutional neural network (CNN)-based system yields 74.83% and 99.93% detection rates on AMR-WB and AMR-NB coded signals, respectively. Secondly, capturing the temporal information using long short-term memory (LSTM) network with the DC AMR-WB signal detection accuracy of 86.25% is found to be superior to the CNN system. Thirdly, combining the deep feature representations learned by CNN and LSTM networks further improves the performance. Fourthly, the detection rates are found to deteriorate when the signals are first encoded using different audio codecs prior to AMR-WB compression. Finally, applying score level or decision level fusion to the proposed three systems improves the detection rates, in general. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Exploring the Effectiveness of the Phase Features on Double Compressed AMR Speech Detection.

Author: Büker, Aykut and Hanilçi, Cemal
Subjects: SPEECH synthesis, SPECTROGRAMS, VIDEO coding, DEAF children
Abstract: Determining whether an audio signal is single compressed (SC) or double compressed (DC) is a crucial task in audio forensics, as it is closely linked to the integrity of the recording. In this paper, we propose the utilization of phase spectrum-based features for detecting DC narrowband and wideband adaptive multi-rate (AMR-NB and AMR-WB) speech. To the best of our knowledge, phase spectrum features have not been previously explored for DC audio detection. In addition to introducing phase spectrum features, we propose a novel parallel LSTM system that simultaneously learns the most representative features from both the magnitude and phase spectrum of the speech signal and integrates both sets of information to further enhance its performance. Analyses demonstrate significant differences between the phase spectra of SC and DC speech signals, suggesting their potential as representative features for DC AMR speech detection. The proposed phase spectrum features are found to perform as well as magnitude spectrum features for the AMR-NB codec, while outperforming the magnitude spectrum in detecting AMR-WB speech. The proposed phase spectrum features yield 8% performance improvement in terms of true positive rate over the magnitude spectrogram features. The proposed parallel LSTM system further improves DC AMR-WB speech detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Detecting audio copy-move forgery with an artificial neural network.

Author: Akdeniz, Fulya and Becerikli, Yaşar
Abstract: Given how easily audio data can be obtained, audio recordings are subject to both malicious and unmalicious tampering and manipulation that can compromise the integrity and reliability of audio data. Because audio recordings can be used in many strategic areas, detecting such tampering and manipulation of audio data is critical. Although the literature demonstrates the lack of any accurate, integrated system for detecting copy-move forgery, the field shows great promise for research. Thus, our proposed method seeks to support the detection of the passive technique of audio copy-move forgery. For our study, forgery audio data were obtained from the TIMIT dataset, and 4378 audio recordings were used: 2189 of original audio and 2189 of audio created by copy-move forgery. After the voiced and unvoiced regions in the audio signal were determined by the yet another algorithm for pitch tracking, the features were obtained from the signals using Mel frequency cepstrum coefficients (MFCCs), delta (Δ) MFCCs, and ΔΔMFCCs coefficients together, along with linear prediction coefficients (LPCs). In turn, those features were classified using artificial neural networks. Our experimental results demonstrate that the best results were 75.34% detection with the MFCC method, 73.97% detection with the ΔMFCC method, 72.37% detection with the ΔΔMFCC method, 76.48% detection with the MFCC + ΔMFCC + ΔΔMFCC method, and 74.77% detection with the LPC method. Using the MFCC + ΔMFCC + ΔΔMFCC method, in which the features are used together, we determined that the models give far superior results even with relatively few epochs. The proposed method is also more robust than other methods in the literature because it does not use threshold values. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Audio Deep Fake Detection with Sonic Sleuth Model

Author: Anfal Alshehri, Danah Almalki, Eaman Alharbi, and Somayah Albaradei
Subjects: artificial intelligence, deepfake detection, machine learning, deep learning, deepfake audio, audio forensics, Electronic computers. Computer science, QA75.5-76.95
Abstract: Information dissemination and preservation are crucial for societal progress, especially in the technological age. While technology fosters knowledge sharing, it also risks spreading misinformation. Audio deepfakes—convincingly fabricated audio created using artificial intelligence (AI)—exacerbate this issue. We present Sonic Sleuth, a novel AI model designed specifically for detecting audio deepfakes. Our approach utilizes advanced deep learning (DL) techniques, including a custom CNN model, to enhance detection accuracy in audio misinformation, with practical applications in journalism and social media. Through meticulous data preprocessing and rigorous experimentation, we achieved a remarkable 98.27% accuracy and a 0.016 equal error rate (EER) on a substantial dataset of real and synthetic audio. Additionally, Sonic Sleuth demonstrated 84.92% accuracy and a 0.085 EER on an external dataset. The novelty of this research lies in its integration of datasets that closely simulate real-world conditions, including noise and linguistic diversity, enabling the model to generalize across a wide array of audio inputs. These results underscore Sonic Sleuth’s potential as a powerful tool for combating misinformation and enhancing integrity in digital communications.
Published: 2024
Full Text: View/download PDF

11. Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection

Author: Attorresi, Luigi, Salvi, Davide, Borrelli, Clara, Bestagini, Paolo, Tubaro, Stefano, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rousseau, Jean-Jacques, editor, and Kapralos, Bill, editor
Published: 2023
Full Text: View/download PDF

12. Exploring the Effectiveness of the Phase Features on Double Compressed AMR Speech Detection

Author: Aykut Büker and Cemal Hanilçi
Subjects: DC AMR speech detection, AMR-NB, AMR-WB, audio forensics, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Determining whether an audio signal is single compressed (SC) or double compressed (DC) is a crucial task in audio forensics, as it is closely linked to the integrity of the recording. In this paper, we propose the utilization of phase spectrum-based features for detecting DC narrowband and wideband adaptive multi-rate (AMR-NB and AMR-WB) speech. To the best of our knowledge, phase spectrum features have not been previously explored for DC audio detection. In addition to introducing phase spectrum features, we propose a novel parallel LSTM system that simultaneously learns the most representative features from both the magnitude and phase spectrum of the speech signal and integrates both sets of information to further enhance its performance. Analyses demonstrate significant differences between the phase spectra of SC and DC speech signals, suggesting their potential as representative features for DC AMR speech detection. The proposed phase spectrum features are found to perform as well as magnitude spectrum features for the AMR-NB codec, while outperforming the magnitude spectrum in detecting AMR-WB speech. The proposed phase spectrum features yield 8% performance improvement in terms of true positive rate over the magnitude spectrogram features. The proposed parallel LSTM system further improves DC AMR-WB speech detection.
Published: 2024
Full Text: View/download PDF

13. Audio forensics behind the Iron Curtain: from raw sounds to expert testimony.

Author: Kvicalova, Anna
Abstract: This essay investigates the construction of forensic audio expertise in the legal and security system of Communist Czechoslovakia and shows that the contested nature of speaker identification and sound-based objectivity contributed to the formulation of probabilistic claims in forensics. It explores the practices of the Department of Fonoscopy: a unique research laboratory of audio forensics that systematically examined the spectrographic, linguistic, and auditory means of sound analysis for the purpose of identifying unknown voices and environments in audio recordings. Bringing together the notions of "forensic cultures" and "sonic skills", this article addresses the scientific, cultural, and political underpinning of the nascent field of audio expertise as well as the changing status of sound-based knowledge and forms of representation in forensics. In establishing fonoscopic expertise before the court and in the broader praxis of police investigation, the idea of vocal fingerprints and the use of sound visualisation technologies became instrumental. This essay pays special attention to the dynamics of the intricate process in which acoustic "raw material" (from anonymous calls, wiretapped phone lines, recorded conversations, or police interrogation rooms) was transformed into different kinds of legal and criminalistic evidence in the service of the totalitarian surveillance state. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

14. Deep Learning-Based CNN Multi-Modal Camera Model Identification for Video Source Identification.

Author: Singh, Surjeet and Sehgal, Vivek Kumar
Subjects: DEEP learning, SOCIAL media, CRIMINAL evidence, CRIMINAL procedure, CRIMINAL investigation, FORENSIC sciences
Abstract: Copyright of Informatica (03505596) is the property of Slovene Society Informatika and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2023
Full Text: View/download PDF

15. A Robust Approach to Multimodal Deepfake Detection.

Author: Salvi, Davide, Liu, Honggu, Mandelli, Sara, Bestagini, Paolo, Zhou, Wenbo, Zhang, Weiming, and Tubaro, Stefano
Subjects: DEEPFAKES, DEEP learning, TIME management
Abstract: The widespread use of deep learning techniques for creating realistic synthetic media, commonly known as deepfakes, poses a significant threat to individuals, organizations, and society. As the malicious use of these data could lead to unpleasant situations, it is becoming crucial to distinguish between authentic and fake media. Nonetheless, though deepfake generation systems can create convincing images and audio, they may struggle to maintain consistency across different data modalities, such as producing a realistic video sequence where both visual frames and speech are fake and consistent one with the other. Moreover, these systems may not accurately reproduce semantic and timely accurate aspects. All these elements can be exploited to perform a robust detection of fake content. In this paper, we propose a novel approach for detecting deepfake video sequences by leveraging data multimodality. Our method extracts audio-visual features from the input video over time and analyzes them using time-aware neural networks. We exploit both the video and audio modalities to leverage the inconsistencies between and within them, enhancing the final detection performance. The peculiarity of the proposed method is that we never train on multimodal deepfake data, but on disjoint monomodal datasets which contain visual-only or audio-only deepfakes. This frees us from leveraging multimodal datasets during training, which is desirable given their lack in the literature. Moreover, at test time, it allows to evaluate the robustness of our proposed detector on unseen multimodal deepfakes. We test different fusion techniques between data modalities and investigate which one leads to more robust predictions by the developed detectors. Our results indicate that a multimodal approach is more effective than a monomodal one, even if trained on disjoint monomodal datasets. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

16. A Deep Learning Approach for Splicing Detection in Digital Audios

Author: Chuchra, Akanksha, Kaur, Mandeep, Gupta, Savita, Xhafa, Fatos, Series Editor, Saraswat, Mukesh, editor, Sharma, Harish, editor, Balachandran, K., editor, Kim, Joong Hoon, editor, and Bansal, Jagdish Chand, editor
Published: 2022
Full Text: View/download PDF

17. Faked speech detection with zero prior knowledge

Author: Al Ajmi, Sahar Abdullah, Hayat, Khizar, Al Obaidi, Alaa Mohammed, Kumar, Naresh, Najim AL-Din, Munaf Salim, and Magnier, Baptiste
Published: 2024
Full Text: View/download PDF

18. Shallow and deep feature fusion for digital audio tampering detection

Author: Zhifeng Wang, Yao Yang, Chunyan Zeng, Shuai Kong, Shixiong Feng, and Nan Zhao
Subjects: Electronic network frequency, Audio forensics, Deep learning, Feature fusion, Telecommunication, TK5101-6720, Electronics, TK7800-8360
Abstract: Abstract Digital audio tampering detection can be used to verify the authenticity of digital audio. However, most current methods use standard electronic network frequency (ENF) databases for visual comparison analysis of ENF continuity of digital audio or perform feature extraction for classification by machine learning methods. ENF databases are usually tricky to obtain, visual methods have weak feature representation, and machine learning methods have more information loss in features, resulting in low detection accuracy. This paper proposes a fusion method of shallow and deep features to fully use ENF information by exploiting the complementary nature of features at different levels to more accurately describe the changes in inconsistency produced by tampering operations to raw digital audio. Firstly, the audio signal is band-pass filtered to obtain the ENF component. Then, the discrete Fourier transform (DFT) and Hilbert transform are performed to obtain the phase and instantaneous frequency of the ENF component. Secondly, the mean value of the sequence variation is used as the shallow feature; the feature matrix obtained by framing and reshaping of the ENF sequence is used as the input of the convolutional neural network; the characteristics of the fitted coefficients are obtained by curve fitting. Then, the local details of ENF are obtained from the feature matrix by the convolutional neural network, and the global information of ENF is obtained by fitting coefficient features through deep neural network (DNN). The depth features of ENF are composed of ENF global information and local information together. The shallow and deep features are fused using an attention mechanism to give greater weights to features useful for classification and suppress invalid features. Finally, the tampered audio is detected by downscaling and fitting with a DNN containing two fully connected layers, and classification is performed using a Softmax layer. The method achieves 97.03% accuracy on three classic databases: Carioca 1, Carioca 2, and New Spanish. In addition, we have achieved an accuracy of 88.31% on the newly constructed database GAUDI-DI. Experimental results show that the proposed method is superior to the state-of-the-art method.
Published: 2022
Full Text: View/download PDF

19. A multi-firearm, multi-orientation audio dataset of gunshots

Author: Ruksana Kabealo, Steven Wyatt, Akshay Aravamudan, Xi Zhang, David N. Acaron, Mawaba P. Dao, David Elliott, Anthony O. Smith, Carlos E. Otero, Luis D. Otero, Georgios C. Anagnostopoulos, Adrian M. Peter, Wesley Jones, and Eric Lam
Subjects: Gunshot audio classification, Audio forensics, Machine learning, Acoustic situational awareness, Multiple sensor orchestration, Internet of Battlefield Things (IoBT), Computer applications to medicine. Medical informatics, R858-859.7, Science (General), Q1-390
Abstract: Early detection of firearm discharge has become increasingly critical for situational awareness in both civilian and military domains. The ability to determine the location and model of a discharged firearm is vital, as this can inform effective response plans. To this end, several gunshot audio datasets have been released that aim to facilitate gunshot detection and classification of a discharged firearm based on acoustic signatures. However, these datasets often suffer from a lack of variety in the orientations of recording devices around the source of the gunshot. Additionally, these datasets often suffer from the absence of proper time synchronization, which prevents the usage of these datasets for determining the Direction of Arrival (DoA) of the sound. In this paper, we present a multi-firearm, multi-orientation time-synchronized audio dataset collected in a semi-controlled real-world setting – providing us a degree of supervision – using several edge devices positioned in and around an outdoor firing range.
Published: 2023
Full Text: View/download PDF

20. Sound on the Quiet: Speaker Identification and Auditory Objectivity in Czechoslovak Fonoscopy, 1975–90.

Author: Kvicalova, Anna
Subjects: *OBJECTIVITY, *SECRET police, *FORENSIC sciences, *LEGAL evidence, *COLD War, 1945-1991, *VISUALIZATION
Abstract: Audio technologies that allowed eavesdropping on private conversations were a key tool in Cold War–era surveillance practices. In 1975, in the midst of the Cold War, a criminal police agency called the Fonoscopy Department was established in Czechoslovakia's capital, Prague, to explore the forensic potential of sound analysis for speaker identification. This article reveals for the first time that, aside from the well-known Czechoslovak secret police's wiretapping and eavesdropping activities, an independent government agency engaged in forensic fonoscopy, developing sound-based expertise. Examining the department's practices challenges the notion of mechanical and visually grounded objectivity to show how forensic science negotiated objective knowledge at the intersection of aural analysis and visualization technologies. More generally, the article contributes to debates on utilizing "sonic skills" to produce knowledge and evidence for security and legal purposes. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

21. Source Acquisition Device Identification from Recorded Audio Based on Spatiotemporal Representation Learning with Multi-Attention Mechanisms.

Author: Zeng, Chunyan, Feng, Shixiong, Zhu, Dongliang, and Wang, Zhifeng
Subjects: *DISTRIBUTION (Probability theory), *CONVOLUTIONAL neural networks, *TIME-varying networks, *IDENTIFICATION, *FEATURE extraction
Abstract: Source acquisition device identification from recorded audio aims to identify the source recording device by analyzing the intrinsic characteristics of audio, which is a challenging problem in audio forensics. In this paper, we propose a spatiotemporal representation learning framework with multi-attention mechanisms to tackle this problem. In the deep feature extraction stage of recording devices, a two-branch network based on residual dense temporal convolution networks (RD-TCNs) and convolutional neural networks (CNNs) is constructed. The spatial probability distribution features of audio signals are employed as inputs to the branch of the CNN for spatial representation learning, and the temporal spectral features of audio signals are fed into the branch of the RD-TCN network for temporal representation learning. This achieves simultaneous learning of long-term and short-term features to obtain an accurate representation of device-related information. In the spatiotemporal feature fusion stage, three attention mechanisms—temporal, spatial, and branch attention mechanisms—are designed to capture spatiotemporal weights and achieve effective deep feature fusion. The proposed framework achieves state-of-the-art performance on the benchmark CCNU_Mobile dataset, reaching an accuracy of 97.6% for the identification of 45 recording devices, with a significant reduction in training time compared to other models. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

22. Word-for-Word Transcription of Conversations as a Task of Forensic Audio Analysis

Author: O. O. Vlasov, V. O. Kuznetshov, T. N. Svirava, and S. B. Shavykina
Subjects: audio forensics, word-for-word transcription of recorded conversations, intelligibility of speech, attribution of utterances, diagnostic expert task, classification expert task, Social pathology. Social and public welfare. Criminology, HV1-9960
Abstract: The article aims to develop a unified methodical approach to determining the word-for-word contents of conversations on phonograms when conducting analysis of audio recordings in forensic institutions of the Russian Ministry of Justice.The author summarizes the existing experience; addresses the institutional, legal, and theoretical aspects of the expert task in question and its legal significance. He also describes its type, object, subject, stages, and the range of the applied methods.Based on the author’s classification of phonograms and practical examples, the author analyzes the approaches enabling experts to solve not only typical but also particular tasks. The article also clarifies the competence of forensic experts working in the system of the Russian Ministry of Justice, who can determine the word-for-word contents of recorded conversations in the course of forensic audio and video analysis.
Published: 2022
Full Text: View/download PDF

23. Forensic authentication method for audio recordings generated by Voice Recorder application on Samsung Galaxy Watch4 series.

Author: Park, Nam In, Lim, Seong Ho, Byun, Jun Seok, Kim, Jin‐Hwan, Lee, Ji Woo, Chun, Chanjun, Kim, Yonggang, and Jeon, Oc‐Yeub
Subjects: *SOUND recordings, *WEARABLE technology, *SMARTWATCHES, *FILES (Records), *SMARTPHONES, *TIMESTAMPS, *HUMAN voice
Abstract: The number of smartwatch users has been rapidly increasing in recent years. A smartwatch is a wearable device that collects various types of data using sensors and provides basic functions, such as healthcare‐related measurements and audio recording. In this study, we proposed the forensic authentication method for audio recordings from the Voice Recording application in the Samsung Galaxy Watch4 series. First, a total of 240 audio recordings from each of the four different models, paired with four different smartphones for synchronization via Bluetooth, were collected and verified. To analyze the characteristics of smartwatch audio recordings, we examined the transition of the audio latency, writable audio bandwidth, timestamps, and file structure between those generated in the smartwatches and those edited using the Voice Recording application of the paired smartphones. In addition, the devices with the audio recordings were examined via the Android Debug Bridge (ADB) tool and compared with the timestamps stored in the file system. The experimental results showed that the audio latency, writable audio bandwidth, and file structure of audio recordings generated by smartwatches differed from those generated by smartphones. Additionally, by analyzing the file structure, audio recordings can be classified as unmanipulated, manipulation has been attempted, or manipulated. Finally, we can forensically authenticate the audio recordings generated by the Voice Recorder application in the Samsung Galaxy Watch4 series by accessing the smartwatches and analyzing the timestamps related to the audio recordings in the file system. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. How robust is the United Kingdom justice system against the advance of deepfake audio and video?

Author: Jones, Karl O. and Jones, Bethan S.
Subjects: DEEPFAKES, JUSTICE administration, ELECTRONIC evidence, RECOMMENDER systems, LEGAL remedies
Abstract: A recent development is the application of AI to either alter or create video and audio files - called Deepfakes. The paper examines the issues arising from deepfakes, to determine how robust the UK justice system is against deepfakes. The work analyses deepfake technology, with respect to an evaluation of professional knowledge, evidential standards, and current legislation. The paper discusses difficulties presented by deepfakes, highlighting the need for methods to authenticate digital evidence, and considers what UK legal remedies can protect the justice system and public from digitally falsified evidence. The paper concludes with potential recommendations for the justice system. [ABSTRACT FROM AUTHOR]
Published: 2022

25. A Large-Scale Benchmark Dataset for Anomaly Detection and Rare Event Classification for Audio Forensics

Author: Ahmed Abbasi, Abdul Rehman Rehman Javed, Amanullah Yasin, Zunera Jalil, Natalia Kryvinska, and Usman Tariq
Subjects: Audio forensics, audio analysis, anomaly detection, key feature extraction, feature selection, machine learning, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: With the emergence of new digital technologies, a significant surge has been seen in the volume of multimedia data generated from various smart devices. Several challenges for data analysis have emerged to extract useful information from multimedia data. One such challenge is the early and accurate detection of anomalies in multimedia data. This study proposes an efficient technique for anomaly detection and classification of rare events in audio data. In this paper, we develop a vast audio dataset containing seven different rare events (anomalies) with 15 different background environmental settings (e.g., beach, restaurant, and train) to focus on both detection of anomalous audio and classification of rare sound (e.g., events—baby cry, gunshots, broken glasses, footsteps) events for audio forensics. The proposed approach uses the supreme feature extraction technique by extracting mel-frequency cepstral coefficients (MFCCs) features from the audio signals of the newly created dataset and selects the minimum number of best-performing features for optimum performance using principal component analysis (PCA). These features are input to state-of-the-art machine learning algorithms for performance analysis. We also apply machine learning algorithms to the state-of-the-art dataset and realize good results. Experimental results reveal that the proposed approach effectively detects all anomalies and superior performance to existing approaches in all environments and cases.
Published: 2022
Full Text: View/download PDF

26. Shallow and deep feature fusion for digital audio tampering detection.

Author: Wang, Zhifeng, Yang, Yao, Zeng, Chunyan, Kong, Shuai, Feng, Shixiong, and Zhao, Nan
Subjects: ARTIFICIAL neural networks, HILBERT transform, DISCRETE Fourier transforms, CONVOLUTIONAL neural networks, DIGITAL audio, IMAGE fusion, MACHINE learning, FEATURE extraction
Abstract: Digital audio tampering detection can be used to verify the authenticity of digital audio. However, most current methods use standard electronic network frequency (ENF) databases for visual comparison analysis of ENF continuity of digital audio or perform feature extraction for classification by machine learning methods. ENF databases are usually tricky to obtain, visual methods have weak feature representation, and machine learning methods have more information loss in features, resulting in low detection accuracy. This paper proposes a fusion method of shallow and deep features to fully use ENF information by exploiting the complementary nature of features at different levels to more accurately describe the changes in inconsistency produced by tampering operations to raw digital audio. Firstly, the audio signal is band-pass filtered to obtain the ENF component. Then, the discrete Fourier transform (DFT) and Hilbert transform are performed to obtain the phase and instantaneous frequency of the ENF component. Secondly, the mean value of the sequence variation is used as the shallow feature; the feature matrix obtained by framing and reshaping of the ENF sequence is used as the input of the convolutional neural network; the characteristics of the fitted coefficients are obtained by curve fitting. Then, the local details of ENF are obtained from the feature matrix by the convolutional neural network, and the global information of ENF is obtained by fitting coefficient features through deep neural network (DNN). The depth features of ENF are composed of ENF global information and local information together. The shallow and deep features are fused using an attention mechanism to give greater weights to features useful for classification and suppress invalid features. Finally, the tampered audio is detected by downscaling and fitting with a DNN containing two fully connected layers, and classification is performed using a Softmax layer. The method achieves 97.03% accuracy on three classic databases: Carioca 1, Carioca 2, and New Spanish. In addition, we have achieved an accuracy of 88.31% on the newly constructed database GAUDI-DI. Experimental results show that the proposed method is superior to the state-of-the-art method. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

27. A Robust Approach to Multimodal Deepfake Detection

Author: Davide Salvi, Honggu Liu, Sara Mandelli, Paolo Bestagini, Wenbo Zhou, Weiming Zhang, and Stefano Tubaro
Subjects: deepfake detection, video forensics, audio forensics, multimodality, Photography, TR1-1050, Computer applications to medicine. Medical informatics, R858-859.7, Electronic computers. Computer science, QA75.5-76.95
Abstract: The widespread use of deep learning techniques for creating realistic synthetic media, commonly known as deepfakes, poses a significant threat to individuals, organizations, and society. As the malicious use of these data could lead to unpleasant situations, it is becoming crucial to distinguish between authentic and fake media. Nonetheless, though deepfake generation systems can create convincing images and audio, they may struggle to maintain consistency across different data modalities, such as producing a realistic video sequence where both visual frames and speech are fake and consistent one with the other. Moreover, these systems may not accurately reproduce semantic and timely accurate aspects. All these elements can be exploited to perform a robust detection of fake content. In this paper, we propose a novel approach for detecting deepfake video sequences by leveraging data multimodality. Our method extracts audio-visual features from the input video over time and analyzes them using time-aware neural networks. We exploit both the video and audio modalities to leverage the inconsistencies between and within them, enhancing the final detection performance. The peculiarity of the proposed method is that we never train on multimodal deepfake data, but on disjoint monomodal datasets which contain visual-only or audio-only deepfakes. This frees us from leveraging multimodal datasets during training, which is desirable given their lack in the literature. Moreover, at test time, it allows to evaluate the robustness of our proposed detector on unseen multimodal deepfakes. We test different fusion techniques between data modalities and investigate which one leads to more robust predictions by the developed detectors. Our results indicate that a multimodal approach is more effective than a monomodal one, even if trained on disjoint monomodal datasets.
Published: 2023
Full Text: View/download PDF

28. Target speaker filtration by mask estimation for source speaker traceability in voice conversion.

Author: Zhang, Junfei, Zhang, Xiongwei, Sun, Meng, Zou, Xia, Jia, Chong, and Li, Yihao
Subjects: *ORTHOGONAL decompositions, *FEATURE extraction, *SIGNAL filtering, *SYSTEM identification, *SPEECH
Abstract: Voice Conversion (VC) can manipulate the source speaker's identity of speech signal to make it sound like some specific target speaker, which makes it harder for a human being or a speaker verification/identification system to trace the real identity of the source speaker. However, extracting features of the source speaker from converted audio is challenging since the features of the target speaker are dominant in the converted audio, which hinders the extraction of the features of the source speaker. In this paper, to extract features of the source speaker from audios processed by VC methods, a speaker filtration block is designed, which uses mask estimation to identify source speakers from manipulated speech signals by filtering out the features of the target speaker in converted audio. Extensive experiments are conducted to evaluate the effectiveness of the proposed model in tracing source speakers of audios converted by ADAIN-VC, AGAIN-VC, VQMIVC, and FREEVC. Experimental results demonstrate the effectiveness of the proposed model by comparing to competitive baselines in speaker verification/identification scenarios. Notably, it has good performance even when being applied to unknown VC methods. Furthermore, the experiments also show that, training audios generated by multiple VC methods can improve the performance on the traceability of the source speaker. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Electric Network Frequency Based Audio Forensics Using Convolutional Neural Networks

Author: Mao, Maoyu, Xiao, Zhongcheng, Kang, Xiangui, Li, Xiang, Xiao, Liang, Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Goedicke, Michael, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Tröltzsch, Fredi, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, Kreps, David, Editorial Board Member, Reis, Ricardo, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Peterson, Gilbert, editor, and Shenoi, Sujeet, editor
Published: 2020
Full Text: View/download PDF

30. Advanced forensic procedure for the authentication of audio recordings generated by Voice Memos application of iOS14.

Author: Park, Nam In, Shim, Kyu‐Sun, Lee, Ji Woo, Kim, Jin‐Hwan, Lim, Seong Ho, Byun, Jun Seok, Kim, Yong Jin, and Jeon, Oc‐Yeub
Subjects: *SOUND recordings, *AUDIO codec, *MEMORANDUMS, *FILES (Records), *SYSTEM analysis
Abstract: In this study, we propose an advanced forensic examination procedure for audio recordings generated by the Voice Memos application with iPhone Operation System (iOS)14, to verify that these are the original recordings and have not been manipulated. The proposed examination procedure consists of an analysis of the characteristics of audio recordings and of the file system of the device storing the audio recordings. To analyze the characteristics of audio recordings, we compare the encoding parameters (bitrate, sampling rate, timestamps, etc.) and the file structure to determine whether audio recordings were manipulated. Next, in the device examination step, we analyze the media‐log history and temporary files of the file system obtained by mobile forensic tools. For comparative analysis, a total of 100 audio recording samples were obtained through the Voice Memos application from five iPhone mobile handsets of different models with iOS14 installed using Advanced audio coding (AAC) or Apple lossless audio codec (ALAC). As a result of analyzing the encoding parameters between the original and manipulated audio recordings, as well as the temporary files contained in the device file system, the difference in the encoding parameters and the very unique trace of the original audio recordings in the temporary files were confirmed when manipulating the audio recordings. In particular, the primary advantage of our proposed method is its potential ability to recover original audio recordings that were subsequently manipulated via the temporary files examined in the device file system analysis. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

31. Audio Forensics on Smartphone with Digital Forensics Research Workshop (DFRWS) Method

Author: Sunardi Sunardi, Imam Riadi, Rusydi Umar, and Muhammad Fauzan Gustafi
Subjects: audio forensics, smartphone, digital forensics research workshop (dfrws), Telecommunication, TK5101-6720, Information technology, T58.5-58.64
Abstract: Audio is one of the digital items that can reveal a happened case. However, audio evidence can also be manipulated and changed to hide information. Forensics audio is a technique to identify the sound’s owner from the audio using pitch, formant, and spectrogram parameters. The conducted research examines the similarity of the original sound with the manipulated voice to determine the owner of the sound. It analyzes the level of similarity or identical sound using spectrogram analysis with the Digital Forensics Research Workshop (DFRWS) Method. The research objects are original and manipulated files. Both files are in mp3 format, which is encoded to WAV format. Then, the live forensics method is used by picking up the data on a smartphone. Several applications are also used. The results show that the research successfully gets digital evidence on a smartphone with the Oxygen Forensic application. It extracts digital evidence in the form of two audio files and two video files. Then, by the hashing process, the four obtained files are proven to be authentic. Around 90% of the data are identical to the original voice recording. Only 10% of the data are not identical.
Published: 2021
Full Text: View/download PDF

32. Toward Realigning Automatic Speaker Verification in the Era of COVID-19.

Author: Khan, Awais, Javed, Ali, Malik, Khalid Mahmood, Raza, Muhammad Anas, Ryan, James, Saudagar, Abdul Khader Jilani, and Malik, Hafiz
Subjects: *SARS-CoV-2 Omicron variant, *SARS-CoV-2 Delta variant, *MEDICAL masks, *BREAKTHROUGH infections, *SPEECH perception, *COVID-19, *TEXT messages
Abstract: The use of face masks has increased dramatically since the COVID-19 pandemic started in order to to curb the spread of the disease. Additionally, breakthrough infections caused by the Delta and Omicron variants have further increased the importance of wearing a face mask, even for vaccinated individuals. However, the use of face masks also induces attenuation in speech signals, and this change may impact speech processing technologies, e.g., automated speaker verification (ASV) and speech to text conversion. In this paper we examine Automatic Speaker Verification (ASV) systems against the speech samples in the presence of three different types of face mask: surgical, cloth, and filtered N95, and analyze the impact on acoustics and other factors. In addition, we explore the effect of different microphones, and distance from the microphone, and the impact of face masks when speakers use ASV systems in real-world scenarios. Our analysis shows a significant deterioration in performance when an ASV system encounters different face masks, microphones, and variable distance between the subject and microphone. To address this problem, this paper proposes a novel framework to overcome performance degradation in these scenarios by realigning the ASV system. The novelty of the proposed ASV framework is as follows: first, we propose a fused feature descriptor by concatenating the novel Ternary Deviated overlapping Patterns (TDoP), Mel Frequency Cepstral Coefficients (MFCC), and Gammatone Cepstral Coefficients (GTCC), which are used by both the ensemble learning-based ASV and anomaly detection system in the proposed ASV architecture. Second, this paper proposes an anomaly detection model for identifying vocal samples produced in the presence of face masks. Next, it presents a Peak Norm (PN) filter to approximate the signal of the speaker without a face mask in order to boost the accuracy of ASV systems. Finally, the features of filtered samples utilizing the PN filter and samples without face masks are passed to the proposed ASV to test for improved accuracy. The proposed ASV system achieved an accuracy of 0.99 and 0.92, respectively, on samples recorded without a face mask and with different face masks. Although the use of face masks affects the ASV system, the PN filtering solution overcomes this deficiency up to 4%. Similarly, when exposed to different microphones and distances, the PN approach enhanced system accuracy by up to 7% and 9%, respectively. The results demonstrate the effectiveness of the presented framework against an in-house prepared, diverse Multi Speaker Face Masks (MSFM) dataset, (IRB No. FY2021-83), consisting of samples of subjects taken with a variety of face masks and microphones, and from different distances. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

33. VPCID—A VoIP Phone Call Identification Database

Author: Huang, Yuankun, Tan, Shunquan, Li, Bin, Huang, Jiwu, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Yoo, Chang D., editor, Shi, Yun-Qing, editor, Kim, Hyoung Joong, editor, Piva, Alessandro, editor, and Kim, Gwangsu, editor
Published: 2019
Full Text: View/download PDF

34. Learning to Fool the Speaker Recognition.

Author: JIGUO LI, XINFENG ZHANG, JIZHENG XU, SIWEI MA, and WEN GAO
Abstract: Due to the widespread deployment of fingerprint/face/speaker recognition systems, the risk in these systems, especially the adversarial attack, has drawn increasing attention in recent years. Previous researches mainly studied the adversarial attack to the vision-based systems, such as fingerprint and face recognition. While the attack for speech-based systems has not been well studied yet, although it has been widely used in our daily life. In this article, we attempt to fool the state-of-the-art speaker recognition model and present speaker recognition attacker, a lightweight multi-layer convolutional neural network to fool the well-trained state-of-the-art speaker recognition model by adding imperceptible perturbations onto the raw speech waveform. We find that the speaker recognition system is vulnerable to the adversarial attack, and achieve a high success rate on both the non-targeted attack and targeted attack. Besides, we present an effective method by leveraging a pretrained phoneme recognition model to optimize the speaker recognition attacker to obtain a tradeoff between the attack success rate and the perceptual quality. Experimental results on the TIMIT and LibriSpeech datasets demonstrate the effectiveness and efficiency of our proposed model. And the experiments for frequency analysis indicate that high-frequency attack is more effective than low-frequency attack, which is different from the conclusion drawn in previous image-based works. Additionally, the ablation study gives more insights into our model. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

35. CNN-Based Multi-Modal Camera Model Identification on Video Sequences.

Author: Cortivo, Davide Dal, Mandelli, Sara, Bestagini, Paolo, and Tubaro, Stefano
Subjects: CONVOLUTIONAL neural networks, COLOR filter arrays, DIGITAL image processing, DIGITAL media, COPYRIGHT infringement, FEATURE extraction
Abstract: Identifying the source camera of images and videos has gained significant importance in multimedia forensics. It allows tracing back data to their creator, thus enabling to solve copyright infringement cases and expose the authors of hideous crimes. In this paper, we focus on the problem of camera model identification for video sequences, that is, given a video under analysis, detecting the camera model used for its acquisition. To this purpose, we develop two different CNN-based camera model identification methods, working in a novel multi-modal scenario. Differently from mono-modal methods, which use only the visual or audio information from the investigated video to tackle the identification task, the proposed multi-modal methods jointly exploit audio and visual information. We test our proposed methodologies on the well-known Vision dataset, which collects almost 2000 video sequences belonging to different devices. Experiments are performed, considering native videos directly acquired by their acquisition devices and videos uploaded on social media platforms, such as YouTube and WhatsApp. The achieved results show that the proposed multimodal approaches significantly outperform their mono-modal counterparts, representing a valuable strategy for the tackled problem and opening future research to even more challenging scenarios. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

36. AN ELECTRIC NETWORK FREQUENCY ANALYSIS TECHNOLOGY DEMONSTRATOR FOR EDUCATIONAL PURPOSES.

Author: Jones, Karl O., Hamilton, Lewis, Ellis, David L., Robinson, Colin, Reed-Jones, Jago T., and Morrison, Kay
Subjects: *ELECTRIC network analysis, *FREQUENCIES of oscillating systems, *EDUCATIONAL objectives, *DIGITAL audio, *FORENSIC sciences
Abstract: Authenticating digital audio is a crucial task for audio forensic technicians (AFT) owing to increased use of digital multimedia in litigation. Analysing the electric network frequency (ENF), which can be unintentionally embedded when recording digital audio, is a critical authentication tool. Increasing use of digital media in court means growing demand for AFTs and raised demand for education. Thus, ENF analysis should be key in the education of future audio forensic technicians. A device for educational purposes has been designed to demonstrate the technology and procedures involved in ENF analysis, providing future AFTs practical experience in a key authentication technique. [ABSTRACT FROM AUTHOR]
Published: 2021

37. Speaker-independent source cell-phone identification for re-compressed and noisy audio recordings.

Author: Verma, Vinay and Khanna, Nitin
Subjects: ADDITIVE white Gaussian noise, SOUND recordings, DISCRETE Fourier transforms, AUTOMATIC speech recognition, CELL phones, USER-generated content
Abstract: With the rapid increase in user-generated multimedia content, extensive outreach over social media, and their potential in critical applications such as law enforcement, sourcey identification from re-compressed and noisy multimedia are of great importance. This paper proposes a system for speaker-independent cell-phone identification from recorded audio. This system is capable of dealing with test audio with different speech content and a different speaker compared to the training audio. Each recorded audio has the device fingerprint implicitly embedded in it, which encourages us to design a CNN-based system for learning the device-specific signatures directly from the magnitude of discrete Fourier transform of the audio. This paper also addresses the scenario where the recorded audio is re-compressed due to efficient storage and network transmission requirements, which is a common phenomenon in this age of social media. The scenario of the cell-phone classification from the audio recordings in the presence of additive white Gaussian noise is addressed as well. We show that our proposed system performs as well as the state-of-art systems for the speaker-dependent case with clean audio recordings and exhibits much higher robustness in the speaker-independent case with clean, re-compressed, and noisy audio recordings. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

38. A Semi-supervised Speaker Identification Method for Audio Forensics Using Cochleagrams

Author: Camacho, Steven, Renza, Diego, Ballesteros L., Dora M., Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Figueroa-García, Juan Carlos, editor, López-Santana, Eduyn Ramiro, editor, Villa-Ramírez, José Luis, editor, and Ferro-Escobar, Roberto, editor
Published: 2017
Full Text: View/download PDF

39. Source Cell-Phone Identification Using Spectral Features of Device Self-noise

Author: Jin, Chao, Wang, Rangding, Yan, Diqun, Tao, Biaoli, Chen, Yanan, Pei, Anshan, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Shi, Yun Qing, editor, Kim, Hyoung Joong, editor, Perez-Gonzalez, Fernando, editor, and Liu, Feng, editor
Published: 2017
Full Text: View/download PDF

40. Identification of VoIP Speech With Multiple Domain Deep Features.

Author: Huang, Yuankun, Li, Bin, Barni, Mauro, and Huang, Jiwu
Abstract: Identifying whether a phone call comes from VoIP (Voice over Internet Protocol) is a challenging but less-investigated audio forensic issue. As shown in a previous study, existing feature based methods do not work well. In this paper, we propose a robust data-driven approach, called CNN-MLS (convolutional neural network based multi-domain learning scheme), to distinguish VoIP calls from mobile phone calls. To better explore the differences between VoIP and mobile phone calls, we first process data with high-pass filtering, and then extract deep features from both temporal domain and spectral domain. Two CNN architectures are designed for accepting data from respective domains, and some tricks such as auxiliary classifiers and individual subnet training are used for accelerating network convergence. The deep features are finally fused in a classification module for identifying the phone call type. The proposed method is evaluated on VPCID (VoIP Phone Call Identification Database) dataset, under various testing conditions. We pay particular attention to tests on data belonging to a source mismatched with the training sources. Experimental results show that, compared with existing methods, our method can achieve satisfactory and better accuracy on two-second-long inputs, implying that an alert may be activated shortly after a VoIP call is made. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

41. ENF Signal Enhancement in Audio Recordings.

Author: Hua, Guang and Zhang, Haijian
Abstract: In electric network frequency (ENF) based audio forensics, the ENF signal captured in a questioned audio recording is estimated and analyzed for authentication purposes. However, the captured ENF signal is usually contaminated by very strong noise and interference. In this paper, we propose a robust filtering algorithm (RFA) for ENF signal enhancement in audio recordings, which could effectively suppress the additive noise and facilitate subsequent ENF estimation, especially in practical low signal-to-noise ratio (SNR) situations. The proposed algorithm encodes the time domain expression of the preprocessed audio signal (ENF signal plus noise) as the instantaneous frequencies (IFs) of an analytical sinusoidal frequency modulated (SFM) signal. Then, a kernel function is utilized to generate a sinusoidal time-frequency distribution (STFD) whose peaks correspond to the IFs of the analytical signal, i.e., the denoised ENF signal. It is then proven that finding the STFD peaks is equivalent to finding the averaged phases of the kernel function if the additive noise is a zero mean wide sense stationary (WSS) process. The RFA serves as a noise reduction mechanism yielding improved SNR at the filter output. Combined with generic frequency estimation methods, ENF extraction accuracy could be substantially improved with the use of the proposed RFA than without using it. Both synthetic and experimental results are provided to illustrate the effectiveness of our proposal. Reliable ENF extraction could be achieved under noise level down to −20 dB SNR, and the RFA is suitable for a wide range of ENF-based forensic applications. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

42. CNN-Based Multi-Modal Camera Model Identification on Video Sequences

Author: Davide Dal Cortivo, Sara Mandelli, Paolo Bestagini, and Stefano Tubaro
Subjects: camera model identification, video forensics, audio forensics, convolutional neural networks, Photography, TR1-1050, Computer applications to medicine. Medical informatics, R858-859.7, Electronic computers. Computer science, QA75.5-76.95
Abstract: Identifying the source camera of images and videos has gained significant importance in multimedia forensics. It allows tracing back data to their creator, thus enabling to solve copyright infringement cases and expose the authors of hideous crimes. In this paper, we focus on the problem of camera model identification for video sequences, that is, given a video under analysis, detecting the camera model used for its acquisition. To this purpose, we develop two different CNN-based camera model identification methods, working in a novel multi-modal scenario. Differently from mono-modal methods, which use only the visual or audio information from the investigated video to tackle the identification task, the proposed multi-modal methods jointly exploit audio and visual information. We test our proposed methodologies on the well-known Vision dataset, which collects almost 2000 video sequences belonging to different devices. Experiments are performed, considering native videos directly acquired by their acquisition devices and videos uploaded on social media platforms, such as YouTube and WhatsApp. The achieved results show that the proposed multi-modal approaches significantly outperform their mono-modal counterparts, representing a valuable strategy for the tackled problem and opening future research to even more challenging scenarios.
Published: 2021
Full Text: View/download PDF

43. Source cell-phone identification from recorded speech using non-speech segments

Author: Anshan PEI, Rangding WANG, and Diqun YAN
Subjects: audio forensics, source cell-phone identification, silent segment, Mel frequency coefficient, Telecommunication, TK5101-6720, Technology
Abstract: Source cell-phone identification has become a hot topic in multimedia forensics.A novel cell-phone identification method was proposed based on the silent segments of recorded speech.Firstly,the silent segments were obtained using adaptive endpoint detection algorithm.Then,the mean of Mel frequency coefficients (MFC) was extracted as the characteristics for device identification.Finally,the CfsSubsetEval evaluation function of WEKA platform was selected according to the best priority (BestFirst) search,and support vector machine (SVM) was used for classification.Twenty-three popular models of the cell-phones were evaluated in the experiment.Experimental results show that the proposed method is feasible and the average recognition rates are 99.23% and 99.00% on the TIMIT database and the CKC-SD database.At the same time,the proposed feature performs was demonstrated better than the MFC features and the Mel frequency cepstrum coefficients (MFCC) features of the speech segments.
Published: 2017
Full Text: View/download PDF

44. Identification of Fake Stereo Audio Using SVM and CNN

Author: Tianyun Liu, Diqun Yan, Rangding Wang, Nan Yan, and Gang Chen
Subjects: stereo faking audio, audio forensics, MFCC, SVM, CNN, Information technology, T58.5-58.64
Abstract: The number of channels is one of the important criteria in regard to digital audio quality. Generally, stereo audio with two channels can provide better perceptual quality than mono audio. To seek illegal commercial benefit, one might convert a mono audio system to stereo with fake quality. Identifying stereo-faking audio is a lesser-investigated audio forensic issue. In this paper, a stereo faking corpus is first presented, which is created using the Haas effect technique. Two identification algorithms for fake stereo audio are proposed. One is based on Mel-frequency cepstral coefficient features and support vector machines. The other is based on a specially designed five-layer convolutional neural network. The experimental results on two datasets with five different cut-off frequencies show that the proposed algorithm can effectively detect stereo-faking audio and has good robustness.
Published: 2021
Full Text: View/download PDF

45. Robust Speech Hashing for Digital Audio Forensics.

Author: Renza, Diego, Vargas, Jaisson, and Ballesteros, Dora M.
Subjects: SOUND recordings, DATA integrity, PRINCIPAL components analysis, ADMISSIBLE evidence, FINITE impulse response filters, ELECTRONIC evidence, FORENSIC sciences
Abstract: The verification of the integrity and authenticity of multimedia content is an essential task in the forensic field, in order to make digital evidence admissible. The main objective is to establish whether the multimedia content has been manipulated with significant changes to its content, such as the removal of noise (e.g., a gunshot) that could clarify the facts of a crime. In this project we propose a method to generate a summary value for audio recordings, known as hash. Our method is robust, which means that if the audio has been modified slightly (without changing its significant content) with perceptual manipulations such as MPEG-4 AAC, the hash value of the new audio is very similar to that of the original audio; on the contrary, if the audio is altered and its content changes, for example with a low pass filter, the new hash value moves away from the original value. The method starts with the application of MFCC (Mel-frequency cepstrum coefficients) and the reduction of dimensions through the analysis of main components (principal component analysis, PCA). The reduced data is encrypted using as inputs two values from a particular binarization system using Collatz conjecture as the basis. Finally, a robust 96-bit code is obtained, which varies little when perceptual modifications are made to the signal such as compression or amplitude modification. According to experimental tests, the BER (bit error rate) between the hash value of the original audio recording and the manipulated audio recording is low for perceptual manipulations, i.e., 0% for FLAC and re-quantization, 1% in average for volume (−6 dB gain), less than 5% in average for MPEG-4 and resampling (using the FIR anti-aliasing filter); but more than 25% for non-perceptual manipulations such as low pass filtering (3 kHz, fifth order), additive noise, cutting and copy-move. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

46. Robust Copy–Move Detection of Speech Recording Using Similarities of Pitch and Formant.

Author: Yan, Qi, Yang, Rui, and Huang, Jiwu
Abstract: Copy–move forgery on very short speech segments, followed by post-processing operations to eliminate traces of the forgery, presents a great challenge to forensic detection. In this paper, we propose a robust method for detecting and locating a speech copy–move forgery. We found that pitch and formant can be used as the features representing a voiced speech segment, and these two features are very robust against commonly used post-processing operations. In the proposed algorithm, we first divide the speech recording into voiced speech segments and unvoiced speech segments. We then extract the pitch sequence and the first two formant sequences as the feature set of each voiced speech segment. Dynamic time warping is applied to compute the similarities of each feature set. By comparing the similarities with a threshold, we can detect and locate copy–move forgeries in speech recording. The extensive experiments show that the proposed method is very effective in detecting and locating copy–move forgeries, even on a forged speech segment as short as one voiced speech segment. The proposed method is also robust against several kinds of commonly used post-processing operations and background noise, which highlights the promising potential of the proposed method as a speech copy–move forgery localization tool in practical forensics applications. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

47. A Mobile-Oriented System for Integrity Preserving in Audio Forensics.

Author: Renza, Diego, Arango, Jaime Andres, and Ballesteros, Dora Maria
Subjects: DATA integrity, INTEGRITY
Abstract: This paper addresses a problem in the field of audio forensics. With the aim of providing a solution that helps Chain of Custody (CoC) processes, we propose an integrity verification system that includes capture (mobile based), hash code calculation and cloud storage. When the audio is recorded, a hash code is generated in situ by the capture module (an application), and it is sent immediately to the cloud. Later, the integrity of the audio recording given as evidence can be verified according to the information stored in the cloud. To validate the properties of the proposed scheme, we conducted several tests to evaluate if two different inputs could generate the same hash code (collision resistance), and to evaluate how much the hash code changes when small changes occur in the input (sensitivity analysis). According to the results, all selected audio signals provide different hash codes, and these values are very sensitive to small changes over the recorded audio. On the other hand, in terms of computational cost, less than 2 s per minute of recording are required to calculate the hash code. With the above results, our system is useful to verify the integrity of audio recordings that may be relied on as digital evidence. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

48. Source smartphone identification by exploiting encoding characteristics of recorded speech.

Author: Jin, Chao, Wang, Rangding, and Yan, Diqun
Subjects: IDENTIFICATION, AUTOMATIC speech recognition, FEATURE selection
Abstract: Source device identification has become a hot topic in multimedia forensics recently. In this paper, a novel method is proposed for source smartphone identification by using encoding characteristics as the intrinsic fingerprint of recording devices. The encoding characteristics for the smartphones of 24 popular models derived from 7 mainstream brands are investigated and statistical features of some important parameters are extracted as the discriminative features for the smartphone identification. To keep a balance between reasonable feature dimension and high classification rate, a two-step feature selection strategy consisting of Variance Threshold and SVM-RFE is designed to choose the optimal features. Experimental results show that the proposed method can achieve high identification rates of 97.89% and 98.04% for the live recorded database (CKC-SD) and the TIMIT recaptured database (TIMIT-RSD), respectively, and furthermore our scheme performs better when compared with two typical source identification approaches using recorded speeches. In addition, robustness of the proposed features is evaluated while confronting double compression attack. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

49. An Automatic Digital Audio Authentication/Forensics System

Author: Zulfiqar Ali, Muhammad Imran, and Mansour Alsulaiman
Subjects: Digital audio authentication, audio forensics, forgery, machine learning algorithm, human psychoacoustic principles, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: With the continuous rise in ingenious forgery, a wide range of digital audio authentication applications are emerging as a preventive and detective control in real-world circumstances, such as forged evidence, breach of copyright protection, and unauthorized data access. To investigate and verify, this paper presents a novel automatic authentication system that differentiates between the forged and original audio. The design philosophy of the proposed system is primarily based on three psychoacoustic principles of hearing, which are implemented to simulate the human sound perception system. Moreover, the proposed system is able to classify between the audio of different environments recorded with the same microphone. To authenticate the audio and environment classification, the computed features based on the psychoacoustic principles of hearing are dangled to the Gaussian mixture model to make automatic decisions. It is worth mentioning that the proposed system authenticates an unknown speaker irrespective of the audio content i.e., independent of narrator and text. To evaluate the performance of the proposed system, audios in multi-environments are forged in such a way that a human cannot recognize them. Subjective evaluation by three human evaluators is performed to verify the quality of the generated forged audio. The proposed system provides a classification accuracy of 99.2% ± 2.6. Furthermore, the obtained accuracy for the other scenarios, such as text-dependent and text-independent audio authentication, is 100% by using the proposed system.
Published: 2017
Full Text: View/download PDF

50. On Practical Issues of Electric Network Frequency Based Audio Forensics

Author: Guang Hua, Guoan Bi, and Vrizlynn L. L. Thing
Subjects: Electric network frequency (ENF), audio forensics, audio timestamp verification, audio tampering detection, audio authentication, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The transmission frequency of power grids, i.e., electric network frequency (ENF), has become a common criterion to authenticate audio recordings during the past decade, drawing much attention from both the academic researchers and law enforcement agencies world widely. The properties of ENF enable forensic applications such as audio evidence timestamp verification and tampering detection. In this paper, based on a general review of existing works, we discuss several important practical problems and facts that have drawn less research attention or have not been formally studied, including ENF detection problems, limitations of the ENF-based tampering detection systems, and the difficulties in the ENF analysis. During ENF detection, the challenges come from not only the noise and the interference, but also the fact that audio recordings without captured ENF can still have signal components in the frequency band of interest (false positive). In ENF-based tampering detection systems, the weakness of commonly used assumptions and the limitations of several existing solutions are discussed. In addition, we reveal that in the most intensively studied ENF-based audio evidence timestamp verification, many works aiming at improving ENF estimation could only produce marginal performance improvement, while the main problems due to noise and interference remain open. All these analysis and discussions are related by a proposed big picture of ENF-based audio authentication systems. After that, we also investigate the strategies to design more reliable audio authentication systems based on the ENF, which consists of a series of research and investigation works.
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

226 results on '"Audio forensics"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources