341 results on '"audio analysis"'
Search Results
2. SecureVision: Advanced Cybersecurity Deepfake Detection with Big Data Analytics.
- Author
-
Kumar, Naresh and Kundu, Ankit
- Subjects
- *
MACHINE learning , *DATA analytics , *DIGITAL technology , *BIG data , *DEEPFAKES , *DEEP learning , *DATA privacy - Abstract
SecureVision is an advanced and trustworthy deepfake detection system created to tackle the growing threat of 'deepfake' movies that tamper with media, undermine public trust, and jeopardize cybersecurity. We present a novel approach that combines big data analytics with state-of-the-art deep learning algorithms to detect altered information in both audio and visual domains. One of SecureVision's primary innovations is the use of multi-modal analysis, which improves detection capabilities by concurrently analyzing many media forms and strengthening resistance against advanced deepfake techniques. The system's efficacy is further enhanced by its capacity to manage large datasets and integrate self-supervised learning, which guarantees its flexibility in the ever-changing field of digital deception. In the end, this study helps to protect digital integrity by providing a proactive, scalable, and efficient defense against the ubiquitous threat of deepfakes, thereby establishing a new benchmark for privacy and security measures in the digital era. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Sonic Strategies: Unveiling the Impact of Sound Features in Short Video Ads on Enterprise Market Entry Performance.
- Author
-
Yang, Qiang, Wang, Yudan, Song, Mingrui, Jiang, Yushi, and Li, Qianwen
- Subjects
- *
MACHINE learning , *MARKET entry , *ACOUSTICS , *SOCIAL marketing , *ENVIRONMENTAL music , *SOCIAL enterprises - Abstract
PurposeDesign/methodology/approachFindingsResearch implicationsPractical implicationsOriginalityThis study investigates the sound features of short video advertisements (ads) released by enterprises entering the social e-commerce market and tests whether they are potential drivers of enterprise performance. Based on these findings, this study provides practical references for the design of enterprise ads and short-video social e-commerce platforms.An integrated approach comprising machine learning algorithms and various statistical analysis techniques was employed to quantify sound features. Subsequently, we conducted correlation tests, regression analyses, and robustness tests to further verify our measurements and hypotheses.Sound features in advertisements directly influence an enterprise’s market entry performance. We found that speech rate and voice quality positively affected market entry performance, while loudness and voice pitch negatively impacted performance. In addition, the tempo of the background music positively influences performance.Operating within a cohesive research framework, this study extensively examines the effects of key sound features on enterprise performance. This study also contributes to broadening the academic literature related to the heuristic-systematic model theory and extends audio analysis methods to the social e-commerce market entry field, providing empirical evidence of the impact of sound features on the market entry performance of enterprises.The findings emphasize that enterprises should comprehend the role of sound features and utilize them appropriately when designing ads to be released upon entering the short video social e-commerce market. Furthermore, by considering the findings and audio analysis technologies, platforms can identify user references for sound more precisely and provide enterprises with more guidelines for sound design.This is the first empirical study to discuss the impact of sound features in short video ads on enterprises’ market-entry performance. Additionally, it combines the social e-commerce platform context and expands enterprise market entry to the most popular short video market. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. THE REVOLUTION OF DUPLICATED MUSIC: SONIC MARKERS TO IDENTIFY EARLY PHONOGRAPH CYLINDER COPIES IN ARCHIVE COLLECTIONS.
- Author
-
Bårdsen, Thomas
- Subjects
- *
MUSICAL analysis , *MUSIC history , *SOUND recording industry , *EARLY music , *SIGNAL-to-noise ratio - Abstract
This article explores the evolution of early commercial music production and the shift from selling original recordings to duplicated copies. It focuses on the introduction of the pantographic duplication technique, which allowed for successful mass production of phonograph cylinders. The pantograph copied cylinders mechanically, using the same blank cylinders as original recordings. This made the two products difficult to distinguish. The music industry kept this process of duplication a secret, selling duplicated copies labelled "original" and "master" quality. Recent research reveals that mechanical duplication through the pantographic method was more extensive than previously acknowledged, with millions of copies produced in a short timeframe. The author commissioned the production of a contemporary pantograph copy of a cylinder recording. Its analysis uncovered characteristic sounds and defects, such as additional mechanical noise, deteriorated signal-to-noise ratio, errors in the time axis, and excess harmonic distortion. These signatures can help differentiate between original recordings and pantographic copies in archive collections. Understanding the implications of early duplication techniques also contributes to a better understanding of the development of the music industry and its recording practices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Spiking Neural Network with Quantized Weights for Audio Stream Classification
- Author
-
Rybka, R. B., Dyakova, E. O., Serenko, A. V., Sboev, A. G., Kryzhanovsky, Boris, editor, Dunin-Barkowski, Witali, editor, Redko, Vladimir, editor, Tiumentsev, Yury, editor, and Yudin, Dmitry, editor
- Published
- 2024
- Full Text
- View/download PDF
6. Detecting Deepfake Voices Using a Novel Method for Authenticity Verification in Voice-Based Communication
- Author
-
Kansara, Aditya, Kumari, Priya, Prathap, Boppuru Rudra, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Kahraman, Cengiz, editor, Cevik Onar, Sezi, editor, Cebi, Selcuk, editor, Oztaysi, Basar, editor, Tolga, A. Cagrı, editor, and Ucal Sari, Irem, editor
- Published
- 2024
- Full Text
- View/download PDF
7. Harmonizing Nature: Bird Species Classification Through Machine Learning-Based Vocal Analysis
- Author
-
Shah, Himanshu, Borole, Tejasvi, Dhagude, Amruta, Sable, Nilesh P., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Senjyu, Tomonobu, editor, So–In, Chakchai, editor, and Joshi, Amit, editor
- Published
- 2024
- Full Text
- View/download PDF
8. Audio-Based Detection of Anxiety and Depression via Vocal Biomarkers
- Author
-
Brueckner, Raymond, Kwon, Namhee, Subramanian, Vinod, Blaylock, Nate, O’Connell, Henry, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2024
- Full Text
- View/download PDF
9. Next Generation Technical Interview Process Automation with Multi-level Interactive Chatbot Based on Intelligent Techniques
- Author
-
Rathnayake, Devin I., Mahendra, Damitha N., Amarasinghe, Bhathiya C., Premaratne, Saminda C., Buhari, Mufitha M., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Iglesias, Andres, editor, Shin, Jungpil, editor, Patel, Bharat, editor, and Joshi, Amit, editor
- Published
- 2024
- Full Text
- View/download PDF
10. Respiratory Condition Detection Using Audio Analysis and Convolutional Neural Networks Optimized by Modified Metaheuristics.
- Author
-
Bacanin, Nebojsa, Jovanovic, Luka, Stoean, Ruxandra, Stoean, Catalin, Zivkovic, Miodrag, Antonijevic, Milos, and Dobrojevic, Milos
- Subjects
- *
CONVOLUTIONAL neural networks , *LUNGS , *METAHEURISTIC algorithms , *NETWORK performance , *IDENTIFICATION - Abstract
Respiratory conditions have been a focal point in recent medical studies. Early detection and timely treatment are crucial factors in improving patient outcomes for any medical condition. Traditionally, doctors diagnose respiratory conditions through an investigation process that involves listening to the patient's lungs. This study explores the potential of combining audio analysis with convolutional neural networks to detect respiratory conditions in patients. Given the significant impact of proper hyperparameter selection on network performance, contemporary optimizers are employed to enhance efficiency. Moreover, a modified algorithm is introduced that is tailored to the specific demands of this study. The proposed approach is validated using a real-world medical dataset and has demonstrated promising results. Two experiments are conducted: the first tasked models with respiratory condition detection when observing mel spectrograms of patients' breathing patterns, while the second experiment considered the same data format for multiclass classification. Contemporary optimizers are employed to optimize the architecture selection and training parameters of models in both cases. Under identical test conditions, the best models are optimized by the introduced modified metaheuristic, with an accuracy of 0.93 demonstrated for condition detection, and a slightly reduced accuracy of 0.75 for specific condition identification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Mel-Frequency-based Feature Analysis of Audio Signals in the Context of Holy Quran Recitation.
- Author
-
Faizan, Muhammad, Arif, Muhammad Sameer, Chattha, Jawwad Nasar, and Butt, Faran Awais
- Subjects
- *
EMOTIONAL state , *MACHINE learning , *HEALING - Abstract
Different sounds have various effects on human health, and by introducing the ones that are therapeutic, a healing environment can be created. This paper describes the process to train and test a machine learning algorithm to describe and explore the therapeutic nature of Quranic verse. Using a dataset containing four emotional states namely happy, sad, angry, and relaxed, we trained a model and classified different recitations of the Quran into one of these states. This paper proposes the use of Mel-frequency cepstral coefficients (MFCC) to extract features from Quranic audio and classify it with respect to a known dataset. Based on the experiments conducted on Quranic verses, we summarize our results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Exploring AI Techniques for Generalizable Teaching Practice Identification
- Author
-
Federico Pardo Garcia, Oscar Canovas, and Felix J. Garcia Clemente
- Subjects
Audio analysis ,deep learning ,machine learning ,multi-modal learning analytics ,speaker diarization ,teaching practices ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Using automated models to analyze classroom discourse is a valuable tool for educators to improve their teaching methods. In this paper, we focus on exploring alternatives to ensure the generalizability of models for identifying teaching practices across diverse teaching contexts. Our proposal utilizes artificial intelligence to analyze audio recordings of classroom activities. By leveraging deep learning for speaker diarization and traditional machine learning algorithms for classifying teaching practices, we extract features from the audio diarization using a processing pipeline to provide detailed insights into teaching dynamics. These features enable the classification of three distinct teaching practices: lectures, group discussions, and the use of audience response systems. Our findings demonstrate that these features effectively capture the nuances of teacher-student interactions, allowing for a refined analysis of teaching styles. To enhance the robustness and generalizability of our model, we explore various pipelines for audio processing, evaluating the model’s performance across diverse contexts involving different teachers and students. By comparing these practices and their associated features, we illustrate how AI-driven tools can support teachers in reflecting on and improving their teaching strategies.
- Published
- 2024
- Full Text
- View/download PDF
13. Hyperparameter Optimization for Impulsive Sound Classifiers.
- Author
-
Mendes, André, Trigo, Paulo, and Paulo, Joel
- Subjects
SPORTS events ,SPORTS competitions ,AUTOMATIC identification ,SPORTS films ,SOUNDS - Abstract
The detection and identification of impulsive sounds applied in a specific context, particularly in sports events, enables the analysis and synthesis of various metrics and statistics associated with the game or even the player's performance. In this context, the automatic identification of an impulsive sound, such as a ball being hit by a player, is a major contribution to the construction of a data source on which game-specific analysis can be performed. Considering all the characteristics (features) of a particular type of impulsive sound in various conditions/environments involves dealing with numerous variables, making it equally challenging to efficiently find the values of the hyperparameters that allow obtaining the best configuration for a given algorithm to be executed in a machine learning process. The contribution of this work is to explore the hyperparameter space in search of values that optimize the performance of the entire process of automatic impulsive sound classification. This process begins with the generation of the dataset to be processed, continues with the training of classification models, and ends with the evaluation of the learned models. The experiments consider a binary classification problem, where a distinction must be made between the intended event and noise. The validation of the process resorts to an audio extracted from videos of sports event competitions, specifically in tennis and padel, where the goal is to identify the sounds of racket hits on the ball. This work is currently in progress, but the preliminary results already enable to evidence the impact of hyperparameter optimization on the accuracy of the overall learning process. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. An auditory data analysis framework for tourism and hospitality research.
- Author
-
Wang, Ying, Ruan, Qian, Yang, Yang, Qiu, Yiqi, and Zhou, Daohua
- Subjects
TOURISM research ,DIGITAL audio ,HOSPITALITY ,TOURISM - Abstract
Auditory data has become ubiquitous in tourism and hospitality, especially with digitalization. However, a lack of feasible frameworks for analysing auditory data creates barriers to tourism and hospitality research using auditory data. This study proposed a framework to address the methodological challenges of analysing auditory data in tourism and hospitality. The framework is applied in a digital audio interpretation setting to demonstrate its feasibility and the value of audio features. The proposed framework offers new directions for advancing tourism and hospitality research using unstructured auditory data in various contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Respiratory Diseases Diagnosis Using Audio Analysis and Artificial Intelligence: A Systematic Review.
- Author
-
Kapetanidis, Panagiotis, Kalioras, Fotios, Tsakonas, Constantinos, Tzamalis, Pantelis, Kontogiannis, George, Karamanidou, Theodora, Stavropoulos, Thanos G., and Nikoletseas, Sotiris
- Subjects
- *
ARTIFICIAL intelligence , *RESPIRATORY diseases , *DIAGNOSIS , *VOICE analysis , *COVID-19 pandemic , *AUTOMATIC speech recognition - Abstract
Respiratory diseases represent a significant global burden, necessitating efficient diagnostic methods for timely intervention. Digital biomarkers based on audio, acoustics, and sound from the upper and lower respiratory system, as well as the voice, have emerged as valuable indicators of respiratory functionality. Recent advancements in machine learning (ML) algorithms offer promising avenues for the identification and diagnosis of respiratory diseases through the analysis and processing of such audio-based biomarkers. An ever-increasing number of studies employ ML techniques to extract meaningful information from audio biomarkers. Beyond disease identification, these studies explore diverse aspects such as the recognition of cough sounds amidst environmental noise, the analysis of respiratory sounds to detect respiratory symptoms like wheezes and crackles, as well as the analysis of the voice/speech for the evaluation of human voice abnormalities. To provide a more in-depth analysis, this review examines 75 relevant audio analysis studies across three distinct areas of concern based on respiratory diseases' symptoms: (a) cough detection, (b) lower respiratory symptoms identification, and (c) diagnostics from the voice and speech. Furthermore, publicly available datasets commonly utilized in this domain are presented. It is observed that research trends are influenced by the pandemic, with a surge in studies on COVID-19 diagnosis, mobile data acquisition, and remote diagnosis systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. SecureVision: Advanced Cybersecurity Deepfake Detection with Big Data Analytics
- Author
-
Naresh Kumar and Ankit Kundu
- Subjects
deepfake detection ,cybersecurity ,deep learning ,multimedia analysis ,digital deception ,audio analysis ,Chemical technology ,TP1-1185 - Abstract
SecureVision is an advanced and trustworthy deepfake detection system created to tackle the growing threat of ‘deepfake’ movies that tamper with media, undermine public trust, and jeopardize cybersecurity. We present a novel approach that combines big data analytics with state-of-the-art deep learning algorithms to detect altered information in both audio and visual domains. One of SecureVision’s primary innovations is the use of multi-modal analysis, which improves detection capabilities by concurrently analyzing many media forms and strengthening resistance against advanced deepfake techniques. The system’s efficacy is further enhanced by its capacity to manage large datasets and integrate self-supervised learning, which guarantees its flexibility in the ever-changing field of digital deception. In the end, this study helps to protect digital integrity by providing a proactive, scalable, and efficient defense against the ubiquitous threat of deepfakes, thereby establishing a new benchmark for privacy and security measures in the digital era.
- Published
- 2024
- Full Text
- View/download PDF
17. BD-Transformer: A Transformer-Based Approach for Bipolar Disorder Classification Using Audio
- Author
-
Ramadan, Mohamed, Abdelkawy, Hazem, Mustaqueem, Othmani, Alice, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Su, Ruidan, editor, Zhang, Yudong, editor, Liu, Han, editor, and F Frangi, Alejandro, editor
- Published
- 2023
- Full Text
- View/download PDF
18. A Novel Intelligent Assessment Based on Audio-Visual Data for Chinese Zither Fingerings
- Author
-
Zhao, Wenting, Wang, Shigang, Zhao, Yan, Wei, Jian, Li, Tianshu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lu, Huchuan, editor, Ouyang, Wanli, editor, Huang, Hui, editor, Lu, Jiwen, editor, Liu, Risheng, editor, Dong, Jing, editor, and Xu, Min, editor
- Published
- 2023
- Full Text
- View/download PDF
19. Evaluating the Depression Level Based on Facial Image Analyzing and Patient Voice
- Author
-
Ramos-Cuadros, Alexander, Santillan, Luis Palomino, Ugarte, Willy, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Maciaszek, Leszek A., editor, Mulvenna, Maurice D., editor, and Ziefle, Martina, editor
- Published
- 2023
- Full Text
- View/download PDF
20. Analysis of Classroom Interaction Using Speaker Diarization and Discourse Features from Audio Recordings
- Author
-
Canovas, Oscar, Garcia, Felix J., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Auer, Michael E., editor, Pachatz, Wolfgang, editor, and Rüütmann, Tiia, editor
- Published
- 2023
- Full Text
- View/download PDF
21. Proposed Experimental Design of a Portable COVID-19 Screening Device Using Cough Audio Samples
- Author
-
Mehta, Kavish Rupesh, Natesan, Punid Ramesh, Jindal, Sumit Kumar, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Saraswat, Mukesh, editor, Chowdhury, Chandreyee, editor, Kumar Mandal, Chintan, editor, and Gandomi, Amir H., editor
- Published
- 2023
- Full Text
- View/download PDF
22. A Quantitative Study of Chineseized Musical Styles in the Piano Composition of the Yellow River Concerto Based on Audio Analysis
- Author
-
Dai Liping
- Subjects
ann algorithm ,psola algorithm ,audio analysis ,playing speed ,03d78 ,Mathematics ,QA1-939 - Abstract
The piano concerto “Yellow River”is one of the most influential works in Chinese piano concertos. This paper designs an ANN beat classification model based on audio analysis, calculates the beat cycle of the piano piece, and synthesizes the beats of the piano piece using the PSOLA algorithm, presenting an audio analysis of the piano concerto “The Yellow River”. Collate the compositional techniques and layout elements, musical characteristics, and points of nationalistic performance styles of the piano concerto“Yellow River”. Verify the universality of the audio analysis technique for the piano piece by identifying beats from different musical styles. The validity of the ANN-based audio analysis is determined by the piano keys and the audio selections of the piano concerto of “The Yellow River”. Select the classic performance version of the “Yellow River” piano concerto and count the total duration of different performance versions. Combining the designed audio analysis technique for piano compositions, the performance speeds and average speeds of each version are visually organized. Analyze the creative expression of the performance speed on the emotion and style of the piano concerto of “The Yellow River”. In the whole work, the solo piano melody has 187 beats in total. Yin Chengzong, Lang Lang, and Wan Jieni have the same average speed as 70.3 beats, 72.4 beats, and 59.3 beats, respectively. The overall tempo design of Yin Chengzong and Lang Lang has obvious peaks, and Yin Chengzong’s playing speed has a higher degree of ups and downs. The music has a more fluid feel, and the playing speed is used to demonstrate the immense momentum.
- Published
- 2024
- Full Text
- View/download PDF
23. Is Someone There or Is That the TV? Detecting Social Presence Using Sound.
- Author
-
Georgiou, Nicholas C., Ramnauth, Rebecca, Adeniran, Emmanuel, Lee, Michael, Selin, Lila, and Scassellati, Brian
- Subjects
SOCIAL robots ,SOCIETAL reaction ,HUMAN-robot interaction ,TELEVISION programs ,DECISION making ,PROBLEM solving ,MACHINE learning ,CHATBOTS - Abstract
Social robots in the home will need to solve audio identification problems to better interact with their users. This article focuses on the classification between (a) natural conversation that includes at least one co-located user and (b) media that is playing from electronic sources and does not require a social response, such as television shows. This classification can help social robots detect a user's social presence using sound. Social robots that are able to solve this problem can apply this information to assist them in making decisions, such as determining when and how to appropriately engage human users. We compiled a dataset from a variety of acoustic environments that contained either natural or media audio, including audio that we recorded in our own homes. Using this dataset, we performed an experimental evaluation on a range of traditional machine learning classifiers and assessed the classifiers' abilities to generalize to new recordings, acoustic conditions, and environments. We conclude that a C-Support Vector Classification (SVC) algorithm outperformed other classifiers. Finally, we present a classification pipeline that in-home robots can utilize, and we discuss the timing and size of the trained classifiers as well as privacy and ethics considerations. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
24. Gender and age-evolution detection based on audio forensic analysis using light deep neural network.
- Author
-
AL-Shakarchy, Noor D., Rageb, Huda, and Safoq, Mais Saad
- Abstract
Forensic audio analysis is a foundation stone of many crime investigations. In forensic evidence; the audio file of the human voice is analyzed to extract much information in addition to the content of the speech, such as the speaker's identity, emotions, gender, origin, etc. The accurate determination of individuals into groups based on their age development stage and their gender are often used as early investigations to differentiate them and determine the legal rights and responsibilities associated with them. This work introduces a light CNN model with a new architecture to detect the human age-evolution being's stage (kids or adults) at the same time the gender of the adult one (male or female) based on the individual's voice characteristics, which offers a balance between computational efficiency and model accuracy. The temporal information in the audio file is prepared by scaled and normalized. Then this information is exploited to extract and track the unique and salient audio features that make up the pattern of the feature map for each target class through some convolutional layers followed by maxpooling layers. Finally, The decision is made based on these feature maps by some fully connected layers. Successful and promising results are accomplished in terms of accuracy and loss functions which realize 0.99 and 0.017 respectively over the riched Voxceleb2 dataset. The proposed model underscores the importance of leveraging Light DNNs for gender and age-evolution detection, offering a robust and ethically sound solution for real-world applications in the field of audio forensics such as span speaker identification, victim profiling, deception detection, and more, contributing to the advancement of audio forensic analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. A Multimodal Late Fusion Framework for Physiological Sensor and Audio-Signal-Based Stress Detection: An Experimental Study and Public Dataset.
- Author
-
Xefteris, Vasileios-Rafail, Dominguez, Monica, Grivolla, Jens, Tsanousa, Athina, Zaffanela, Francesco, Monego, Martina, Symeonidis, Spyridon, Diplaris, Sotiris, Wanner, Leo, Vrochidis, Stefanos, and Kompatsiaris, Ioannis
- Subjects
MACHINE learning ,FEATURE selection ,FEATURE extraction ,MULTIMODAL user interfaces ,INTRUSION detection systems (Computer security) ,INTELLIGENT sensors ,EMERGENCY management - Abstract
Stress can be considered a mental/physiological reaction in conditions of high discomfort and challenging situations. The levels of stress can be reflected in both the physiological responses and speech signals of a person. Therefore the study of the fusion of the two modalities is of great interest. For this cause, public datasets are necessary so that the different proposed solutions can be comparable. In this work, a publicly available multimodal dataset for stress detection is introduced, including physiological signals and speech cues data. The physiological signals include electrocardiograph (ECG), respiration (RSP), and inertial measurement unit (IMU) sensors equipped in a smart vest. A data collection protocol was introduced to receive physiological and audio data based on alterations between well-known stressors and relaxation moments. Five subjects participated in the data collection, where both their physiological and audio signals were recorded by utilizing the developed smart vest and audio recording application. In addition, an analysis of the data and a decision-level fusion scheme is proposed. The analysis of physiological signals includes a massive feature extraction along with various fusion and feature selection methods. The audio analysis comprises a state-of-the-art feature extraction fed to a classifier to predict stress levels. Results from the analysis of audio and physiological signals are fused at a decision level for the final stress level detection, utilizing a machine learning algorithm. The whole framework was also tested in a real-life pilot scenario of disaster management, where users were acting as first responders while their stress was monitored in real time. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. Intelligent invigilator system based on target detection.
- Author
-
Xue, Jing, Wu, Wen, and Cheng, Qingkai
- Abstract
Affected by the COVID-19 epidemic, the final examinations at many universities and the recruitment interviews of enterprises were forced to be transferred to online remote video invigilation, which undoubtedly improves the space and possibility of cheating. To solve these problems, this paper proposes an intelligent invigilation system based on the EfficientDet target detection network model combined with a centroid tracking algorithm. Experiments show that cheating behavior detection model proposed in this paper has good detection, tracking and recognition effects in remote testing scenarios. Taking the EfficientDet network as the detection target, the average detection accuracy of the network is 81%. Experiments with real online test videos show that the cheating behavior detection accuracy can reach 83.1%. In addition, to compensate for the shortage of image detection, we also design an audio detection module to carry out auxiliary detection and forensics. The audio detection module is used to continuously detect the environmental sound of the examination room, save suspicious sounds and provide evidence for judging cheating behavior. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
27. The sound of surgery-development of an acoustic trocar system enabling laparoscopic sound analysis
- Author
-
Ostler-Mildner, Daniel, Wegener, Luca, Fuchtmann, Jonas, Feussner, Hubertus, Wilhelm, Dirk, and Navab, Nassir
- Published
- 2024
- Full Text
- View/download PDF
28. The Way EU Make Me Feel: Measuring Anxiety in the Brexit Negotiations Using Text and Audio
- Author
-
Kappos, Cybele
- Subjects
Political science ,Communication ,Statistics ,Anxiety ,Audio analysis ,Brexit ,Text analysis - Abstract
Politics make us anxious. People experience anxiety – defined here as a fearful un- certainty about the future course of events – when thinking about political issues such as climate change and the economy, ahead of important elections, and even when hearing the voices of certain political figures. Yet, scholars have devoted little attention to the study of this critical emotion. We know that politics make us feel anxious, but what does the language of political anxiety look and sound like? By understanding the language of anxiety, we may someday be able to understand how political elites use anxiety as a tool to persuade colleagues and constituents alike. Using the Brexit negotiations (2016-2020) as a case study, I develop a methodological tool that measures anxiety and emotional intensity in elite political rhetoric using text and audio data. I develop a dictionary of anxiety that scores speeches based on the semantic similarity to a sample of highly anxious words. With this dictionary, I am able to study how anxiety varies with party affiliation and the topic of the speech. I then examine a large subset of the data in audio format using pitch as a proxy for emotional intensity. This novel approach combines the measure of anxiety in text and emotional intensity in audio, constructing a fuller picture of the speaker’s emotional state. The composite measure reveals how expressions of emotion differ according to the role of thespeaker in a meeting and their party affiliation.
- Published
- 2024
29. Respiratory Condition Detection Using Audio Analysis and Convolutional Neural Networks Optimized by Modified Metaheuristics
- Author
-
Nebojsa Bacanin, Luka Jovanovic, Ruxandra Stoean, Catalin Stoean, Miodrag Zivkovic, Milos Antonijevic, and Milos Dobrojevic
- Subjects
respiratory condition ,medical data ,audio analysis ,convolutional neural network ,metaheuristic optimization ,Mathematics ,QA1-939 - Abstract
Respiratory conditions have been a focal point in recent medical studies. Early detection and timely treatment are crucial factors in improving patient outcomes for any medical condition. Traditionally, doctors diagnose respiratory conditions through an investigation process that involves listening to the patient’s lungs. This study explores the potential of combining audio analysis with convolutional neural networks to detect respiratory conditions in patients. Given the significant impact of proper hyperparameter selection on network performance, contemporary optimizers are employed to enhance efficiency. Moreover, a modified algorithm is introduced that is tailored to the specific demands of this study. The proposed approach is validated using a real-world medical dataset and has demonstrated promising results. Two experiments are conducted: the first tasked models with respiratory condition detection when observing mel spectrograms of patients’ breathing patterns, while the second experiment considered the same data format for multiclass classification. Contemporary optimizers are employed to optimize the architecture selection and training parameters of models in both cases. Under identical test conditions, the best models are optimized by the introduced modified metaheuristic, with an accuracy of 0.93 demonstrated for condition detection, and a slightly reduced accuracy of 0.75 for specific condition identification.
- Published
- 2024
- Full Text
- View/download PDF
30. Dataset of audio signals from brushless DC motors for predictive maintenance
- Author
-
Rommel Stiward Prieto Estacio, Diego Alberto Bravo Montenegro, and Carlos Felipe Rengifo Rodas
- Subjects
Machine learning ,Audio analysis ,BLDC motors ,Predictive maintenance ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Science (General) ,Q1-390 - Abstract
Predictive Maintenance (PdM) has a main role in the Fourth Industrial Revolution; its goal is to design models that can safely detect failure in systems before they fail, aiming to reduce financial, environmental, and operational costs. A brushless DC (BLDC) electric motors have increasingly become more popular and been gaining popularity in industrial applications, so their analysis for PdM applications is only a natural progression; audio analysis proves to be a useful method to achieve this and rises as a very pragmatic case of study of the characteristics of the motors. The main goal of this paper is to showcase sound-based behavior of BLDC motors in different failure modes as result of an experiment led by researchers at Universidad del Cauca in Colombia. This dataset may provide researchers with useful information regarding signal processing and the development of Machine Learning applications that would achieve an improvement within Predictive Maintenance and I4.0.Predictive Maintenance (PdM) has a main role in the Fourth Industrial Revolution; its goal is to design models that can safely detect failure in systems before they fail, aiming to reduce financial, environmental, and operational costs. A brushless DC (BLDC) electric motors have increasingly become more popular and been gaining popularity in industrial applications, so their analysis for PdM applications is only a natural progression; audio analysis proves to be a useful method to achieve this and rises as a very pragmatic case of study of the characteristics of the motors. The main goal of this paper is to showcase sound-based behavior of BLDC motors in different failure modes as result of an experiment led by researchers at Universidad del Cauca in Colombia. This dataset may provide researchers with useful information regarding signal processing and the development of Machine Learning applications that would achieve an improvement within Predictive Maintenance and I4.0.
- Published
- 2023
- Full Text
- View/download PDF
31. AI Sound Recognition on Asthma Medication Adherence: Evaluation With the RDA Benchmark Suite
- Author
-
Dimitris Nikos Fakotakis, Stavros Nousias, Gerasimos Arvanitis, Evangelia I. Zacharaki, and Konstantinos Moustakas
- Subjects
Artificial intelligence ,asthma ,audio analysis ,deep learning ,feature extraction ,inhaled medication adherence ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Asthma is a common, usually long-term respiratory disease with negative impact on global society and economy. Treatment involves using medical devices (inhalers) that distribute medication to the airways and its efficiency depends on the precision of the inhalation technique. There is a clinical need for objective methods to assess the inhalation technique, during clinical consultation. Integrated health monitoring systems, equipped with sensors, enable the recognition of drug actuation, embedded with sound signal detection, analysis and identification from intelligent structures, that could provide powerful tools for reliable content management. Health monitoring systems equipped with sensors, embedded with sound signal detection, enable the recognition of drug actuation and could be used for effective audio content analysis. This paper revisits sound pattern recognition with machine learning techniques for asthma medication adherence assessment and presents the Respiratory and Drug Actuation (RDA) Suite (https://gitlab.com/vvr/monitoring-medication-adherence/rda-benchmark) for benchmarking and further research. The RDA Suite includes a set of tools for audio processing, feature extraction and classification procedures and is provided along with a dataset, consisting of respiratory and drug actuation sounds. The classification models in RDA are implemented based on conventional and advanced machine learning and deep networks’ architectures. This study provides a comparative evaluation of the implemented approaches, examines potential improvements and discusses on challenges and future tendencies.
- Published
- 2023
- Full Text
- View/download PDF
32. Emotion Recognition of Speech by Audio Analysis using Machine Learning and Deep Learning Techniques
- Author
-
Jain, Ati, Sah, Hare Ram, Kothari, Abhay, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Shukla, Samiksha, editor, Gao, Xiao-Zhi, editor, Kureethara, Joseph Varghese, editor, and Mishra, Durgesh, editor
- Published
- 2022
- Full Text
- View/download PDF
33. Comparative Study on Sentiment Analysis of Human Speech using DNN and CNN
- Author
-
Ghosal, Sayak, Roy, Saumya, Basak, Rituparna, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Mandal, Jyotsna Kumar, editor, Hsiung, Pao-Ann, editor, and Sankar Dhar, Rudra, editor
- Published
- 2022
- Full Text
- View/download PDF
34. Multi-lingual Emotion Classification Using Convolutional Neural Networks
- Author
-
Iliev, Alexander, Mote, Ameya, Manoharan, Arjun, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lirkov, Ivan, editor, and Margenov, Svetozar, editor
- Published
- 2022
- Full Text
- View/download PDF
35. Gaussian process modelling for audio signals
- Author
-
Wilkinson, William J.
- Subjects
621.382 ,Electronic Engineering & Computer Science ,Gaussian Models ,Time-Frequency Analysis ,Audio analysis - Abstract
Audio signals are characterised and perceived based on how their spectral make-up changes with time. Uncovering the behaviour of latent spectral components is at the heart of many real-world applications involving sound, but is a highly ill-posed task given the infi nite number of ways any signal can be decomposed. This motivates the use of prior knowledge and a probabilistic modelling paradigm that can characterise uncertainty. This thesis studies the application of Gaussian processes to audio, which offer a principled non-parametric way to specify probability distributions over functions whilst also encoding prior knowledge. Along the way we consider what prior knowledge we have about sound, the way it behaves, and the way it is perceived, and write down these assumptions in the form of probabilistic models. We show how Bayesian time-frequency analysis can be reformulated as a spectral mixture Gaussian process, and utilise modern day inference methods to carry out joint time-frequency analysis and nonnegative matrix factorisation. Our reformulation results in increased modelling flexibility, allowing more sophisticated prior knowledge to be encoded, which improves performance on a missing data synthesis task. We demonstrate the generality of this paradigm by showing how the joint model can additionally be applied to both denoising and source separation tasks without modi cation. We propose a hybrid statistical-physical model for audio spectrograms based on observations about the way amplitude envelopes decay over time, as well as a nonlinear model based on deep Gaussian processes. We examine the benefi ts of these methods, all of which are generative in the sense that novel signals can be sampled from the underlying models, allowing us to consider the extent to which they encode the important perceptual characteristics of sound.
- Published
- 2019
36. Source Microphone Identification Using Swin Transformer.
- Author
-
Qamhan, Mustafa, Alotaibi, Yousef A., and Selouani, Sid-Ahmed
- Subjects
MICROPHONES ,DIGITAL audio ,PATTERN recognition systems ,CRIMINAL investigation ,CRIME analysis - Abstract
Microphone identification is a crucial challenge in the field of digital audio forensics. The ability to accurately identify the type of microphone used to record a piece of audio can provide important information for forensic analysis and crime investigations. In recent years, transformer-based deep-learning models have been shown to be effective in many different tasks. This paper proposes a system based on a transformer for microphone identification based on recorded audio. Two types of experiments were conducted: one to identify the model of the microphones and another in which identical microphones were identified within the same model. Furthermore, extensive experiments were performed to study the effects of different input types and sub-band frequencies on system accuracy. The proposed system is evaluated on the Audio Forensic Dataset for Digital Multimedia Forensics (AF-DB). The experimental results demonstrate that our model achieves state-of-the-art accuracy for inter-model and intra-model microphone classification with 5-fold cross-validation. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. Bone whistle modeling method based on robust scan point tracking
- Author
-
Kuntharrgyal Khysru and Jianguo Wei
- Subjects
bone whistle ,3D print ,3D scan ,audio analysis ,3D image ,Evolution ,QH359-425 ,Ecology ,QH540-549.5 - Abstract
Several excavations at the Hemudu site in Zhejiang Province have led to the discovery of more than 100 pieces of bird bones that were likely used to create a kind of musical instrument. However, in contrast to the Jiahu bone flute, it is not clear for what the Hemudu bone whistle was used, although it was probably used for musical purposes. In this paper, the 7000-year-old Hemudu bone whistle is reproduced with 3D scanning and printing technology. Therefore, we propose a robust scan point detection and tracking method that includes three main stages. First, we conducted robust point detection and selection where noise and irrelevant points were removed. Then, we propose to track the key points and conduct 3D model optimization. With the above efforts, we obtained the 3D model of the bone whistle, and the audio of the bone whistle is replicated. The purpose of the bone whistles is inferred and evaluated through audio analyses. The experimental results show that the method is feasible.
- Published
- 2023
- Full Text
- View/download PDF
38. Respiratory Diseases Diagnosis Using Audio Analysis and Artificial Intelligence: A Systematic Review
- Author
-
Panagiotis Kapetanidis, Fotios Kalioras, Constantinos Tsakonas, Pantelis Tzamalis, George Kontogiannis, Theodora Karamanidou, Thanos G. Stavropoulos, and Sotiris Nikoletseas
- Subjects
respiratory symptoms ,respiratory disease ,audio analysis ,signal processing ,machine learning ,digital biomarkers ,Chemical technology ,TP1-1185 - Abstract
Respiratory diseases represent a significant global burden, necessitating efficient diagnostic methods for timely intervention. Digital biomarkers based on audio, acoustics, and sound from the upper and lower respiratory system, as well as the voice, have emerged as valuable indicators of respiratory functionality. Recent advancements in machine learning (ML) algorithms offer promising avenues for the identification and diagnosis of respiratory diseases through the analysis and processing of such audio-based biomarkers. An ever-increasing number of studies employ ML techniques to extract meaningful information from audio biomarkers. Beyond disease identification, these studies explore diverse aspects such as the recognition of cough sounds amidst environmental noise, the analysis of respiratory sounds to detect respiratory symptoms like wheezes and crackles, as well as the analysis of the voice/speech for the evaluation of human voice abnormalities. To provide a more in-depth analysis, this review examines 75 relevant audio analysis studies across three distinct areas of concern based on respiratory diseases’ symptoms: (a) cough detection, (b) lower respiratory symptoms identification, and (c) diagnostics from the voice and speech. Furthermore, publicly available datasets commonly utilized in this domain are presented. It is observed that research trends are influenced by the pandemic, with a surge in studies on COVID-19 diagnosis, mobile data acquisition, and remote diagnosis systems.
- Published
- 2024
- Full Text
- View/download PDF
39. The Effect of Different Preparatory Conducting Gestures on Breathing Behavior and Voice Quality of Choral Singers.
- Author
-
Platte, Sarah Lisette, Gollhofer, Albert, Gehring, Dominic, Willimann, Joseph, Schuldt-Jensen, Morten, and Lauber, Benedikt
- Abstract
The breathing technique is a determining factor for the singer's sound quality and consequently crucial for the choral sound. However, very little is known about possible influences of the conductor's preparatory gesture on the way choral singers inhale before the beginning of a piece (respectively every subsequent phrase). The conducting literature does not discriminate between out- and inward preparatory gestures and even describes them as equivalent, but previous studies suggest that singers assign different types of inhalation to different preparatory gestures. It may therefore be assumed that the type of preparatory gesture has a direct influence on the singer's inhalation and tone production, and the aim of this study is hence to examine possible effects of two contrasting preparatory gestures on the singer's inhalation type and the resulting tone quality. In our within-subjects study design, 18 healthy choral singers (9 male/ 9 female) were recruited to participate in a laboratory experiment. The participants were asked to sing a tone suitable for their voice register in response to different video stimuli. These consisted of two conducting-videos, each showing a different preparatory gesture, and two control conditions with an animated bar and an arrow indicating the desired breathing type. The singers reacted to 10 sets of videos, each set consisting of the four stimuli in randomized order. For evaluation of the breathing behavior and vocal output during the different experimental conditions, chest wall kinematics of upper rib cage, abdominal rib cage and abdomen were measured via 3D motion capture and voice samples were recorded. The obtained data were filtered and compared using the repeated measures analysis of variance and post hoc Tukey test for significant results. The level of significance was set at P < 0.05. The results of the study show significant differences in volume of the abdomen between the two different gestures (F 1,17 = 24.04, η
2 = 0.59, P = 0.0001), which can be validated by the two control measurements (F 1,17 = 21.12, η2 = 0.55, P = 0.0002). An outward preparatory gesture evoked an abdominal breathing type while an inward-upward movement led to an inhalation with a higher portion of clavicular breathing. Furthermore, significant differences in timbre and loudness of the produced tone could be observed. The maximum sound pressure level of the outward preparatory gesture was significantly higher than in case of the inward-upward movement (F 1,17 = 20.4, η2 = 0.56, P = 0.0004). In contrast to the existing conducting literature, which does not discriminate between out- and inward preparatory gestures, the results of this study show that the conductor's choice of trajectory direction and form of the preparatory gestures elicit spontaneous, gesture-specific reactions in singers' breathing behavior as well as the corresponding loudness and sound quality. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
40. An investigation into the use of artificial intelligence techniques for the analysis and control of instrumental timbre and timbral combinations
- Author
-
Antoine, Aurélien
- Subjects
781.2 ,Timbre ,Machine Learning ,Instrument Combination ,Psychoacoustics ,Timbral Combination ,Audio Analysis ,Auditory Perception ,Computer Music - Abstract
Researchers have investigated harnessing computers as a tool to aid in the composition of music for over 70 years. In major part, such research has focused on creating algorithms to work with pitches and rhythm, which has resulted in a selection of sophisticated systems. Although the musical possibilities of these systems are vast, they are not directly considering another important characteristic of sound. Timbre can be defined as all the sound attributes, except pitch, loudness and duration, which allow us to distinguish and recognize that two sounds are dissimilar. This feature plays an essential role in combining instruments as it involves mixing instrumental properties to create unique textures conveying specific sonic qualities. Within this thesis, we explore harnessing techniques for the analysis and control of instrumental timbre and timbral combinations. This thesis begins with investigating the link between musical timbre, auditory perception and psychoacoustics for sounds emerging from instrument mixtures. It resulted in choosing to use verbal descriptors of timbral qualities to represent auditory perception of instrument combination sounds. Therefore, this thesis reports on the developments of methods and tools designed to automatically retrieve and identify perceptual qualities of timbre within audio files, using specific musical acoustic features and artificial intelligence algorithms. Different perceptual experiments have been conducted to evaluate the correlation between selected acoustics cues and humans' perception. Results of these evaluations confirmed the potential and suitability of the presented approaches. Finally, these developments have helped to design a perceptually-orientated generative system harnessing aspects of artificial intelligence to combine sampled instrument notes. The findings of this exploration demonstrate that an artificial intelligence approach can help to harness the perceptual aspect of instrumental timbre and timbral combinations. This investigation suggests that established methods of measuring timbral qualities, based on a diverse selection of sounds, also work for sounds created by combining instrument notes. The development of tools designed to automatically retrieve and identify perceptual qualities of timbre also helped in designing a comparative scale that goes towards standardising metrics for comparing timbral attributes. Finally, this research demonstrates that perceptual characteristics of timbral qualities, using verbal descriptors as a representation, can be implemented in an intelligent computing system designed to combine sampled instrument notes conveying specific perceptual qualities.
- Published
- 2018
41. Audio Features, Precomputed for Podcast Retrieval and Information Access Experiments
- Author
-
Alexander, Abigail, Mars, Matthijs, Tingey, Josh C., Yu, Haoyue, Backhouse, Chris, Reddy, Sravana, Karlgren, Jussi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Candan, K. Selçuk, editor, Ionescu, Bogdan, editor, Goeuriot, Lorraine, editor, Larsen, Birger, editor, Müller, Henning, editor, Joly, Alexis, editor, Maistro, Maria, editor, Piroi, Florina, editor, Faggioli, Guglielmo, editor, and Ferro, Nicola, editor
- Published
- 2021
- Full Text
- View/download PDF
42. A Large-Scale Benchmark Dataset for Anomaly Detection and Rare Event Classification for Audio Forensics
- Author
-
Ahmed Abbasi, Abdul Rehman Rehman Javed, Amanullah Yasin, Zunera Jalil, Natalia Kryvinska, and Usman Tariq
- Subjects
Audio forensics ,audio analysis ,anomaly detection ,key feature extraction ,feature selection ,machine learning ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
With the emergence of new digital technologies, a significant surge has been seen in the volume of multimedia data generated from various smart devices. Several challenges for data analysis have emerged to extract useful information from multimedia data. One such challenge is the early and accurate detection of anomalies in multimedia data. This study proposes an efficient technique for anomaly detection and classification of rare events in audio data. In this paper, we develop a vast audio dataset containing seven different rare events (anomalies) with 15 different background environmental settings (e.g., beach, restaurant, and train) to focus on both detection of anomalous audio and classification of rare sound (e.g., events—baby cry, gunshots, broken glasses, footsteps) events for audio forensics. The proposed approach uses the supreme feature extraction technique by extracting mel-frequency cepstral coefficients (MFCCs) features from the audio signals of the newly created dataset and selects the minimum number of best-performing features for optimum performance using principal component analysis (PCA). These features are input to state-of-the-art machine learning algorithms for performance analysis. We also apply machine learning algorithms to the state-of-the-art dataset and realize good results. Experimental results reveal that the proposed approach effectively detects all anomalies and superior performance to existing approaches in all environments and cases.
- Published
- 2022
- Full Text
- View/download PDF
43. Speech Emotion Recognition And Analysis Using Mfcc & Cnn.
- Author
-
Aggarwal, Saurabh and Kumar, Saurav
- Abstract
Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. Selection of suitable feature sets, design of a proper classification methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. Detecting emotions is one of the most important marketing strategies in today’s world. You could personalize different things for an individual specifically to suit their interest. Once machines have the capability of understanding the emotions of a person, it will greatly enhance the user experience. In this report, we will try to classify the emotion of a voice (audio clip) using different algorithms of Artificial Intelligence (AI). Based on the accuracy rates, a suitable choice could be made to make such application in the future. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. Objective Assessment of Loose Gravel Condition using Machine Learning with Audio-visual Observation
- Author
-
Saeed, Nausheen and Saeed, Nausheen
- Abstract
A well-maintained road network is essential for sustainable economic development, providing vital transportation routes for goods and services while connecting communities. Sweden's public road network includes a significant portion of gravel roads, particularly cost-effective for less populated areas with lower traffic volumes. However, gravel roads deteriorate quickly, leading to accidents, environmental pollution, and vehicle tire wear when not adequately maintained. The Swedish Road Administration Authority (Trafikverket) assesses gravel road conditions using subjective methods, analysing images taken during snow-free periods. Due to cost constraints, this labour-intensive process is prone to errors and lacks advanced techniques like road profilometers. This thesis explores the field of assessing gravel road conditions. It commences with a comprehensive review of manual gravel road assessment methods employed globally and existing data-driven smart methods. Subsequently, it harnesses machine hearing and machine vision techniques, primarily focusing on enhancing road condition classification by integrating sound and image data. The research examines sound data collected from gravel roads, exploring machine learning algorithms for loose gravel conditions classification with potential road maintenance and monitoring implications. Another crucial aspect involves applying machine vision to categorise image data from gravel roads. The study introduces an innovative approach using publicly available resources like Google Street View for image data collection, demonstrating machine vision's adaptability in assessing road conditions. The research also compares machine learning methods with manual human classification, specifically regarding sound data. Automated approaches consistently outperform manual methods, providing more reliable results. Furthermore, the thesis investigates combining audio and image data to classify road conditions, particularly loose gravel scena
- Published
- 2024
45. Footstep Analysis for the Military Parade
- Author
-
Okugawa, Yohei, Kubo, Masao, Sato, Hiroshi, Sakai, Shun, Lim, Meng-Hiot, Series Editor, Ong, Yew Soon, Series Editor, Sato, Hiroshi, editor, Iwanaga, Saori, editor, and Ishii, Akira, editor
- Published
- 2020
- Full Text
- View/download PDF
46. RideSafe: Detecting Sexual Harassment in Rideshares
- Author
-
Sakhuja, Shikhar, Cohen, Robin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Goutte, Cyril, editor, and Zhu, Xiaodan, editor
- Published
- 2020
- Full Text
- View/download PDF
47. Infant Attachment Prediction Using Vision and Audio Features in Mother-Infant Interaction
- Author
-
Li, Honggai, Cui, Jinshi, Wang, Li, Zha, Hongbin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Palaiahnakote, Shivakumara, editor, Sanniti di Baja, Gabriella, editor, Wang, Liang, editor, and Yan, Wei Qi, editor
- Published
- 2020
- Full Text
- View/download PDF
48. An Analysis of Automated Parkinson’s Diagnosis Using Voice: Methodology and Future Directions
- Author
-
Wroge, Timothy J., Ghomi, Reza Hosseini, Obeid, Iyad, editor, Selesnick, Ivan, editor, and Picone, Joseph, editor
- Published
- 2020
- Full Text
- View/download PDF
49. The Role of Orchestration in Shaping Musical Form
- Author
-
Charles de Paiva Santana and Didier Guigue
- Subjects
musical analysis ,orchestration ,musical texture ,computational musicology ,audio analysis ,Music ,M1-5000 - Abstract
We introduce a method for computer-assisted analysis of orchestration. We also look into the role that texture and orchestration have in structuring musical form. The method comprises a numerical representation, a hierarchy of ‘textural situations’ and measures for heterogeneity, diversity and complexity of orchestral-textural configurations.
- Published
- 2022
- Full Text
- View/download PDF
50. Exploring robust computer-aided diagnosis of Parkinson’s disease based on various voice signals
- Author
-
Jiu-Cheng Xie, Yanyan Gan, Ping Liang, Rushi Lan, and Hao Gao
- Subjects
Parkinson’s disease (PD) ,audio analysis ,particle swarm optimization (PSO) ,convolutional neural network (CNN) ,classification ,Physics ,QC1-999 - Abstract
As the voice disorder is a typical early symptom of Parkinson, some researchers attempt to diagnose this disease based on voice data collected from suspected patients. Although existing methods can provide acceptable results, they just work in partial scenarios. In other words, they are not generable and robust enough. To this end, we present a Parkinson’s auxiliary diagnosis system based on human speech, which can adaptively build a suitable deep neural network based on sound features. The system includes two modules: hybrid features extraction and adaptive network construction. We extract kinds of information from the voice data to form a new compound feature. Furthermore, particle swarm optimization (PSO) algorithm is employed to build the corresponding 1D convolution network for features classification. Extensive experiments on two datasets consisting of English and Italian are conducted for evaluation purposes. Experimental results show that our method improves the accuracy of voice-based Parkinson’s disease detection to some extent.
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.