Descriptor: "Audio analyzer" / Topic: computer science - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Audio analyzer"' showing total 548 results

Start Over Descriptor "Audio analyzer" Topic computer science

548 results on '"Audio analyzer"'

1. Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition

Author: Zhenyu Wang and John H. L. Hansen
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Acoustics and Ultrasonics, Artificial neural network, Computer Science - Artificial Intelligence, Computer science, Speech recognition, Speaker recognition, Audio forensics, Computer Science - Sound, Domain (software engineering), Computational Mathematics, Noise, Artificial Intelligence (cs.AI), Discriminative model, Audio and Speech Processing (eess.AS), Audio analyzer, FOS: Electrical engineering, electronic engineering, information engineering, Computer Science (miscellaneous), Electrical and Electronic Engineering, Adaptation (computer science), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Adapting speaker recognition systems to new environments is a widely-used technique to improve a well-performing model learned from large-scale data towards a task-specific small-scale data scenarios. However, previous studies focus on single domain adaptation, which neglects a more practical scenario where training data are collected from multiple acoustic domains needed in forensic scenarios. Audio analysis for forensic speaker recognition offers unique challenges in model training with multi-domain training data due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings. It is also difficult to directly employ small-scale domain-specific data to train complex neural network architectures due to domain mismatch and performance loss. Fine-tuning is a commonly-used method for adaptation in order to retrain the model with weights initialized from a well-trained model. Alternatively, in this study, three novel adaptation methods based on domain adversarial training, discrepancy minimization, and moment-matching approaches are proposed to further promote adaptation performance across multiple acoustic domains. A comprehensive set of experiments are conducted to demonstrate that: 1) diverse acoustic environments do impact speaker recognition performance, which could advance research in audio forensics, 2) domain adversarial training learns the discriminative features which are also invariant to shifts between domains, 3) discrepancy-minimizing adaptation achieves effective performance simultaneously across multiple acoustic domains, and 4) moment-matching adaptation along with dynamic distribution alignment also significantly promotes speaker recognition performance on each domain, especially for the LENA-field domain with noise compared to all other systems., Comment: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
Published: 2022

2. Audio-Based Machine Learning Model for Traffic Congestion Detection

Author: Carlos Henrique Quartucci Forster and Rubens Cruz Gatto
Subjects: SIMPLE (military communications protocol), Computer science, business.industry, Mechanical Engineering, Detector, External noise, Machine learning, computer.software_genre, Computer Science Applications, Traffic congestion, Robustness (computer science), Automotive Engineering, Audio analyzer, Global Positioning System, Mel-frequency cepstrum, Artificial intelligence, business, computer
Abstract: The present work approaches intelligent traffic evaluation and congestion detection using sound sensors and machine learning. For this, two important problems are addressed: traffic condition assessment from audio data, and analysis of audio under uncontrolled environments. By modeling the traffic parameters and the sound generation from passing vehicles and using the produced audio as a source of data for learning the traffic audio patterns, we provide a solution that copes with the time, the cost and the constraints inherent to the activity of traffic monitoring. External noise sources were introduced to produce more realistic acoustic scenes and to verify the robustness of the methods presented. Audio-based monitoring becomes a simple and low-cost option, comparing to other methods based on detector loops, or GPS, and as good as camera-based solutions, without some of the common problems of image-based monitoring, such as occlusions and light conditions. The approach is evaluated with data from audio analysis of traffic registered in locations around the city of Sao Jose dos Campos, Brazil, and audio files from places around the world, downloaded from YouTube. Its validation shows the feasibility of traffic automatic audio monitoring as well as using machine learning algorithms to recognize audio patterns under noisy environments.
Published: 2021

3. Medidas en oído real mediante sonda microfónica. Definición y aplicaciones

Author: Franz Zenker
Subjects: Hearing aid, Microphone, Computer science, medicine.medical_treatment, Speech recognition, Audio analyzer, medicine, General Medicine, Audiologist, Adaptation (computer science)
Abstract: Las medidas de oído real ha permitido al audioprotesista disponer de un criterio para la valoración de la adaptación de audífonos fiable y válido. El uso de estas medidas en la estimación de la bondad de la adaptación nos permite, entre otras ventajas, tener en cuenta las diferencias individuales al facilitarnos parámetros referidos al rendimiento del audífono en oído real. En el presente artículo llevaremos a cabo una revisión de las principales medidas que pueden registrarse con audioanalizador y sonda microfónica en oído real y sus principales aplicaciones.
Published: 2021

4. Modern possibilities of diagnostic research in the field of forensic video and audio analysis

Author: O. Brendel
Subjects: Field (physics), Multimedia, Computer science, Audio analyzer, computer.software_genre, computer
Abstract: Issues that are increasingly arising in criminal proceedings are highlighted, namely: use of data obtained as a result of diagnostic researches on video, sound recordings that allows to ensure a higher level of completeness, objectivity and comprehensiveness of research. The purpose of the article is to analyze the possibilities of forensic diagnostic researches on video and sound recordings of both oral speech and the sound environment of an offense event. Various modern possibilities of researches on video and sound recordings are considered. Attention is paid both to the specifics of diagnostic speech researches in order to obtain information about personality of the unknown speaker and to information that can be obtained through research and diagnosis of non-speech information. Analyzed the possibilities (within the framework of the examination of video, sound recording) diagnostic researches on directly speech information, including the definition of: oral speech form, nature of relations between the interlocutors, conversation meaning (definition of the meaningful situation: consent vs disagreement; permission vs prohibition; understanding vs misunderstanding; request; advice; promise; assurance; gratitude; threat; clarification; order; question; message, etc.), rate of speech and the emotional state of the interlocutors, as well as expert diagnostics of biological parameters of the speakers. Possibilities of diagnostic studies of technical studies of recording media and recording equipment, that can contain information about the technology of obtaining/fixing/saving video, sound recordings, properties and features of the media itself are considered. Non-speech sounds are classified according to their belonging to certain sources. The article highlights effectiveness of using the forensic information obtained through forensic examination in practice of investigating crimes.
Published: 2021

5. Research on a voice changed by distortion

Author: V. Kocharyan, S. Grigoryan, A. Roman, and Kh. Lutsenko
Subjects: Identification (information), Software, business.industry, Computer science, Human–computer interaction, Distortion, Perspective (graphical), Audio analyzer, SIGNAL (programming language), Criminal law, business, Field (computer science)
Abstract: The concept and essence of distortion are considered from a technical stand-point and criminal law perspective. The most common ways of distorting a voice and a speech are provided, as well as certain methods of detecting an intentional change in a voice and human speech by computer tools and linguistic analysis. Some software and hardware tools changing a speech signal both in real time and in a pre-prepared recording are analyzed. To solve diagnostic and identification tasks, a pressing issue in forensic video and audio analysis is studied which is addressed to forensic experts in this field more often.
Published: 2021

6. Audio style conversion using deep learning

Author: Aakash Ezhilan, R. Dheekksha, and S. Shridevi
Subjects: Numerical Analysis, Control and Optimization, Artificial neural network, business.industry, Computer science, General Chemical Engineering, Deep learning, General Engineering, Convolution (computer science), computer.software_genre, Style (sociolinguistics), Identification (information), Audio analyzer, Spectrogram, Artificial intelligence, General Agricultural and Biological Sciences, business, computer, Natural language processing, Generative grammar
Abstract: Style transfer is one of the most popular uses of neural networks. It has been thoroughly researched, such as extracting the style from famous paintings and applying it to other images thus creating synthetic paintings. Generative adversarial networks (GANs) are used to achieve this. This paper explores the many ways in which the same results can be achieved with audio related tasks, for which a plethora of new applications can be found. Analysis of different techniques used to transfer styles of audios, specifically changing the gender of the audio is implemented. The Crowd sourced high-quality UK and Ireland English Dialect speech data set was used. In this paper, the input is the male or female wave form and the opposite gender’s waveform is synthesized by the network, with the content spoken remaining the same. Different architectures are explored, from naive techniques and directly training audio waveforms against convolution neural networks (CNN) to using extensive algorithms researched for image style conversion and generation of spectrograms (using GANs) to be trained on CNNs. This research has a broader scope when used in converting music from one genre to another, identification of synthetic voices, curating voices for AIs based on preference etc.
Published: 2021

7. Audio and Music Analysis on the Web using Essentia.js

Author: Pablo Alonso-Jiménez, Jorge Marcos-Fernández, Dmitry Bogdanov, Albin Andrew Correya, Luis Joglar-Ongay, and Xavier Serra
Subjects: music signal processing, software, web audio, business.industry, Interface (Java), Computer science, deep learning, audio analysis, Information technology, T58.5-58.64, JavaScript, computer.software_genre, World Wide Web, Documentation, Software, Audio analyzer, Music information retrieval, M1-5000, music audio classification, Android (operating system), business, Audio signal processing, computer, Music, computer.programming_language
Abstract: Open-source software libraries have a significant impact on the development of Audio Signal Processing and Music Information Retrieval (MIR) systems. Despite the abundance of such tools, there is a lack of an extensive and easy-to-use reference library for audio feature extraction on Web clients. In this article, we present Essentia.js, an open-source JavaScript (JS) library for audio and music analysis on both web clients and JS engines. Along with the Web Audio API, it can be used for both offline and real-time audio feature extraction on web browsers. Essentia.js is modular, lightweight, and easy-to-use, deploy, maintain, and integrate into the existing plethora of JS libraries and web technologies. It is powered by a WebAssembly back end cross-compiled from the Essentia C++ library, which facilitates a JS interface to a wide range of low-level and high-level audio features, including signal processing MIR algorithms as well as pre-trained TensorFlow.js machine learning models. It also provides a higher-level JS API and add-on MIR utility modules along with extensive documentation, usage examples, and tutorials. We benchmark the proposed library on two popular web browsers and the Node.js engine, and four devices, including mobile Android and iOS, comparing it to the native performance of Essentia and the Meyda JS library. The work on Essentia.js has been partially funded by the Ministry of Science and Innovation of the Spanish Government under the grant agreement PID2019-111403GB-I00 (Musical AI).
Published: 2021

8. PROBLEMS OF FORENSIC PHONOSCOPIC EXAMINATION IN THE LIGHT OF THE DEVELOPMENT OF DIGITAL TECHNOLOGIES

Subjects: Software, Complementary and alternative medicine, Multimedia, Computer science, business.industry, Human life, Audio analyzer, Pharmaceutical Science, Pharmacology (medical), Speech synthesis, computer.software_genre, business, computer
Abstract: The article deals with some problems related to the production of forensic speech and audio analysis, taking into account global digitalization of all spheres of human life. Voicechanger, technology of voice synthesis signifi cantly complicate forensic phonoscopic examinations and put before the forensic experts a new, interesting challenges. We consider the algorithm for changing the voice using modern software, as well as features of voice synthesis technologies. Some variants of studying such phonograms are proposed.
Published: 2020

9. Discrete Fourier transform-based method for analysis of a vibrato tone

Author: Hee Suk Pang, Jun-Seok Lim, and Seokjin Lee
Subjects: Visual Arts and Performing Arts, Computer science, Speech recognition, 05 social sciences, Musical instrument, 06 humanities and the arts, Musical, 050105 experimental psychology, 060404 music, Vibrato, Tone (musical instrument), Discrete Fourier transform (general), Music theory, Audio analyzer, Intonation (music), 0501 psychology and cognitive sciences, 0604 arts, Music
Abstract: Vibrato is one of the most common musical techniques used for the enrichment of vocal and musical instrument sounds. We propose a method that can analyse the intonation, vibrato rate, and vibrato e...
Published: 2020

10. Acoustic signal analysis of instrument–tissue interaction for minimally invasive interventions

Author: Jonas Fuchtmann, Nassir Navab, Nicole Samm, Dirk Wilhelm, Matthias Seibold, Hubertus Feussner, and Daniel Ostler
Subjects: Visual perception, Audio analysis, Computer science, Swine, Feature extraction, Biomedical Engineering, Health Informatics, 030218 nuclear medicine & medical imaging, Feedback, 03 medical and health sciences, 0302 clinical medicine, Robotic Surgical Procedures, Minimally invasive surgery, Audio perception, Animals, Minimally Invasive Surgical Procedures, Radiology, Nuclear Medicine and imaging, Computer vision, ddc:610, Muscle, Skeletal, Spectrogram, Haptic technology, Audio signal, business.industry, Deep learning, General Medicine, Acoustics, Visceral surgery, Computer Graphics and Computer-Aided Design, ddc, Computer Science Applications, Liver, Feature (computer vision), Audio analyzer, Surgery, Audio feedback, Original Article, Computer Vision and Pattern Recognition, Artificial intelligence, Neural Networks, Computer, Haptic perception, business, 030217 neurology & neurosurgery
Abstract: Purpose Minimally invasive surgery (MIS) has become the standard for many surgical procedures as it minimizes trauma, reduces infection rates and shortens hospitalization. However, the manipulation of objects in the surgical workspace can be difficult due to the unintuitive handling of instruments and limited range of motion. Apart from the advantages of robot-assisted systems such as augmented view or improved dexterity, both robotic and MIS techniques introduce drawbacks such as limited haptic perception and their major reliance on visual perception. Methods In order to address the above-mentioned limitations, a perception study was conducted to investigate whether the transmission of intra-abdominal acoustic signals can potentially improve the perception during MIS. To investigate whether these acoustic signals can be used as a basis for further automated analysis, a large audio data set capturing the application of electrosurgery on different types of porcine tissue was acquired. A sliding window technique was applied to compute log-mel-spectrograms, which were fed to a pre-trained convolutional neural network for feature extraction. A fully connected layer was trained on the intermediate feature representation to classify instrument–tissue interaction. Results The perception study revealed that acoustic feedback has potential to improve the perception during MIS and to serve as a basis for further automated analysis. The proposed classification pipeline yielded excellent performance for four types of instrument–tissue interaction (muscle, fascia, liver and fatty tissue) and achieved top-1 accuracies of up to 89.9%. Moreover, our model is able to distinguish electrosurgical operation modes with an overall classification accuracy of 86.40%. Conclusion Our proof-of-principle indicates great application potential for guidance systems in MIS, such as controlled tissue resection. Supported by a pilot perception study with surgeons, we believe that utilizing audio signals as an additional information channel has great potential to improve the surgical performance and to partly compensate the loss of haptic feedback.
Published: 2020

11. Influence of adaptive thresholding on peaks detection in audio data

Author: Tomasz Maka
Subjects: Audio signal, Computer Networks and Communications, Heuristic (computer science), business.industry, Computer science, Pattern recognition, 02 engineering and technology, Function (mathematics), Thresholding, Signal, 030507 speech-language pathology & audiology, 03 medical and health sciences, Hardware and Architecture, Audio analyzer, Genetic algorithm, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, 020201 artificial intelligence & image processing, Artificial intelligence, 0305 other medical science, business, Software
Abstract: Many audio analysis systems employ peak picking procedure to produce the final decision. A typical scheme uses a thresholding function to minimise detection errors where its form depends on the structure of the input signal. The paper covers the problem of an adaptive thresholding function estimation. Using the genetic algorithm to optimise the components of the thresholding function we have determined the level of importance of individual local statistics on the final function representation. The proposed method has been used to tune the peak detection procedure to identify the change points in an audio signal. In the result of the heuristic configuration, the best accuracy of segment boundaries have been obtained for thresholding function built on top of two local statistics of the detection function and constant value. Finally, as an example, a comparison with the state–of–the–art scheme for audio segmentation was performed.
Published: 2020

12. Audio Example Recognition and Retrieval Based on Geometric Incremental Learning Support Vector Machine System

Author: Linyuan Fan
Subjects: Discrete wavelet transform, General Computer Science, Computer science, 02 engineering and technology, Signal, 03 medical and health sciences, symbols.namesake, 0302 clinical medicine, Wavelet, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, 030223 otorhinolaryngology, wavelet transform, Signal processing, Audio signal, business.industry, General Engineering, Pattern recognition, audio processing, Fourier transform, Frequency domain, Audio analyzer, symbols, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Mel-frequency cepstrum, Artificial intelligence, business, audio feature, lcsh:TK1-9971, Content audio
Abstract: With the fast development of computer and information technology, multimedia data has become the most important form of information media. Auditory information plays an important role in information location, this comes from the fact that it can be difficult to find useful information. Thus audio classification becomes more important in audio analysis as it prepares for content-based audio retrieval. There is quite a bit of research on the topic of audio classification methods, audio feature analysis, and extraction based on audio classification. Many works of literature extract features of audio signals based on time or Fourier transform frequency domain. The emergence of the wavelet theory provides a time-frequency analysis tool for signal analysis. Wavelet transformation is a local transformation of the signal in time and frequency which can effectively extract information from the signal, and perform multi-scale refinement analysis on functions or signals through operations such as stretching and translation instead of the traditional Fourier transformation. In the time-frequency analysis of the signal, the wavelet analysis captures the local time and frequency characters of the signal which can improve the ability of signal analysis. It can also change certain locals of the signal without affecting other aspects of it. In this paper, the frequency domain features are combined with the wavelet domain features. At the same time that the MFCC features are extracted, the discrete wavelet transform is used to extract the features of the wavelet domain. Then the statistical features are extracted for each audio example, and the SVM model is used to realize the different forms of audio classification identification.
Published: 2020

13. Learning CNN features from DE features for EEG-based emotion recognition

Author: Hyeran Byun, Sunhee Hwang, Guiyoung Son, and Kibeom Hong
Subjects: medicine.diagnostic_test, business.industry, Computer science, 020207 software engineering, Pattern recognition, 02 engineering and technology, Electroencephalography, Convolutional neural network, Visualization, Differential entropy, ComputingMethodologies_PATTERNRECOGNITION, Signal classification, Artificial Intelligence, Audio analyzer, 0202 electrical engineering, electronic engineering, information engineering, medicine, Deep neural networks, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Emotion recognition, business
Abstract: Recently, deep neural networks (DNNs) have shown the remarkable success of feature representations in computer vision, audio analysis, and natural language processing. Furthermore, DNNs have been used for electroencephalography (EEG) signal classification in recent studies on brain–computer interface. However, most works use one-dimensional EEG features to learn DNNs that ignores the local information within multichannel or multiple frequency bands in the EEG signals. In this paper, we propose a novel emotion recognition method using a convolutional neural network (CNN) while preventing the loss of local information. The proposed method consists of two parts. The first part generates topology-preserving differential entropy features while keeping the distance from the center electrode to other electrodes. The second part learns the proposed CNN to estimate three-class emotional states (positive, neutral, negative). We evaluate our work on SEED dataset, including 62-channel EEG signals recorded from 15 subjects. Our experimental results demonstrate that the proposed method achieved superior performance on SEED dataset with an average accuracy of 90.41% with the visualization of extracted features from the proposed CNN using t-SNE to show our representation outperforms the other representations based on standard features for EEG analysis. Besides, with the additional experiment on VIG dataset to estimate the vigilance of EEG dataset, we show the off-the-shelf availability of the proposed method.
Published: 2019

14. A Deep Audiovisual Approach for Human Confidence Classification

Author: Sushovan Chanda, Kedar Fitwe, Gauri Deshpande, Björn W. Schuller, and Sachin Patel
Subjects: data collection, Computer science, Bi-LSTM, computer.software_genre, Session (web analytics), confidence classification, video analysis, General Environmental Science, Data collection, Recall, business.industry, Deep learning, General Engineering, deep neural network, Multiple applications, audio analysis, QA75.5-76.95, Electronic computers. Computer science, Audio analyzer, ddc:000, General Earth and Planetary Sciences, Artificial intelligence, Communication skills, business, computer, Natural language processing
Abstract: Research on self-efficacy and confidence has spread across several subfields of psychology and neuroscience. The role of one’s confidence is very crucial in the formation of attitude and communication skills. The importance of differentiating the levels of confidence is quite visible in this domain. With the recent advances in extracting behavioral insight from a signal in multiple applications, detecting confidence is found to have great importance. One such prominent application is detecting confidence in interview conversations. We have collected an audiovisual data set of interview conversations with 34 candidates. Every response (from each of the candidate) of this data set is labeled with three levels of confidence: high, medium, and low. Furthermore, we have also developed algorithms to efficiently compute such behavioral confidence from speech and video. A deep learning architecture is proposed for detecting confidence levels (high, medium, and low) from an audiovisual clip recorded during an interview. The achieved unweighted average recall (UAR) reaches 85.9% on audio data and 73.6% on video data captured from an interview session.
Published: 2021

15. Summing Node and False Summing Node Methods: Accurate Operational Amplifier AC Characteristics Testing without Audio Analyzer

Author: Toshiyuki Okamoto, Daisuke Iimori, Takashi Ishida, Akemi Hatta, Jianglin Wei, Shogo Katayama, Haruo Kobayashi, Keno Sato, Kazumi Hatayama, Anna Kuwana, Gaku Ogihara, Takayuki Nakatani, Tamotsu Ichikawa, Minh Tri Tran, and Yujie Zhao
Subjects: business.industry, Computer science, law, Node (networking), Audio analyzer, Operational amplifier, business, Computer hardware, law.invention
Published: 2021

16. A TMR Angle Sensor for Gesture Acquisition and Disambiguation on the Electric Guitar

Author: Andrew P. McPherson and Adan L. Benito Temprano
Subjects: Electric guitar, business.industry, Transcription (music), Computer science, String (computer science), Audio analyzer, Computer vision, Artificial intelligence, Guitar, business, Displacement (vector), Gesture, String bending
Abstract: This paper presents a novel approach to the acquisition of musical gestures on guitar based on Tunneling Magnetoresistance (TMR) sensing. With this minimally invasive setup, tracking of the horizontal displacement of the strings is used to capture gestures related to left and right-hand techniques. A pitch-based calibration is suggested to map the sensed displacement to pitch shifts so that the acquired signals can be directly used to estimate pitch produced by string bending in real-time. Some of the performer’s gestures, despite corresponding to different physical interactions, might produce a similar sonic output, as is the case of upward and downward string bends on the guitar. The proposed technology can be used to disambiguate between these gestures whether that is for automatic transcription purposes or for crafting instrument augmentations that build upon the performer’s existing expertise.
Published: 2021

17. Integrated Cloud-based System for Endangered Language Documentation and Application

Author: Jignasha Borad, Min Chen, James Randall, and Mizuki Miyashita
Subjects: Computer science, business.industry, Cloud computing, Language documentation, computer.software_genre, World Wide Web, Documentation, Endangered language, Transcription (linguistics), Audio analyzer, Web service, business, computer, Host (network)
Abstract: Nearly half of the world languages are considered endangered and need to be documented, analyzed, and revitalized. However, existing linguistics tools lack the accessibility to effectively analyze languages such as Blackfoot in which relative pitch movement is significant, e.g., words with the same sound sequence but convey different meanings when changing in pitches. To address this issue, we present a novel form of audio analysis with perceptual scale, and develop a consolidated and interactive toolset called MeTILDA (Melodic Transcription in Language Documentation and Analysis) to effectively capture perceived changes in pitch movement and to host other existing desktop-based linguistic tools on the cloud to enable collaboration, data-sharing, and data reuse among multiple linguistic tools.
Published: 2021

18. Adaptive sinusoidal models for speech with applications in speech modifications and audio analysis

Author: George P. Kafentzis
Subjects: Audio signal, Computer science, Speech recognition, Audio analyzer, Speech coding, Acoustic model, Musical instrument, Linear predictive coding, Audio signal processing, computer.software_genre, Speech processing, computer
Abstract: Η Ημιτονοειδής Μοντελοποίηση είναι μια από τις πιο ευρέως χρησιμοποιούμενες παραμετρικές μεθόδους για την επεξεργασία σήματος φωνής και ήχου. Η ακριβής εκτίμηση των ημιτονοειδών παραμέτρων (πλάτη, συχνότητες, και φάσεις) είναι ένα κρίσιμο σημείο για τη ακριβή αναπαράσταση των σημάτων που αναλύονται. Στην παρούσα εργασία, με βάση τις πρόσφατες εξελίξεις στην ημιτονοειδή ανάλυση, προτείνουμε υψηλής ανάλυσης, προσαρμόσιμα ημιτονοειδή μοντέλα για συστήματα ανάλυσης, σύνθεσης, και τροποποίησης ομιλίας. Στόχος μας είναι να προσφέρουμε συστήματα που αναπαριστούν σήματα φωνής με εξαιρετικά ακριβή και συμπαγή τρόπο.Εμπνευσμένοι από πρόσφατα προταθέντα μοντέλα, όπως το προσαρμόσιμο Σχεδον - Αρμονικό Μοντέλο (aQHM) και το προσαρμόσιμο Αρμονικό Μοντέλο (aHM), διατυπώνουμε τη θεωρία της προσαρμόσιμης Ημιτονοειδούς Μοντελοποίησης και προτείνουμε ένα μοντέλο που ονομάζεται εκτεταμένο προσαρμόσιμο Σχεδον - Αρμονικό Μοντέλο (eaQHM), το οποίο είναι ένα μη παραμετρικό μοντέλο, ικανό να προσαρμόσει τα στιγμιαία πλάτη και φάσεις των συναρτήσεων βάσης του στα τοπικά χρονικά μεταβαλλόμενα χαρακτηριστικά του σήματος της φωνής, αμβλύνοντας έτσι τη γνωστή υπόθεση της τοπικής στασιμότητας. Αποδεικνύεται ότι το eaQHM παρουσιάζει υψηλότερες επιδόσεις από το aQHM στην ανάλυση και ανασύνθεση των έμφωνων τμημάτων φωνής. Με βάση το eaQHM, ένα υβριδικό σύστημα ανάλυσης / σύνθεσης ομιλίας παρουσιάζεται (eaQHNM), μαζί με μια υβριδική έκδοση του του aHM (aHNM). Επιπλέον, παρουσιάζουμε κίνητρα για μια αναπαράσταση του σήματος της φωνής σε όλο το φάσμο και σε όλη τη διάρκεια του, χρησιμοποιώντας το eaQHM, αναπαριστώντας έτσι όλα τα μέρη του σήματος της φωνής, με υψηλής ανάλυσης AM-FM ημίτονα. Η αξιολόγηση δείχνει ότι η προσαρμοσιμότητα και η σχεδόν-αρμονικότητα είναι αρκετή για να παράξει πολύ υψηλή ποιότητα στην ανασύνθεση των άφωνων τμημάτων της φωνής. Στη συνέχεια, παρουσιάζεται το σύστημα πλήρους φάσματος ανάλυσης και σύνθεσης βασισμένο στο eaQHM, το οποίο υπερτερεί συστημάτων που θεωρούνται state-of-the-art, υβριδικά ή πλήρους ανάλυσης, στην ανάλυση και ανασύνθεση φωνής. Η υπεροχή του στην ποιότητα ανασύνθεσης επιβεβαιώθηκε με αντικειμενικές και υποκειμενικές αξιολογήσεις.Όσον αφορά τις εφαρμογές, το eaQHM και το aHM εφαρμόζονται σε μετασχηματισμούς φωνής (κλιμάκωση χρόνου και κλιμάκωση θεμελιώδους συχνότητας). Οι μετασχηματισμοί που προκύπτουν είναι υψηλής ποιότητας, ακολουθώντας πολύ απλούς κανόνες, σε σύγκριση με άλλα συστήματα state-of-the-art. Οι έννοιες της σχετικής φάσης και της καθυστέρησης σχετικής φάσης είναι ζωτικής σημασίας για την ανάπτυξη μετασχηματισμένου σήματος με χαρακτηριστικά αναλλοίωτου σχήματος, χωρίς τεχνικά ελαττώματα, και υψηλής ποιότητας. Τα αποτελέσματα δείχνουν ότι τα συστηματα βασισμένα στην αρμονικότητα προτιμούνται έναντι αυτών της σχεδόν-αρμονικότητας, λόγω της απλότητας της αναπραστάσης. Επιπλέον, το eaQHM εφαρμόζεται στο πρόβλημα της μοντελοποίησης σημάτων ήχου, και συγκεκριμένα ήχων μουσικών οργάνων. Το eaQHM αξιολογείται και σύγκρινεται με state-of-the-art συστήματα, και έχει υψηλές επιδόσεις όσον αφορά την ποιότητα επανασύνθεσης, αναπαριστωντας με επιτυχία τα στάδια της επίθεσης, της μετάβασης, και της στατικότητας ενός ήχου μουσικού οργάνου. Τέλος, μια άλλη προτεινόμενη εφαρμογή έγκειται στην ανάλυση και ταξινόμηση της εκφραστικής ομιλίας. Το eaQHM εφαρμόζεται στην ανάλυση της εκφραστικής ομιλίας, παρέχοντας τις στιγμιαίες παραμέτρους του ως χαρακτηριστικά που μπορούν να χρησιμοποιηθούν στην αναγνώριση και ταξινόμηση, βασισμένη σε διανυσματικούς κβαντιστές, εκφραστικής ομιλίας. Αν και τα ημιτονοειδή μοντέλα δεν χρησιμοποιούνται συνήθως σε τέτοιες εφαρμογές, τα αποτελέσματα είναι ελπιδοφόρα.
Published: 2021

19. Implementation of a Supervised Learning Model for Raga Identification in Carnatic Music

Author: Snehlata Barde and Veena Kaimal
Subjects: Audio signal, Computer science, business.industry, Supervised learning, Feature extraction, Python (programming language), computer.software_genre, Support vector machine, Identification (information), Audio analyzer, Artificial intelligence, Audio signal processing, business, computer, Natural language processing, computer.programming_language
Abstract: Music is one of the most pious cultures that exist throughout the world. Almost every country has their own rich musical culture, most of them are with strong classical backgrounds. India is one such land, rich in various types of music. This research paper is about the study of the identification of Raga. ‘Raga’ is said to be the melody of a song. We have used the Audio Signal Processing (ASP) techniques for feature extraction of two very similar ragas from Carnatic Music (CM) by analyzing the ‘RagaChhaya’ swaras commonly known as ‘RagaLakshna’ swaras of these ragas using Essentia and SMS tools. Essentia is an open-source library based on C++ with python and JavaScript wrappers for extensive audio analysis and MIR studies. The ‘SMS-tools’ is an application called ‘Spectral Modelling and Synthesis’ used to analyze sound and music applications. We proposed a supervised machine learning model trained with the features extracted from the two ragas considered, using the SMS-tools, for testing and training data that compares the given input piece of the audio signal to correctly identify based on the trained model.
Published: 2021

20. MeTILDA: Platform for Melodic Transcription in Language Documentation and Application

Author: Mitchell B. Lee, Praveena Avula, and Min Chen
Subjects: Melody, Transcription (linguistics), Pitch accent, Endangered language, Computer science, Audio analyzer, Language documentation, Pronunciation, Variety (linguistics), Linguistics
Abstract: Blackfoot language is an endangered language that needs to be documented, analyzed, and preserved. Blackfoot is challenging to learn and teach because it is a pitch accent language whose words with same characters can take on different meanings when changing in pitch. Linguistics researchers are working to create visual aids, called Pitch Art, to teach the nuance in pitch changes. However, the existing techniques used to create Pitch Art fail to accurately indicate changes in pitch and require time-consuming work across multiple applications. To address this issue, this project proposes a system called MeTILDA (Melodic Transcription in Language Documentation and Application) to provide new forms of audio analysis and to automate the process of creating Pitch Art. MeTILDA provides value to a variety of stakeholders. Linguistics researchers are provided with tools to analyze and compare Blackfoot speeches. Teachers are given collections of words and recordings from native speakers to teach students. Students are given the ability to compare their own pronunciation of Blackfoot words to that of native speakers. We also present a new form of audio analysis, called perceptual scale, to provide more effective visuals of perceived changes in pitch movement. By collaborating with domain experts in this field, we have validated the effectiveness of MeTILDA in creating Pitch Art using the perceptual scale.
Published: 2021

21. Securing Audio Using AES-based Authenticated Encryption with Python

Author: Jessy Ayala
Subjects: Authenticated encryption, business.industry, Computer science, Data_MISCELLANEOUS, Cryptography, Data_CODINGANDINFORMATIONTHEORY, Python (programming language), computer.software_genre, mathematics_computer_science_other, Audio analyzer, Operating system, Hardware_ARITHMETICANDLOGICSTRUCTURES, business, computer, computer.programming_language
Abstract: The focus of this research is to analyze the results of encrypting audio using various authenticated encryption algorithms implemented in the Python cryptography library for ensuring authenticity and confidentiality of the original contents. The Advanced Encryption Standard (AES) is used as the underlying cryptographic primitive in conjunction with various modes including Galois Counter Mode (GCM), Counter with Cipher Block Chaining Message Authentication Code (CCM), and Cipher Block Chaining (CBC) with Keyed-Hashing for encrypting a relatively small audio file. The resulting encrypted audio shows similarity in the variance when encrypting using AES-GCM and AES-CCM. There is a noticeable reduction in variance of the performed encodings and an increase in the amount of time it takes to encrypt and decrypt the same audio file using AES-CBC with Keyed-Hashing. In addition, the corresponding encrypted using this mode audio spans a longer duration. As a result, AES should either have GCM or CCM for an efficient and reliable authenticated encryption integration within a workflow.
Published: 2021

22. A Cross-platform Interface for Automatic Speaker Identification and Verification

Author: Thipe Isaiah Modipa, Tshephisho Joseph Sefara, Madimetja Jonas Manamela, and Tumisho Billson Mokgonyane
Subjects: Identification (information), Computer science, Interface (Java), Speech recognition, Multilayer perceptron, Feature extraction, Audio analyzer, Language technology, Classifier (linguistics), Speaker recognition
Abstract: The task of automatically identifying and/or verifying the identity of a speaker from a recording of a speech sample, known as automatic speaker recognition, has been studied for many years and automatic speaker recognition technologies have improved recently and becoming inexpensive and reliable methods for identifying and verifying people. Although automatic speaker recognition research has now spanned over 50 years, there is not adequate research done with regards to low-resourced South African indigenous languages. In this paper, a multi-layer perceptron (MLP) classifier model is trained and deployed on a graphical user interface for real time identification and verification of Sepedi native speakers. Sepedi is a low-resourced language spoken by the majority of residents in the Limpopo province of South Africa. The data used to train the speaker recognition system is obtained from the NCHLT (National Centre for Human Language Technology) project. A total of 34 short-term acoustic features of speech are extracted with the use of py Audio Analysis library and Sklearn is used to train the MLP classifier model which performs well with an accuracy of 95%. The GUI is developed with QT Creator and PyQT4 and it has obtained a true acceptance rate (TAR) of 66.67% and a true rejection rate of (TRR) 13.33%.
Published: 2021

23. Deep Learning with hyper-parameter tuning for COVID-19 Cough Detection

Author: Michael Esposito, Andreas Spanias, Jayaraman J. Thiagarajan, Sunil Rao, and Vivek Sivaraman Narayanaswamy
Subjects: Hyperparameter, Coronavirus disease 2019 (COVID-19), Computer science, business.industry, Deep learning, Feature extraction, Machine learning, computer.software_genre, Cross entropy, Audio analyzer, Spectrogram, Sensitivity (control systems), Artificial intelligence, business, computer
Abstract: As the COVID-19 pandemic continues, rapid non-invasive testing has become essential. Recent studies and benchmarks motivates the use of modern artificial intelligence (AI) tools that utilize audio waveform spectral features of coughing for COVID-19 diagnosis. In this paper, we describe the system we developed for COVID-19 cough detection. We utilize features directly extracted from the coughing audio and use deep learning algorithms to develop automated diagnostic tools for COVID-19. In particular, we develop a unique modification of the VGG13 deep learning architecture for audio analysis that uses log-mel spectrograms and a combination of binary cross entropy and focal losses. This unique modification enabled the model to achieve highly robust classification of the DiCOVA 2021 COVID-19 data. We also explore the use of data augmentation and an ensembling strategy to further improve the performance on the validation and the blind test datasets. Our model achieved an average validation AUROC of 82.23% and a test AUROC of 78.3% at a sensitivity of 80.49%.
Published: 2021

24. A Novel Music Genre Classification Using Convolutional Neural Network

Author: Lakshman Kumar Puppala, P. Selvi Rajendran, Sudarshan Reddy Chinige, and Siva Sankar Reddy Muvva
Subjects: Artificial neural network, Computer science, business.industry, Deep learning, Feature vector, Speech recognition, Feature extraction, Convolutional neural network, Support vector machine, ComputingMethodologies_PATTERNRECOGNITION, Audio analyzer, Artificial intelligence, Mel-frequency cepstrum, business
Abstract: Because of the rise of music songs, both online and offline, genre differentiation of music is increasingly important in today's environment. This increases the need to properly catalog and get more access. Music classification is important when searching for music in a large set. Machine learning methods are used in the bulk of modern type classification techniques known as KNN, SVM, etc. This article presents the GT-ZAN music dataset. It has ten dissimilar musical genres. Deep learning technique is encoded to coach and identify the method. For training and classification, a convolution neural network is used. The most important role in the audio analysis is feature extraction. As a function vector, the Mel Frequency Cepstral constant (MFCC) is utilized for the sound sample. By extracting the feature vector, the planned framework categorizes music into varied genres. Our study found that our project has an accuracy level of about 97% for training and 74% for testing, which will significantly boost and encourage the classification of music genres.
Published: 2021

25. Speech Emotion Recognition System With Librosa

Author: V. Siva Nagaraju, P. Ashok Babu, and Rajeev Ratna Vallabhuni
Subjects: Computer science, business.industry, Speech recognition, Deep learning, SIGNAL (programming language), Python (programming language), Communications system, Visualization, Audio analyzer, Spectrogram, The Internet, Artificial intelligence, business, computer, computer.programming_language
Abstract: In this paper, we propose a system that will analyze the speech signals and gather the emotion from the same efficient solution based on combinations. This system solely served to identify emotions present in the signal or speech using concepts of deep learning and algorithms of machine learning (ML). Using the above mentioned, the system will determine the eight emotions present in the speech signal; anger, sad, happy, neutral, calm, fearful, disgust and surprised. The system is built with the language python and librosa, sound file libraries, which are part of the more extensive scikit library used for specific applications of audio analysis. The system will receive the sound files from the dataset present on the internet called RAVDESS. It will then analyze the audio files' spectrograms in WAV format and return us the efficiency of the system, which is the intended Outcome. We have achieved an efficiency rate of 81.82%.
Published: 2021

26. Learning representations of sound using trainable COPE feature extractors

Author: Nicolai Petkov, Mario Vento, Nicola Strisciuglio, and Intelligent Systems
Subjects: Trainable feature extractors, Audio analysis, Computer science, Feature vector, 02 engineering and technology, 01 natural sciences, Representation learning, CLASSIFICATION, Background noise, Artificial Intelligence, Robustness (computer science), 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, VESSEL DELINEATION, 010306 general physics, Audio signal, business.industry, RECOGNITION, Peaks of energy, Pattern recognition, TIME, MODEL, Signal Processing, Audio analyzer, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Event detection, business, Feature learning, Classifier (UML), Software
Abstract: Sound analysis research has mainly been focused on speech and music processing. The deployed methodologies are not suitable for analysis of sounds with varying background noise, in many cases with very low signal-to-noise ratio (SNR). In this paper, we present a method for the detection of patterns of interest in audio signals. We propose novel trainable feature extractors, which we call COPE (Combination of Peaks of Energy). The structure of a COPE feature extractor is determined using a single prototype sound pattern in an automatic configuration process, which is a type of representation learning. We construct a set of COPE feature extractors, configured on a number of training patterns. Then we take their responses to build feature vectors that we use in combination with a classifier to detect and classify patterns of interest in audio signals. We carried out experiments on four public data sets: MIVIA audio events, MIVIA road events, ESC-10 and TU Dortmund data sets. The results that we achieved (recognition rate equal to 91.71% on the MIVIA audio events, 94% on the MIVIA road events, 81.25% on the ESC-10 and 94.27% on the TU Dortmund) demonstrate the effectiveness of the proposed method and are higher than the ones obtained by other existing approaches. The COPE feature extractors have high robustness to variations of SNR. Real-time performance is achieved even when the value of a large number of features is computed.
Published: 2019

27. Audio Analysis and Classification: A Review

Author: Aas Mohammad and Manish Madhava Tripathi
Subjects: Computer science, Speech recognition, Audio analyzer
Published: 2019

28. A framework for computer-assisted sound design systems supported by modelling affective and perceptual properties of soundscape

Author: Miles Thorogood, Jianyu Fan, and Philippe Pasquier
Subjects: Soundscape, Visual Arts and Performing Arts, Computer science, Metaphor, media_common.quotation_subject, Sound design, 05 social sciences, 06 humanities and the arts, Virtual reality, 050105 experimental psychology, 060404 music, Sound art, Human–computer interaction, Perception, Audio analyzer, 0501 psychology and cognitive sciences, Composition (language), 0604 arts, Music, media_common
Abstract: Autonomously generating artificial soundscapes for video games, virtual reality, and sound art presents several non-trivial challenges. We outline a system called Audio Metaphor that is built upon ...
Published: 2019

29. Research on sound classification based on SVM

Author: Pengcheng Wei, Fangcheng He, Jing Li, and Li Li
Subjects: 0209 industrial biotechnology, Artificial neural network, Computer science, business.industry, 02 engineering and technology, Machine learning, computer.software_genre, Field (computer science), Support vector machine, 020901 industrial engineering & automation, Artificial Intelligence, Audio analyzer, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Software, Digital audio
Abstract: Sound is a ubiquitous natural phenomenon that contains a wealth of information that constantly enhances our understanding of the objective world. With the continuous development of computer network technology and communication technology, audio information has become a very important part. Audio is a non-semantic symbolic representation and an unstructured binary stream. Because the audio itself lacks the description of content semantics and structured organization, it brings great difficulty to the audio classification work. The research of digital audio classification will become more and more important with the increasing number of digital audio resources in the network. Digital audio classification technology is the key technology to solve this problem. It is the key to solve the problem of audio structure and extract audio structured information and content semantics. It is a research hot spot in the field of audio analysis. It has important application value in many fields, such as audio retrieval, video summary and auxiliary video analysis. This paper studies the structure of audio, the analysis and extraction of audio features, the digital audio classifier based on support vector machines (SVM) and the audio segmentation technology based on BCI. SVM is an important achievement of machine learning research in recent years. As a new machine learning method, SVM can solve practical problems such as small sample, nonlinearity and high dimension, so it has become a new research hot spot after the study of neural network. Experiments show that the SVM-based audio classification algorithm has good classification effect, and the smoothed audio segmentation results are more accurate. With the further development of the research, the research results will be well applied in practice.
Published: 2019

30. On the automatic audio analysis and classification of cry for infant pain assessment

Author: E. Baccaglini, Emilia Parodi, Riccardo Scopigno, D. Ricossa, and E. Di Nardo
Subjects: endocrine system, Linguistics and Language, Scoring system, Computer science, business.industry, Spectral entropy, Pain management, Infant pain, computer.software_genre, Language and Linguistics, Human-Computer Interaction, 030507 speech-language pathology & audiology, 03 medical and health sciences, Distress, Pain assessment, Infant cry analysis · Machine-based infant pain assessment tool · Spectral entropy analysis, Audio analyzer, Observational study, Computer Vision and Pattern Recognition, Artificial intelligence, 0305 other medical science, business, computer, Software, Natural language processing
Abstract: The effectiveness of pain management relies on the choice and the correct use of suitable pain assessment tools. In the case of newborns, some of the most common tools are human-based and observational, thus affected by subjectivity and methodological problems. Therefore, in the last years there has been an increasing interest in developing an automatic machine-based pain assessment tool. This research is a preliminary investigation towards the inclusion of a scoring system for the vocal expression of the infant into an automatic tool. To this aim we present a method to compute three correlated indicators which measure three distress-related features of the cry: duration, dysphonantion and fundamental frequency of the first cry. In particular, we propose a new method to measure the dysphonantion of the cry via spectral entropy analysis, resulting in an indicator that identifies three well separated levels of distress in the vocal expression. These levels provide a classification that is highly correlated with the human-based assessment of the cry.
Published: 2019

31. Adaptive Mid-Term Representations for Robust Audio Event Classification

Author: Francesc J. Ferri, Irene Martin-Morato, and Maximo Cobos
Subjects: Audio signal, Acoustics and Ultrasonics, Computer science, business.industry, Feature vector, Pattern recognition, 01 natural sciences, 030507 speech-language pathology & audiology, 03 medical and health sciences, Computational Mathematics, Nonlinear system, Framing (construction), Acoustic event detection, 0103 physical sciences, Audio analyzer, Computer Science (miscellaneous), Segmentation, Artificial intelligence, Electrical and Electronic Engineering, 0305 other medical science, business, 010301 acoustics, Temporal information
Abstract: Low-level audio features are commonly used in many audio analysis tasks, such as audio scene classification or acoustic event detection. Due to the variable length of audio signals, it is a common approach to create fixed-length feature vectors consisting of a set of statistics that summarize the temporal variability of such short-term features. To avoid the loss of temporal information, the audio event can be divided into a set of mid-term segments or texture windows. However, such an approach requires to estimate accurately the onset and offset times of the audio events in order to obtain a robust mid-term statistical description of their temporal evolution. This paper proposes the use of an alternative event representation based on nonlinear time normalization prior to the extraction of mid-term statistics. The short-term features are transformed into a new fixed-length representation that considers uniform distance subsampling over a defined feature space in contrast to the classical short-term temporal framing. The results show that the use of distance-based texture windows provides an improved statistical description of the event robust to errors in the event segmentation stage under noisy conditions.
Published: 2018

32. A Unified Audio Analysis Framework For Movie Genre Classification Using Movie Trailers

Author: Aditya Sharma, Mayank Jindal, Ayush Mittal, and Dinesh Kumar Vishwakarma
Subjects: Artificial neural network, Computer science, business.industry, Supervised learning, k-means clustering, computer.software_genre, Film genre, Robustness (computer science), Audio analyzer, Unsupervised learning, Artificial intelligence, business, Cluster analysis, computer, Natural language processing
Abstract: The audio content of the movie trailers carries various prominent characteristics that can be exploited to predict the genres of the movie. None of the previous approaches have focused on extracting the audio features from the trailers using a clustering-based unsupervised learning approach followed by distance-based supervised learning. Hence, in this paper, we propose a novel framework for movie genre classification using audio features of movie trailers. Movie trailers belonging to the five most generic and popular genres (i.e., Action, Romance, Horror, Science Fiction, and Comedy) are considered in the work. The proposed AFAnet architecture is trained on a total of 78 features including 68 audio features and 10 distance-based features extracted after the k-means clustering on the audio chunks. The cross-dataset validation is performed using the standard LMTD dataset to validate the performance of our proposed framework. The results obtained depict that our model has performed excellently and show the robustness of the proposed approach.
Published: 2021

33. Deep learning based respiratory sound analysis for detection of chronic obstructive pulmonary disease

Author: Sharnil Pandya, Ryan Miranda, Arpan Srivastava, Sonakshi Jain, Shruti Patil, and Ketan Kotecha
Subjects: General Computer Science, Helping hand, Computer science, Data Mining and Machine Learning, Medical-assistive technology, 02 engineering and technology, Disease, Machine learning, computer.software_genre, Convolutional neural network, lcsh:QA75.5-76.95, CNN based classification, 03 medical and health sciences, 0302 clinical medicine, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Medical imaging, medicine, Respiratory sounds, medicine.diagnostic_test, business.industry, Deep learning, Respiratory sound analysis, Natural Language and Speech, Audio analyzer, 020201 artificial intelligence & image processing, Mel-frequency cepstrum, Artificial intelligence, lcsh:Electronic computers. Computer science, business, computer, 030217 neurology & neurosurgery
Abstract: In recent times, technologies such as machine learning and deep learning have played a vital role in providing assistive solutions to a medical domain’s challenges. They also improve predictive accuracy for early and timely disease detection using medical imaging and audio analysis. Due to the scarcity of trained human resources, medical practitioners are welcoming such technology assistance as it provides a helping hand to them in coping with more patients. Apart from critical health diseases such as cancer and diabetes, the impact of respiratory diseases is also gradually on the rise and is becoming life-threatening for society. The early diagnosis and immediate treatment are crucial in respiratory diseases, and hence the audio of the respiratory sounds is proving very beneficial along with chest X-rays. The presented research work aims to apply Convolutional Neural Network based deep learning methodologies to assist medical experts by providing a detailed and rigorous analysis of the medical respiratory audio data for Chronic Obstructive Pulmonary detection. In the conducted experiments, we have used a Librosa machine learning library features such as MFCC, Mel-Spectrogram, Chroma, Chroma (Constant-Q) and Chroma CENS. The presented system could also interpret the severity of the disease identified, such as mild, moderate, or acute. The investigation results validate the success of the proposed deep learning approach. The system classification accuracy has been enhanced to an ICBHI score of 93%. Furthermore, in the conducted experiments, we have applied K-fold Cross-Validation with ten splits to optimize the performance of the presented deep learning approach.
Published: 2021

34. Intelligent audio analysis techniques for identification of music in smart devices

Author: Pragun Mangla, Shefali Arora, and Mohinder Pal Singh Bhatia
Subjects: Identification (information), Spoofing attack, Human–computer interaction, Computer science, business.industry, Deep learning, Audio analyzer, Artificial intelligence, business
Published: 2021

35. Speech Emotion Recognition using Time Distributed CNN and LSTM

Author: Smita Bharne, Omkar Narvade, Beenaa Salian, and Rujuta Tambewagh
Subjects: Hybrid neural network, Noise, Long short term memory, Artificial neural network, Computer science, Speech recognition, Audio analyzer, Spectrogram, Emotion recognition, Information technology, Layer (object-oriented design), T58.5-58.64
Abstract: Speech has several distinguishing characteristic features which has remained a state-of-the-art tool for extracting valuable information from audio samples. Our aim is to develop a emotion recognition system using these speech features, which would be able to accurately and efficiently recognize emotions through audio analysis. In this article, we have employed a hybrid neural network comprising four blocks of time distributed convolutional layers followed by a layer of Long Short Term Memory to achieve the same.The audio samples for the speech dataset are collectively assembled from RAVDESS, TESS and SAVEE audio datasets and are further augmented by injecting noise. Mel Spectrograms are computed from audio samples and are used to train the neural network. We have been able to achieve a testing accuracy of about 89.26%.
Published: 2021

36. The WASABI Dataset: Cultural, Lyrics and Audio Analysis Metadata About 2 Million Popular Commercially Released Songs

Author: Michel Buffa, Franck Michel, Michael Fell, Alain Giboin, Maroua Tikat, Marco Winckler, Guillaume Pellerin, Johan Pauwels, Romain Hennequin, Elena Cabrio, Fabien Gandon, Web-Instrumented Man-Machine Interactions, Communities and Semantics (WIMMICS), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Scalable and Pervasive softwARe and Knowledge Systems (Laboratoire I3S - SPARKS), Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Deezer Research, Queen Mary University of London (QMUL), Institut de Recherche et Coordination Acoustique/Musique (IRCAM), Université Nice Sophia Antipolis (... - 2019) (UNS), and COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (... - 2019) (UNS)
Subjects: [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB], Audio signal, Linked data, Computer science, 05 social sciences, Named entites, 02 engineering and technology, computer.file_format, Lyrics, World Wide Web, Metadata, Popular music, Music metadata, Lyrics analysis, Audio analyzer, 0202 electrical engineering, electronic engineering, information engineering, SPARQL, 020201 artificial intelligence & image processing, 0509 other social sciences, 050904 information & library sciences, Semantic Web, computer
Abstract: International audience; Since 2017, the goal of the two-million song WASABI database has been to build a knowledge graph linking collected metadata (artists,discography, producers, dates, etc.) with metadata generated by the analysis of both the songs’ lyrics (topics, places, emotions, structure, etc.) and audio signal (chords, sound, etc.). It relies on natural language processing and machine learning methods for extraction, and semantic Web frameworks for representation and integration. It describes more than 2 millions commercial songs, 200K albums and 77K artists. It can be exploited by music search engines, music professionals (e.g. journalists, radio presenters, music teachers) or scientists willing to analyze popular music published since 1950. It is available under an open license, inmultiple formats and with online and open source services including aninteractive navigator, a REST API and a SPARQL endpoint.
Published: 2021

37. Assessing the QoME of NMP via Audio Analysis Tools

Author: George Xylomenos and Konstantinos Tsioutas
Subjects: Multimedia, Computer science, Audio analyzer, computer.software_genre, computer
Published: 2021

38. Mastering audio analysis

Author: John Paul Braddock
Subjects: Multimedia, Computer science, Audio analyzer, computer.software_genre, computer
Published: 2020

39. Speaker Recognition for Digital Forensic Audio Analysis using Support Vector Machine

Author: Burhanuddin Dirgantoro, Casi Setianingsih, and Rinda Mardhotillah
Subjects: Support vector machine, ComputingMethodologies_PATTERNRECOGNITION, Dimension (vector space), Computer science, Speech recognition, Data classification, Pattern recognition (psychology), Audio analyzer, Mel-frequency cepstrum, Speaker recognition, Sentence
Abstract: Speaker Recognition is included in pattern recognition, where one of the most critical parts is the process of data classification. In the classification, the built system must estimate the classification of data into a classroom dimension closest to the training set. The speaker's introduction aims to identify evidence of speech recording by a handheld telephone that involves comparing one or more unidentified sound samples with one or more known sound samples. In this research, the data used in the form of evidence of recording conversation by telephone and recording of comparison of some unexpected. The part that is done is to classify speaker recognition with the Support Vector Machine (SVM) classification method to recognize the speaker. Using the SVM method, the accuracy of classifying the speaker's introduction is excellent. From the test results, the SVM method's use resulted in an accuracy rate of 86.67% for the test with the same sentence and up to 67% for different sentences to recognize the speaker with the values of C 0.01 and $\boldsymbol{\gamma}$ (Gamma) 0.0001.
Published: 2020

40. HOW TO SPOT COVID-19 PATIENTS: SPEECH & SOUND AUDIO ANALYSIS FOR PRELIMINARY DIAGNOSIS OF SARS-COV-2 CORONA PATIENTS

Author: Dinesh Kumar Sharma, Ashish Baldi, and Amit Sharma
Subjects: Therapy Area: Other, Coronavirus disease 2019 (COVID-19), Computer science, Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Speech recognition, India, Sample (statistics), How To…, 030204 cardiovascular system & hematology, Loudness, 03 medical and health sciences, COVID-19 Testing, 0302 clinical medicine, Artificial Intelligence, Phone, Humans, Speech, Medicine, 030212 general & internal medicine, SARS-CoV-2, business.industry, Respiration, COVID-19, Workload, General Medicine, Sound, Hotspot (Wi-Fi), Audio analyzer, Voice, The Internet, business, Timbre
Abstract: Background: The global cases of Covid-19 increasing day by day. On Nov. 25, 2020, a total of 59,850,910 cases reported globally with a 1,411,216 global death. In India, total cases in the country now stand at 91,77,841 including 86,04,955 recoveries and 4,38,667 active cases as of Nov. 24, 2020, as per data issued by ICMR. A new generation of voice/audio analysis application which can tell whether the person is suffering from COVID-19 or not. Aims: To describe how to establish a new generation of voice/audio analysis applications to identify the suspected covid-19 hidden cases in hotspot areas with the help of an audio sample of the general public. Materials & Methods: The different patents and data available as literature on the internet are evaluated to make a new generation of voice/audio analysis application with the help of an audio sample of the general public. Results: The collection of the audio sample will be done from the already suffered covid-19 patients in (.Wave files) personally or through phone calls. The audio samples like the sound of the cough, the pattern of breathing, respiration rate, and way of speech will be recorded. The parameters will be evaluated for loudness, articulation, tempo, rhythm, melody, and timbre. The analysis and interpretation of the parameters can be made through machine learning and artificial intelligence to detect corona cases with an audio sample. Discussion: The voice/audio application current project can be merged with a mobile App called “Aarogya Setu” by Govt. of India. The project can be implemented in the high-risk area of Covid-19 in the country. Conclusion: This new method of detecting cases will decrease the workload in the covid-19 laboratory.
Published: 2020

41. Towards Real-Time Illegal Logging Monitoring: Gas-Powered Chainsaw Logging Detection System using K-Nearest Neighbors

Author: Pauline C. Calica, Dylan Josh D. Lopez, Bernadette Andree D. R. Celestino, Yolanda D. Austria, John Daniel C. Arevalo, and Katami A. Dimapunong
Subjects: ALARM, Computer science, business.industry, Microcomputer, Audio analyzer, Real-time computing, Logging, Process (computing), Illegal logging, Modular design, business, Graphical user interface
Abstract: Deforestation is exponentially depleting the planet's biodiversity and natural ecosystems at an alarming rate. This research aims to address illegal logging through realtime alerting and monitoring of suspected gas-fueled chainsaw sounds in the forest. Features were extracted from a collated nature sound dataset and trained on a supervised machine learning algorithm. The model is deployed through a microcomputer to process the chainsaw sounds through radio frequency transmission. The system has a desktop application that triggers an alarm and visualizes relevant information from the detected illegal logging activity location. The device prototype is easily-replaceable, modular, and portable and can be reconfigured to large-scale domains such as rainforests. The main contributions of this research are the improvement of alert and monitoring of illegal logging through (1) real-time and online audio analysis and detection of gas-powered chainsaws sounds through k-nearest neighbors; (2) a deployable prototype capable of listening to chainsaw sounds in the forest while buried, and (3) development of a graphical user interface for monitoring of module feedback and responses. The experimental results show that our system has an accuracy of 96.00% an F1-score of 94.34%.
Published: 2020

42. Unconventional Mechanisms for Biometric Data Acquisition via Side-Channels

Author: Max Smith-Creasey and Jonathan Francis Roscoe
Subjects: Spoofing attack, 020205 medical informatics, Biometrics, Computer science, Smart device, 020207 software engineering, Eavesdropping, 02 engineering and technology, Computer security, computer.software_genre, law.invention, law, Audio analyzer, 0202 electrical engineering, electronic engineering, information engineering, Side channel attack, computer, Hacker, Haptic technology
Abstract: In this paper, we discuss the proliferation of household smart devices and review the literature to explore whether the implementation characteristics of such systems may provide avenues of attack to obtain private biometric data. Examples include the use of mechanical hard drives as audio microphones and interception of soft-keyboard input through audio analysis of haptic feedback. As the use of biometric data increases in casual environments, the opportunity for it to be stolen in unexpected ways is also increasing. There are many examples of the technology being utilised by hackers to enable unexpected use such as spoofing. We examine the importance and sanctity of biometric data in the modern world and posit that manufacturers must avoid complacency and advocate secure design, to ensure security and privacy of users.
Published: 2020

43. Ar-DAD: Arabic diversified audio dataset

Author: Mohammed Lataifeh and Ashraf Elnagar
Subjects: Quran recitations, Arabic, Computer science, lcsh:Computer applications to medicine. Medical informatics, 03 medical and health sciences, 0302 clinical medicine, Machine learning, CLIPS, lcsh:Science (General), 030304 developmental biology, computer.programming_language, Data Article, Structure (mathematical logic), Imitators, 0303 health sciences, Multidisciplinary, Information retrieval, business.industry, Deep learning, Arabic audio clips, Cantillations, Speaker recognition, Popularity, language.human_language, Identification (information), Audio analyzer, language, lcsh:R858-859.7, Artificial intelligence, Speaker identification, business, computer, 030217 neurology & neurosurgery, lcsh:Q1-390
Abstract: The automatic identification and verification of speakers through representative audio continue to gain the attention of many researchers with diverse domains of applications. Despite this diversity, the availability of classified and categorized multi-purpose Arabic audio libraries is scarce. Therefore, we introduce a large Arabic-based audio clips dataset (15810 clips) of 30 popular reciters cantillating 37 chapters from the Holy Quran. These chapters have a variable number of verses saved to different subsequent folders, where each verse is allocated one folder containing 30 audio clips for the declared reciters covering the same textual content. An additional 397 audio clips for 12 competent imitators of the top reciters are collected based on popularity and number of views/downloads to allow for cross-comparison of text, reciters, and authenticity. Based on the volume, quality, and rich diversity of this dataset we anticipate a wide range of deployments for speaker identification, in addition to setting a new direction for the structure and organization of similar large audio clips dataset.
Published: 2020

44. Measurements of microphone array phase and amplitude behavior towards controllable beamforming

Author: Bonnie L. Gray, Yiqi Jia, and Rodney G. Vaughan
Subjects: Beamforming, Frequency response, Microphone array, Amplitude, Computer science, Phased array, Microphone, Acoustics, Audio analyzer, Phase (waves), Omnidirectional antenna, Linear phase
Abstract: Beamforming of miniature and MEMS microphone arrays often relies on only the array factor, with the microphone elements assumed to be ideal, i.e., identical, with linear phase and constant omnidirectional magnitude response over all frequencies. However, real-world element responses are not ideal and there is a shortfall of measurements of array-embedded microphone directional responses. This is a significant gap for estimating a real-world acoustic microphone array response. This paper presents a procedure for physically evaluating the phase and amplitude of miniature microphones using a two-port audio analyzer. While the phase is often the essence of array beamforming (hence the term phased array) the amplitudes are also weighted in general beamforming. Directional responses, or patterns, are perhaps the most difficult sensor parameter to measure, requiring specialized equipment; currently there are no accurate commercial systems readily available for 3D patterns. Our method estimates embedded microphone directional response samples, and shows that while the relative phase can behave similarly to that of isolated, electrically small sensors, the amplitudes differ significantly, which compromises accurate beamforming. The phase differences between a pair of embedded microphones can be modelled using basic geometric spacing. Comparisons between the measured and modelled signals of the microphone elements indicate that the individual embedded element responses should be included for accurate estimation of the array response, rather than just using the array factor.
Published: 2020

45. Audio-Based Event Detection at Different SNR Settings Using Two-Dimensional Spectrogram Magnitude Representations

Author: Ioannis Papadimitriou, Antonios Lalas, Dimitrios Tzovaras, Konstantinos Votis, and Anastasios Vafeiadis
Subjects: Computer Networks and Communications, Microphone, Computer science, Speech recognition, Concatenation, Ambient noise level, audio surveillance, lcsh:TK7800-8360, 02 engineering and technology, SNR, symbols.namesake, Raw audio format, multichannel, 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, spectrograms, Electrical and Electronic Engineering, 050210 logistics & transportation, lcsh:Electronics, 05 social sciences, Short-time Fourier transform, Fourier transform, Hardware and Architecture, Control and Systems Engineering, Feature (computer vision), Signal Processing, Audio analyzer, symbols, Spectrogram, 020201 artificial intelligence & image processing, Mel-frequency cepstrum, CNN
Abstract: Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become better in a range of fields of interest, audio analysis will have to play an integral part. Event recognition in autonomous vehicles (AVs) is such a field at a nascent stage that can especially leverage solely on audio or can be part of the multimodal approach. In this manuscript, an extensive analysis focused on the comparison of different magnitude representations of the raw audio is presented. The data on which the analysis is carried out is part of the publicly available MIVIA Audio Events dataset. Single channel Short-Time Fourier Transform (STFT), mel-scale and Mel-Frequency Cepstral Coefficients (MFCCs) spectrogram representations are used. Furthermore, aggregation methods of the aforementioned spectrogram representations are examined, the feature concatenation compared to the stacking of features as separate channels. The effect of the SNR on recognition accuracy and the generalization of the proposed methods on datasets that were both seen and not seen during training are studied and reported.
Published: 2020
Full Text: View/download PDF

46. Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data

Author: Andreas Grammenos, Jing Han, Cecilia Mascolo, Dimitris Spathis, Tong Xia, Jagmohan Chauhan, Chloë Brown, Apinan Hasthanasombat, and Pietro Cicuta
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), medicine.medical_specialty, Stethoscope, Computer science, 02 engineering and technology, Audiology, Computer Science - Sound, Machine Learning (cs.LG), law.invention, Respiratory examination, Audio and Speech Processing (eess.AS), law, 020204 information systems, FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, medicine, Respiratory sounds, Respiratory system, Asthma, Audio signal, medicine.diagnostic_test, Auscultation, medicine.disease, 3. Good health, Audio analyzer, Breathing, 020201 artificial intelligence & image processing, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Audio signals generated by the human body (e.g., sighs, breathing, heart, digestion, vibration sounds) have routinely been used by clinicians as indicators to diagnose disease or assess disease progression. Until recently, such signals were usually collected through manual auscultation at scheduled visits. Research has now started to use digital technology to gather bodily sounds (e.g., from digital stethoscopes) for cardiovascular or respiratory examination, which could then be used for automatic analysis. Some initial work shows promise in detecting diagnostic signals of COVID-19 from voice and coughs. In this paper we describe our data analysis over a large-scale crowdsourced dataset of respiratory sounds collected to aid diagnosis of COVID-19. We use coughs and breathing to understand how discernible COVID-19 sounds are from those in asthma or healthy controls. Our results show that even a simple binary machine learning classifier is able to classify correctly healthy and COVID-19 sounds. We also show how we distinguish a user who tested positive for COVID-19 and has a cough from a healthy user with a cough, and users who tested positive for COVID-19 and have a cough from users with asthma and a cough. Our models achieve an AUC of above 80% across all tasks. These results are preliminary and only scratch the surface of the potential of this type of data and audio-based machine learning. This work opens the door to further investigation of how automatically analysed respiratory patterns could be used as pre-screening signals to aid COVID-19 diagnosis., Comment: 9 pages, 6 figures, 2 tables, Accepted for publication at KDD'20 (Health Day)
Published: 2020

47. Software-assisted phase locking technique for Programmable Josephson Voltage Standard

Author: Patrick G. Reuvekamp, Jonathan M. Williams, Helge Malmbekk, Jane Ireland, and Eric Breakenridge
Subjects: Spectrum analyzer, Sampling (signal processing), Computer science, Interface (computing), Audio analyzer, Josephson voltage standard, Electronic engineering, Master clock, Allan variance, Voltage
Abstract: In this paper, a novel phase-locking technique is used to interface a Programmable Josephson Voltage Standard (PJVS) with a Keysight U8930B Performance Audio Analyzer. This technique uses both hardware and software to phase-lock the PJVS to the analyzer via a master clock. Both long term stability and nulling measurements were performed on the analyzer. The long term stability results indicated that the lowest Allan deviation was achieved after 500 s of sampling. The long term stability provides a springboard for the nulling measurements. The nulling measurements allows the possibility for the U8903B to be used as a reliable transfer of the ac quantum voltage standard.
Published: 2020

48. Detecting Adversarial Audio via Activation Quantization Error

Author: Gregory Ditzler and Heng Liu
Subjects: Adversarial system, Artificial neural network, Robustness (computer science), Computer science, Speech recognition, Quantization (signal processing), Audio analyzer, 0202 electrical engineering, electronic engineering, information engineering, 020206 networking & telecommunications, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, 0105 earth and related environmental sciences
Abstract: The robustness and vulnerability of Deep Neural Networks (DNN) are quickly becoming a critical area of interest since these models are in widespread use across real-world applications (i.e., image and audio analysis, recommendation system, natural language analysis, etc.). A DNN’s vulnerability is exploited by an adversary to generate data to attack the model; however, the majority of adversarial data generators have focused on image domains with far fewer work on audio domains. More recently, audio analysis models were shown to be vulnerable to adversarial audio examples (e.g., speech command classification, automatic speech recognition, etc.). Thus, one urgent open problem is to detect adversarial audio reliably. In this contribution, we incorporate a separate and yet related DNN technique to detect adversarial audio, namely model quantization. Then we propose an algorithm to detect adversarial audio by using a DNN’s quantization error. Specifically, we demonstrate that adversarial audio typically exhibits a larger activation quantization error than benign audio. The quantization error is measured using character error rates. We use the difference in errors to discriminate adversarial audio. Experiments with three the-state-of-the-art audio attack algorithms against the DeepSpeech model show our detection algorithm achieved high accuracy on the Mozilla dataset.
Published: 2020

49. A novel deep feature transfer-based OSA detection method using sleep sound signals

Author: Zhu Xiaobei, Xing Gao, Haiqin Liu, Bin Wang, Yewen Shi, Xiaoyong Ren, Jing Luo, and Hei Xinhong
Subjects: Physiology, Computer science, Polysomnography, 0206 medical engineering, Biomedical Engineering, Biophysics, 02 engineering and technology, Convolutional neural network, 03 medical and health sciences, 0302 clinical medicine, Physiology (medical), medicine, Humans, Respiratory Sounds, Sleep Apnea, Obstructive, medicine.diagnostic_test, Receiver operating characteristic, business.industry, Deep learning, Pattern recognition, Signal Processing, Computer-Assisted, medicine.disease, 020601 biomedical engineering, respiratory tract diseases, Data set, Obstructive sleep apnea, Audio analyzer, Artificial intelligence, Neural Networks, Computer, Transfer of learning, business, Sleep, 030217 neurology & neurosurgery
Abstract: Objective Polysomnography is typically used to evaluate the severity of obstructive sleep apnea (OSA) but the inconvenience of application and high cost considerably affect the diagnostics. In this study, sleep sound signals are used to detect OSA in patients. Approach A deep feature transfer-based OSA detection approach is proposed. First, a deep convolutional neural network is trained on large-scale labeled audio data sets to distinguish respiration sounds from environmental noise. Second, the trained model is transferred to recognize respiration sounds in sleep sound signals. Third, the deep features of the detected respiration sounds are used to train a logistic regression classifier to identify OSA patients from potential patients. Polysomnography-based diagnosis is used as a reference. Main results A self-collected data set of 132 potential OSA patients is applied in OSA detection experiments. The OSA detection performances are tested on four models for different apnea-hypopnea index thresholds and sexes resulting in accuracies of 80.17%, 80.21%, 81.63% and 77.22%. The corresponding areas under the receiver operating characteristic curves are 0.82, 0.80, 0.81 and 0.79. In addition, the proposed method presented a significant performance improvement compared with the state-of-the-art methods. Significance Big data, deep learning and transfer learning can be successfully applied to improve diagnostic accuracy in OSA detection. The performance of the proposed approach is superior to that of traditional audio analysis technology. The proposed method significantly reduces difficulties in OSA detection and diagnosis, such that potential OSA patients can perform initial inspections by themselves at home.
Published: 2020

50. Seecology: Um Framework de Visualização de Dados para Aplicações em Ecologia Acústica

Author: Clausius Duque Gonçalves Reis, Maria Cristina Ferreira de Oliveira, Rosane Minghim, Hélio Pedrini, and Jose Galizia Tundisi
Subjects: Soundscape, Data visualization, Computer science, business.industry, Human–computer interaction, Soundscape ecology, Feature extraction, Audio analyzer, Acoustic ecology, business, Visualization
Abstract: The field of Soundscape Ecology refers to the study of sounds produced in natural environments and how they can provide important information about the state of the environment, as well as on the potential impacts caused by changes due to external influences. The analysis and visualization of large amounts of ecological recordings, as well as the development of appropriate tools for audio analysis contitute a major challenge. Mechanisms for extracting audio features, as well as the characterization of acoustic events of interest, resulting in datasets that capture the frequency variations and the occurrence of acoustic events in the recordings, still constitute a problem due to available solutions do not prove adequate for data analysis in acoustic ecology research, involving domain-specific issues and voluminous amounts of audio records collected over long periods of time. This work aims to address problems related to the extraction of audio features, providing assistance through visualization to the selection of the most significants, that could represent the subtle variations in ecological recordings, as well as assisting specialists in the generation of annotated dtasets by the characterization of acoustic events through exploratory visualizations, and methods for detecting vessels in underwater recordings. A framework named Seecology is presented, encompassing suitable methods and tools to supporting specialists and scholars of environmental analysis. Case studies were carried out with the framework in terrestrial and underwater recordings provided by acoustic ecology researchers, by producing datasets from the custom feature extractor included in the framework, and in the case of the method developed for detecting boats in underwater recordings, a comparative study to another method was conducted to determine its accuracy, in addition to the case study to determine its effectiveness. The presented methods for extracting characteristics, characterizing acoustic events through exploratory visualization and boat detection, demonstrated their effectiveness for applications in acoustic ecology, with the framework containing the methods capable of producing multidimensional datasets without excessive computational costs, allowing the user to easily generate annotations on this data through the included visualizations. The boat detection method performed better than the one it was compared, both in speed and accuracy, being able to detect weak signals from boats even under extreme noise. A área de Ecologia de Paisagens Sonoras (Soundscape Ecology) refere-se ao estudo de sons produzidos em ambientes naturais e como eles podem fornecer informações importantes sobre o meio ambiente, bem como possíveis impactos causados por alterações devido a influências externas. A análise e visualização de grandes quantidades de gravações ecológicas, juntamente com a produção de recursos para análises constituem um grande desafio. Meios para a extração de características dos áudios, bem como a caracterização de eventos acústicos de interesse, produzindo conjuntos de dados que representem as variações de frequências e eventos acústicos capturados nas gravações, ainda consituem um problema devido às soluções disponíveis não se mostrarem adequadas para análises de dados em pesquisas de ecologia acústica, envolvendo questões específicas do domínio e quantidades volumosas de registros de áudio coletados por longos períodos de tempo. Faz-se necessário o desenvolvimento de métodos e ferramental para extrair e representar a grande quantidade dos dados produzidos a partir de estudos de ecologia. Este trabalho tem por objetivo abordar problemas relacionados à extração de características dos áudios, auxiliando na seleção das mais significativas que representem as sutis variações nas gravações ecológicas, bem como auxiliar especialistas na geração de conjuntos de dados anotados pela caracterização de eventos acústicos por meio de visualizações exploratórias, e métodos para um problema específico, que é a detecção de embarcações em gravações subaquáticas. Um arcabouço nomeado Seecology é apresentado, englobando métodos e ferramentas adequados para dar suporte aos especialistas e estudiosos de análise ambiental. Estudos de caso foram realizados com o arcabouço em gravações terrestres e subaquáticas fornecidos por pesquisadores da área, produzindo conjuntos de dados a partir do extrator de características personalizado incluso no arcabouço. No caso do método desenvolvido para detecção de barcos em gravações subaquáticas, um estudo comparativo a outro método foi conduzido para determinar sua acurácia, além do estudo de caso para determinar sua eficácia. Os métodos propostos para extração de características, caracterização de eventos acústicos por meio de visualização exploratória e detecção de barcos, demonstraram sua eficácia para aplicações em ecologia acústica, sendo o arcabouço capaz de produzir conjuntos de dados multidimensionais sem custos computacionais excessivos. Dessa forma o usuário é capaz de gerar anotações nestes dados facilmente por meio das visualizações inclusas. O método de detecção de barcos obteve desempenho superior ao que foi comparado, tanto em velocidade quanto em acurácia, sendo capaz de detectar sinais fracos de barcos mesmo sob ruído extremo.
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

548 results on '"Audio analyzer"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources