10,124 results on '"spectrogram"'
Search Results
2. Publicly available datasets analysis and spectrogram-ResNet41 based improved features extraction for audio spoof attack detection.
- Author
-
Chakravarty, Nidhi and Dua, Mohit
- Abstract
The rapid expansion of voice-based technologies across diverse applications underscores the critical need for robust security measures against audio spoofing attacks. This paper comprehensively examines publicly available datasets that have been developed to detect audio spoof attacks. The research encompasses a compilation of datasets, including ASVspoof dataset series (2019, 2021), Voice Spoofing Detection Corpus (VSDC), Voice Impersonation Corpus in Hindi Language (VIHL) and DEepfake CROss-lingual evaluation dataset (DECRO), covering various spoofing attack scenarios of English, Hindi and Chinese languages. In the first part of the paper, a baseline for the proposed research work has been developed by comparing the performances of state-of-the-art baseline Linear frequency cepstral coefficient (LFCC) features with application of four different machine learning classifiers Random forest (RF), K-nearest neighbor (KNN), eXtreme gradient boosting (XGBoost), and Naïve Bayes (NB) at the backend, over these four different datasets. In second part of the proposal, we have used novel feature combination of Mel Spectrogram-Residual Network41 (ResNet41)-Linear discriminant analysis (LDA) and Gammatone Spectrogram-ResNet41-LDA, one by one, with application of same set of machine learning classifiers at the backend. The combination of Gammatone spectrogram-ResNet41-LDA along with XGBoost classifier has achieved an Equal Error Rate (EER) of 1.7, 1.28, 0.5, 0.36, 0.03, 0.07, and 0.9% for ASVspoof 2019 Logical Access (LA), ASVspoof 2019 Physical Access (PA), ASVspoof 2021 Deepfake, VSDC, DECRO English, DECRO Chinese, and VIHL datasets, respectively. Hence, the proposed research work in this paper achieves the objective of assessing the feasibility and utility of publicly available state of the art datasets for training and testing advanced algorithms in identifying manipulated audio. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Toeplitz operators associated with the Whittaker Gabor transform and applications.
- Author
-
Mejjaoli, Hatem
- Subjects
- *
GABOR transforms , *TOEPLITZ operators , *SPECTROGRAMS - Abstract
The Whittaker Gabor transform (WGT) is a novel addition to the class of Gabor transforms, which has gained a respectable status in the realm of time-frequency signal analysis within a short span of time. Knowing the fact that the study of the time-frequency analysis is both theoretically interesting and practically useful, the aim of this article is to explore two more aspects of the time-frequency analysis associated with the WGT including the spectral analysis associated with the concentration operators and the scalogram. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Recognition of Sheep Feeding Behavior in Sheepfolds Using Fusion Spectrogram Depth Features and Acoustic Features.
- Author
-
Yu, Youxin, Zhu, Wenbo, Ma, Xiaoli, Du, Jialei, Liu, Yu, Gan, Linhui, An, Xiaoping, Li, Honghui, Wang, Buyu, and Fu, Xueliang
- Subjects
- *
CONVOLUTIONAL neural networks , *SHEEP feeding , *ANIMAL breeding , *SUPPORT vector machines , *ANIMAL welfare - Abstract
Simple Summary: Precision feeding requires reliable methods to monitor the feeding behavior of sheep. It is especially critical to achieve a high-accuracy classification of sheep feeding behavior in complex environments through acoustic sensors. This study collected data from production environments and thoroughly considered noise interference. The feature fusion technique significantly improved the recognition performance. The results show that combining multiple features makes the classification accuracy reach 96.47%. This technique can automatically monitor the feeding behavior of sheep, which helps to improve breeding efficiency and animal welfare and significantly promotes the development of intelligent sheep farming. In precision feeding, non-contact and pressure-free monitoring of sheep feeding behavior is crucial for health monitoring and optimizing production management. The experimental conditions and real-world environments differ when using acoustic sensors to identify sheep feeding behaviors, leading to discrepancies and consequently posing challenges for achieving high-accuracy classification in complex production environments. This study enhances the classification performance by integrating the deep spectrogram features and acoustic characteristics associated with feeding behavior. We conducted the task of collecting sound data in actual production environments, considering noise and complex surroundings. The method included evaluating and filtering the optimal acoustic features, utilizing a customized convolutional neural network (SheepVGG-Lite) to extract Short-Time Fourier Transform (STFT) spectrograms and Constant Q Transform (CQT) spectrograms' deep features, employing cross-spectrogram feature fusion and assessing classification performance through a support vector machine (SVM). Results indicate that the fusion of cross-spectral features significantly improved classification performance, achieving a classification accuracy of 96.47%. These findings highlight the value of integrating acoustic features with spectrogram deep features for accurately recognizing sheep feeding behavior. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. A computationally efficient method for induction motor bearing fault detection based on parallel convolutions and semi-supervised GAN.
- Author
-
Irfan, Muhammad, Khan, Nabeel A., Mushtaq, Zohaib, Kareri, Tareq, Faraj Mursal, Salim Nasar, Shaheen, Ateeq-Ur-Rehman, Alghaffari, Shadi, Alghanmi, Ayman, Althobiani, Faial, and Attar, H. M.
- Subjects
- *
FAULT-tolerant control systems , *GENERATIVE adversarial networks , *INDUCTION motors , *FEATURE extraction , *PARALLEL processing - Abstract
Accurate and timely bearing fault detection is imperative for optimal system functioning and the implementation of preventative maintenance measures. Deep learning models provide viable solutions to these malfunctions, however, the lack of labelled data makes the training both expensive and cumbersome. To remedy this, various semi-supervised approaches have surfaced in the last decade, significantly mitigating the need for extensive labelled data but with added computational cost. This study proposes one such approach by leveraging generative adversarial networks (GAN) trained on a time-frequency based representation. The proposed Parallel Convolutions Semi-Supervised GAN, namely PC-SSGAN, uses bottleneck parallel convolutions blocks to capture multi-scale features in both local and global contexts, lacing both the generator and discriminator with enhanced feature extraction capabilities and simultaneously reducing the parameters and training time. The Proposed framework is evaluated on two distinct open-source datasets. The classification accuracy for both models exceeded 99.50%. Moreover, the proposed parallel convolutions-based architecture spent approximately 33% less time on training than the normal convolutional layers. It has been foreseen that the proposed fault detection system can be integrated into the motor fault tolerant control system to produce a unified framework that can make informed decisions to handle the bearing faults effectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. An Ensemble Approach for Speaker Identification from Audio Files in Noisy Environments.
- Author
-
Zarin, Syed Shahab, Mustafa, Ehzaz, Zaman, Sardar Khaliq uz, Namoun, Abdallah, and Alanazi, Meshari Huwaytim
- Abstract
Automatic noise-robust speaker identification is essential in various applications, including forensic analysis, e-commerce, smartphones, and security systems. Audio files containing suspect speech often include background noise, as they are typically not recorded in soundproof environments. To this end, we address the challenges of noise robustness and accuracy in speaker identification systems. An ensemble approach is proposed combining two different neural network architectures including an RNN and DNN using softmax. This approach enhances the system's ability to identify speakers even in noisy environments accurately. Using softmax, we combine voice activity detection (VAD) with a multilayer perceptron (MLP). The VAD component aims to remove noisy frames from the recording. The softmax function addresses these residual traces by assigning a higher probability to the speaker's voice compared to the noise. We tested our proposed solution on the Kaggle speaker recognition dataset and compared it to two baseline systems. Experimental results show that our approach outperforms the baseline systems, achieving a 3.6% and 5.8% increase in test accuracy. Additionally, we compared the proposed MLP system with Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM) classifiers. The results demonstrate that the MLP with VAD and softmax outperforms the LSTM by 23.2% and the BiLSTM by 6.6% in test accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. A hybrid model for the classification of Autism Spectrum Disorder using Mu rhythm in EEG.
- Author
-
Radhakrishnan, Menaka, Ramamurthy, Karthik, Shanmugam, Saranya, Prasanna, Gaurav, S, Vignesh, Y, Surya, and Won, Daehan
- Abstract
BACKGROUND: Autism Spectrum Disorder (ASD) is a condition with social interaction, communication, and behavioral difficulties. Diagnostic methods mostly rely on subjective evaluations and can lack objectivity. In this research Machine learning (ML) and deep learning (DL) techniques are used to enhance ASD classification. OBJECTIVE: This study focuses on improving ASD and TD classification accuracy with a minimal number of EEG channels. ML and DL models are used with EEG data, including Mu Rhythm from the Sensory Motor Cortex (SMC) for classification. METHODS: Non-linear features in time and frequency domains are extracted and ML models are applied for classification. The EEG 1D data is transformed into images using Independent Component Analysis-Second Order Blind Identification (ICA-SOBI), Spectrogram, and Continuous Wavelet Transform (CWT). RESULTS: Stacking Classifier employed with non-linear features yields precision, recall, F1-score, and accuracy rates of 78%, 79%, 78%, and 78% respectively. Including entropy and fuzzy entropy features further improves accuracy to 81.4%. In addition, DL models, employing SOBI, CWT, and spectrogram plots, achieve precision, recall, F1-score, and accuracy of 75%, 75%, 74%, and 75% respectively. The hybrid model, which combined deep learning features from spectrogram and CWT with machine learning, exhibits prominent improvement, attained precision, recall, F1-score, and accuracy of 94%, 94%, 94%, and 94% respectively. Incorporating entropy and fuzzy entropy features further improved the accuracy to 96.9%. CONCLUSIONS: This study underscores the potential of ML and DL techniques in improving the classification of ASD and TD individuals, particularly when utilizing a minimal set of EEG channels. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Gabor Transform Associated with the Dunkl–Bessel Transform and Spectrograms.
- Author
-
Ghobber, Saifallah, Mejjaoli, Hatem, and Sraieb, Nadia
- Abstract
Time–frequency (or space–phase) analysis plays a key role in signal analysis. In particular, signals that have a very concentrated time–frequency content are of great importance. However, the uncertainty principle sets a limitation to the possible simultaneous concentration of a function and its Dunkl–Bessel transform. For this purpose, we introduce and study a new transformation called Dunkl–Bessel Gabor transform. For this transformation, we define the Toeplitz-type (or time–frequency localization) operators, in order to localize signals on the time–frequency plane. We study these operators; in particular, we give criteria for their boundedness and Schatten class properties. Then, using the special class of concentration operators, which are compact and self-adjoint, we show that their eigenfunctions are maximally time–frequency-concentrated in the region of interest. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. MelCochleaGram-DeepCNN: Sequentially Fused Spectrogram and the DeepCNN Classifiers-based Audio Spoof Detection System.
- Author
-
Dua, Mohit, Chakravarty, Nidhi, Priya Reddy, Sanivarapu Ganga, Bansal, Anshika, Pawar, Sushmita, and Dua, Shelza
- Subjects
- *
FRAUD investigation , *ERROR rates , *SPECTROGRAMS , *SECURITY systems , *POPULARITY - Abstract
Automatic Speaker Verification (ASV) systems are crucial in various fields, enabling speaker identification for authentication, fraud detection, and forensic applications. While the simplicity and effectiveness of speech biometrics are driving the demand for ASV systems, their increasing popularity raises concerns about vulnerability to speech attacks. To enhance the security of these systems, the work in this paper proposes a spectrogram-based solution that leverages the robustness of spectrograms in audio analysis and feature extraction. The proposed model consists of two main components: frontend and backend. In the frontend, it introduces a novel spectrogram MelCochleaGram (MCG) by fusing Mel Spectrogram and Cochleagram, sequentially. For the backend implementation, pre-trained deep learning models, including ResNet50, ResNet50V2, and InceptionV3, are employed using the Keras framework. These models are individually paired with MCG to detect deepfake and replay attacks. To validate the effectiveness of the proposed system, thorough experimentation is conducted on two datasets: the DEepfake CROss-lingual (DECRO) evaluation dataset and the Voice Spoofing Detection Corpus (VSDC). The proposed combination of MCG with ResNet50 has achieved an Equal Error Rate (EER) of 0.2%, and 1.2% for deepfake detection over DECRO English and Chinese datasets, respectively. Also, for replay attack detection, the proposed combination has produced an EER of 1.4%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Machine Learning Recognizes Frequency-Following Responses in American Adults: Effects of Reference Spectrogram and Stimulus Token.
- Author
-
Bauer, Sydney W., Jeng, Fuh-Cherng, and Carriero, Amanda
- Subjects
- *
REPEATED measures design , *SOUND spectrography , *ELECTROENCEPHALOGRAPHY , *DESCRIPTIVE statistics , *BRAIN stem , *ANALYSIS of variance , *MACHINE learning , *AUDITORY perception , *ACOUSTIC stimulation , *SPEECH perception , *AUDITORY evoked response , *ELECTROPHYSIOLOGY , *ALGORITHMS , *ADULTS ,PHYSIOLOGICAL aspects of speech - Abstract
Electrophysiological research has been widely utilized to study brain responses to acoustic stimuli. The frequency-following response (FFR), a non-invasive reflection of how the brain encodes acoustic stimuli, is a particularly propitious electrophysiologic measure. While the FFR has been studied extensively, there are limitations in obtaining and analyzing FFR recordings that recent machine learning algorithms may address. In this study, we aimed to investigate whether FFRs can be enhanced using an "improved" source-separation machine learning algorithm. For this study, we recruited 28 native speakers of American English with normal hearing. We obtained two separate FFRs from each participant while they listened to two stimulus tokens /i/ and /da/. Electroencephalographic signals were pre-processed and analyzed using a source-separation non-negative matrix factorization (SSNMF) machine learning algorithm. The algorithm was trained using individual, grand-averaged, or stimulus token spectrograms as a reference. A repeated measures analysis of variance revealed that FFRs were significantly enhanced (p <.001) when the "improved" SSNMF algorithm was trained using both individual and grand-averaged spectrograms, but not when utilizing the stimulus token spectrogram. Similar results were observed when extracting FFRs elicited by using either stimulus token, /i/ or /da/. This demonstration shows how the SSNMF machine learning algorithm, using individual and grand-averaged spectrograms as references in training the algorithm, significantly enhanced FFRs. This improvement has important implications for the obtainment and analytical processes of FFR, which may lead to advancements in clinical applications of FFR testing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Deep learning and feature fusion-based lung sound recognition model to diagnoses the respiratory diseases.
- Author
-
Shehab, Sara A., Mohammed, Kamel K., Darwish, Ashraf, and Hassanien, Aboul Ella
- Subjects
- *
LUNG diseases , *MEDICAL personnel , *RESPIRATORY diseases , *IMAGE representation , *SPECTROGRAMS , *LUNGS , *DEEP learning - Abstract
This paper proposed a novel approach for detecting lung sound disorders using deep learning feature fusion. The lung sound dataset are oversampled and converted into spectrogram images. Then, extracting deep features from CNN architectures, which are pre-trained on large-scale image datasets. These deep features capture rich representations of spectrogram images from the input signals, allowing for a comprehensive analysis of lung disorders. Next, a fusion technique is employed to combine the extracted features from multiple CNN architectures totlaly 8064 feature. This fusion process enhances the discriminative power of the features, facilitating more accurate and robust detection of lung disorders. To further improve the detection performance, an improved CNN Architecture is employed. To evaluate the effectiveness of the proposed approach, an experiments conducted on a large dataset of lung disorder signals. The results demonstrate that the deep feature fusion from different CNN architectures, combined with different CNN Layers, achieves superior performance in lung disorder detection. Compared to individual CNN architectures, the proposed approach achieves higher accuracy, sensitivity, and specificity, effectively reducing false negatives and false positives. The proposed model achieves 96.03% accuracy, 96.53% Sensitivity, 99.424% specificity, 96.52% precision, and 96.50% F1 Score when predicting lung diseases from sound files. This approach has the potential to assist healthcare professionals in the early detection and diagnosis of lung disorders, ultimately leading to improved patient outcomes and enhanced healthcare practices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Fused Audio Instance and Representation for Respiratory Disease Detection.
- Author
-
Truong, Tuan, Lenga, Matthias, Serrurier, Antoine, and Mohammadi, Sadegh
- Subjects
- *
RECEIVER operating characteristic curves , *COVID-19 , *RESPIRATORY diseases , *COVID-19 pandemic , *DEEP learning - Abstract
Audio-based classification techniques for body sounds have long been studied to aid in the diagnosis of respiratory diseases. While most research is centered on the use of coughs as the main acoustic biomarker, other body sounds also have the potential to detect respiratory diseases. Recent studies on the coronavirus disease 2019 (COVID-19) have suggested that breath and speech sounds, in addition to cough, correlate with the disease. Our study proposes fused audio instance and representation (FAIR) as a method for respiratory disease detection. FAIR relies on constructing a joint feature vector from various body sounds represented in waveform and spectrogram form. We conduct experiments on the use case of COVID-19 detection by combining waveform and spectrogram representation of body sounds. Our findings show that the use of self-attention to combine extracted features from cough, breath, and speech sounds leads to the best performance with an area under the receiver operating characteristic curve (AUC) score of 0.8658, a sensitivity of 0.8057, and a specificity of 0.7958. Compared to models trained solely on spectrograms or waveforms, the use of both representations results in an improved AUC score, demonstrating that combining spectrogram and waveform representation helps to enrich the extracted features and outperforms the models that use only one representation. While this study focuses on COVID-19, FAIR's flexibility allows it to combine various multi-modal and multi-instance features in many other diagnostic applications, potentially leading to more accurate diagnoses across a wider range of diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Deep Hybrid Fusion Network for Inverse Synthetic Aperture Radar Ship Target Recognition Using Multi-Domain High-Resolution Range Profile Data.
- Author
-
Deng, Jie and Su, Fulin
- Subjects
- *
RADAR targets , *INVERSE synthetic aperture radar , *SPECTROGRAMS - Abstract
Most existing target recognition methods based on high-resolution range profiles (HRRPs) use data from only one domain. However, the information contained in HRRP data from different domains is not exactly the same. Therefore, in the context of inverse synthetic aperture radar (ISAR), this paper proposes an advanced deep hybrid fusion network to utilize HRRP data from different domains for ship target recognition. First, the proposed network simultaneously processes time-domain HRRP and its corresponding time–frequency (TF) spectrogram through two branches to obtain initial features from the two HRRP domains. Next, a feature alignment module is used to make the fused features more discriminative regarding the target. Finally, a decision fusion module is designed to further improve the model's prediction performance. We evaluated our approach using both simulated and measured data, encompassing ten different ship target types. Our experimental results on the simulated and measured datasets showed an improvement in recognition accuracy of at least 4.22% and 2.82%, respectively, compared to using single-domain data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech.
- Author
-
Ahmad, Md. Talib, Pradhan, Gayadhar, and Singh, Jyoti Prakash
- Subjects
- *
CONVOLUTIONAL neural networks , *DISCRETE wavelet transforms , *FOURIER transforms , *SPEECH , *DYSARTHRIA - Abstract
This paper investigates the nuanced characteristics of the spectral envelope attributes due to vocal-tract resonance structure and fine-level excitation source features within short-term Fourier transform (STFT) magnitude spectra for the assessment of dysarthria. The single-channel convolutional neural network (CNN) employing time-frequency representations such as STFT spectrogram (STFT-SPEC) and Mel-spectrogram (MEL-SPEC) does not ensure capture of the source and system information simultaneously due to the filtering operation using a fixed-size filter. Building upon this observation, this study first explores the significance of convolution filter size in the context of the CNN-based automated dysarthric assessment system. An approach is then introduced to effectively capture resonance structure and fine-level features through a multi-channel CNN. In the proposed approach, the STFT-SPEC is decomposed using a one-level discrete wavelet transform (DWT) to separate the slow-varying spectral structure and fine-level features. The resulting decomposed coefficients in four directions are taken as the inputs to multi-channel CNN to capture the source and system features by employing different sizes of convolution filters. The experimental results conducted on the UA-speech corpus validate the efficacy of the proposed approach utilizing multi-channel CNN. The proposed approach demonstrates the notable enhancement in accuracy and F1 score (60.86% and 48.52%) compared to a single-channel CNN using STFT-SPEC (46.45% and 40.97%), MEL-SPEC (48.86% and 38.20%), and MEL-SPEC appended with delta and delta-delta coefficients (52.40% and 42.84%) for assessment of dysarthria in a speaker-independent and text-independent mode. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Automatic Extraction of VLF Constant‐Frequency Electromagnetic Wave Frequency Based on an Improved Vgg16‐Unet.
- Author
-
Han, Ying, Liu, Qingjie, Huang, Jianping, Li, Zhong, Yan, Rui, Yuan, Jing, Shen, Xuhui, Xing, Lili, and Pang, Guoli
- Subjects
MACHINE learning ,ELECTROMAGNETIC waves ,GLOBAL Positioning System ,IONOSPHERIC disturbances ,TRACKING radar - Abstract
Constant Frequency Electromagnetic Waves (CFEWs) refer to electromagnetic waves with a constant frequency. Man‐made CFEWs are mainly used in wireless communication, scientific research, global navigation and positioning systems, and military radar. CFEWs exhibit horizontal line characteristics higher than the background on spectrograms. In this study, we focus on Very Low Frequency (VLF) waveform data and power spectral data collected by the China Seismo‐Electromagnetic Satellite (CSES) Electromagnetic Field Detector (EFD). We utilize deep learning techniques to construct an improved Vgg16‐Unet model for automatically detecting horizontal lines on time‐frequency spectrogram and extracting their frequencies. First, we transform waveform data into time‐frequency spectrogram with a duration of 2 s using Short‐Time Fourier Transform. Then, we manually label horizontal lines on the time‐frequency spectrogram using the Labelme tool to establish the dataset. Next, we establish and improve the Vgg16‐Unet deep learning model. Finally, we train and test the model using the dataset. Statistical experimental results show that the error rate of line detection is 0, indicating high reliability of the model, with fewer parameters and fast computation speed suitable for practical applications. Not only do we detect lines through the model, but we also obtain their frequencies. Additionally, in batch‐generated power spectrogram of CFEWs, we discover some unstable phenomena such as frequency shifts and fluctuations, which contribute to understanding the propagation mechanism of CFEWs in the ionosphere and improving the accuracy of related systems. Plain Language Summary: Since its launch in 2018, China's first seismic electromagnetic satellite (CSES) has received a large amount of ionospheric disturbance during its over 5 years in orbit, including a significant number of CFEWs. Based on the characteristic high‐level spectral line features of CFES in the spectrogram, this study primarily employs deep learning methods to generate spectrogram from the data collected by CSES, detect CFEWs, and extract their frequencies. Furthermore, based on the extracted frequencies, it was discovered during the bulk generation of their power spectrogram that some of these CFEWs exhibit very stable signals, while others show frequency fluctuations and frequency drift phenomena. These instabilities can significantly affect system performance, leading to phenomena such as signal distortion or loss in communication systems, difficulties in target detection and tracking in radar systems, and inaccurate positioning in navigation systems. Therefore, this study contributes to understanding the characteristics of CFEWs in electromagnetic propagation and helps improve the accuracy of related systems. Key Points: The constant‐frequency electromagnetic wave appear as horizontal straight lines on the spectorgramA deep learning algorithm is used to detect horizontal lines on the time‐frequency spectorgramExtract the frequency of the constant‐frequency electromagnetic waves generating these straight lines [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. AClassiHonk: a system framework to annotate and classify vehicular honk from road traffic.
- Author
-
Maity, Biswajit, Alim, Abdul, Rama Charan, Popuri Sree, Nandi, Subrata, and Bhattacharjee, Sanghita
- Subjects
NOISE pollution ,URBAN pollution ,POLLUTION ,CITIES & towns ,RESEARCH personnel - Abstract
Some recent studies highlight that vehicular traffic and honking contribute to more than 50% of noise pollution in urban or sub-urban areas in developing countries, including Indian cities. Frequent honking has an adverse effect on health and hampers road safety, the environment, etc. Therefore, recognizing the various vehicle honks and classifying the honk of different vehicles can provide good insights into environmental noise pollution. Moreover, classifying honks based on vehicle types allows for the inference of contextual information of a location, area, or traffic. So far, the researchers have done outdoor sound classification and honk detection, where vehicular honks are collected in a controlled environment or in the absence of ambient noise. Such classification models fail to classify honk based on vehicle types. Therefore, it becomes imperative to design a system that can detect and classify honks of different types of vehicles to infer some contextual information. This paper presents a novel framework A C lassi H onk that performs raw vehicular honk sensing, data labeling, and classifies the honk into three major groups, i.e., light-weight vehicles, medium-weight vehicles, and heavy-weight vehicles. Raw audio samples of different vehicular honking are collected based on spatio-temporal characteristics and converted them into spectrogram images. A deep learning-based multi-label autoencoder model (MAE) is proposed for automated labeling of the unlabeled data samples, which provides 97.64% accuracy in contrast to existing deep learning-based data labeling methods. Further, various pre-trained models, namely Inception V3, ResNet50, MobileNet, and ShuffleNet are used and proposed an Ensembled Transfer Learning model (EnTL) for vehicle honks classification and performed comparative analysis. Results reveal that EnTL exhibits the best performance compared to pre-trained models and achieves 96.72% accuracy in our dataset. In addition, context of a location is identified based on these classified honk signatures in a city. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Алгоритм выделения следа искусственного сигнала вистлера в спектрограмме с помощью интегрированной среды разработки приложений PyCharm
- Author
-
Марченко, Л.С.
- Subjects
искусственный вистлер ,спектрограмма ,след ,фильтр ,маска ,python ,pycharm ,artificial whistler ,spectrogram ,trace ,filter ,mask ,Science - Abstract
В работе предложен алгоритм выделения следа искусственного сигнала свистящего атмосферика (вистлера) в спектрограмме, реализованный на языке Python в интегрированной среде разработки PyCharm 2024.1. Алгоритм позволяет с помощью установки некоторого порогового значения (фильтра) выделить след вистлера. Фильтр учитывает интенсивность сигнала в спектре, стандартное отклонение значений от среднего, а также некоторый множитель, который позволяет исключить шум и выделить только более значимые пики в сигнале. В алгоритме с помощью маски на основе фильтра удается получить массив частот для следа искусственного вистлера. Компьютерная программа позволяет сохранять полученный массив в текстовый файл, который можно использовать для дальнейшего анализа в различных табличных процессорах, а также строить графики следа вистлера для визуального исследования. В статье была произведена проверка адекватности алгоритма на примере расчета коэффициента дисперсии. Показано, что алгоритм дает хорошие результаты.
- Published
- 2024
- Full Text
- View/download PDF
18. ecoSound-web: an open-source, online platform for ecoacoustics [version 3; peer review: 3 approved]
- Author
-
Kevin F.A. Darras, Noemí Pérez, Liu Dilong, Tara Hanf-Dressler, Matthias Markolf, Thomas C Wanger, and Anna F. Cord
- Subjects
Software Tool Article ,Articles ,Soundscape ,sound analysis ,ecoacoustics ,passive acoustic monitoring ,automated sound recording ,autonomous recording units ,spectrogram ,audio annotation - Abstract
Passive acoustic monitoring of soundscapes and biodiversity produces vast amounts of audio recordings, but the management and analyses of these raw data present technical challenges. A multitude of software solutions exist, but none can fulfil all purposes required for the management, processing, navigation, and analysis of acoustic data. The field of ecoacoustics needs a software tool that is free, evolving, and accessible. We take a step in that direction and present ecoSound-web: an open-source, online platform for ecoacoustics designed and built by ecologists and software engineers. ecoSound-web can be used for storing, re-sampling, organising, analysing, and sharing soundscape recording or metadata projects. Specifically, it allows manual annotation of soniferous animals and soundscape components, automatic annotation with deep-learning models for all birds and for UK bat species, peer-reviewing annotations, analysing audio in time and frequency dimensions, computing alpha acoustic indices, and providing reference sound libraries for different taxa. We present ecoSound-web’s structure and features, and describe its operation for typical use cases such as sampling bird and bat communities, using a primate call library, and the analysis of soundscape components and acoustic indices. ecoSound-web is available from: https://github.com/ecomontec/ecoSound-web
- Published
- 2024
- Full Text
- View/download PDF
19. Electroretinogram Analysis Using a Short-Time Fourier Transform and Machine Learning Techniques.
- Author
-
Albasu, Faisal, Kulyabin, Mikhail, Zhdanov, Aleksei, Dolganov, Anton, Ronkin, Mikhail, Borisov, Vasilii, Dorosinsky, Leonid, Constable, Paul A., Al-masni, Mohammed A., and Maier, Andreas
- Subjects
- *
MACHINE learning , *BIOMEDICAL signal processing , *FEATURE extraction , *SIGNAL classification , *CLASSIFICATION algorithms , *DEEP learning - Abstract
Electroretinography (ERG) is a non-invasive method of assessing retinal function by recording the retina's response to a brief flash of light. This study focused on optimizing the ERG waveform signal classification by utilizing Short-Time Fourier Transform (STFT) spectrogram preprocessing with a machine learning (ML) decision system. Several window functions of different sizes and window overlaps were compared to enhance feature extraction concerning specific ML algorithms. The obtained spectrograms were employed to train deep learning models alongside manual feature extraction for more classical ML models. Our findings demonstrated the superiority of utilizing the Visual Transformer architecture with a Hamming window function, showcasing its advantage in ERG signal classification. Also, as a result, we recommend the RF algorithm for scenarios necessitating manual feature extraction, particularly with the Boxcar (rectangular) or Bartlett window functions. By elucidating the optimal methodologies for feature extraction and classification, this study contributes to advancing the diagnostic capabilities of ERG analysis in clinical settings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Utility of Quantitative EEG in Neurological Emergencies and ICU Clinical Practice.
- Author
-
Veciana de las Heras, Misericordia, Sala-Padro, Jacint, Pedro-Perez, Jordi, García-Parra, Beliu, Hernández-Pérez, Guillermo, and Falip, Merce
- Subjects
- *
NEUROLOGICAL emergencies , *TIME-domain analysis , *FUNCTIONAL assessment , *CRITICAL care medicine , *ELECTROENCEPHALOGRAPHY - Abstract
The electroencephalogram (EEG) is a cornerstone tool for the diagnosis, management, and prognosis of selected patient populations. EEGs offer significant advantages such as high temporal resolution, real-time cortical function assessment, and bedside usability. The quantitative EEG (qEEG) added the possibility of long recordings being processed in a compressive manner, making EEG revision more efficient for experienced users, and more friendly for new ones. Recent advancements in commercially available software, such as Persyst, have significantly expanded and facilitated the use of qEEGs, marking the beginning of a new era in its application. As a result, there has been a notable increase in the practical, real-world utilization of qEEGs in recent years. This paper aims to provide an overview of the current applications of qEEGs in daily neurological emergencies and ICU practice, and some elementary principles of qEEGs using Persyst software in clinical settings. This article illustrates basic qEEG patterns encountered in critical care and adopts the new terminology proposed for spectrogram reporting. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. A hybrid approach to detecting Parkinson's disease using spectrogram and deep learning CNN-LSTM network.
- Author
-
Shibina, V. and Thasleema, T. M.
- Subjects
PARKINSON'S disease ,DEEP learning ,MACHINE learning ,SUBTHALAMIC nucleus ,CONVOLUTIONAL neural networks ,SPECTROGRAMS - Abstract
Parkinson's disease (PD) is a common illness that affects brain neurons. Medical practitioners and caregivers face challenges in detecting Parkinson's disease promptly, either in its early or late stages. There is an urgent need for non-invasive PD diagnostic technologies because timely diagnosis substantially impacts patient outcomes. This research aims to provide an efficient way of identifying Parkinson's disease by transforming voice inputs into spectrograms using Short Term Fourier Transform and applying deep learning algorithms. The identification of Parkinson's disease can be done by leveraging the deep learning architectures such as Convolutional Neural Networks and Long Short-Term Memory networks. The experiment produced positive findings, with 95.67% accuracy, 97.62% precision, 94.67% recall, and an F1-score of 95.91%. The outcomes indicate that the suggested deep learning method is more successful in PD identification, surpassing the results of traditional classification methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. 基于欠定盲源分离和深度学习的生猪状态音频识别.
- Author
-
潘伟豪, 盛卉子, 王春宇, 闫顺丕, 周小波, 辜丽川, and 焦 俊
- Abstract
Copyright of Journal of South China Agricultural University is the property of Gai Kan Bian Wei Hui and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
23. DIFFCRNN: A NOVEL APPROACH FOR DETECTING SOUND EVENTS IN SMART HOME SYSTEMS USING DIFFUSION-BASED CONVOLUTIONAL RECURRENT NEURAL NETWORK.
- Author
-
AL DABEL, MARYAM M.
- Subjects
CONVOLUTIONAL neural networks ,ARTIFICIAL neural networks ,RECURRENT neural networks ,SMART homes ,SPECTROGRAMS - Abstract
This paper presents a latent diffusion model and convolutional recurrent neural network for detecting sound event, fusing advantages of different networks together to advance security applications and smart home systems. The proposed approach underwent initial training using extensive datasets and subsequently applied transfer learning to adapt to the desired task to effectively mitigate the challenge of limited data availability. It employs the latent diffusion model to get a discrete representation that is compressed from the mel-spectrogram of audio. Subsequently a convolutional neural network (CNN) is linked as the front-end of recurrent neural network (RNN) which produces a feature map. After that, an attention module predicts attention maps in temporal-spectral dimensions level, from the feature map. The input spectrogram is subsequently multiplied with the generated attention maps for adaptive feature refinement. Finally, trainable scalar weights aggregate the fine-tuned features from the back-end RNN. The experimental findings show that the proposed method performs better compared to the state-of-art using three datasets: the DCASE2016-SED, DCASE2017-SED and URBAN-SED. In experiments on the first dataset, DCASE2016-SED, the performance of the approach reached a peak in F1 of 66.2% and ER of 0.42. Using the second dataset, DCASE2017-SED, the results indicate that the F1 and ER achieved 68.1% and 0.40, respectively. Further investigation with the third dataset, URBAN-SED, demonstrates that our proposed approach significantly outperforms existing alternatives as 74.3% and 0.44 for the F1 and ER. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Exploring fish choruses: patterns revealed through PCA computed from daily spectrograms.
- Author
-
Sánchez-Gendriz, Ignacio, Luna-Naranjo, D., Guedes, Luiz Affonso, López, José D., and Padovese, L. R.
- Subjects
PRINCIPAL components analysis ,MARINE ecology ,ENVIRONMENTAL monitoring ,DATA mining ,SPECTROGRAMS - Abstract
Soundscape analysis has become integral to environmental monitoring, particularly in marine and terrestrial settings. Fish choruses within marine ecosystems provide essential descriptors for environmental characterization. This study employed a month-long sequence of continuous underwater recordings to generate 24-h spectrograms, utilizing Principal Component Analysis (PCA) specifically adapted to analyze fish choruses. The spectrograms were constructed using a frequency range from 0 to 5 kHz, represented by 1,025 spectral points (frequency binwidth 5 Hz) on a linear scale. A preliminary spectral subsampling reduced the frequency components to 205 spectral points. PCA was then applied to this subsampled data, selecting 7 principal components (PCs) that explained 95% of the variance. To enhance visualization and interpretation, we introduced "acoustic maps" portrayed as heatmaps. This methodology proved valuable in characterizing the structure of the observed environment and capturing pertinent diel patterns of fish choruses. Additionally, these PCA components can be analyzed using acoustic maps to reveal hidden dynamics within the marine acoustic environment. The dimensionality reduction achieved by PCA not only streamlined data handling but also enabled the extraction of spectral information pertinent to fish choruses and the temporal dynamics of the soundscape. In conclusion, our methodology presents a versatile framework extendable to diverse biological choruses and ecoacoustic studies. The straightforward, easily interpretable analysis leverages computations derived from 24-h spectrograms, offering novel insights into the daily dynamics of biological. Choruses and contributing to future advancements in ecoacoustic research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. PREDICTING CARDIAC HEALTH USING SUB-COMPONENT OF A PHONOCARDIOGRAM.
- Author
-
ARORA, SHRUTI, JAIN, SUSHMA, and CHANA, INDERVEER
- Subjects
- *
SOUND waves , *HEART sounds , *ELECTRONIC surveillance , *ARTIFICIAL intelligence , *HEART diseases - Abstract
There has been a steady rise in the number of deaths throughout the world due to heart diseases. This can be mitigated, to a large extent, if cardiovascular disorders can be detected timely and efficiently. Electrocardiograms (ECGs) and phonocardiograms (PCGs) are the two most popular diagnostic tools used for detecting cardiac problems. Another simple and efficient method for quickly identifying cardiovascular illness is Auscultation. In this work, the cardiac sound signal has been transformed into its equivalent spectrogram representation for detecting cardiac problems. The novelty of the proposed approach is the deployment of customized transfer learning (TL) models on sub-component of a spectrogram called Harmonic Spectrogram, instead of taking full spectrogram. Experiments have been conducted using PhysioNet 2016, which is considered a benchmark dataset. TL models, viz. MobileNet, DenseNet121, InceptionResnetV2, VGG16, and InceptionV3 have been put to use for categorizing cardiac sound waves as normal or pathological. The results exhibit that the MobileNet has achieved greater accuracy (93.45%), recall (92.46%), Precision (97.82%), F1 Score (95.06%) than many of the peers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. MI-CSBO: a hybrid system for myocardial infarction classification using deep learning and Bayesian optimization.
- Author
-
Gül, Evrim, Diker, Aykut, Avcı, Engin, and Doğantekin, Akif
- Subjects
- *
HYBRID systems , *MYOCARDIAL infarction , *DEEP learning , *MACHINE learning , *CONVOLUTIONAL neural networks , *FEATURE selection , *HEART - Abstract
AbstractMyocardial Infarction (MI) refers to damage to the heart tissue caused by an inadequate blood supply to the heart muscle due to a sudden blockage in the coronary arteries. This blockage is often a result of the accumulation of fat (cholesterol) forming plaques (atherosclerosis) in the arteries. Over time, these plaques can crack, leading to the formation of a clot (thrombus), which can block the artery and cause a heart attack. Risk factors for a heart attack include smoking, hypertension, diabetes, high cholesterol, metabolic syndrome, and genetic predisposition. Early diagnosis of MI is crucial. Thus, detecting and classifying MI is essential. This paper introduces a new hybrid approach for MI Classification using Spectrogram and Bayesian Optimization (MI-CSBO) for Electrocardiogram (ECG). First, ECG signals from the PTB Database (PTBDB) were converted from the time domain to the frequency domain using the spectrogram method. Then, a deep residual CNN was applied to the test and train datasets of ECG imaging data. The ECG dataset trained using the Deep Residual model was then acquired. Finally, the Bayesian approach, NCA feature selection, and various machine learning algorithms (k-NN, SVM, Tree, Bagged, Naïve Bayes, Ensemble) were used to derive performance measures. The MI-CSBO method achieved a 100% correct diagnosis rate, as detailed in the Experimental Results section. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. E-BDL: Enhanced Band-Dependent Learning Framework for Augmented Radar Sensing.
- Author
-
Cai, Fulin, Wu, Teresa, and Lure, Fleming Y. M.
- Subjects
- *
DOPPLER effect , *RADAR , *ALZHEIMER'S disease , *MOTION capture (Human mechanics) , *DEEP learning , *ADAPTIVE filters , *MULTISPECTRAL imaging - Abstract
Radar sensors, leveraging the Doppler effect, enable the nonintrusive capture of kinetic and physiological motions while preserving privacy. Deep learning (DL) facilitates radar sensing for healthcare applications such as gait recognition and vital-sign measurement. However, band-dependent patterns, indicating variations in patterns and power scales associated with frequencies in time–frequency representation (TFR), challenge radar sensing applications using DL. Frequency-dependent characteristics and features with lower power scales may be overlooked during representation learning. This paper proposes an Enhanced Band-Dependent Learning framework (E-BDL) comprising an adaptive sub-band filtering module, a representation learning module, and a sub-view contrastive module to fully detect band-dependent features in sub-frequency bands and leverage them for classification. Experimental validation is conducted on two radar datasets, including gait abnormality recognition for Alzheimer's disease (AD) and AD-related dementia (ADRD) risk evaluation and vital-sign monitoring for hemodynamics scenario classification. For hemodynamics scenario classification, E-BDL-ResNet achieves competitive performance in overall accuracy and class-wise evaluations compared to recent methods. For ADRD risk evaluation, the results demonstrate E-BDL-ResNet's superior performance across all candidate models, highlighting its potential as a clinical tool. E-BDL effectively detects salient sub-bands in TFRs, enhancing representation learning and improving the performance and interpretability of DL-based models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. A Proposed Approach to Utilizing Esp32 Microcontroller for Data Acquisition.
- Author
-
Vy-Khang Tran, Bao-Toan Thai, Hai Pham, Van-Khan Nguyen, and Van-Khanh Nguyen
- Subjects
- *
FAST Fourier transforms , *ANALOG-to-digital converters , *DIGITAL-to-analog converters , *HIGHPASS electric filters , *BANDPASS filters - Abstract
Accurate data acquisition is crucial in embedded systems. This study aimed to evaluate the data acquisition ability of the ESP32 Analog to Digital Converter (ADC) module when combined with the I2S module to collect high-frequency data. Sine waves at various frequencies and white noise were recorded in this mode. The recorded data were analyzed by the fast Fourier transform (FFT) to assess the accuracy of the recorded data and evaluate the generated noise. Digital filters are proposed to improve the quality of the collected signals. A 2D spectrogram imaging algorithm is proposed to convert the data to time-frequency domain images. The results showed that the ADC module could effectively collect signals at frequencies up to 96 kHz; frequency errors were proportional to the sampling rate, and the maximum was 79.6 Hz, equivalent to 0.38%. The execution time of the lowpass and highpass filters was about 6.83 ms and for the bandpass filter about 5.97 ms; the spectrogram imaging time was 40 ms; while the calculation time for an FFT transform was approximately 1.14 ms, which is appropriate for real-time running. These results are significant for data collection systems based on microcontrollers and are a premise for deploying TinML networks on embedded systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. BASSA: New software tool reveals hidden details in visualisation of low‐frequency animal sounds.
- Author
-
Jancovich, Benjamin A. and Rogers, Tracey L.
- Subjects
- *
ANIMAL sounds , *WHALE sounds , *ACOUSTICS , *BLUE whale , *ANIMAL communication - Abstract
The study of animal sounds in biology and ecology relies heavily upon time–frequency (TF) visualisation, most commonly using the short‐time Fourier transform (STFT) spectrogram. This method, however, has inherent bias towards either temporal or spectral details that can lead to misinterpretation of complex animal sounds. An ideal TF visualisation should accurately convey the structure of the sound in terms of both frequency and time, however, the STFT often cannot meet this requirement. We evaluate the accuracy of four TF visualisation methods (superlet transform [SLT], continuous wavelet transform [CWT] and two STFTs) using a synthetic test signal. We then apply these methods to visualise sounds of the Chagos blue whale, Asian elephant, southern cassowary, eastern whipbird, mulloway fish and the American crocodile. We show that the SLT visualises the test signal with 18.48%–28.08% less error than the other methods. A comparison between our visualisations of animal sounds and their literature descriptions indicates that the STFT's bias may have caused misinterpretations in describing pygmy blue whale songs and elephant rumbles. We suggest that use of the SLT to visualise low‐frequency animal sounds may prevent such misinterpretations. Finally, we employ the SLT to develop 'BASSA', an open‐source, GUI software application that offers a no‐code, user‐friendly tool for analysing short‐duration recordings of low‐frequency animal sounds for the Windows platform. The SLT visualises low‐frequency animal sounds with improved accuracy, in a user‐friendly format, minimising the risk of misinterpretation while requiring less technical expertise than the STFT. Using this method could propel advances in acoustics‐driven studies of animal communication, vocal production methods, phonation and species identification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. MusicNeXt: Addressing category bias in fused music using musical features and genre-sensitive adjustment layer.
- Author
-
Meng, Shiting, Hao, Qingbo, Xiao, Yingyuan, and Zheng, Wenguang
- Subjects
- *
CONVOLUTIONAL neural networks , *POPULAR music genres , *MUSICAL meter & rhythm , *DEEP learning , *FEATURE extraction - Abstract
Convolutional neural networks (CNNs) have been successfully applied to music genre classification tasks. With the development of diverse music, genre fusion has become common. Fused music exhibits multiple similar musical features such as rhythm, timbre, and structure, which typically arise from the temporal information in the spectrum. However, traditional CNNs cannot effectively capture temporal information, leading to difficulties in distinguishing fused music. To address this issue, this study proposes a CNN model called MusicNeXt for music genre classification. Its goal is to enhance the feature extraction method to increase focus on musical features, and increase the distinctiveness between different genres, thereby reducing classification result bias. Specifically, we construct the feature extraction module which can fully utilize temporal information, thereby enhancing its focus on music features. It exhibits an improved understanding of the complexity of fused music. Additionally, we introduce a genre-sensitive adjustment layer that strengthens the learning of differences between different genres through within-class angle constraints. This leads to increased distinctiveness between genres and provides interpretability for the classification results. Experimental results demonstrate that our proposed MusicNeXt model outperforms baseline networks and other state-of-the-art methods in music genre classification tasks, without generating category bias in the classification results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Human micro-doppler detection and classification studies at Mersin University using real outdoor experiments via C-band FMCW radar.
- Author
-
Tekir, Onur and Özdemir, Caner
- Subjects
MOVEMENT disorders ,DOPPLER effect ,GAIT disorders ,RADAR ,SPECTROGRAMS - Abstract
In this work, a unique radar hardware is introduced for human-gait micro-Doppler studies. The developed radar sensor operates in C-band microwave frequencies. We share several outdoor experiments at Mersin University facilities to detect and characterize human walking and running movements. In these experiments, various walking and running movements were performed with different people. To examine the Doppler properties of human motion, raw data gathered is transformed onto 2D jointtime-frequency plane. The generation of micro-Doppler signatures in the transformed data is the first step in the extraction of features of the walking/running human motion. It is shown that the directions, durations, range distances as well as torso and limb velocities of walking and running human movements in each experiment are successfully obtained from these micro-Doppler signatures. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. RESEARCH ON STOCHASTIC PROPERTIES OF TIME SERIES DATA ON CHEMICAL ANALYSIS OF CAST IRON.
- Author
-
Sidanchenko, V. V. and Gusev, O. Yu.
- Subjects
TIME series analysis ,CAST-iron ,DYNAMICAL systems ,BLAST furnaces ,CHAOS theory - Abstract
Purpose. To provide a procedure for identifying chaotic processes in a dynamic system and to examine time series, describing the chemical composition of cast iron at the blast furnace output with the purpose of identifying the nonlinearity of the investigated system and detecting the presence of chaotic processes in it. Methodology. The determination of the unique characteristics of the attractor of a dynamic chaotic system based on the time series of cast iron’s chemical composition values was carried out using methods of nonlinear dynamics and dynamic chaos theory, such as the autocorrelation function method, correlation and fractal dimensions. Findings. The methods of nonlinear dynamics and dynamic chaos theory were used to study the behavior of time series data on the chemical composition of cast iron at the blast furnace output. The presence was identified of chaotic processes with a fractal structure in the studied dynamic system, leading to the inefficiency of traditional analysis methods based on the Gaussian properties of stochastic processes. Originality. For the first time, the possibility and feasibility of applying chaos theory methods for the analysis and prediction of time series data on the chemical composition of cast iron at the blast furnace output were substantiated. For the first time, the nonlinearity of the studied dynamic system was identified, and chaotic processes were discovered within it by determining the unique characteristics of the strange attractor of the system using the analyzed time series, such as embedding dimension, time delay, and the largest Lyapunov exponent. Practical value. The obtained results open up the possibility for more effective and qualitative analysis of the behavior of the studied dynamic system by developing new tools for assessment and prediction that are adequate to the nature of the ongoing processes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis.
- Author
-
Younis, Samir A., Sobhy, Dalia, and Tawfik, Noha S.
- Subjects
TRANSFORMER models ,CONVOLUTIONAL neural networks ,INFANT care ,DEEP learning ,PERFORMANCE theory - Abstract
Crying is a newborn's main way of communicating. Despite their apparent similarity, newborn cries are physically generated and have distinct characteristics. Experienced medical professionals, nurses, and parents are able to recognize these variations based on their prior interactions. Nonetheless, interpreting a baby's cries can be challenging for carers, first-time parents, and inexperienced paediatricians. This paper uses advanced deep learning techniques to propose a novel approach for baby cry classification. This study aims to accurately classify different cry types associated with everyday infant needs, including hunger, discomfort, pain, tiredness, and the need for burping. The proposed model achieves an accuracy of 98.33%, surpassing the performance of existing studies in the field. IoT-enabled sensors are utilized to capture cry signals in real time, ensuring continuous and reliable monitoring of the infant's acoustic environment. This integration of IoT technology with deep learning enhances the system's responsiveness and accuracy. Our study highlights the significance of accurate cry classification in understanding and meeting the needs of infants and its potential impact on improving infant care practices. The methodology, including the dataset, preprocessing techniques, and architecture of the deep learning model, is described. The results demonstrate the performance of the proposed model, and the discussion analyzes the factors contributing to its high accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Sequential Patch Analysis Framework for Lung Disease Classification
- Author
-
Le, Kim-Ngoc T., Le, Duc-Tai, Choo, Hyunseung, Li, Gang, Series Editor, Filipe, Joaquim, Series Editor, Xu, Zhiwei, Series Editor, Dang, Tran Khanh, editor, Küng, Josef, editor, and Chung, Tai M., editor
- Published
- 2024
- Full Text
- View/download PDF
35. Enhancing Bird Migration Studies: Detecting Birdsong in Audio Files Using Convolutional Neural Networks
- Author
-
Honsor, Oksana, Gonsor, Yuriy, Xhafa, Fatos, Series Editor, Hu, Zhengbing, editor, Zhang, Qingying, editor, and He, Matthew, editor
- Published
- 2024
- Full Text
- View/download PDF
36. Music Genre Classification System Using Deep Learning Algorithm
- Author
-
Chatterjee, Ritam, Agarwal, Kushal, Bajari, Hrithik, Ghosh, Ritesh Kumar, Pramanik, Sabyasachi, Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, Goar, Vishal, editor, Sharma, Aditi, editor, Shin, Jungpil, editor, and Mridha, M. Firoz, editor
- Published
- 2024
- Full Text
- View/download PDF
37. Diffusion-Based Convolutional Recurrent Neural Network for Improving Sound Event Detection
- Author
-
M. Al Dabel, Maryam, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Yang, Xin-She, editor, Sherratt, Simon, editor, Dey, Nilanjan, editor, and Joshi, Amit, editor
- Published
- 2024
- Full Text
- View/download PDF
38. Gender and Age Extraction from Audio Signal Using Convolutional Neural Network, MFCC and Spectrogram
- Author
-
Karaoui, Fazia, Djeradi, Rachida, Djeradi, Amar, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Daimi, Kevin, editor, and Al Sadoon, Abeer, editor
- Published
- 2024
- Full Text
- View/download PDF
39. Audio Data Feature Extraction for Speaker Diarization
- Author
-
Pande, Vinod K., Kale, Vijay K., Tharewal, Sumegh, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Woungang, Isaac, editor, Dhurandher, Sanjay Kumar, editor, and Singh, Yumnam Jayanta, editor
- Published
- 2024
- Full Text
- View/download PDF
40. A Comparative Study of Audio Source Separation Techniques on Indian Classical Music: Performance Evaluation and Analysis
- Author
-
Pandith, Yajna, Pavan, H., Abhishek, T. H., Ashwini, B., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Senjyu, Tomonobu, editor, So–In, Chakchai, editor, and Joshi, Amit, editor
- Published
- 2024
- Full Text
- View/download PDF
41. Study of Relationships Between Time Series by Co-spectral Analysis
- Author
-
d’Ovidio, Francesco Domenico, Firza, Najada, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Goos, Gerhard, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Gervasi, Osvaldo, editor, Murgante, Beniamino, editor, Garau, Chiara, editor, Taniar, David, editor, C. Rocha, Ana Maria A., editor, and Faginas Lago, Maria Noelia, editor
- Published
- 2024
- Full Text
- View/download PDF
42. Cloud-Based Anomaly Detection for Broken Rail Track Using LSTM Autoencoders and Cross-modal Audio Analysis
- Author
-
Rath, Smita, Upadhyay, Hans, Prakash, Somya, Raja, Harsh, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Nanda, Umakanta, editor, Tripathy, Asis Kumar, editor, Sahoo, Jyoti Prakash, editor, Sarkar, Mahasweta, editor, and Li, Kuan-Ching, editor
- Published
- 2024
- Full Text
- View/download PDF
43. Machine Learning Assessment of Battery State-of-Health
- Author
-
Rizanov, Stefan, Stoynova, Anna, Kafadarova, Nadezhda, Sotirov, Sotir, Bonev, Borislav, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Nagar, Atulya K., editor, Jat, Dharm Singh, editor, Mishra, Durgesh Kumar, editor, and Joshi, Amit, editor
- Published
- 2024
- Full Text
- View/download PDF
44. Enhanced Sound Recognition and Classification Through Spectrogram Analysis, MEMS Sensors, and PyTorch: A Comprehensive Approach
- Author
-
Spournias, Alexandros, Nanos, Nikolaos, Faliagka, Evanthia, Antonopoulos, Christos, Voros, Nikolaos, Keramidas, Giorgos, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Gao, Honghao, editor, Wang, Xinheng, editor, and Voros, Nikolaos, editor
- Published
- 2024
- Full Text
- View/download PDF
45. Audio Event Detection Based on Cross Correlation in Selected Frequency Bands of Spectrogram
- Author
-
Hajihashemi, Vahid, Gharahbagh, Abdorreza Alavi, Machado, J. J. M., Tavares, João Manuel R. S., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Rocha, Alvaro, editor, Adeli, Hojjat, editor, Dzemyda, Gintautas, editor, Moreira, Fernando, editor, and Colla, Valentina, editor
- Published
- 2024
- Full Text
- View/download PDF
46. Fusion Spectrogram for Sound Classification Using 2D Convolutional Neural Network
- Author
-
Presannakumar, Krishna, Mohamed, Anuj, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, Gabbouj, Moncef, editor, Pandey, Shyam Sudhir, editor, Garg, Hari Krishna, editor, and Hazra, Ranjay, editor
- Published
- 2024
- Full Text
- View/download PDF
47. Fourier Chromagrams for Fingerprinting, Verification and Authentication of Digital Audio Recordings
- Author
-
Lependin, Andrey, Ladygin, Pavel, Karev, Valentin, Mansurov, Alexander, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Jordan, Vladimir, editor, Tarasov, Ilya, editor, Shurina, Ella, editor, Filimonov, Nikolay, editor, and Faerman, Vladimir A., editor
- Published
- 2024
- Full Text
- View/download PDF
48. 'Seeing Sound': Audio Classification Using the Wigner-Ville Distribution and Convolutional Neural Networks
- Author
-
Marios, Christonasis Antonios, van Eijndhoven, Stef, Duin, Peter, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2024
- Full Text
- View/download PDF
49. Schizophrenia Identification Through Deep Learning on Spectrogram Images
- Author
-
Prabhakara Rao, Amarana, Prasanna Kumar, G., Ranjan, Rakesh, Venkata Subba Rao, M., Srinivasulu, M., Sravya, E., Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Pareek, Prakash, editor, Gupta, Nishu, editor, and Reis, M. J. C. S., editor
- Published
- 2024
- Full Text
- View/download PDF
50. Detection and Classification of Categories of Dysphonia Using Convolutional Neural Network
- Author
-
da Silva Moura, Ronaldo, Maia, Joaquim Miguel, Dajer, María Eugenia, Magjarević, Ratko, Series Editor, Ładyżyński, Piotr, Associate Editor, Ibrahim, Fatimah, Associate Editor, Lackovic, Igor, Associate Editor, Rock, Emilio Sacristan, Associate Editor, Marques, Jefferson Luiz Brum, editor, Rodrigues, Cesar Ramos, editor, Suzuki, Daniela Ota Hisayasu, editor, Marino Neto, José, editor, and García Ojeda, Renato, editor
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.