Author: "Schuller A" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Schuller A"' showing total 29,410 results

Start Over Author "Schuller A"

29,410 results on '"Schuller A"'

1. Does the Definition of Difficulty Matter? Scoring Functions and their Role for Curriculum Learning

Author: Rampp, Simon, Milling, Manuel, Triantafyllopoulos, Andreas, and Schuller, Björn W.
Subjects: Computer Science - Machine Learning
Abstract: Curriculum learning (CL) describes a machine learning training strategy in which samples are gradually introduced into the training process based on their difficulty. Despite a partially contradictory body of evidence in the literature, CL finds popularity in deep learning research due to its promise of leveraging human-inspired curricula to achieve higher model performance. Yet, the subjectivity and biases that follow any necessary definition of difficulty, especially for those found in orderings derived from models or training statistics, have rarely been investigated. To shed more light on the underlying unanswered questions, we conduct an extensive study on the robustness and similarity of the most common scoring functions for sample difficulty estimation, as well as their potential benefits in CL, using the popular benchmark dataset CIFAR-10 and the acoustic scene classification task from the DCASE2020 challenge as representatives of computer vision and computer audition, respectively. We report a strong dependence of scoring functions on the training setting, including randomness, which can partly be mitigated through ensemble scoring. While we do not find a general advantage of CL over uniform sampling, we observe that the ordering in which data is presented for CL-based training plays an important role in model performance. Furthermore, we find that the robustness of scoring functions across random seeds positively correlates with CL performance. Finally, we uncover that models trained with different CL strategies complement each other by boosting predictive power through late fusion, likely due to differences in the learnt concepts. Alongside our findings, we release the aucurriculum toolkit (https://github.com/autrainer/aucurriculum), implementing sample difficulty and CL-based training in a modular fashion.
Published: 2024

2. Audio-based Kinship Verification Using Age Domain Conversion

Author: Sun, Qiyang, Akman, Alican, Jing, Xin, Milling, Manuel, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing, 68T10, I.5.4, I.2.6
Abstract: Audio-based kinship verification (AKV) is important in many domains, such as home security monitoring, forensic identification, and social network analysis. A key challenge in the task arises from differences in age across samples from different individuals, which can be interpreted as a domain bias in a cross-domain verification task. To address this issue, we design the notion of an "age-standardised domain" wherein we utilise the optimised CycleGAN-VC3 network to perform age-audio conversion to generate the in-domain audio. The generated audio dataset is employed to extract a range of features, which are then fed into a metric learning architecture to verify kinship. Experiments are conducted on the KAN_AV audio dataset, which contains age and kinship labels. The results demonstrate that the method markedly enhances the accuracy of kinship verification, while also offering novel insights for future kinship verification research., Comment: 4 pages, 2 figures, submitted to IEEE Signal Processing Letters
Published: 2024

3. Audio Explanation Synthesis with Generative Foundation Models

Author: Akman, Alican, Sun, Qiyang, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The increasing success of audio foundation models across various tasks has led to a growing need for improved interpretability to understand their intricate decision-making processes better. Existing methods primarily focus on explaining these models by attributing importance to elements within the input space based on their influence on the final decision. In this paper, we introduce a novel audio explanation method that capitalises on the generative capacity of audio foundation models. Our method leverages the intrinsic representational power of the embedding space within these models by integrating established feature attribution techniques to identify significant features in this space. The method then generates listenable audio explanations by prioritising the most important features. Through rigorous benchmarking against standard datasets, including keyword spotting and speech emotion recognition, our model demonstrates its efficacy in producing audio explanations.
Published: 2024

4. PerCo (SD): Open Perceptual Compression

Author: Körber, Nikolai, Kromer, Eduard, Siebert, Andreas, Hauke, Sascha, Mueller-Gritschneder, Daniel, and Schuller, Björn
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce PerCo (SD), a perceptual image compression method based on Stable Diffusion v2.1, targeting the ultra-low bit range. PerCo (SD) serves as an open and competitive alternative to the state-of-the-art method PerCo, which relies on a proprietary variant of GLIDE and remains closed to the public. In this work, we review the theoretical foundations, discuss key engineering decisions in adapting PerCo to the Stable Diffusion ecosystem, and provide a comprehensive comparison, both quantitatively and qualitatively. On the MSCOCO-30k dataset, PerCo (SD) demonstrates improved perceptual characteristics at the cost of higher distortion. We partly attribute this gap to the different model capacities being used (866M vs. 1.4B). We hope our work contributes to a deeper understanding of the underlying mechanisms and paves the way for future advancements in the field. Code and trained models will be released at https://github.com/Nikolai10/PerCo.
Published: 2024

5. Trading through Earnings Seasons using Self-Supervised Contrastive Representation Learning

Author: Ye, Zhengxin Joseph and Schuller, Bjoern
Subjects: Computer Science - Machine Learning, Quantitative Finance - Trading and Market Microstructure
Abstract: Earnings release is a key economic event in the financial markets and crucial for predicting stock movements. Earnings data gives a glimpse into how a company is doing financially and can hint at where its stock might go next. However, the irregularity of its release cycle makes it a challenge to incorporate this data in a medium-frequency algorithmic trading model and the usefulness of this data fades fast after it is released, making it tough for models to stay accurate over time. Addressing this challenge, we introduce the Contrastive Earnings Transformer (CET) model, a self-supervised learning approach rooted in Contrastive Predictive Coding (CPC), aiming to optimise the utilisation of earnings data. To ascertain its effectiveness, we conduct a comparative study of CET against benchmark models across diverse sectors. Our research delves deep into the intricacies of stock data, evaluating how various models, and notably CET, handle the rapidly changing relevance of earnings data over time and over different sectors. The research outcomes shed light on CET's distinct advantage in extrapolating the inherent value of earnings data over time. Its foundation on CPC allows for a nuanced understanding, facilitating consistent stock predictions even as the earnings data ages. This finding about CET presents a fresh approach to better use earnings data in algorithmic trading for predicting stock price trends.
Published: 2024

6. Angular Divergent Component of Motion: A step towards planning Spatial DCM Objectives for Legged Robots

Author: Herron, Connor W., Schuller, Robert, Beiter, Benjamin C., Griffin, Robert J., Leonessa, Alexander, and Englsberger, Johannes
Subjects: Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control
Abstract: In this work, the Divergent Component of Motion (DCM) method is expanded to include angular coordinates for the first time. This work introduces the idea of spatial DCM, which adds an angular objective to the existing linear DCM theory. To incorporate the angular component into the framework, a discussion is provided on extending beyond the linear motion of the Linear Inverted Pendulum model (LIPM) towards the Single Rigid Body model (SRBM) for DCM. This work presents the angular DCM theory for a 1D rotation, simplifying the SRBM rotational dynamics to a flywheel to satisfy necessary linearity constraints. The 1D angular DCM is mathematically identical to the linear DCM and defined as an angle which is ahead of the current body rotation based on the angular velocity. This theory is combined into a 3D linear and 1D angular DCM framework, with discussion on the feasibility of simultaneously achieving both sets of objectives. A simulation in MATLAB and hardware results on the TORO humanoid are presented to validate the framework's performance.
Published: 2024

7. Affective Computing Has Changed: The Foundation Model Disruption

Author: Schuller, Björn, Mallol-Ragolta, Adria, Almansa, Alejandro Peña, Tsangko, Iosif, Amin, Mostafa M., Semertzidou, Anastasia, Christ, Lukas, and Amiriparian, Shahin
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: The dawn of Foundation Models has on the one hand revolutionised a wide range of research problems, and, on the other hand, democratised the access and use of AI-based tools by the general public. We even observe an incursion of these models into disciplines related to human psychology, such as the Affective Computing domain, suggesting their affective, emerging capabilities. In this work, we aim to raise awareness of the power of Foundation Models in the field of Affective Computing by synthetically generating and analysing multimodal affective data, focusing on vision, linguistics, and speech (acoustics). We also discuss some fundamental problems, such as ethical issues and regulatory aspects, related to the use of Foundation Models in this research area.
Published: 2024

8. Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models

Author: Jing, Xin, Zhou, Kun, Triantafyllopoulos, Andreas, and Schuller, Björn W.
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: While current emotional text-to-speech (TTS) systems can generate highly intelligible emotional speech, achieving fine control over emotion rendering of the output speech still remains a significant challenge. In this paper, we introduce ParaEVITS, a novel emotional TTS framework that leverages the compositionality of natural language to enhance control over emotional rendering. By incorporating a text-audio encoder inspired by ParaCLAP, a contrastive language-audio pretraining (CLAP) model for computational paralinguistics, the diffusion model is trained to generate emotional embeddings based on textual emotional style descriptions. Our framework first trains on reference audio using the audio encoder, then fine-tunes a diffusion model to process textual inputs from ParaCLAP's text encoder. During inference, speech attributes such as pitch, jitter, and loudness are manipulated using only textual conditioning. Our experiments demonstrate that ParaEVITS effectively control emotion rendering without compromising speech quality. Speech demos are publicly available.
Published: 2024

9. Negation Blindness in Large Language Models: Unveiling the NO Syndrome in Image Generation

Author: Nadeem, Mohammad, Sohail, Shahab Saquib, Cambria, Erik, Schuller, Björn W., and Hussain, Amir
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Foundational Large Language Models (LLMs) have changed the way we perceive technology. They have been shown to excel in tasks ranging from poem writing and coding to essay generation and puzzle solving. With the incorporation of image generation capability, they have become more comprehensive and versatile AI tools. At the same time, researchers are striving to identify the limitations of these tools to improve them further. Currently identified flaws include hallucination, biases, and bypassing restricted commands to generate harmful content. In the present work, we have identified a fundamental limitation related to the image generation ability of LLMs, and termed it The NO Syndrome. This negation blindness refers to LLMs inability to correctly comprehend NO related natural language prompts to generate the desired images. Interestingly, all tested LLMs including GPT-4, Gemini, and Copilot were found to be suffering from this syndrome. To demonstrate the generalization of this limitation, we carried out simulation experiments and conducted entropy-based and benchmark statistical analysis tests on various LLMs in multiple languages, including English, Hindi, and French. We conclude that the NO syndrome is a significant flaw in current LLMs that needs to be addressed. A related finding of this study showed a consistent discrepancy between image and textual responses as a result of this NO syndrome. We posit that the introduction of a negation context-aware reinforcement learning based feedback loop between the LLMs textual response and generated image could help ensure the generated text is based on both the LLMs correct contextual understanding of the negation query and the generated visual output., Comment: 15 pages, 7 figures
Published: 2024

10. Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition

Author: Kounadis-Bastian, Dionyssos, Schrüfer, Oliver, Derington, Anna, Wierstorf, Hagen, Eyben, Florian, Burkhardt, Felix, and Schuller, Björn
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech Emotion Recognition (SER) needs high computational resources to overcome the challenge of substantial annotator disagreement. Today SER is shifting towards dimensional annotations of arousal, dominance, and valence (A/D/V). Universal metrics as the L2 distance prove unsuitable for evaluating A/D/V accuracy due to non converging consensus of annotator opinions. However, Concordance Correlation Coefficient (CCC) arose as an alternative metric for A/D/V where a model's output is evaluated to match a whole dataset's CCC rather than L2 distances of individual audios. Recent studies have shown that wav2vec2 / wavLM architectures outputing a float value for each A/D/V dimension achieve today's State-of-the-art (Sota) CCC on A/D/V. The Wav2Vec2.0 / WavLM family has a high computational footprint, but training small models using human annotations has been unsuccessful. In this paper we use a large Transformer Sota A/D/V model as Teacher/Annotator to train 5 student models: 4 MobileNets and our proposed Wav2Small, using only the Teacher's A/D/V outputs instead of human annotations. The Teacher model we propose also sets a new Sota on the MSP Podcast dataset of valence CCC=0.676. We choose MobileNetV4 / MobileNet-V3 as students, as MobileNet has been designed for fast execution times. We also propose Wav2Small - an architecture designed for minimal parameters and RAM consumption. Wav2Small with an .onnx (quantised) of only 120KB is a potential solution for A/D/V on hardware with low resources, having only 72K parameters vs 3.12M parameters for MobileNet-V4-Small., Comment: Nomenclature
Published: 2024

11. Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance

Author: Milling, Manuel, Liu, Shuo, Triantafyllopoulos, Andreas, Aslan, Ilhan, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications. To guide the optimisation of the AE module towards a target application, and especially to overcome difficult samples, we make use of the sample-wise performance measure as an indication of sample importance. In experiments, we consider four representative applications to evaluate our training paradigm, i.e., ASR, speech command recognition (SCR), speech emotion recognition (SER), and ASC. These applications are associated with speech and non-speech tasks concerning semantic and non-semantic features, transient and global information, and the experimental results indicate that our proposed approach can considerably boost the noise robustness of the models, especially at low signal-to-noise ratios (SNRs), for a wide range of computer audition tasks in everyday-life noisy environments.
Published: 2024
Full Text: View/download PDF

12. Abusive Speech Detection in Indic Languages Using Acoustic Features

Author: Spiesberger, Anika A., Triantafyllopoulos, Andreas, Tsangko, Iosif, and Schuller, Björn W.
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Abusive content in online social networks is a well-known problem that can cause serious psychological harm and incite hatred. The ability to upload audio data increases the importance of developing methods to detect abusive content in speech recordings. However, simply transferring the mechanisms from written abuse detection would ignore relevant information such as emotion and tone. In addition, many current algorithms require training in the specific language for which they are being used. This paper proposes to use acoustic and prosodic features to classify abusive content. We used the ADIMA data set, which contains recordings from ten Indic languages, and trained different models in multilingual and cross-lingual settings. Our results show that it is possible to classify abusive and non-abusive content using only acoustic and prosodic features. The most important and influential features are discussed.
Published: 2024
Full Text: View/download PDF

13. Computer Audition: From Task-Specific Machine Learning to Foundation Models

Author: Triantafyllopoulos, Andreas, Tsangko, Iosif, Gebhard, Alexander, Mesaros, Annamaria, Virtanen, Tuomas, and Schuller, Björn
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Foundation models (FMs) are increasingly spearheading recent advances on a variety of tasks that fall under the purview of computer audition -- the use of machines to understand sounds. They feature several advantages over traditional pipelines: among others, the ability to consolidate multiple tasks in a single model, the option to leverage knowledge from other modalities, and the readily-available interaction with human users. Naturally, these promises have created substantial excitement in the audio community, and have led to a wave of early attempts to build new, general-purpose foundation models for audio. In the present contribution, we give an overview of computational audio analysis as it transitions from traditional pipelines towards auditory foundation models. Our work highlights the key operating principles that underpin those models, and showcases how they can accommodate multiple tasks that the audio community previously tackled separately.
Published: 2024

14. Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset

Author: Liu, Rui, Zuo, Haolin, Lian, Zheng, Xing, Xiaofen, Schuller, Björn W., and Li, Haizhou
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Emotion and Intent Joint Understanding in Multimodal Conversation (MC-EIU) aims to decode the semantic information manifested in a multimodal conversational history, while inferring the emotions and intents simultaneously for the current utterance. MC-EIU is enabling technology for many human-computer interfaces. However, there is a lack of available datasets in terms of annotation, modality, language diversity, and accessibility. In this work, we propose an MC-EIU dataset, which features 7 emotion categories, 9 intent categories, 3 modalities, i.e., textual, acoustic, and visual content, and two languages, i.e., English and Mandarin. Furthermore, it is completely open-source for free access. To our knowledge, MC-EIU is the first comprehensive and rich emotion and intent joint understanding dataset for multimodal conversation. Together with the release of the dataset, we also develop an Emotion and Intent Interaction (EI$^2$) network as a reference system by modeling the deep correlation between emotion and intent in the multimodal conversation. With comparative experiments and ablation studies, we demonstrate the effectiveness of the proposed EI$^2$ method on the MC-EIU dataset. The dataset and codes will be made available at: https://github.com/MC-EIU/MC-EIU., Comment: 26 pages, 8 figures, 12 tables, NeurIPS 2024 Dataset and Benchmark Track
Published: 2024

15. Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition

Author: Schrüfer, Oliver, Milling, Manuel, Burkhardt, Felix, Eyben, Florian, and Schuller, Björn
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Uncertainty Quantification (UQ) is an important building block for the reliable use of neural networks in real-world scenarios, as it can be a useful tool in identifying faulty predictions. Speech emotion recognition (SER) models can suffer from particularly many sources of uncertainty, such as the ambiguity of emotions, Out-of-Distribution (OOD) data or, in general, poor recording conditions. Reliable UQ methods are thus of particular interest as in many SER applications no prediction is better than a faulty prediction. While the effects of label ambiguity on uncertainty are well documented in the literature, we focus our work on an evaluation of UQ methods for SER under common challenges in real-world application, such as corrupted signals, and the absence of speech. We show that simple UQ methods can already give an indication of the uncertainty of a prediction and that training with additional OOD data can greatly improve the identification of such signals., Comment: accepted for Interspeech 2024, 5 pages
Published: 2024

16. Exploring Gender-Specific Speech Patterns in Automatic Suicide Risk Assessment

Author: Gerczuk, Maurice, Amiriparian, Shahin, Lutz, Justina, Strube, Wolfgang, Papazova, Irina, Hasan, Alkomiet, and Schuller, Björn W.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, 68T10, J.3
Abstract: In emergency medicine, timely intervention for patients at risk of suicide is often hindered by delayed access to specialised psychiatric care. To bridge this gap, we introduce a speech-based approach for automatic suicide risk assessment. Our study involves a novel dataset comprising speech recordings of 20 patients who read neutral texts. We extract four speech representations encompassing interpretable and deep features. Further, we explore the impact of gender-based modelling and phrase-level normalisation. By applying gender-exclusive modelling, features extracted from an emotion fine-tuned wav2vec2.0 model can be utilised to discriminate high- from low- suicide risk with a balanced accuracy of 81%. Finally, our analysis reveals a discrepancy in the relationship of speech characteristics and suicide risk between female and male subjects. For men in our dataset, suicide risk increases together with agitation while voice characteristics of female subjects point the other way., Comment: accepted at INTERSPEECH 2024
Published: 2024

17. This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach

Author: Christ, Lukas, Amiriparian, Shahin, Hawighorst, Friederike, Schill, Ann-Kathrin, Boutalikakis, Angelo, Graf-Vlachy, Lorenz, König, Andreas, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Flattery is an important aspect of human communication that facilitates social bonding, shapes perceptions, and influences behavior through strategic compliments and praise, leveraging the power of speech to build rapport effectively. Its automatic detection can thus enhance the naturalness of human-AI interactions. To meet this need, we present a novel audio textual dataset comprising 20 hours of speech and train machine learning models for automatic flattery detection. In particular, we employ pretrained AST, Wav2Vec2, and Whisper models for the speech modality, and Whisper TTS models combined with a RoBERTa text classifier for the textual modality. Subsequently, we build a multimodal classifier by combining text and audio representations. Evaluation on unseen test data demonstrates promising results, with Unweighted Average Recall scores reaching 82.46% in audio-only experiments, 85.97% in text-only experiments, and 87.16% using a multimodal approach., Comment: Interspeech 2024
Published: 2024

18. Speech Emotion Recognition under Resource Constraints with Data Distillation

Author: Chang, Yi, Ren, Zhao, Zhao, Zhonghao, Nguyen, Thanh Tam, Qian, Kun, Schultz, Tanja, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment of SER models. To address these challenges, we propose a data distillation framework to facilitate efficient development of SER models in IoT applications using a synthesised, smaller, and distilled dataset. Our experiments demonstrate that the distilled dataset can be effectively utilised to train SER models with fixed initialisation, achieving performances comparable to those developed using the original full emotional speech dataset.
Published: 2024

19. Voltage control of spin resonance in phase change materials

Author: Chen, Tian-Yue, Ren, Haowen, Ghazikhanian, Nareg, Hage, Ralph El, Sasaki, Dayne Y., Salev, Pavel, Takamura, Yayoi, Schuller, Ivan K., and Kent, Andrew D.
Subjects: Physics - Applied Physics, Condensed Matter - Materials Science
Abstract: Metal-insulator transitions (MITs) in resistive switching materials can be triggered by an electric stimulus that produces significant changes in the electrical response. When these phases have distinct magnetic characteristics, dramatic changes in spin excitations are also expected. The transition metal oxide La0.7Sr0.3MnO3 (LSMO) is a ferromagnetic metal at low temperatures and a paramagnetic insulator above room temperature. When LSMO is in its metallic phase a critical electrical bias has been shown to lead to an MIT that results in the formation of a paramagnetic resistive barrier transverse to the applied electric field. Using spin-transfer ferromagnetic resonance spectroscopy, we show that even for electrical biases less than the critical value that triggers the MIT, there is magnetic phase separation with the spin-excitation resonances varying systematically with applied bias. Thus, applied voltages provide a means to alter spin resonance characteristics of interest for neuromorphic circuits.
Published: 2024

20. ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets

Author: Amiriparian, Shahin, Packań, Filip, Gerczuk, Maurice, and Schuller, Björn W.
Subjects: Computer Science - Computation and Language, 68T10, I.2
Abstract: Foundation models have shown great promise in speech emotion recognition (SER) by leveraging their pre-trained representations to capture emotion patterns in speech signals. To further enhance SER performance across various languages and domains, we propose a novel twofold approach. First, we gather EmoSet++, a comprehensive multi-lingual, multi-cultural speech emotion corpus with 37 datasets, 150,907 samples, and a total duration of 119.5 hours. Second, we introduce ExHuBERT, an enhanced version of HuBERT achieved by backbone extension and fine-tuning on EmoSet++. We duplicate each encoder layer and its weights, then freeze the first duplicate, integrating an extra zero-initialized linear layer and skip connections to preserve functionality and ensure its adaptability for subsequent fine-tuning. Our evaluation on unseen datasets shows the efficacy of ExHuBERT, setting a new benchmark for various SER tasks. Model and details on EmoSet++: https://huggingface.co/amiriparian/ExHuBERT., Comment: accepted at INTERSPEECH 2024
Published: 2024

21. The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition

Author: Amiriparian, Shahin, Christ, Lukas, Kathan, Alexander, Gerczuk, Maurice, Müller, Niklas, Klug, Steffen, Stappen, Lukas, König, Andreas, Cambria, Erik, Schuller, Björn, and Eulitz, Simone
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, 68T10, I.2
Abstract: The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals such as assertiveness, dominance, likability, and sincerity based on the provided audio-visual data. The Cross-Cultural Humor Detection Sub-Challenge (MuSe-Humor) dataset expands upon the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset, focusing on the detection of spontaneous humor in a cross-lingual and cross-cultural setting. The main objective of MuSe 2024 is to unite a broad audience from various research domains, including multimodal sentiment analysis, audio-visual affective computing, continuous signal processing, and natural language processing. By fostering collaboration and exchange among experts in these fields, the MuSe 2024 endeavors to advance the understanding and application of sentiment analysis and affective computing across multiple modalities. This baseline paper provides details on each sub-challenge and its corresponding dataset, extracted features from each data modality, and discusses challenge baselines. For our baseline system, we make use of a range of Transformers and expert-designed features and train Gated Recurrent Unit (GRU)-Recurrent Neural Network (RNN) models on them, resulting in a competitive baseline system. On the unseen test datasets of the respective sub-challenges, it achieves a mean Pearson's Correlation Coefficient ($\rho$) of 0.3573 for MuSe-Perception and an Area Under the Curve (AUC) value of 0.8682 for MuSe-Humor.
Published: 2024

22. DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition

Author: Jing, Xin, Zhang, Luyang, Xie, Jiangjian, Gebhard, Alexander, Baird, Alice, and Schuller, Bjoern
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: In ornithology, bird species are known to have variedit's widely acknowledged that bird species display diverse dialects in their calls across different regions. Consequently, computational methods to identify bird species onsolely through their calls face critsignificalnt challenges. There is growing interest in understanding the impact of species-specific dialects on the effectiveness of bird species recognition methods. Despite potential mitigation through the expansion of dialect datasets, the absence of publicly available testing data currently impedes robust benchmarking efforts. This paper presents the Dialect Dominated Dataset of Bird Vocalisation, the first cross-corpus dataset that focuses on dialects in bird vocalisations. The DB3V comprises more than 25 hours of audio recordings from 10 bird species distributed across three distinct regions in the contiguous United States (CONUS). In addition to presenting the dataset, we conduct analyses and establish baseline models for cross-corpus bird recognition. The data and code are publicly available online: https://zenodo.org/records/11544734, Comment: accepted by Interspeech 2024
Published: 2024

23. ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks

Author: Jing, Xin, Triantafyllopoulos, Andreas, and Schuller, Björn
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Contrastive language-audio pretraining (CLAP) has recently emerged as a method for making audio analysis more generalisable. Specifically, CLAP-style models are able to `answer' a diverse set of language queries, extending the capabilities of audio models beyond a closed set of labels. However, CLAP relies on a large set of (audio, query) pairs for pretraining. While such sets are available for general audio tasks, like captioning or sound event detection, there are no datasets with matched audio and text queries for computational paralinguistic (CP) tasks. As a result, the community relies on generic CLAP models trained for general audio with limited success. In the present study, we explore training considerations for ParaCLAP, a CLAP-style model suited to CP, including a novel process for creating audio-language queries. We demonstrate its effectiveness on a set of computational paralinguistic tasks, where it is shown to surpass the performance of open-source state-of-the-art models., Comment: Accepted by Interspeech 2024
Published: 2024

24. Enrolment-based personalisation for improving individual-level fairness in speech emotion recognition

Author: Triantafyllopoulos, Andreas and Schuller, Björn
Subjects: Computer Science - Computation and Language
Abstract: The expression of emotion is highly individualistic. However, contemporary speech emotion recognition (SER) systems typically rely on population-level models that adopt a `one-size-fits-all' approach for predicting emotion. Moreover, standard evaluation practices measure performance also on the population level, thus failing to characterise how models work across different speakers. In the present contribution, we present a new method for capitalising on individual differences to adapt an SER model to each new speaker using a minimal set of enrolment utterances. In addition, we present novel evaluation schemes for measuring fairness across different speakers. Our findings show that aggregated evaluation metrics may obfuscate fairness issues on the individual-level, which are uncovered by our evaluation, and that our proposed method can improve performance both in aggregated and disaggregated terms., Comment: Accepted to INTERSPEECH 2024
Published: 2024

25. INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition

Author: Triantafyllopoulos, Andreas, Batliner, Anton, Rampp, Simon, Milling, Manuel, and Schuller, Björn
Subjects: Computer Science - Computation and Language
Abstract: We revisit the INTERSPEECH 2009 Emotion Challenge -- the first ever speech emotion recognition (SER) challenge -- and evaluate a series of deep learning models that are representative of the major advances in SER research in the time since then. We start by training each model using a fixed set of hyperparameters, and further fine-tune the best-performing models of that initial setup with a grid search. Results are always reported on the official test set with a separate validation set only used for early stopping. Most models score below or close to the official baseline, while they marginally outperform the original challenge winners after hyperparameter tuning. Our work illustrates that, despite recent progress, FAU-AIBO remains a very challenging benchmark. An interesting corollary is that newer methods do not consistently outperform older ones, showing that progress towards `solving' SER is not necessarily monotonic., Comment: Accepted to INTERSPEECH 2024
Published: 2024

26. Sustained Vowels for Pre- vs Post-Treatment COPD Classification

Author: Triantafyllopoulos, Andreas, Batliner, Anton, Mayr, Wolfgang, Fendler, Markus, Pokorny, Florian, Gerczuk, Maurice, Amiriparian, Shahin, Berghaus, Thomas, and Schuller, Björn
Subjects: Computer Science - Computation and Language
Abstract: Chronic obstructive pulmonary disease (COPD) is a serious inflammatory lung disease affecting millions of people around the world. Due to an obstructed airflow from the lungs, it also becomes manifest in patients' vocal behaviour. Of particular importance is the detection of an exacerbation episode, which marks an acute phase and often requires hospitalisation and treatment. Previous work has shown that it is possible to distinguish between a pre- and a post-treatment state using automatic analysis of read speech. In this contribution, we examine whether sustained vowels can provide a complementary lens for telling apart these two states. Using a cohort of 50 patients, we show that the inclusion of sustained vowels can improve performance to up to 79\% unweighted average recall, from a 71\% baseline using read speech. We further identify and interpret the most important acoustic features that characterise the manifestation of COPD in sustained vowels., Comment: Accepted to INTERSPEECH 2024
Published: 2024

27. Audio-based Step-count Estimation for Running -- Windowing and Neural Network Baselines

Author: Wagner, Philipp, Triantafyllopoulos, Andreas, Gebhard, Alexander, and Schuller, Björn
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In recent decades, running has become an increasingly popular pastime activity due to its accessibility, ease of practice, and anticipated health benefits. However, the risk of running-related injuries is substantial for runners of different experience levels. Several common forms of injuries result from overuse -- extending beyond the recommended running time and intensity. Recently, audio-based tracking has emerged as yet another modality for monitoring running behaviour and performance, with previous studies largely concentrating on predicting runner fatigue. In this work, we investigate audio-based step count estimation during outdoor running, achieving a mean absolute error of 1.098 in window-based step-count differences and a Pearson correlation coefficient of 0.479 when predicting the number of steps in a 5-second window of audio. Our work thus showcases the feasibility of audio-based monitoring for estimating important physiological variables and lays the foundations for further utilising audio sensors for a more thorough characterisation of runner behaviour., Comment: Accepted at EUSIPCO 2024
Published: 2024

28. An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats

Author: Triantafyllopoulos, Andreas, Gebhard, Alexander, Milling, Manuel, Rampp, Simon, and Schuller, Björn
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Prior work in computational bioacoustics has mostly focused on the detection of animal presence in a particular habitat. However, animal sounds contain much richer information than mere presence; among others, they encapsulate the interactions of those animals with other members of their species. Studying these interactions is almost impossible in a naturalistic setting, as the ground truth is often lacking. The use of animals in captivity instead offers a viable alternative pathway. However, most prior works follow a traditional, statistics-based approach to analysing interactions. In the present work, we go beyond this standard framework by attempting to predict the underlying context in interactions between captive \emph{Rousettus Aegyptiacus} using deep neural networks. We reach an unweighted average recall of over 30\% -- more than thrice the chance level -- and show error patterns that differ from our statistical analysis. This work thus represents an important step towards the automatic analysis of states in animals from sound., Comment: Accepted at EUSIPCO 2024
Published: 2024

29. Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning

Author: Christ, Lukas, Amiriparian, Shahin, Milling, Manuel, Aslan, Ilhan, and Schuller, Björn W.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by introducing continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories. We collect additional annotations for this data and map the categorical labels to the continuous valence and arousal space. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. The best configuration achieves a Concordance Correlation Coefficient (CCC) of $.8221$ for valence and $.7125$ for arousal on the test set, demonstrating the efficacy of our proposed approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story. In addition, we uncover the weaknesses of our approach by investigating examples that prove to be difficult to predict., Comment: Accepted to ACL 2024 Findings. arXiv admin note: text overlap with arXiv:2212.11382
Published: 2024

30. Local strain inhomogeneities during electrical triggering of a metal-insulator transition revealed by X-ray microscopy.

Author: Salev, Pavel, Kisiel, Elliot, Sasaki, Dayne, Gunn, Brandon, He, Wei, Feng, Mingzhen, Li, Junjie, Tamura, Nobumichi, Poudyal, Ishwor, Islam, Zahirul, Takamura, Yayoi, Frano, Alex, and Schuller, Ivan
Subjects: X-ray microdiffraction, dark-field X-ray microscopy, in operando microscopy, metal–insulator transition, resistive switching
Abstract: Electrical triggering of a metal-insulator transition (MIT) often results in the formation of characteristic spatial patterns such as a metallic filament percolating through an insulating matrix or an insulating barrier splitting a conducting matrix. When MIT triggering is driven by electrothermal effects, the temperature of the filament or barrier can be substantially higher than the rest of the material. Using X-ray microdiffraction and dark-field X-ray microscopy, we show that electrothermal MIT triggering leads to the development of an inhomogeneous strain profile across the switching device, even when the material does not undergo a pronounced, discontinuous structural transition coinciding with the MIT. Diffraction measurements further reveal evidence of unique features associated with MIT triggering including lattice distortions, tilting, and twinning, which indicate structural nonuniformity of both low- and high-resistance regions inside the switching device. Such lattice deformations do not occur under equilibrium, zero-voltage conditions, highlighting the qualitative difference between states achieved through increasing temperature and applying voltage in nonlinear electrothermal materials. Electrically induced strain, lattice distortions, and twinning could have important contributions in the MIT triggering process and drive the material into nonequilibrium states, providing an unconventional pathway to explore the phase space in strongly correlated electronic systems.
Published: 2024

31. Collective dynamics and long-range order in thermal neuristor networks.

Author: Zhang, Yuan-Hang, Sipling, Chesson, Qiu, Erbin, Schuller, Ivan, and Di Ventra, Massimiliano
Abstract: In the pursuit of scalable and energy-efficient neuromorphic devices, recent research has unveiled a novel category of spiking oscillators, termed thermal neuristors. These devices function via thermal interactions among neighboring vanadium dioxide resistive memories, emulating biological neuronal behavior. Here, we show that the collective dynamical behavior of networks of these neurons showcases a rich phase structure, tunable by adjusting the thermal coupling and input voltage. Notably, we identify phases exhibiting long-range order that, however, does not arise from criticality, but rather from the time non-local response of the system. In addition, we show that these thermal neuristor arrays achieve high accuracy in image recognition and time series prediction through reservoir computing, without leveraging long-range order. Our findings highlight a crucial aspect of neuromorphic computing with possible implications on the functioning of the brain: criticality may not be necessary for the efficient performance of neuromorphic systems in certain computational tasks.
Published: 2024

32. Understanding the Star Formation Efficiency in Dense Gas: Initial Results from the CAFFEINE Survey with ArT\'eMiS

Author: Mattern, M., André, Ph., Zavagno, A., Russeil, D., Roussel, H., Peretto, N., Schuller, F., Shimajiri, Y., Di Francesco, J., Arzoumanian, D., Revéret, V., and De Breuck, C.
Subjects: Astrophysics - Astrophysics of Galaxies
Abstract: Despite recent progress, the question of what regulates the star formation efficiency in galaxies remains one of the most debated problems in astrophysics. According to the dominant picture, star formation (SF) is regulated by turbulence and feedback, and the SFE is 1-2% per local free-fall time. In an alternate scenario, the SF rate in galactic disks is linearly proportional to the mass of dense gas above a critical density threshold. We aim to discriminate between these two pictures thanks to high-resolution observations tracing dense gas and young stellar objects (YSOs) for a comprehensive sample of 49 nearby massive SF complexes out to d < 3 kpc in the Galactic disk. We use data from CAFFEINE, a 350/450 $\mu$m survey with APEX/ArT\'eMiS of the densest portions of all southern molecular clouds, in combination with Herschel data to produce column density maps at 8" resolution. Our maps are free of saturation and resolve the structure of dense gas and the typical 0.1 pc width of molecular filaments at 3 kpc, which is impossible with Herschel data alone. Coupled with SFR estimates derived from Spitzer observations of the YSO content of the same clouds, this allows us to study the dependence of the SFE with density in the CAFFEINE clouds. We also combine our findings with existing SFE measurements in nearby clouds to extend our analysis down to lower column densities. Our results suggest that the SFE does not increase with density above the critical threshold and support a scenario in which the SFE in dense gas is approximately constant. However, the SFE measurements traced by Class I YSOs in nearby clouds are more inconclusive, since they are consistent with both the presence of a density threshold and a dependence on density above the threshold. Overall, we suggest that the SFE in dense gas is primarily governed by the physics of filament fragmentation into protostellar cores., Comment: In press; accepted: 18/05/2024
Published: 2024
Full Text: View/download PDF

33. Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding

Author: Gao, Rong, Liu, Xin, Xing, Bohao, Yu, Zitong, Schuller, Bjorn W., and Kälviäinen, Heikki
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we focus on a special group of human body language -- the micro-gesture (MG), which differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings. This characteristic introduces two novel challenges regarding micro-gestures that are worth rethinking. The first is whether strategies designed for other action recognition are entirely applicable to micro-gestures. The second is whether micro-gestures, as supplementary data, can provide additional insights for emotional understanding. In recognizing micro-gestures, we explored various augmentation strategies that take into account the subtle spatial and brief temporal characteristics of micro-gestures, often accompanied by repetitiveness, to determine more suitable augmentation methods. Considering the significance of temporal domain information for micro-gestures, we introduce a simple and efficient plug-and-play spatiotemporal balancing fusion method. We not only studied our method on the considered micro-gesture dataset but also conducted experiments on mainstream action datasets. The results show that our approach performs well in micro-gesture recognition and on other datasets, achieving state-of-the-art performance compared to previous micro-gesture recognition methods. For emotional understanding based on micro-gestures, we construct complex emotional reasoning scenarios. Our evaluation, conducted with large language models, shows that micro-gestures play a significant and positive role in enhancing comprehensive emotional understanding. The scenarios we developed can be extended to other micro-gesture-based tasks such as deception detection and interviews. We confirm that our new insights contribute to advancing research in micro-gesture and emotional artificial intelligence.
Published: 2024

34. Intelligent Cardiac Auscultation for Murmur Detection via Parallel-Attentive Models with Uncertainty Estimation

Author: Zhang, Zixing, Pang, Tao, Han, Jing, and Schuller, Björn W.
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Heart murmurs are a common manifestation of cardiovascular diseases and can provide crucial clues to early cardiac abnormalities. While most current research methods primarily focus on the accuracy of models, they often overlook other important aspects such as the interpretability of machine learning algorithms and the uncertainty of predictions. This paper introduces a heart murmur detection method based on a parallel-attentive model, which consists of two branches: One is based on a self-attention module and the other one is based on a convolutional network. Unlike traditional approaches, this structure is better equipped to handle long-term dependencies in sequential data, and thus effectively captures the local and global features of heart murmurs. Additionally, we acknowledge the significance of understanding the uncertainty of model predictions in the medical field for clinical decision-making. Therefore, we have incorporated an effective uncertainty estimation method based on Monte Carlo Dropout into our model. Furthermore, we have employed temperature scaling to calibrate the predictions of our probabilistic model, enhancing its reliability. In experiments conducted on the CirCor Digiscope dataset for heart murmur detection, our proposed method achieves a weighted accuracy of 79.8% and an F1 of 65.1%, representing state-of-the-art results.
Published: 2024

35. HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech

Author: Dong, Zhongren, Zhang, Zixing, Xu, Weixiang, Han, Jing, Ou, Jianjun, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying such models on edge devices. In this context, we construct a novel framework, namely Hierarchical Attention-Free Transformer (HAFFormer), to better deal with long speech for AD detection. Specifically, we employ an attention-free module of Multi-Scale Depthwise Convolution to replace the self-attention and thus avoid the expensive computation, and a GELU-based Gated Linear Unit to replace the feedforward layer, aiming to automatically filter out the redundant information. Moreover, we design a hierarchical structure to force it to learn a variety of information grains, from the frame level to the dialogue level. By conducting extensive experiments on the ADReSS-M dataset, the introduced HAFFormer can achieve competitive results (82.6% accuracy) with other recent work, but with significant computational complexity and model size reduction compared to the standard Transformer. This shows the efficiency of HAFFormer in dealing with long audio for AD detection.
Published: 2024

36. Expressivity and Speech Synthesis

Author: Triantafyllopoulos, Andreas and Schuller, Björn W.
Subjects: Computer Science - Computation and Language
Abstract: Imbuing machines with the ability to talk has been a longtime pursuit of artificial intelligence (AI) research. From the very beginning, the community has not only aimed to synthesise high-fidelity speech that accurately conveys the semantic meaning of an utterance, but also to colour it with inflections that cover the same range of affective expressions that humans are capable of. After many years of research, it appears that we are on the cusp of achieving this when it comes to single, isolated utterances. This unveils an abundance of potential avenues to explore when it comes to combining these single utterances with the aim of synthesising more complex, longer-term behaviours. In the present chapter, we outline the methodological advances that brought us so far and sketch out the ongoing efforts to reach that coveted next level of artificial expressivity. We also discuss the societal implications coupled with rapidly advancing expressive speech synthesis (ESS) technology and highlight ways to mitigate those risks and ensure the alignment of ESS capabilities with ethical norms., Comment: Invited contribution. Under review
Published: 2024

37. MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition

Author: Lian, Zheng, Sun, Haiyang, Sun, Licai, Wen, Zhuofan, Zhang, Siyuan, Chen, Shun, Gu, Hao, Zhao, Jinming, Ma, Ziyang, Chen, Xie, Yi, Jiangyan, Liu, Rui, Xu, Kele, Liu, Bin, Cambria, Erik, Zhao, Guoying, Schuller, Björn W., and Tao, Jianhua
Subjects: Computer Science - Machine Learning, Computer Science - Human-Computer Interaction
Abstract: Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing the dataset size and building more effective algorithms. However, due to problems such as complex environments and inaccurate annotations, current systems are hard to meet the demands of practical applications. Therefore, we organize the MER series of competitions to promote the development of this field. Last year, we launched MER2023, focusing on three interesting topics: multi-label learning, noise robustness, and semi-supervised learning. In this year's MER2024, besides expanding the dataset size, we further introduce a new track around open-vocabulary emotion recognition. The main purpose of this track is that existing datasets usually fix the label space and use majority voting to enhance the annotator consistency. However, this process may lead to inaccurate annotations, such as ignoring non-majority or non-candidate labels. In this track, we encourage participants to generate any number of labels in any category, aiming to describe emotional states as accurately as possible. Our baseline code relies on MERTools and is available at: https://github.com/zeroQiaoba/MERTools/tree/master/MER2024.
Published: 2024

38. Non-Invasive Suicide Risk Prediction Through Speech Analysis

Author: Amiriparian, Shahin, Gerczuk, Maurice, Lutz, Justina, Strube, Wolfgang, Papazova, Irina, Hasan, Alkomiet, Kathan, Alexander, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing, I.2
Abstract: The delayed access to specialized psychiatric assessments and care for patients at risk of suicidal tendencies in emergency departments creates a notable gap in timely intervention, hindering the provision of adequate mental health support during critical situations. To address this, we present a non-invasive, speech-based approach for automatic suicide risk assessment. For our study, we collected a novel speech recording dataset from $20$ patients. We extract three sets of features, including wav2vec, interpretable speech and acoustic features, and deep learning-based spectral representations. We proceed by conducting a binary classification to assess suicide risk in a leave-one-subject-out fashion. Our most effective speech model achieves a balanced accuracy of $66.2\,\%$. Moreover, we show that integrating our speech model with a series of patients' metadata, such as the history of suicide attempts or access to firearms, improves the overall result. The metadata integration yields a balanced accuracy of $94.4\,\%$, marking an absolute improvement of $28.2\,\%$, demonstrating the efficacy of our proposed approaches for automatic suicide risk assessment in emergency medicine.
Published: 2024

39. Generation of relativistic electrons at the termination shock in the solar flare region

Author: Mann, G., Veronig, A. M., and Schuller, F.
Subjects: Astrophysics - Solar and Stellar Astrophysics, Astrophysics - High Energy Astrophysical Phenomena
Abstract: Solar flares are accompanied by an enhanced emission of electromagnetic waves from the radio up to the gamma-ray range. The associated hard X-ray (HXR) and microwave radiation is generated by energetic electrons, which carry a substantial part of the energy released during a flare. The flare is generally understood as a manifestation of magnetic reconnection in the corona. The so-called standard CSHKP model is one of the most widely accepted models for eruptive flares. The solar flare on September 10, 2017 offers a unique opportunity to study this model. The observations from the Expanded Owens Valley Solar Array (EOVSA) show that 1.6x10^4 electrons with energies >300 keV were generated in the flare region. There are signatures in solar radio and extreme ultraviolet observations as well as numerical simulations that a termination shock (TS) appears in the magnetic reconnection outflow region. Electrons accelerated at the TS can be considered to generate the loop-top HXR sources. In contrast to previous studies, we investigate whether the heating of the plasma at the TS provides enough relativistic electrons needed for the HXR and microwave emission observed during the X8.2 solar flare on September 10, 2017. We studied the heating of the plasma at the TS by evaluating the jump in the temperature across the shock by means of the Rankine-Hugoniot relationships under coronal circumstances measured during that event. The part of relativistic electrons was calculated in the heated downstream region. In the magnetic reconnection outflow region, the plasma is strongly heated at the TS. Thus, there are enough energetic electrons in the tail of the electron distribution function needed for the microwave and HXR emission observed during that event. The generation of relativistic electrons at the TS is a possible mechanism to explain the enhanced microwave and HXR radiation emitted during flares., Comment: Accepted for publication in A&A
Published: 2024

40. The sounds of science a symphony for many instruments and voices part II

Author: Hooft, Gerard t, Phillips, William D, Zeilinger, Anton, Allen, Roland, Baggott, Jim, Bouchet, Francois R, Cantanhede, Solange M G, Castanedo, Lazaro A M, Cetto, Ana Maria, Coley, Alan A, Dalton, Bryan J, Fahimi, Peyman, Franks, Sharon, Frano, Alex, Fry, Edward S, Goldfarb, Steven, Langanke, Karlheinz, Matta, Cherif F, Nanopoulos, Dimitri, Orzel, Chad, Patrick, Sam, Sanghai, Viraj A A, Schuller, Ivan K, Shpyrko, Oleg, and Lidstrom, Suzy
Subjects: Physics - Physics and Society
Abstract: Despite its amazing quantitative successes and contributions to revolutionary technologies, physics currently faces many unsolved mysteries ranging from the meaning of quantum mechanics to the nature of the dark energy that will determine the future of the Universe. It is clearly prohibitive for the general reader, and even the best informed physicists, to follow the vast number of technical papers published in the thousands of specialized journals. For this reason, we have asked the leading experts across many of the most important areas of physics to summarise their global assessment of some of the most important issues. In lieu of an extremely long abstract summarising the contents, we invite the reader to look at the section headings and their authors, and then to indulge in a feast of stimulating topics spanning the current frontiers of fundamental physics from The Future of Physics by William D Phillips and What characterises topological effects in physics? by Gerard t Hooft through the contributions of the widest imaginable range of world leaders in their respective areas. This paper is presented as a preface to exciting developments by senior and young scientists in the years that lie ahead, and a complement to the less authoritative popular accounts by journalists., Comment: 54 pages, 13 figures
Published: 2024
Full Text: View/download PDF

41. emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition

Author: Rajapakshe, Thejan, Rana, Rajib, Khalifa, Sara, Sisman, Berrak, Schuller, Bjorn W., and Busso, Carlos
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech Emotion Recognition (SER) is crucial for enabling computers to understand the emotions conveyed in human communication. With recent advancements in Deep Learning (DL), the performance of SER models has significantly improved. However, designing an optimal DL architecture requires specialised knowledge and experimental assessments. Fortunately, Neural Architecture Search (NAS) provides a potential solution for automatically determining the best DL model. The Differentiable Architecture Search (DARTS) is a particularly efficient method for discovering optimal models. This study presents emoDARTS, a DARTS-optimised joint CNN and Sequential Neural Network (SeqNN: LSTM, RNN) architecture that enhances SER performance. The literature supports the selection of CNN and LSTM coupling to improve performance. While DARTS has previously been used to choose CNN and LSTM operations independently, our technique adds a novel mechanism for selecting CNN and SeqNN operations in conjunction using DARTS. Unlike earlier work, we do not impose limits on the layer order of the CNN. Instead, we let DARTS choose the best layer order inside the DARTS cell. We demonstrate that emoDARTS outperforms conventionally designed CNN-LSTM models and surpasses the best-reported SER results achieved through DARTS on CNN-LSTM by evaluating our approach on the IEMOCAP, MSP-IMPROV, and MSP-Podcast datasets., Comment: Submitted to IEEE Transactions on Affective Computing on February 19, 2024. arXiv admin note: text overlap with arXiv:2305.14402
Published: 2024
Full Text: View/download PDF

42. On Prompt Sensitivity of ChatGPT in Affective Computing

Author: Amin, Mostafa M. and Schuller, Björn W.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent studies have demonstrated the emerging capabilities of foundation models like ChatGPT in several fields, including affective computing. However, accessing these emerging capabilities is facilitated through prompt engineering. Despite the existence of some prompting techniques, the field is still rapidly evolving and many prompting ideas still require investigation. In this work, we introduce a method to evaluate and investigate the sensitivity of the performance of foundation models based on different prompts or generation parameters. We perform our evaluation on ChatGPT within the scope of affective computing on three major problems, namely sentiment analysis, toxicity detection, and sarcasm detection. First, we carry out a sensitivity analysis on pivotal parameters in auto-regressive text generation, specifically the temperature parameter $T$ and the top-$p$ parameter in Nucleus sampling, dictating how conservative or creative the model should be during generation. Furthermore, we explore the efficacy of several prompting ideas, where we explore how giving different incentives or structures affect the performance. Our evaluation takes into consideration performance measures on the affective computing tasks, and the effectiveness of the model to follow the stated instructions, hence generating easy-to-parse responses to be smoothly used in downstream applications., Comment: 2 Tables, 1 Figure, preprint submission to ACII 2024
Published: 2024

43. The solar cycle 25 multi-spacecraft solar energetic particle event catalog of the SERPENTINE project

Author: Dresing, N., Yli-Laurila, A., Valkila, S., Gieseler, J., Morosan, D. E., Farwa, G. U., Kartavykh, Y., Palmroos, C., Jebaraj, I., Jensen, S., Kühl, P., Heber, B., Espinosa, F., Gómez-Herrero, R., Kilpua, E., Linho, V. -V., Oleynik, P., Hayes, L. A., Warmuth, A., Schuller, F., Collier, H., Xiao, H., Asvestari, E., Trotta, D., Mitchell, J. G., Cohen, C. M. S., Labrador, A. W., Hill, M. E., and Vainio, R.
Subjects: Physics - Space Physics, Astrophysics - Solar and Stellar Astrophysics
Abstract: The Solar energetic particle analysis platform for the inner heliosphere (SERPENTINE) project presents it's new multi-spacecraft SEP event catalog for events observed in solar cycle 25. Observations from five different viewpoints are utilized, provided by Solar Orbiter, Parker Solar Probe, STEREO A, BepiColombo, and the near-Earth spacecraft Wind and SOHO. The catalog contains key SEP parameters for 25-40 MeV protons, 1 MeV electrons, and 100 keV electrons. Furthermore, basic parameters of the associated flare and type-II radio burst are listed, as well as the coordinates of the observer and solar source locations. SEP onset times are determined using the Poisson-CUSUM method. SEP peak times and intensities refer to the global intensity maximum. If different viewing directions are available, we use the one with the earliest onset for the onset determination and the one with the highest peak intensity for the peak identification. Associated flares are identified using observations from near Earth and Solar Orbiter. Associated type II radio bursts are determined from ground-based observations in the metric frequency range and from spacecraft observations in the decametric range. The current version of the catalog contains 45 multi-spacecraft events observed in the period from Nov 2020 until May 2023, of which 13 were widespread events and four were classified as narrow-spread events. Using X-ray observations by GOES/XRS and Solar Orbiter/STIX, we were able to identify the associated flare in all but four events. Using ground-based and space-borne radio observations, we found an associated type-II radio burst for 40 events. In total, the catalog contains 142 single event observations, of which 20 (45) have been observed at radial distances below 0.6 AU (0.8 AU).
Published: 2024
Full Text: View/download PDF

44. Challenges in Observing the Emotions of Children with Autism Interacting with a Social Robot

Author: Erol Barkana, Duygun, Bartl-Pokorny, Katrin D., Kose, Hatice, Landowska, Agnieszka, Milling, Manuel, Robins, Ben, Schuller, Björn W., Uluer, Pinar, Wrobel, Michal R., and Zorcec, Tatjana
Published: 2024
Full Text: View/download PDF

45. Near-Field Mixing in a Coaxial Dual Swirled Injector

Author: Marragou, Sylvain, Guiberti, Thibault Frédéric, Poinsot, Thierry, and Schuller, Thierry
Published: 2024
Full Text: View/download PDF

46. Systematic perturbation screens identify regulators of inflammatory macrophage states and a role for TNF mRNA m6A modification

Author: Haag, Simone M., Xie, Shiqi, Eidenschenk, Celine, Fortin, Jean-Philippe, Callow, Marinella, Costa, Mike, Lun, Aaron, Cox, Chris, Wu, Sunny Z., Pradhan, Rachana N., Lock, Jaclyn, Kuhn, Julia A., Holokai, Loryn, Thai, Minh, Freund, Emily, Nissenbaum, Ariane, Keir, Mary, Bohlen, Christopher J., Martin, Scott, Geiger-Schuller, Kathryn, Hejase, Hussein A., Yaspan, Brian L., Melo Carlos, Sandra, Turley, Shannon J., and Murthy, Aditya
Published: 2024
Full Text: View/download PDF

47. Paediatric e-scooter riders at high risk of life-threatening traffic accidents

Author: Schuller, Andrea, Hohensteiner, Anna, Sator, Thomas, Pichler, Lorenz, Jaindl, Manuela, Schwendenwein, Elisabeth, Tiefenboeck, Thomas Manfred, and Payr, Stephan
Published: 2024
Full Text: View/download PDF

48. Towards Equitable Agile Research and Development of AI and Robotics

Author: Hundt, Andrew, Schuller, Julia, and Kacianka, Severin
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Software Engineering
Abstract: Machine Learning (ML) and 'Artificial Intelligence' ('AI') methods tend to replicate and amplify existing biases and prejudices, as do Robots with AI. For example, robots with facial recognition have failed to identify Black Women as human, while others have categorized people, such as Black Men, as criminals based on appearance alone. A 'culture of modularity' means harms are perceived as 'out of scope', or someone else's responsibility, throughout employment positions in the 'AI supply chain'. Incidents are routine enough (incidentdatabase.ai lists over 2000 examples) to indicate that few organizations are capable of completely respecting peoples' rights; meeting claimed equity, diversity, and inclusion (EDI or DEI) goals; or recognizing and then addressing such failures in their organizations and artifacts. We propose a framework for adapting widely practiced Research and Development (R&D) project management methodologies to build organizational equity capabilities and better integrate known evidence-based best practices. We describe how project teams can organize and operationalize the most promising practices, skill sets, organizational cultures, and methods to detect and address rights-based fairness, equity, accountability, and ethical problems as early as possible when they are often less harmful and easier to mitigate; then monitor for unforeseen incidents to adaptively and constructively address them. Our primary example adapts an Agile development process based on Scrum, one of the most widely adopted approaches to organizing R&D teams. We also discuss limitations of our proposed framework and future research directions., Comment: 15 pages (32 with refs + appendix), 2 figures, 1 table (7 with appendix), incorporates changes based on WeRobot 2023 Draft feedback
Published: 2024

49. Energetic particle contamination in STIX during Solar Orbiter's passage through Earth's radiation belts and an interplanetary shock

Author: Collier, Hannah, Limousin, Olivier, Xiao, Hualin, Claret, Arnaud, Schuller, Frederic, Dresing, Nina, Valkila, Saku, Lara, Francisco Espinosa, Fedeli, Annamaria, Foucambert, Simon, and Krucker, Säm
Subjects: Astrophysics - Solar and Stellar Astrophysics, Nuclear Experiment, Physics - Space Physics
Abstract: The Spectrometer/Telescope for Imaging X-rays (STIX) is a hard X-ray imaging spectrometer on board the ESA and NASA heliospheric mission Solar Orbiter. STIX has been operational for three years and has observed X-ray emission from ~35,000 solar flares. Throughout its lifetime, Solar Orbiter has been frequently struck by a high flux of energetic particles usually of flare origin, or from coronal mass ejection shocks. These Solar Energetic Particles (SEPs) are detected on board by the purpose-built energetic particle detector instrument suite. During SEP events, the X-ray signal is also contaminated in STIX. This work investigates the effect of these particles on the STIX instrument for two events. The first event occurred during an interplanetary shock crossing and the second event occurred when Solar Orbiter passed through Earth's radiation belts while performing a gravity assist maneuver. The induced spectra consist of tungsten fluorescence emission lines and secondary Bremsstrahlung emission produced by incident particles interacting with spacecraft components. For these two events, we identify > 100 keV electrons as significant contributors to the contamination via Bremsstrahlung emission and tungsten fluorescence., Comment: 8 pages, 11 figures, accepted by IEEE TNS
Published: 2024
Full Text: View/download PDF

50. STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Author: Chang, Yi, Ren, Zhao, Zhang, Zixing, Jing, Xin, Qian, Kun, Shao, Xi, Hu, Bin, Schultz, Tanja, and Schuller, Björn W.
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction. The robustness of SER models is crucial, particularly in privacy-sensitive and reliability-demanding domains like private healthcare. Recently, the vulnerability of deep neural networks in the audio domain to adversarial attacks has become a popular area of research. However, prior works on adversarial attacks in the audio domain primarily rely on iterative gradient-based techniques, which are time-consuming and prone to overfitting the specific threat model. Furthermore, the exploration of sparse perturbations, which have the potential for better stealthiness, remains limited in the audio domain. To address these challenges, we propose a generator-based attack method to generate sparse and transferable adversarial examples to deceive SER models in an end-to-end and efficient manner. We evaluate our method on two widely-used SER datasets, Database of Elicited Mood in Speech (DEMoS) and Interactive Emotional dyadic MOtion CAPture (IEMOCAP), and demonstrate its ability to generate successful sparse adversarial examples in an efficient manner. Moreover, our generated adversarial examples exhibit model-agnostic transferability, enabling effective adversarial attacks on advanced victim models.
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

29,410 results on '"Schuller A"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources