152 results on '"Volker Hohmann"'
Search Results
2. The future of hearing aid technology
- Author
-
Volker Hohmann
- Subjects
Issues, ethics and legal aspects ,Health (social science) ,Geriatrics and Gerontology ,Gerontology - Published
- 2023
3. Making sense of periodicity glimpses in a prediction-update-loop—A computational model of attentive voice tracking
- Author
-
Joanna Luberadzka, Hendrik Kayser, and Volker Hohmann
- Subjects
Periodicity ,Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,Speech Perception ,Voice ,Humans ,Bayes Theorem ,Computer Simulation ,Acoustics ,Psychological and Physiological Acoustics ,Speech Acoustics - Abstract
Humans are able to follow a speaker even in challenging acoustic conditions. The perceptual mechanisms underlying this ability remain unclear. A computational model of attentive voice tracking, consisting of four computational blocks: (1) sparse periodicity-based auditory features (sPAF) extraction, (2) foreground-background segregation, (3) state estimation, and (4) top-down knowledge, is presented. The model connects the theories about auditory glimpses, foreground-background segregation, and Bayesian inference. It is implemented with the sPAF, sequential Monte Carlo sampling, and probabilistic voice models. The model is evaluated by comparing it with the human data obtained in the study by Woods and McDermott [Curr. Biol. 25(17), 2238–2246 (2015)], which measured the ability to track one of two competing voices with time-varying parameters [fundamental frequency ( F0) and formants ( F1, F2)]. Three model versions were tested, which differ in the type of information used for the segregation: version (a) uses the oracle F0, version (b) uses the estimated F0, and version (c) uses the spectral shape derived from the estimated F0 and oracle F1 and F2. Version (a) simulates the optimal human performance in conditions with the largest separation between the voices, version (b) simulates the conditions in which the separation in not sufficient to follow the voices, and version (c) is closest to the human performance for moderate voice separation.
- Published
- 2022
4. A Step towards Neuro-Steered Hearing Aids: Integrated Portable Setup for Time- Synchronized Acoustic Stimuli Presentation and EEG Recording
- Author
-
Steffen Dasenbrock, Sarah Blum, Stefan Debener, Volker Hohmann, and Hendrik Kayser
- Subjects
around-theear eeg ,portable setup ,openmha ,ceegrid ,Biomedical Engineering ,open master hearing aid ,auditory oddball ,Medicine ,portable hearing laboratory ,eeg ,hearing aids - Abstract
Aiming to provide a portable research platform to develop algorithms for neuro-steered hearing aids, a joint hearing aid - EEG measurement setup was implemented in this work. The setup combines the miniaturized electroencephalography sensor technology cEEGrid with a portable hearing aid research platform - the Portable Hearing Laboratory. The different components of the system are connected wirelessly, using the lab streaming layer framework for synchronization of audio and EEG data streams. Our setup was shown to be suitable for simultaneous recording of audio and EEG signals used in a pilot study (n=5) to perform an auditory Oddball experiment. The analysis showed that the setup can reliably capture typical event-related potential responses. Furthermore, linear discriminant analysis was successfully applied for single-trial classification of P300 responses. The study showed that time-synchronized audio and EEG data acquisition is possible with the Portable Hearing Laboratory research platform.
- Published
- 2021
5. Challenging Times for Cochlear Implant Users - Effect of Face Masks on Audiovisual Speech Understanding during the COVID-19 Pandemic
- Author
-
Rasmus Sönnichsen, Gerard Llorach Tó, Volker Hohmann, Sabine Hochmuth, and Andreas Radeloff
- Subjects
Speech and Hearing ,Cochlear Implants ,Otorhinolaryngology ,Speech Intelligibility ,Masks ,Speech Perception ,Humans ,COVID-19 ,Pandemics - Abstract
Unhindered auditory and visual signals are essential for a sufficient speech understanding of cochlear implant (CI) users. Face masks are an important hygiene measurement against the COVID-19 virus but disrupt these signals. This study determinates the extent and the mechanisms of speech intelligibility alteration in CI users caused by different face masks. The audiovisual German matrix sentence test was used to determine speech reception thresholds (SRT) in noise in different conditions (audiovisual, audio-only, speechreading and masked audiovisual using two different face masks). Thirty-seven CI users and ten normal-hearing listeners (NH) were included. CI users showed a reduction in speech reception threshold of 5.0 dB due to surgical mask and 6.5 dB due to FFP2 mask compared to the audiovisual condition without mask. The greater proportion of reduction in SRT by mask could be accounted for by the loss of the visual signal (up to 4.5 dB). The effect of each mask was significantly larger in CI users who exclusively hear with their CI (surgical: 7.8 dB, p = 0.005 and FFP2: 8.7 dB, p = 0.01) compared to NH (surgical: 3.8 dB and FFP2: 5.1 dB). This study confirms that CI users who exclusively rely on their CI for hearing are particularly susceptible. Therefore, visual signals should be made accessible for communication whenever possible, especially when communicating with CI users.
- Published
- 2022
6. Enhancement of Hearing Aid Processing Via Spatial Spectro-Temporal Post-Filtering with a Prototype Eyeglass-Integrated Array
- Author
-
Marcos A. Cantu and Volker Hohmann
- Published
- 2022
7. Development and evaluation of video recordings for the OLSA matrix sentence test
- Author
-
Kirsten C. Wagener, Volker Hohmann, Gerard Llorach, Giso Grimm, Melanie A. Zokoll, and Frederike Kirschner
- Subjects
Linguistics and Language ,medicine.medical_specialty ,Computer science ,Speech recognition ,Video Recording ,Audiovisual perception ,Audiology ,Language and Linguistics ,German ,03 medical and health sciences ,Speech and Hearing ,0302 clinical medicine ,Audio and Speech Processing (eess.AS) ,ComputerApplications_MISCELLANEOUS ,FOS: Electrical engineering, electronic engineering, information engineering ,medicine ,Humans ,030223 otorhinolaryngology ,Speechreading ,Speech Reception Threshold Test ,Image and Video Processing (eess.IV) ,Speech Intelligibility ,Matrix (music) ,Electrical Engineering and Systems Science - Image and Video Processing ,language.human_language ,Test (assessment) ,Speech Perception ,language ,Female ,Noise ,030217 neurology & neurosurgery ,Sentence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
One of the established multi-lingual methods for testing speech intelligibility is the matrix sentence test (MST). Most versions of this test are designed with audio-only stimuli. Nevertheless, visual cues play an important role in speech intelligibility, mostly making it easier to understand speech by speechreading. In this work we present the creation and evaluation of dubbed videos for the Oldenburger female MST (OLSA). 28 normal-hearing participants completed test and retest sessions with conditions including audio and visual modalities, speech in quiet and noise, and open and closed-set response formats. The levels to reach 80% sentence intelligibility were measured adaptively for the different conditions. In quiet, the audiovisual benefit compared to audio-only was 7 dB in sound pressure level (SPL). In noise, the audiovisual benefit was 5 dB in signal-to-noise ratio (SNR). Speechreading scores ranged from 0% to 84% speech reception in visual-only sentences, with an average of 50% across participants. This large variability in speechreading abilities was reflected in the audiovisual speech reception thresholds (SRTs), which had a larger standard deviation than the audio-only SRTs. Training and learning effects in audiovisual sentences were found: participants improved their SRTs by approximately 3 dB SNR after 5 trials. Participants retained their best scores on a separate retest session and further improved their SRTs by approx. -1.5 dB., Comment: 10 pages, 9 figures
- Published
- 2021
8. Low-delay interactive rendering of virtual acoustic environments with extensions for distributed low-delay transmission of audio and bio-physical sensor data
- Author
-
Giso Grimm, Angelika Kothe, and Volker Hohmann
- Subjects
Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) - Abstract
In this study, we present a system that enables low-delay rendering of interactive virtual acoustics. The tool operates in the time domain based on a physical sound propagation model with basic room acoustic modelling and a block-wise update and interpolation of the environment geometry. During the pandemic, the tool was extended by low-delay network-transmission of audio and sensor data, e.g., from motion sensors or bio-physical sensors such as EEG. With this extension, distributed rendering of turn-taking conversations as well as ensemble music performances with individual head-tracked binaural rendering and interactive movement of directional sources is possible. Interactive communication requires a low time delay in sound transmission, which is particularly critical for musical communication, where the upper limit of tolerable delay is between 30 and 50 ms, depending on the genre. Our system can achieve latencies between 7 (dedicated local network) and 100 ms (intercontinental connection), with typical values of 25–40 ms. This is far below the delay achieved by typical video-conferencing tools and is sufficient for fluent speech communicationand music applications. In addition to a technical description of the system, we show here example measurement data of head motion behaviour in a distributed triadic conversation.
- Published
- 2023
9. Der Einfluss von Gesichtsmasken auf das Sprachverstehen von Cochlea-Implantat-Patienten
- Author
-
Rasmus Sönnichsen, Tó Gerard Llorach, Sabine Hochmuth, Volker Hohmann, and Andreas Radeloff
- Published
- 2022
10. The influence of face masks on speech intelligibility of cochlear implant users
- Author
-
Rasmus Sönnichsen, Gerard Llorach Tó, Sabine Hochmuth, Volker Hohmann, and Andreas Radeloff
- Published
- 2022
11. Open community platform for hearing aid algorithm research: open Master Hearing Aid (openMHA)
- Author
-
Hendrik Kayser, Tobias Herzke, Paul Maanen, Max Zimmermann, Giso Grimm, and Volker Hohmann
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Hearing aids ,Audiological research ,Sound (cs.SD) ,Real-time audio signal processing ,Hearing aids, Real-time audio signal processing, Audiological research ,01 natural sciences ,Computer Science - Sound ,03 medical and health sciences ,QA76.75-76.765 ,0302 clinical medicine ,Audio and Speech Processing (eess.AS) ,0103 physical sciences ,FOS: Electrical engineering, electronic engineering, information engineering ,Computer software ,Electrical Engineering and Systems Science - Signal Processing ,030223 otorhinolaryngology ,010301 acoustics ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
open Master Hearing Aid (openMHA) was developed and provided to the hearing aid research community as an open-source software platform with the aim to support sustainable and reproducible research towards improvement and new types of assistive hearing systems not limited by proprietary software. The software offers a flexible framework that allows the users to conduct hearing aid research using tools and a number of signal processing plugins provided with the software as well as the implementation of own methods. The openMHA software is independent of a specific hardware and supports Linux, macOS and Windows operating systems as well as 32-bit and 64-bit ARM-based architectures such as used in small portable integrated systems. www.openmha.org, 10 pages, 5 figures
- Published
- 2022
12. Self-motion with Hearing Impairment and (Directional) Hearing Aids
- Author
-
Maartje M E, Hendrikse, Theda, Eichler, Volker, Hohmann, and Giso, Grimm
- Subjects
Hearing Aids ,Speech Perception ,Humans ,Sound Localization ,Hearing Loss ,Noise - Abstract
When listening to a sound source in everyday situations, typical movement behavior is highly individual and may not result in the listener directly facing the sound source. Behavioral differences can affect the performance of directional algorithms in hearing aids, as was shown in previous work by using head movement trajectories of normal-hearing (NH) listeners in acoustic simulations for noise-suppression performance predictions. However, the movement behavior of hearing-impaired (HI) listeners with or without hearing aids may differ, and hearing-aid users might adapt their self-motion to improve the performance of directional algorithms. This work investigates the influence of hearing impairment on self-motion, and the interaction of hearing aids with self-motion. In order to do this, the self-motion of three HI participant groups----aided with an adaptive differential microphone (ADM), aided without ADM, and unaided-was measured and compared to previously measured self-motion data from younger and older NH participants. Self-motion was measured in virtual audiovisual environments (VEs) in the laboratory, and the signal-to-noise ratios (SNRs) and SNR improvement of the ADM resulting from the head movements of the participants were estimated using acoustic simulations. HI participants did almost all of the movement with their head and less with their eyes compared to NH participants, which led to a 0.3 dB increase in estimated SNR and to differences in estimated SNR improvement of the ADM. However, the self-motion of the HI participants aided with ADM was similar to that of other HI participants, indicating that the ADM did not cause listeners to adapt their self-motion.
- Published
- 2022
13. Synchronization of ear-EEG and audio streams in a portable research hearing device
- Author
-
Steffen, Dasenbrock, Sarah, Blum, Paul, Maanen, Stefan, Debener, Volker, Hohmann, and Hendrik, Kayser
- Abstract
Recent advancements in neuroscientific research and miniaturized ear-electroencephalography (EEG) technologies have led to the idea of employing brain signals as additional input to hearing aid algorithms. The information acquired through EEG could potentially be used to control the audio signal processing of the hearing aid or to monitor communication-related physiological factors. In previous work, we implemented a research platform to develop methods that utilize EEG in combination with a hearing device. The setup combines currently available mobile EEG hardware and the so-called Portable Hearing Laboratory (PHL), which can fully replicate a complete hearing aid. Audio and EEG data are synchronized using the Lab Streaming Layer (LSL) framework. In this study, we evaluated the setup in three scenarios focusing particularly on the alignment of audio and EEG data. In Scenario I, we measured the latency between software event markers and actual audio playback of the PHL. In Scenario II, we measured the latency between an analog input signal and the sampled data stream of the EEG system. In Scenario III, we measured the latency in the whole setup as it would be used in a real EEG experiment. The results of Scenario I showed a jitter (standard deviation of trial latencies) of below 0.1 ms. The jitter in Scenarios II and III was around 3 ms in both cases. The results suggest that the increased jitter compared to Scenario I can be attributed to the EEG system. Overall, the findings show that the measurement setup can time-accurately present acoustic stimuli while generating LSL data streams over multiple hours of playback. Further, the setup can capture the audio and EEG LSL streams with sufficient temporal accuracy to extract event-related potentials from EEG signals. We conclude that our setup is suitable for studying closed-loop EEGaudio applications for future hearing aids.
- Published
- 2022
14. The Virtual Reality Lab: Realization and Application of Virtual Sound Environments
- Author
-
Volker Hohmann, Markus Meis, Richard Paluch, Melanie Krueger, and Giso Grimm
- Subjects
Ecological validity ,Hearing aids ,Computer science ,media_common.quotation_subject ,Realization (linguistics) ,Virtual reality ,01 natural sciences ,Field (computer science) ,03 medical and health sciences ,Speech and Hearing ,User-Computer Interface ,0302 clinical medicine ,Human–computer interaction ,Factor (programming language) ,Perception ,0103 physical sciences ,Humans ,030223 otorhinolaryngology ,Function (engineering) ,010301 acoustics ,computer.programming_language ,media_common ,Audiovisual environments ,Hearing Tests ,Virtual Reality ,Audiology ,Hearing loss ,Acoustics ,Hearing acoustics ,Eriksholm Workshop: Ecological Validity ,Sound ,Otorhinolaryngology ,State (computer science) ,Comprehension ,computer - Abstract
To assess perception with and performance of modern and future hearing devices with advanced adaptive signal processing capabilities, novel evaluation methods are required that go beyond already established methods. These novel methods will simulate to a certain extent the complexity and variability of acoustic conditions and acoustic communication styles in real life. This article discusses the current state and the perspectives of virtual reality technology use in the lab for designing complex audiovisual communication environments for hearing assessment and hearing device design and evaluation. In an effort to increase the ecological validity of lab experiments, that is, to increase the degree to which lab data reflect real-life hearing-related function, and to support the development of improved hearing-related procedures and interventions, this virtual reality lab marks a transition from conventional (audio-only) lab experiments to the field. The first part of the article introduces and discusses the notion of the communication loop as a theoretical basis for understanding the factors that are relevant for acoustic communication in real life. From this, requirements are derived that allow an assessment of the extent to which a virtual reality lab reflects these factors, and which may be used as a proxy for ecological validity. The most important factor of real-life communication identified is a closed communication loop among the actively behaving participants. The second part of the article gives an overview of the current developments towards a virtual reality lab at Oldenburg University that aims at interactive and reproducible testing of subjects with and without hearing devices in challenging communication conditions. The extent to which the virtual reality lab in its current state meets the requirements defined in the first part is discussed, along with its limitations and potential further developments. Finally, data are presented from a qualitative study that compared subject behavior and performance in two audiovisual environments presented in the virtual reality lab-a street and a cafeteria-with the corresponding field environments. The results show similarities and differences in subject behavior and performance between the lab and the field, indicating that the virtual reality lab in its current state marks a step towards more ecological validity in lab-based hearing and hearing device research, but requires further development towards higher levels of ecological validity.
- Published
- 2020
15. How Face Masks Interfere With Speech Understanding of Normal-Hearing Individuals: Vision Makes the Difference
- Author
-
Rasmus Sönnichsen, Gerard Llorach Tó, Sabine Hochmuth, Volker Hohmann, and Andreas Radeloff
- Subjects
Male ,Otorhinolaryngology ,Hearing ,Speech Intelligibility ,Masks ,Speech Perception ,Humans ,Female ,Neurology (clinical) ,Prospective Studies ,Sensory Systems - Abstract
To investigate the effects of wearing a simulated mask on speech perception of normal-hearing subjects.Prospective cohort study.University hospital.Fifteen normal-hearing, native German speakers (8 female, 7 male).Different experimental conditions with and without simulated face masks using the audiovisual version of the female German Matrix test (Oldenburger Satztest, OLSA).Signal-to-noise ratio (SNR) at speech intelligibility of 80%.The SNR at which 80% speech intelligibility was achieved deteriorated by a mean of 4.1 dB SNR when simulating a medical mask and by 5.1 dB SNR when simulating a cloth mask in comparison to the audiovisual condition without mask. Interestingly, the contribution of the visual component alone was 2.6 dB SNR and thus had a larger effect than the acoustic component in the medical mask condition.As expected, speech understanding with face masks was significantly worse than under control conditions. Thus, the speaker's use of face masks leads to a significant deterioration of speech understanding by the normal-hearing listener. The data suggest that these effects may play a role in many everyday situations that typically involve noise.
- Published
- 2022
16. Comment on the Point of View 'Ecological Validity, External Validity and Mundane Realism in Hearing Science'
- Author
-
Gitte Keidser, Graham Naylor, Douglas S. Brungart, Andreas Caduff, Jennifer Campos, Simon Carlile, Mark G. Carpenter, Giso Grimm, Volker Hohmann, Inga Holube, Stefan Launer, Thomas Lunner, Ravish Mehra, Frances Rapport, Malcolm Slaney, and Karolina Smeds
- Subjects
Speech and Hearing ,Hearing ,Otorhinolaryngology ,Hearing Tests ,Oto-rino-laryngologi ,Humans - Abstract
Funding: William Demant Foundation
- Published
- 2022
17. The Period-Modulated Harmonic Locked Loop (PM-HLL): A low-effort algorithm for rapid time-domain periodicity estimation
- Author
-
Volker Hohmann
- Subjects
Audio Signal Processing, Periodicity, Pitch, Fundamental Frequency Estimation, Harmonic-to-Noise Ratio, Hearing Acoustics - Abstract
This is a preprint of the manuscript "The Period-Modulated Harmonic Locked Loop (PM-HLL): A low-effort algorithm for rapid time-domain periodicity estimation" by Volker Hohmann, submitted for publication to Acta Acustica. A package with the scripts required to generate the simulations and figures from the study is available at 10.5281/zenodo.5040358. Thank you for downloading the manuscript. Your comments are very welcome! This manuscript is licensed under a CC BY-NC-SA 4.0 https://creativecommons.org/licenses/by-nc-sa/4.0/ license.
- Published
- 2021
- Full Text
- View/download PDF
18. Age Effects on Concurrent Speech Segregation by Onset Asynchrony
- Author
-
Maria V. Stuckenberg, Bernd Meyer, Christoph Völker, Alexandra Bendixen, Chaitra V. Nayak, and Volker Hohmann
- Subjects
Adult ,Male ,Auditory perception ,Linguistics and Language ,medicine.medical_specialty ,Hearing loss ,media_common.quotation_subject ,Audiology ,Speech Acoustics ,Language and Linguistics ,Young Adult ,Speech and Hearing ,Audiometry ,Perception ,otorhinolaryngologic diseases ,medicine ,Humans ,Active listening ,Young adult ,Aged ,media_common ,Analysis of Variance ,medicine.diagnostic_test ,Age Factors ,Auditory Threshold ,Electroencephalography ,Middle Aged ,Asynchrony (computer programming) ,Speech Perception ,Female ,Cues ,medicine.symptom ,Psychology - Abstract
Purpose For elderly listeners, it is more challenging to listen to 1 voice surrounded by other voices than for young listeners. This could be caused by a reduced ability to use acoustic cues—such as slight differences in onset time—for the segregation of concurrent speech signals. Here, we study whether the ability to benefit from onset asynchrony differs between young (18–33 years) and elderly (55–74 years) listeners. Method We investigated young (normal hearing, N = 20) and elderly (mildly hearing impaired, N = 26) listeners' ability to segregate 2 vowels with onset asynchronies ranging from 20 to 100 ms. Behavioral measures were complemented by a specific event-related brain potential component, the object-related negativity, indicating the perception of 2 distinct auditory objects. Results Elderly listeners' behavioral performance (identification accuracy of the 2 vowels) was considerably poorer than young listeners'. However, both age groups showed the same amount of improvement with increasing onset asynchrony. Object-related negativity amplitude also increased similarly in both age groups. Conclusion Both age groups benefit to a similar extent from onset asynchrony as a cue for concurrent speech segregation during active (behavioral measurement) and during passive (electroencephalographic measurement) listening.
- Published
- 2019
19. Einfluss des kontralateralen Hörens von unilateral versorgten CI-Trägern auf das Sprachverstehen in einer simulierten Restaurantumgebung: Helfen Richtmikrofon-Technologien?
- Author
-
A Aschendorff, Volker Hohmann, T Wesarg, Giso Grimm, and J Galindo Guerreros
- Published
- 2021
20. The Concurrent OLSA test: A method for speech recognition in multi-talker situations at fixed SNR
- Author
-
Jan Heeren, Theresa Nuesse, Matthias Latzel, Inga Holube, Volker Hohmann, Kirsten C. Wagener, and Michael Schulte
- Subjects
Speech and Hearing ,Otorhinolaryngology ,Hearing Tests ,Speech Perception ,FOS: Physical sciences ,Humans ,Speech ,Medical Physics (physics.med-ph) ,Signal-To-Noise Ratio ,Physics - Medical Physics ,Language - Abstract
A multi-talker paradigm is introduced that uses different attentional processes to adjust speech recognition scores with the goal to conduct measurements at high signal-to-noise ratios. The basic idea is to simulate a group conversation with three talkers and a participant. Talkers alternately speak sentences of the German matrix test OLSA. Each time a sentence begins with the name "Kerstin" (call sign), the participant is addressed and instructed to repeat the last words of all sentences from that talker, until another talker begins a sentence with "Kerstin". The alternation of the talkers is implemented with an adjustable overlap time that causes an overlap between the call sign "Kerstin" and the target words to be repeated. Thus, the two tasks of detecting "Kerstin" and repeating target words are to be processed at the same time as a dual task. The paradigm was tested with 22 young normal-hearing participants for three overlap times (0.6 s, 0.8 s, 1.0 s). Results for these overlap times show significant differences with median target word recognition scores of 88%, 82%, and 77%, respectively (including call sign and dual task effects). A comparison of the dual task with the corresponding single tasks suggests that the observed effects reflect an increased cognitive load., 23 pages, 5 figures
- Published
- 2021
21. Speaking with avatars - influence of social interaction on movement behavior in interactive hearing experiments
- Author
-
Volker Hohmann, Marie Hartwig, and Giso Grimm
- Subjects
Interactivity ,Speech recognition ,media_common.quotation_subject ,Saccade ,Conversation ,Noise (video) ,Virtual reality ,Psychology ,Gaze ,Motion (physics) ,Social relation ,media_common - Abstract
This study investigated to what extent social interaction influences the motion behavior of normal-hearing listeners in interactive hearing experiments involving audiovisual virtual reality. To answer this question, an experiment with eleven participants was performed, using two different levels of virtualization, two levels of interactivity, and two different noise levels. The task of the participants was either to communicate with two real or two virtual interlocutors (conditions ’real’ and ’active’) or to listen passively to a conversation between three virtual characters (condition ’passive’). During the experiment, the gaze, head and body motion of the participants was recorded. An analysis of variances showed that the gaze and saccade behavior does not differ between ’real’ and ’active’ conditions. Behavior was found to be different between ’active’ and ’passive’ conditions. For the head motion related parameters, such a significant effect was not found. A classifier was trained to distinguish between conditions based on motion and pose features. Performance was higher for the comparison between ’active’ and ’passive’ conditions than for the comparison between ’real’ and ’active’ conditions, indicating that the measured difference in motion behavior was not sufficient to distinguish between the ’real’ and the ’active’ condition.
- Published
- 2021
22. Hörstörungen und Hörgeräte
- Author
-
Volker Hohmann, Birger Kollmeier, and Giso Grimm
- Published
- 2021
23. Estimating fundamental frequency and formants based on periodicity glimpses: a deep learning approach
- Author
-
Joanna Luberadzka, Hendrik Kayser, and Volker Hohmann
- Subjects
Artificial neural network ,business.industry ,Computer science ,Deep learning ,Feature vector ,Speech recognition ,Acoustic space ,03 medical and health sciences ,0302 clinical medicine ,Formant ,medicine.anatomical_structure ,Video tracking ,medicine ,Auditory system ,Artificial intelligence ,030223 otorhinolaryngology ,Set (psychology) ,business ,030217 neurology & neurosurgery - Abstract
Despite many technological advances, hearing aids still amplify the background sounds together with the signal of interest. To understand how to process the acoustic information in an optimal way for a human listener, we have to understand why a healthy auditory system performs this task with such a great efficiency. Several studies show the importance of the so called auditory glimpses in decoding of the auditory scene. They are usually defined as time-frequency bins dominated by one source, which the auditory system may use to track this source in a crowded acoustic space. Josupeit et al. in [6]-[8] developed an algorithm inspired by these findings. It extracts the speech glimpses, defined as the salient tonal components of a sound mixture, called the sparse periodicity-based auditory features (sPAF).In this study, we investigated if sPAF can be used to estimate the instantaneous voice parameters: fundamental frequency F0 and formant frequencies F1 and F2. We used a supervised machine learning technique for finding the mapping between parameter and feature space. Using a formant synthesizer, we created a labeled data set containing instantaneous sPAF and the corresponding parameter values. We trained a deep neural network and evaluated the prediction performance of the learned model. The results showed that the sPAF represent the parameters of a single voice very well, which opens a possibility to use the sPAF for more complex scenarios of auditory object tracking.
- Published
- 2020
24. The Quest for Ecological Validity in Hearing Science: What It Is, Why It Matters, and How to Advance It
- Author
-
Graham Naylor, Douglas S. Brungart, Frances Rapport, Giso Grimm, Gitte Keidser, Ravish Mehra, Andreas Caduff, Karolina Smeds, Stefan Launer, Thomas Lunner, Inga Holube, Jennifer L. Campos, Simon Carlile, Volker Hohmann, Malcolm Slaney, and Mark G. Carpenter
- Subjects
Ecological validity ,Emerging technologies ,media_common.quotation_subject ,Applied psychology ,Amplification, Ecological validity, Field study, Hearing, Hearing science, Hybrid study, Laboratory study, Outcome domains, Research, Test variables ,Amplification ,01 natural sciences ,03 medical and health sciences ,Speech and Hearing ,Hearing Aids ,0302 clinical medicine ,Hearing ,International Classification of Functioning, Disability and Health ,0103 physical sciences ,Humans ,Active listening ,030223 otorhinolaryngology ,Everyday life ,Set (psychology) ,Function (engineering) ,010301 acoustics ,media_common ,Research ,Hybrid study ,Field study ,Test variables ,Hearing science ,Eriksholm Workshop: Ecological Validity ,Comprehension ,Otorhinolaryngology ,Research Design ,Laboratory study ,Outcome domains ,Auditory Perception ,Psychology - Abstract
Ecological validity is a relatively new concept in hearing science. It has been cited as relevant with increasing frequency in publications over the past 20 years, but without any formal conceptual basis or clear motive. The sixth Eriksholm Workshop was convened to develop a deeper understanding of the concept for the purpose of applying it in hearing research in a consistent and productive manner. Inspired by relevant debate within the field of psychology, and taking into account the World Health Organization's International Classification of Functioning, Disability, and Health framework, the attendees at the workshop reached a consensus on the following definition: "In hearing science, ecological validity refers to the degree to which research findings reflect real-life hearing-related function, activity, or participation." Four broad purposes for striving for greater ecological validity in hearing research were determined: A (Understanding) better understanding the role of hearing in everyday life; B (Development) supporting the development of improved procedures and interventions; C (Assessment) facilitating improved methods for assessing and predicting ability to accomplish real-world tasks; and D (Integration and Individualization) enabling more integrated and individualized care. Discussions considered the effects of variables and phenomena commonly present in hearing-related research on the level of ecological validity of outcomes, supported by examples from a few selected outcome domains and for different types of studies. Illustrated with examples, potential strategies were offered for promoting a high level of ecological validity in a study and for how to evaluate the level of ecological validity of a study. Areas in particular that could benefit from more research to advance ecological validity in hearing science include: (1) understanding the processes of hearing and communication in everyday listening situations, and specifically the factors that make listening difficult in everyday situations; (2) developing new test paradigms that include more than one person (e.g., to encompass the interactive nature of everyday communication) and that are integrative of other factors that interact with hearing in real-life function; (3) integrating new and emerging technologies (e.g., virtual reality) with established test methods; and (4) identifying the key variables and phenomena affecting the level of ecological validity to develop verifiable ways to increase ecological validity and derive a set of benchmarks to strive for.
- Published
- 2020
25. Making sense of periodicity glimpses in a prediction-update-loop - a computational model of attentive voice tracking
- Author
-
Joanna Luberadzka, Hendrik Kayser, and Volker Hohmann
- Subjects
Computational Auditory Scene Analysis ,Sequential Bayesian estimation ,Periodicity Glimpses - Abstract
Humans are able to follow a given speaker even in challenging acoustic conditions. The perceptual mechanisms underlying this ability remain unclear. In this study, we present a computational model of attentive voice tracking, consisting of four main computational blocks: A) sparse periodicity-based auditory feature extraction, B) foreground-background segregation, C) state estimation and D) top-down knowledge. Conceptually, the model brings together ideas related to auditory glimpses, foreground-background segregation and Bayesian inference. Algorithmically, it combines sparse periodicity features, sequential Monte Carlo sampling and probabilistic voice models. We evaluate the model by comparing it with the data obtained by listeners in the study of Woods and McDermott (2015), which measured the ability to track of one of two competing voices with time-varying parameters (fundamental frequency (F 0) and first two formants (F 1, F 2)). We simulate two experiments: Stream Segregation of Sources Varying in Just One Feature and Effect of Source Proximity. In both experiments, we test three model versions, which differ in the type of information used in the segregation stage: version 1 uses oracle F0, version 2 uses estimated F0 and version 3 uses spectral shape derived from estimated F0 and oracle F1 and F2. Version 1 simulates optimal human performance in the conditions with the largest separation between the voices, version 2 simulates conditions where the separation between the voices in not sufficient for humans to follow the voices, and version 3 is closest to human performance for moderate separation between the voices.
- Published
- 2020
- Full Text
- View/download PDF
26. Exploiting Periodicity Features for Joint Detection and DOA Estimation of Speech Sources Using Convolutional Neural Networks
- Author
-
Kamil Adiloglu, Volker Hohmann, Simon Doclo, and Reza Varzandeh
- Subjects
Voice activity detection ,Computer science ,business.industry ,Direction of arrival ,020206 networking & telecommunications ,Context (language use) ,Pattern recognition ,02 engineering and technology ,Convolutional neural network ,Signal ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Feature (computer vision) ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,0305 other medical science ,Representation (mathematics) ,business - Abstract
While many algorithms deal with direction of arrival (DOA) estimation and voice activity detection (VAD) as two separate tasks, only a small number of data-driven methods have addressed these two tasks jointly. In this paper, a multi-input single-output convolutional neural network (CNN) is proposed which exploits a novel feature combination for joint DOA estimation and VAD in the context of binaural hearing aids. In addition to the well-known generalized cross correlation with phase transform (GCC-PHAT) feature, the network uses an auditory-inspired feature called periodicity degree (PD), which provides a broadband representation of the periodic structure of the signal. The proposed CNN has been trained in a multi-conditional training scheme across different signal-to-noise ratios. Experimental results for a single-talker scenario in reverberant environments show that by exploiting the PD feature, the proposed CNN is able to distinguish speech from non-speech signal blocks, thereby outperforming the baseline CNN in terms of DOA estimation accuracy. In addition, the results show that the proposed method is able to adapt to different unseen acoustic conditions and background noises.
- Published
- 2020
27. Restoring Perceived Loudness for Listeners With Hearing Loss
- Author
-
Jens-E. Appell, Dirk Oetting, Volker Hohmann, Birger Kollmeier, Stephan D. Ewert, and Publica
- Subjects
Male ,medicine.medical_specialty ,Computer science ,Hearing loss ,Hearing Loss, Sensorineural ,Loudness Perception ,Audiology ,Monaural ,01 natural sciences ,Loudness ,03 medical and health sciences ,Speech and Hearing ,Hearing Aids ,0302 clinical medicine ,Narrowband ,0103 physical sciences ,Broadband ,medicine ,Humans ,030223 otorhinolaryngology ,010301 acoustics ,Aged ,Decibel ,Aged, 80 and over ,Equipment Design ,Loudness compensation ,Middle Aged ,Otorhinolaryngology ,Female ,medicine.symptom ,Binaural recording - Abstract
Objectives Normalizing perceived loudness is an important rationale for gain adjustments in hearing aids. It has been demonstrated that gains required for restoring normal loudness perception for monaural narrowband signals can lead to higher-than-normal loudness in listeners with hearing loss, particularly for binaural broadband presentation. The present study presents a binaural bandwidth-adaptive dynamic compressor (BBDC) that can apply different gains for narrow- and broadband signals. It was hypothesized that normal perceived loudness for a broad variety of signals could be restored for listeners with mild to moderate high-frequency hearing loss by applying individual signal-dependent gain corrections. Design Gains to normalize perceived loudness for narrowband stimuli were assessed in 15 listeners with mild to moderate high-frequency hearing loss using categorical loudness scaling. Gains for narrowband loudness compensation were calculated and applied in a standard compressor. Aided loudness functions for signals with different bandwidths were assessed. The deviation from the average normal-hearing loudness functions was used for gain correction in the BBDC. Aided loudness functions for narrow- and broadband signals with BBDC were then assessed. Gains for a 65 dB SPL speech-shaped noise of BBDC were compared with gains based on National Acoustic Laboratories' nonlinear fitting procedure version 2 (NAL-NL2). The perceived loudness for 20 real signals was compared to the average normal-hearing rating. Results The suggested BBDC showed close-to-normal loudness functions for binaural narrow- and broadband signals for the listeners with hearing loss. Normal loudness ratings were observed for the real-world test signals. The proposed gain reduction method resulted on average in similar gains as prescribed by NAL-NL2. However, substantial gain variations compared to NAL-NL2 were observed in the data for individual listeners. Gain corrections after narrowband loudness compensation showed large interindividual differences for binaural broadband signals. Some listeners required no further gain reduction for broadband signals; for others, gains in decibels were more than halved for binaural broadband signals. Conclusion The interindividual differences of the binaural broadband gain corrections indicate that relevant information for normalizing perceived loudness of binaural broadband signals cannot be inferred from monaural narrowband loudness functions. Over-amplification can be avoided if binaural broadband measurements are included in the fitting procedure. For listeners with a high binaural broadband gain correction factor, loudness compensation for narrowband and broadband stimuli cannot be achieved by compression algorithms that disregard the bandwidth of the input signals. The suggested BBDC includes individual binaural broadband corrections in a more appropriate way than threshold-based procedures.
- Published
- 2018
28. Modeling speech localization, talker identification, and word recognition in a multi-talker setting
- Author
-
Volker Hohmann and Angela Josupeit
- Subjects
Acoustics and Ultrasonics ,Computer science ,Acoustics ,Speech recognition ,media_common.quotation_subject ,Cognition ,01 natural sciences ,03 medical and health sciences ,Identification (information) ,0302 clinical medicine ,Arts and Humanities (miscellaneous) ,Salient ,Perception ,0103 physical sciences ,Word recognition ,Psychophysics ,Psychoacoustics ,010301 acoustics ,Binaural recording ,030217 neurology & neurosurgery ,media_common - Abstract
This study introduces a model for solving three different auditory tasks in a multi-talker setting: target localization, target identification, and word recognition. The model was used to simulate psychoacoustic data from a call-sign-based listening test involving multiple spatially separated talkers [Brungart and Simpson (2007). Percept. Psychophys. 69(1), 79-91]. The main characteristics of the model are (i) the extraction of salient auditory features ("glimpses") from the multi-talker signal and (ii) the use of a classification method that finds the best target hypothesis by comparing feature templates from clean target signals to the glimpses derived from the multi-talker mixture. The four features used were periodicity, periodic energy, and periodicity-based interaural time and level differences. The model results widely exceeded probability of chance for all subtasks and conditions, and generally coincided strongly with the subject data. This indicates that, despite their sparsity, glimpses provide sufficient information about a complex auditory scene. This also suggests that complex source superposition models may not be needed for auditory scene analysis. Instead, simple models of clean speech may be sufficient to decode even complex multi-talker scenes.
- Published
- 2017
29. Movement and Gaze Behavior in Virtual Audiovisual Listening Environments Resembling Everyday Life
- Author
-
Maartje M. E. Hendrikse, Gerard Llorach, Volker Hohmann, Giso Grimm
- Published
- 2019
- Full Text
- View/download PDF
30. Auditory Model-Based Dynamic Compression Controlled by Subband Instantaneous Frequency and Speech Presence Probability Estimates
- Author
-
Steffen Kortlang, Volker Hohmann, Giso Grimm, Birger Kollmeier, and Stephan D. Ewert
- Subjects
Speech perception ,Acoustics and Ultrasonics ,Dynamic range ,Speech recognition ,Intelligibility (communication) ,Speech processing ,01 natural sciences ,Instantaneous phase ,Loudness ,03 medical and health sciences ,Computational Mathematics ,0302 clinical medicine ,medicine.anatomical_structure ,0103 physical sciences ,otorhinolaryngologic diseases ,Computer Science (miscellaneous) ,medicine ,Auditory system ,Dynamic range compression ,Electrical and Electronic Engineering ,030223 otorhinolaryngology ,010301 acoustics ,Mathematics - Abstract
Sensorineural hearing loss typically results in elevated thresholds and steepened loudness growth significantly conditioned by a damage of outer hair cells (OHC). In hearing aids, amplification and dynamic compression aim at widening the limited available dynamic range. However, speech perception particularly in complex acoustic scenes often remains difficult. Here, a physiologically motivated, fast acting, model-based dynamic compression algorithm (MDC) is introduced which aims at restoring the behaviorally estimated basilar membrane input–output (BM I/O) function in normal-hearing listeners. A system-specific gain prescription rule is suggested, based on the same model BM I/O function and a behavioral estimate of the individual OHC loss. Cochlear off-frequency component suppression is mimicked using an instantaneous frequency (IF) estimate. Increased loudness as a consequence of widened filters in the impaired system is considered in a further compensation stage. In an extended version, a subband estimate of the speech presence probability (MDC+SPP) additionally provides speech-selective amplification in stationary noise. Instrumental evaluation revealed that the IF control enhances the spectral contrast of vowels and benefits in quality predictions at higher signal-to-noise ratios (SNRs) were observed. Compared with a conventional multiband dynamic compressor, MDC achieved objective quality and intelligibility benefits for a competing talker at lower SNRs. MDC+SPP outperformed the conventional compressor in the quality predictions and reached comparable instrumental speech intelligibility as achieved with linear amplification. The proposed algorithm provides a first promising basis for auditory model-based compression with signal-type- and bandwidth-dependent gains.
- Published
- 2016
31. Spectral and binaural loudness summation for hearing-impaired listeners
- Author
-
Jens-E. Appell, Dirk Oetting, Volker Hohmann, Birger Kollmeier, Stephan D. Ewert, and Publica
- Subjects
Adult ,Male ,Auditory perception ,medicine.medical_specialty ,Hearing Loss, Sensorineural ,Loudness Perception ,Speech recognition ,Monaural ,Audiology ,Loudness ,Young Adult ,03 medical and health sciences ,Hearing Aids ,0302 clinical medicine ,Narrowband ,medicine ,Humans ,Hearing Loss ,030223 otorhinolaryngology ,Aged ,Mathematics ,Dynamic range ,Hearing Tests ,Auditory Threshold ,Audiogram ,Loudness compensation ,Sensory Systems ,Acoustic Stimulation ,Auditory Perception ,Female ,Binaural recording ,Algorithms ,030217 neurology & neurosurgery - Abstract
Sensorineural hearing loss typically results in a steepened loudness function and a reduced dynamic range from elevated thresholds to uncomfortably loud levels for narrowband and broadband signals. Restoring narrowband loudness perception for hearing-impaired (HI) listeners can lead to overly loud perception of broadband signals and it is unclear how binaural presentation affects loudness perception in this case. Here, loudness perception quantified by categorical loudness scaling for nine normal-hearing (NH) and ten HI listeners was compared for signals with different bandwidth and different spectral shape in monaural and in binaural conditions. For the HI listeners, frequency- and level-dependent amplification was used to match the narrowband monaural loudness functions of the NH listeners. The average loudness functions for NH and HI listeners showed good agreement for monaural broadband signals. However, HI listeners showed substantially greater loudness for binaural broadband signals than NH listeners: on average a 14.1 dB lower level was required to reach ""very loud"" (range 30.8 to −3.7 dB). Overall, with narrowband loudness compensation, a given binaural loudness for broadband signals above ""medium loud"" was reached at systematically lower levels for HI than for NH listeners. Such increased binaural loudness summation was not found for loudness categories below ""medium loud"" or for narrowband signals. Large individual variations in the increased loudness summation were observed and could not be explained by the audiogram or the narrowband loudness functions.
- Published
- 2016
32. Interaural Coherence Preservation in Multi-Channel Wiener Filtering-Based Noise Reduction for Binaural Hearing Aids
- Author
-
Daniel Marquardt, Simon Doclo, and Volker Hohmann
- Subjects
Sound localization ,Acoustics and Ultrasonics ,Computer science ,Noise reduction ,Speech recognition ,Wiener filter ,Speech processing ,Speech enhancement ,Computational Mathematics ,symbols.namesake ,Noise ,Computer Science (miscellaneous) ,symbols ,Coherence (signal processing) ,Electrical and Electronic Engineering ,Binaural recording - Abstract
Besides noise reduction an important objective of binaural speech enhancement algorithms is the preservation of the binaural cues of all sound sources. To this end, an extension of the binaural multi-channel Wiener filter (MWF), namely the MWF-ITF, has been proposed, which aims to preserve the Interaural Transfer Function (ITF) of the noise sources. However, the MWF-ITF is well-suited only for directional noise sources but not for, e.g., spatially isotropic noise, whose spatial characteristics cannot be properly described by the ITF but rather by the Interaural Coherence (IC). Hence, another extension of the binaural MWF, namely the MWF-IC, has been recently proposed, which aims to preserve the IC of the noise component. Since for the MWF-IC a substantial tradeoff between noise reduction and IC preservation exists, in this paper we propose a perceptually constrained version of the MWF-IC, where the amount of IC preservation is controlled based on the IC discrimination ability of the human auditory system. In addition, a theoretical analysis of the binaural cue preservation capabilities of the binaural MWF and the MWF-ITF for spatially isotropic noise fields is provided. Several simulations in diffuse noise scenarios show that the perceptually constrained MWF-IC yields a controllable preservation of the IC without significantly degrading the output SNR compared to the binaural MWF and the MWF-ITF. Furthermore, contrary to the binaural MWF and MWF-ITF, the proposed algorithm retains the spatial separation between the output speech and noise components while the binaural cues of the speech component are only slightly distorted, such that the binaural hearing advantage for speech intelligibility can still be exploited.
- Published
- 2015
33. Effects of directional hearing aid settings on different laboratory measures of spatial awareness perception
- Author
-
Giso Grimm, Volker Hohmann, Micha Lundbeck, Lars Bramsløw, and Tobias Neher
- Subjects
Hearing aid ,Signal processing ,Reverberation ,medicine.medical_specialty ,Hearing aids ,Hearing loss ,Computer science ,Ecological validity ,medicine.medical_treatment ,media_common.quotation_subject ,Wearable computer ,Audiology ,Auditory movement perception ,hearing loss ,spatial awareness perception ,hearing aid signal processing ,050105 experimental psychology ,Article ,03 medical and health sciences ,0302 clinical medicine ,Perception ,medicine ,otorhinolaryngologic diseases ,0501 psychology and cognitive sciences ,030223 otorhinolaryngology ,Spatial awareness ,media_common ,Spatial contextual awareness ,05 social sciences ,lcsh:Otorhinolaryngology ,lcsh:RF1-547 ,Otorhinolaryngology ,Loudspeaker ,medicine.symptom - Abstract
Hearing loss can negatively influence the spatial hearing abilities of hearing-impaired listeners, not only in static but also in dynamic auditory environments. Therefore, ways of addressing these deficits with advanced hearing aid algorithms need to be investigated. In a previous study based on virtual acoustics and a computer simulation of different bilateral hearing aid fittings, we investigated auditory source movement detectability in older hearing- impaired (OHI) listeners. We found that two directional processing algorithms could substantially improve the detectability of left-right and near-far source movements in the presence of reverberation and multiple interfering sounds. In the current study, we carried out similar measurements with a loudspeaker-based setup and wearable hearing aids. We fitted a group of 15 OHI listeners with bilateral behind-the-ear devices that were programmed to have three different directional processing settings. Apart from source movement detectability, we assessed two other aspects of spatial awareness perception. Using a street scene with up to five environmental sound sources, the participants had to count the number of presented sources or to indicate the movement direction of a single target signal. The data analyses showed a clear influence of the number of concurrent sound sources and the starting position of the moving target signal on the participants’ performance, but no influence of the different hearing aid settings. Complementary artificial head recordings showed that the acoustic differences between the three hearing aid settings were rather small. Another explanation for the lack of effects of the tested hearing aid settings could be that the simulated street scenario was not sufficiently sensitive. Possible ways of improving the sensitivity of the laboratory measures while maintaining high ecological validity and complexity are discussed.
- Published
- 2018
34. Towards Realistic Immersive Audiovisual Simulations for Hearing Research
- Author
-
Volker Hohmann, Gerard Llorach, Giso Grimm, and Maartje M. E. Hendrikse
- Subjects
Hearing aid ,Ambisonics ,Computer science ,medicine.medical_treatment ,Virtual Reality ,Hearing Research ,Hearing research ,Virtual reality ,01 natural sciences ,Computer graphics ,03 medical and health sciences ,0302 clinical medicine ,Human–computer interaction ,0103 physical sciences ,medicine ,Use case ,Audiovisual Reproduction ,030223 otorhinolaryngology ,010301 acoustics ,Audiovisual Capture - Abstract
Most current hearing research laboratories and hearing aid evaluation setups are not sufficient to simulate real-life situations and to evaluate future generations of hearing aids that might include gaze information and brain signals. Thus, new methodologies and technologies might need to be implemented in hearing laboratories and clinics in order to generate audiovisual realistic testing environments. The aim of this work is to provide a comprehensive review of the current available approaches and future directions to create audiovisual realistic immersive simulations for hearing research. Additionally, we present the technologies and use cases of our laboratory, as well as the pros and cons of such technologies: From creating 3D virtual simulations with computer graphics and virtual acoustic simulations, to 360º videos and Ambisonic recordings.
- Published
- 2018
35. Influence of visual cues on head and eye movements during listening tasks in multi-talker audiovisual environments with animated characters
- Author
-
Volker Hohmann, Gerard Llorach, Maartje M. E. Hendrikse, and Giso Grimm
- Subjects
FOS: Computer and information sciences ,Hearing aid ,Linguistics and Language ,Hearing aids ,Visual perception ,medicine.medical_treatment ,media_common.quotation_subject ,Computer Science - Human-Computer Interaction ,FOS: Physical sciences ,01 natural sciences ,Language and Linguistics ,Human-Computer Interaction (cs.HC) ,03 medical and health sciences ,0302 clinical medicine ,Perception ,0103 physical sciences ,otorhinolaryngologic diseases ,medicine ,Active listening ,030223 otorhinolaryngology ,Evaluation ,010301 acoustics ,Sensory cue ,Animations ,media_common ,Audiovisual environments ,Communication ,Head- and eye movement ,Eye movement ,Animation ,Physics - Medical Physics ,Gaze ,Computer Science Applications ,Modeling and Simulation ,Medical Physics (physics.med-ph) ,Computer Vision and Pattern Recognition ,Psychology ,Software ,Cognitive psychology - Abstract
Accepted manuscript, published in Speech Communication (July 2018). Recent studies of hearing aid benefits indicate that head movement behavior influences performance. To systematically assess these effects, movement behavior must be measured in realistic communication conditions. For this, the use of virtual audiovisual environments with animated characters as visual stimuli has been proposed. It is unclear, however, how these animations influence the head- and eye-movement behavior of subjects. Here, two listening tasks were carried out with a group of 14 young normal hearing subjects to investigate the influence of visual cues on head- and eye-movement behavior; on combined localization and speech intelligibility task performance; as well as on perceived speech intelligibility, perceived listening effort and the general impression of the audiovisual environments. Animated characters with different lip-syncing and gaze patterns were compared to an audio-only condition and to a video of real persons. Results show that movement behavior, task performance, and perception were all influenced by visual cues. The movement behavior of young normal hearing listeners in animation conditions with lip-syncing was similar to that in the video condition. These results in young normal hearing listeners are a first step towards using the animated characters to assess the influence of head movement behavior on hearing aid performance., This study was funded by DFG research unit FOR1732 "Hearing Acoustics" and by European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 675324 (ENRICH).
- Published
- 2018
- Full Text
- View/download PDF
36. Evaluation of Spatial Audio Reproduction Schemes for Application in Hearing Aid Research
- Author
-
Giso Grimm, Volker Hohmann, and Stephan D. Ewert
- Subjects
FOS: Computer and information sciences ,Hearing aid ,Sound (cs.SD) ,Class (computer programming) ,Acoustics and Ultrasonics ,Computer science ,Ambisonics ,Speech recognition ,media_common.quotation_subject ,medicine.medical_treatment ,Amplitude panning ,Computer Science - Sound ,Perception ,medicine ,Loudspeaker ,Image resolution ,Music ,media_common ,Communication channel - Abstract
Loudspeaker-based spatial audio reproduction schemes are increasingly used for evaluating hearing aids in complex acoustic conditions. To further establish the feasibility of this approach, this study investigated the interaction between spatial resolution of different reproduction methods and technical and perceptual hearing aid performance measures using computer simulations. Three spatial audio reproduction methods -- discrete speakers, vector base amplitude panning and higher order ambisonics -- were compared in regular circular loudspeaker arrays with 4 to 72 channels. The influence of reproduction method and array size on performance measures of representative multi-microphone hearing aid algorithm classes with spatially distributed microphones and a representative single channel noise-reduction algorithm was analyzed. Algorithm classes differed in their way of analyzing and exploiting spatial properties of the sound field, requiring different accuracy of sound field reproduction. Performance measures included beam pattern analysis, signal-to-noise ratio analysis, perceptual localization prediction, and quality modeling. The results show performance differences and interaction effects between reproduction method and algorithm class that may be used for guidance when selecting the appropriate method and number of speakers for specific tasks in hearing aid research., The archived file is not the final published version of the article Evaluation of spatial audio reproduction schemes for application in hearing aid research, in Acta Acustica united with Acustica, volume 101, 2015, pp. 842-854(13). The definitive publisher-authenticated version is available online at http://www.ingentaconnect.com/content/dav/aaua. http://dx.doi.org/10.3813/AAA.918878
- Published
- 2015
37. The influence of pause, attack, and decay duration of the ongoing envelope on sound lateralization
- Author
-
Martin Klein-Hennig, Mathias Dietz, and Volker Hohmann
- Subjects
Adult ,Male ,medicine.medical_specialty ,Time Factors ,Acoustics and Ultrasonics ,Acoustics ,media_common.quotation_subject ,Audiology ,Functional Laterality ,Lateralization of brain function ,Motion ,Young Adult ,Arts and Humanities (miscellaneous) ,medicine ,Humans ,Contrast (vision) ,Sound Localization ,Mathematics ,Envelope (waves) ,media_common ,Auditory Threshold ,Sound ,Acoustic Stimulation ,Duration (music) ,Audiometry, Pure-Tone ,Female ,Psychoacoustics - Abstract
Klein-Hennig et al. [J. Acoust. Soc. Am. 129, 3856-3872 (2011)] introduced a class of high-frequency stimuli for which the envelope shape can be altered by independently varying the attack, hold, decay, and pause durations. These stimuli, originally employed for testing the shape dependence of human listeners' sensitivity to interaural temporal differences (ITDs) in the ongoing envelope, were used to measure the lateralization produced by fixed interaural disparities. Consistent with the threshold ITD data, a steep attack and a non-zero pause facilitate strong ITD-based lateralization. In contrast, those conditions resulted in the smallest interaural level-based lateralization.
- Published
- 2015
38. Sparse periodicity-based auditory features explain human performance in a spatial multitalker auditory scene analysis task
- Author
-
Volker Hohmann, Esther Schoenmaker, Angela Josupeit, and Steven van de Par
- Subjects
Auditory scene analysis ,Computer science ,Speech recognition ,Stimulus (physiology) ,Intelligibility (communication) ,Signal-To-Noise Ratio ,03 medical and health sciences ,0302 clinical medicine ,medicine ,Auditory system ,Humans ,Psychoacoustics ,030304 developmental biology ,0303 health sciences ,General Neuroscience ,Speech Intelligibility ,Novelty ,Auditory Threshold ,Neurophysiology ,medicine.anatomical_structure ,Sound ,Acoustic Stimulation ,Salient ,Speech Perception ,Perceptual Masking ,030217 neurology & neurosurgery - Abstract
Human listeners robustly decode speech information from a talker of interest that is embedded in a mixture of spatially distributed interferers. A relevant question is which time-frequency segments of the speech are predominantly used by a listener to solve such a complex Auditory Scene Analysis task. A recent psychoacoustic study investigated the relevance of low signal-to-noise ratio (SNR) components of a target signal on speech intelligibility in a spatial multitalker situation. For this, a three-talker stimulus was manipulated in the spectro-temporal domain such that target speech time-frequency units below a variable SNR threshold (SNRcrit ) were discarded while keeping the interferers unchanged. The psychoacoustic data indicate that only target components at and above a local SNR of about 0 dB contribute to intelligibility. This study applies an auditory scene analysis "glimpsing" model to the same manipulated stimuli. Model data are found to be similar to the human data, supporting the notion of "glimpsing," that is, that salient speech-related information is predominantly used by the auditory system to decode speech embedded in a mixture of sounds, at least for the tested conditions of three overlapping speech signals. This implies that perceptually relevant auditory information is sparse and may be processed with low computational effort, which is relevant for neurophysiological research of scene analysis and novelty processing in the auditory system.
- Published
- 2017
39. Influence of multi-microphone signal enhancement algorithms on auditory movement detection in acoustically complex situations
- Author
-
Micha Lundbeck, Laura Hartog, Giso Grimm, Volker Hohmann, Lars Bramsløw, Tobias Neher, Santurette, S., Dau, T., Dalsgaard, J. C., Tranebjærg, L., Andersen, T., and Poulsen, T.
- Abstract
The influence of hearing aid (HA) signal processing on the perception of spatially dynamic sounds has not been systematically investigated so far. Previously, we observed that for elderly hearing-impaired (EHI) listeners concurrent distractor sounds impaired the detectability of left-right source movements, and reverberation that of near-far source movements (Lundbeck et al, Trends Hear 2017). Here, we explored potential ways of improving these deficits with HAs. To that end, we carried out detailed acoustic analyses on the stimuli used previously to examine the impact of two beamforming algorithms and a binaural coherence-based noise reduction scheme on the cues underlying movement perception. While the binaural cues remained mostly unchanged, the applied processing led to greater monaural spectral changes, as well as increases in signal-to-noise ratio and direct-to-reverberant sound ratio. Based on these findings, we conducted a listening test with 20 EHI listeners. That is, we performed aided measurements of movement detectability with three different processing conditions in two acoustic scenarios. Our results indicate that, for both movement dimensions, the applied processing could partly restore source movement detection in the presence of concurrent distractor sounds.
- Published
- 2017
40. Perceptual Consequences of Different Signal Changes Due to Binaural Noise Reduction
- Author
-
Giso Grimm, Volker Hohmann, and Tobias Neher
- Subjects
medicine.medical_specialty ,Signal Detection, Psychological ,Hearing loss ,Hearing Loss, Sensorineural ,Noise reduction ,media_common.quotation_subject ,Signal-To-Noise Ratio ,Audiology ,Stimulus (physiology) ,Background noise ,Speech and Hearing ,Hearing Aids ,Perception ,medicine ,Humans ,Aged ,media_common ,Aged, 80 and over ,Working memory ,Noise attenuation ,Middle Aged ,Memory, Short-Term ,Otorhinolaryngology ,Speech Perception ,Audiometry, Pure-Tone ,medicine.symptom ,Psychology ,Binaural recording - Abstract
OBJECTIVES In a previous study, ) investigated whether pure-tone average (PTA) hearing loss and working memory capacity (WMC) modulate benefit from different binaural noise reduction (NR) settings. Results showed that listeners with smaller WMC preferred strong over moderate NR even at the expense of poorer speech recognition due to greater speech distortion (SD), whereas listeners with larger WMC did not. To enable a better understanding of these findings, the main aims of the present study were (1) to explore the perceptual consequences of changes to the signal mixture, target speech, and background noise caused by binaural NR, and (2) to determine whether response to these changes varies with WMC and PTA. DESIGN As in the previous study, four age-matched groups of elderly listeners (with N = 10 per group) characterized by either mild or moderate PTAs and either better or worse performance on a visual measure of WMC participated. Five processing conditions were tested, which were based on the previously used (binaural coherence-based) NR scheme designed to attenuate diffuse signal components at mid to high frequencies. The five conditions differed in terms of the type of processing that was applied (no NR, strong NR, or strong NR with restoration of the long-term stimulus spectrum) and in terms of whether the target speech and background noise were processed in the same manner or whether one signal was left unprocessed while the other signal was processed with the gains computed for the signal mixture. Comparison across these conditions allowed assessing the effects of changes in high-frequency audibility (HFA), SD, and noise attenuation and distortion (NAD). Outcome measures included a dual-task paradigm combining speech recognition with a visual reaction time (VRT) task as well as ratings of perceived effort and overall preference. All measurements were carried out using headphone simulations of a frontal target speaker in a busy cafeteria. RESULTS Relative to no NR, strong NR was found to impair speech recognition and VRT performance slightly and to improve perceived effort and overall preference markedly. Relative to strong NR, strong NR with restoration of the long-term stimulus spectrum and thus HFA did not affect speech recognition, restored VRT performance to that achievable with no NR, and increased perceived effort and reduced overall preference markedly. SD had negative effects on speech recognition and perceived effort, particularly when both speech and noise were processed with the gains computed for the signal mixture. NAD had positive effects on speech recognition, perceived effort, and overall preference, particularly when the target speech was left unprocessed. VRT performance was unaffected by SD and NAD. None of the datasets exhibited any clear signs that response to the different signal changes varies with PTA or WMC. CONCLUSIONS For the outcome measures and stimuli applied here, the present study provides little evidence that PTA or WMC affect response to changes in HFA, SD, and NAD caused by binaural NR. However, statistical power restrictions suggest further research is needed. This research should also investigate whether partial HFA restoration combined with some pre-processing that reduces co-modulation distortion results in a more favorable balance of the effects of binaural NR across outcome dimensions and whether NR strength has any influence on these results.
- Published
- 2014
41. Do Hearing Loss and Cognitive Function Modulate Benefit From Different Binaural Noise-Reduction Settings?
- Author
-
Volker Hohmann, Giso Grimm, Birger Kollmeier, and Tobias Neher
- Subjects
Male ,Hearing aid ,medicine.medical_specialty ,Speech perception ,Hearing loss ,Hearing Loss, Sensorineural ,medicine.medical_treatment ,Signal-To-Noise Ratio ,Intelligibility (communication) ,Audiology ,Speech and Hearing ,Cognition ,Hearing Aids ,Reaction Time ,otorhinolaryngologic diseases ,medicine ,Humans ,Aged ,Aged, 80 and over ,medicine.diagnostic_test ,Working memory ,Middle Aged ,Memory, Short-Term ,Treatment Outcome ,Otorhinolaryngology ,Pattern Recognition, Physiological ,Speech Perception ,Audiometry, Pure-Tone ,Female ,medicine.symptom ,Audiometry ,Psychology ,Binaural recording ,Algorithms - Abstract
Objectives Although previous research indicates that cognitive skills influence benefit from different types of hearing aid algorithms, comparatively little is known about the role of, and potential interaction with, hearing loss. This holds true especially for noise reduction (NR) processing. The purpose of the present study was thus to explore whether degree of hearing loss and cognitive function modulate benefit from different binaural NR settings based on measures of speech intelligibility, listening effort, and overall preference. Design Forty elderly listeners with symmetrical sensorineural hearing losses in the mild to severe range participated. They were stratified into four age-matched groups (with n = 10 per group) based on their pure-tone average hearing losses and their performance on a visual measure of working memory (WM) capacity. The algorithm under consideration was a binaural coherence-based NR scheme that suppressed reverberant signal components as well as diffuse background noise at mid to high frequencies. The strength of the applied processing was varied from inactive to strong, and testing was carried out across a range of fixed signal-to-noise ratios (SNRs). Potential benefit was assessed using a dual-task paradigm combining speech recognition with a visual reaction time (VRT) task indexing listening effort. Pairwise preference judgments were also collected. All measurements were made using headphone simulations of a frontal speech target in a busy cafeteria. Test-retest data were gathered for all outcome measures. Results Analysis of the test-retest data showed all data sets to be reliable. Analysis of the speech scores showed that, for all groups, speech recognition was unaffected by moderate NR processing, whereas strong NR processing reduced intelligibility by about 5%. Analysis of the VRT scores revealed a similar data pattern. That is, while moderate NR did not affect VRT performance, strong NR impaired the performance of all groups slightly. Analysis of the preference scores collapsed across SNR showed that all groups preferred some over no NR processing. Furthermore, the two groups with smaller WM capacity preferred strong over moderate NR processing; for the two groups with larger WM capacity, preference did not differ significantly between the moderate and strong settings. Conclusions The present study demonstrates that, for the algorithm and the measures of speech recognition and listening effort used here, the effects of different NR settings interact with neither degree of hearing loss nor WM capacity. However, preferred NR strength was found to be associated with smaller WM capacity, suggesting that hearing aid users with poorer cognitive function may prefer greater noise attenuation even at the expense of poorer speech intelligibility. Further research is required to enable a more detailed (SNR-dependent) analysis of this effect and to test its wider applicability.
- Published
- 2014
42. Interactive rendering of dynamic virtual audio-visual environments for 'subject-in-the-loop' experiments
- Author
-
Giso Grimm, Volker Hohmann, and Maartje M. E. Hendrikse
- Subjects
Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,Computer science ,Computer graphics (images) ,Audio visual ,Rendering (computer graphics) - Published
- 2019
43. A high-fidelity multi-channel portable platform for development of novel algorithms for assistive listening wearables
- Author
-
Andy P. Atamaniuk, Chaslav V. Pavlovic, Volker Hohmann, Hendrik Kayser, Reza Kassayan, and S. R. Prakash
- Subjects
Hearing aid ,Signal processing ,Acoustics and Ultrasonics ,Computer science ,Headset ,medicine.medical_treatment ,Wearable computer ,Hearing apparatus ,Arts and Humanities (miscellaneous) ,Human–computer interaction ,medicine ,Codec ,Active listening ,Set (psychology) ,Binaural recording - Abstract
The NIDCD has funded a number of projects to develop portable signal processing tools that enable real-time processing of the acoustic environment. The overarching goal is to provide a large group of researchers with the means to efficiently develop and evaluate novel signal processing schemes, individualized fitting procedures, and technical solutions and services for hearing apparatus such as hearing aids and assistive listening devices. We report here on a development done in the SBIR Phase II Project R44DC016247. This project builds on the software being concurrently developed in R01DC015429 to provide a complete portable and wearable software-hardware master hearing aid device needed for development of new solutions for assisted hearing. We will present and demonstrate the portable platform, currently in the Beta launch, that consists of a Cortex A8 based processing unit and a codec set able to support hearing aid architecture of up to 6 microphones and 4 speakers. It is currently accompanied by a binaural 2-microphone BTE hearing aid set, but will also support different headset form-factors of our partners. Additionally it features stereo line in and line out connections. The device can be remotely controlled with a smart phone.The NIDCD has funded a number of projects to develop portable signal processing tools that enable real-time processing of the acoustic environment. The overarching goal is to provide a large group of researchers with the means to efficiently develop and evaluate novel signal processing schemes, individualized fitting procedures, and technical solutions and services for hearing apparatus such as hearing aids and assistive listening devices. We report here on a development done in the SBIR Phase II Project R44DC016247. This project builds on the software being concurrently developed in R01DC015429 to provide a complete portable and wearable software-hardware master hearing aid device needed for development of new solutions for assisted hearing. We will present and demonstrate the portable platform, currently in the Beta launch, that consists of a Cortex A8 based processing unit and a codec set able to support hearing aid architecture of up to 6 microphones and 4 speakers. It is currently accompanied by a bi...
- Published
- 2019
44. Open Master Hearing Aid (openMHA)—An integrated platform for hearing aid research
- Author
-
Hendrik Kayser, Volker Hohmann, Tobias Herzke, Chaslav V. Pavlovic, and Paul Maanen
- Subjects
Hearing aid ,Acoustics and Ultrasonics ,Computer science ,business.industry ,medicine.medical_treatment ,020207 software engineering ,02 engineering and technology ,Hearing research ,computer.software_genre ,Arts and Humanities (miscellaneous) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Audio signal processing ,Set (psychology) ,Software engineering ,business ,computer - Abstract
The project R01DC015429 “Open community platform for hearing aid algorithm research” provides a software platform for real-time, low-latency audio signal processing: the open Master Hearing Aid (openMHA). It contains a versatile set of basic and advanced methods for hearing aid processing, as well as tools and manuals enabling the design of own setups for algorithm development and evaluation. Documentation is provided for different user levels, in particular for audiologists, application engineers and algorithm designers. The software runs on various computer systems including lab setups and portable setups. Portable setups are of particular interest for the evaluation of new methods in real-word scenarios. In addition to standard off-the-shelf hardware, a portable, integrated research platform for openMHA is provided in conjunction with the SBIR project R44DC016247. This contribution introduces openMHA and discusses the usage and possible application scenarios of the portable openMHA setup in hearing research. The opportunity is given to try a smartphone-based self-fitting application for the portable openMHA, and to learn about the flexible configuration and remote control of openMHA running a typical hearing aid processing chain. Furthermore, a discussion and exchange of ideas on current challenges and future developments is offered.
- Published
- 2019
45. Relation between loudness in categorical units and loudness in phons and sones
- Author
-
Jens-E. Appell, Volker Hohmann, Jesko L. Verhey, and Wiebke Heeren
- Subjects
medicine.medical_specialty ,Acoustics and Ultrasonics ,Grey noise ,Loudness Perception ,Acoustics ,Auditory Threshold ,Signal Processing, Computer-Assisted ,Models, Theoretical ,Audiology ,Loudness ,Sound ,Acoustic Stimulation ,Audiometry ,Arts and Humanities (miscellaneous) ,Phon ,Pressure ,medicine ,Humans ,Narrowband noise ,Sound pressure ,Sone ,Categorical variable ,Mathematics - Abstract
Data are presented on the relation between loudness measured in categorical units (CUs) using a standardized loudness scaling method (ISO 16832, 2006) and loudness expressed as the classical standardized measures phon and sone. Based on loudness scaling of narrowband noise signals by 31 normal-hearing subjects, sound pressure levels eliciting the same categorical loudness were derived for various center frequencies. The results were comparable to the standardized equal-loudness level contours. A comparison between the loudness function in CUs at 1000 Hz and the standardized loudness function in sones indicates a cubic relation between the two loudness measures.
- Published
- 2013
46. Comparing the effect of pause duration on threshold interaural time differences between exponential and squared-sine envelopes (L)
- Author
-
Torben Wendt, Bernhard Laback, Stephan D. Ewert, Mathias Dietz, and Volker Hohmann
- Subjects
Adult ,Time Factors ,Acoustics and Ultrasonics ,Acoustics ,Auditory Threshold ,Ear ,Stimulus (physiology) ,Functional Laterality ,Exponential function ,Amplitude modulation ,Young Adult ,Acoustic Stimulation ,Audiometry ,Arts and Humanities (miscellaneous) ,Auditory Perception ,Humans ,Sine ,Cues ,psychological phenomena and processes ,Psychoacoustics ,Mathematics - Abstract
Recently two studies [Klein-Hennig et al., J. Acoust. Soc. Am. 129, 3856-3872 (2011); Laback et al., J. Acoust. Soc. Am. 130, 1515-1529 (2011)] independently investigated the isolated effect of pause duration on sensitivity to interaural time differences (ITD) in the ongoing stimulus envelope. The steepness of the threshold ITD as a function of pause duration functions differed considerably across studies. The present study, using matched carrier and modulation frequencies, directly compared threshold ITDs for the two envelope flank shapes from those studies. The results agree well when defining the metric of pause duration based on modulation depth sensitivity.
- Published
- 2013
47. Virtual acoustic environments for comprehensive evaluation of model-based hearing devicessup
- Author
-
Joanna Luberadzka, Volker Hohmann, and Giso Grimm
- Subjects
Hearing aid ,Linguistics and Language ,medicine.medical_specialty ,Computer science ,medicine.medical_treatment ,Audiology ,computer.software_genre ,01 natural sciences ,Language and Linguistics ,Rendering (computer graphics) ,03 medical and health sciences ,Speech and Hearing ,0302 clinical medicine ,Software ,Hearing Aids ,Hearing ,Human–computer interaction ,0103 physical sciences ,Evaluation methods ,Materials Testing ,medicine ,Humans ,Computer Simulation ,Correction of Hearing Impairment ,030223 otorhinolaryngology ,Hearing Loss ,010301 acoustics ,Multimedia ,business.industry ,Hearing Tests ,Virtual Reality ,Acoustics ,Equipment Design ,Models, Theoretical ,Environment, Controlled ,Toolbox ,Persons With Hearing Impairments ,Acoustic Stimulation ,Auditory Perception ,business ,Software architecture ,Noise ,computer ,Perceptual Masking ,Simulation methods ,Psychoacoustics - Abstract
Objective: Create virtual acoustic environments (VAEs) with interactive dynamic rendering for applications in audiology. Design: A toolbox for creation and rendering of dynamic virtual acoustic environments (TASCAR) that allows direct user interaction was developed for application in hearing aid research and audiology. The software architecture and the simulation methods used to produce VAEs are outlined. Example environments are described and analysed. Conclusion: With the proposed software, a tool for simulation of VAEs is available. A set of VAEs rendered with the proposed software was described.
- Published
- 2016
48. Online estimation of inter-channel phase differences using non-negative matrix factorization
- Author
-
Kamil Adiloglu, Graham Coleman, Volker Hohmann, and Hendrik Kayser
- Subjects
business.industry ,Computer science ,Phase (waves) ,Pattern recognition ,02 engineering and technology ,Measure (mathematics) ,Term (time) ,Non-negative matrix factorization ,Matrix decomposition ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,Source separation ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,Linear phase ,Communication channel - Abstract
Estimating non-linearities in phase differences between channel pairs of a multi-channel audio recording in a reverberant environment provides more precise spatial information that yields direct improvement in signal enhancement, as we show for the case of source separation. In this study, we propose an online method for estimating inter-channel phase differences (IPDs) that do not linearly depend on frequency. For this task, we use short term cross-correlation features between the input channels and extract the non-linear IPDs as well as a measure of activation for each source using non-negative matrix factorization. Our evaluation shows that the proposed method outperforms a state-of-the-art approach based on linear phase differences by 9.4% relative improvement in signal-to-interference ratio on average. Furthermore, increasing the window length of temporal context used for the decomposition increases the source separation accuracy, which converges to the accuracy of the offline method.
- Published
- 2016
49. Web-Based Live Speech-Driven Lip-Sync
- Author
-
Volker Hohmann, Alun Evans, Giso Grimm, Gerard Llorach, and Josep Blat
- Subjects
business.industry ,Computer science ,Speech recognition ,Feature extraction ,020207 software engineering ,02 engineering and technology ,Visual speech synthesis ,computer.software_genre ,Metaverse ,Visualization ,Set (abstract data type) ,Lip sync ,0202 electrical engineering, electronic engineering, information engineering ,Virtual characters ,Web application ,020201 artificial intelligence & image processing ,Lip synchronization ,business ,Audio signal processing ,computer ,Vocal tract - Abstract
Comunicació presentada a: 8th International Conference on Games and Virtual Worlds for Serious Applications (VS-Games), celebrada a Barcelona del 7 al 9 de setembre de 2016. Virtual characters are an integral part of many games and virtual worlds. The ability to accurately synchronize lip movement to audio speech is an important aspect in the believability of the character. In this paper we propose a simple rule-based lip-syncing algorithm for virtual agents using the web browser. It works in real-time with live input, unlike most current lip-syncing proposals, which may require considerable amounts of computation, expertise and time to set up. Our method generates reliable speech animation based on live speech using three blend shapes and no training, and it only needs manual adjustment of three parameters for each speaker (sensitivity, smoothness and vocal tract length). Our proposal is based on the limited real-time audio processing functions of the client web browser (thus, the algorithm needs to be simple), but this facilitates the use of web based embodied conversational agents. This research has been partially funded by the Spanish Ministry of Economy and Competitiveness (RESET TIN2014-53199-C3-3-R), by the DFG research grant FOR173 and by the European Commission under the contract number H2020-645012-RIA (KRISTINA).
- Published
- 2016
50. Effect of mistuning on the detection of a tone masked by a harmonic tone complex
- Author
-
Volker Hohmann, Georg M. Klump, Martin Klein-Hennig, Astrid Klinge-Strahl, and Mathias Dietz
- Subjects
Masking (art) ,Adult ,Male ,Anatomy and Physiology ,Sound Spectrography ,Speech recognition ,Sensory Physiology ,Perceptual Masking ,lcsh:Medicine ,Context (language use) ,Mistuning ,Neurological System ,Tone (musical instrument) ,Young Adult ,medicine ,Auditory system ,Humans ,lcsh:Science ,Biology ,Physics ,Multidisciplinary ,lcsh:R ,Auditory Threshold ,Fundamental frequency ,Acoustics ,Sensory Systems ,medicine.anatomical_structure ,Auditory System ,Acoustic Stimulation ,Harmonic ,Audiometry, Pure-Tone ,Sensory Perception ,Female ,lcsh:Q ,Physical Laws and Principles ,Research Article ,Neuroscience ,Psychoacoustics - Abstract
The human auditory system is sensitive in detecting “mistuned” components in a harmonic complex, which do not match the frequency pattern defined by the fundamental frequency of the complex. Depending on the frequency configuration, the mistuned component may be perceptually segregated from the complex and may be heard as a separate tone. In the context of a masking experiment, mistuning a single component decreases its masked threshold. In this study we propose to quantify the ability to detect a single component for fixed amounts of mistuning by adaptively varying its level. This method produces masking release by mistuning that can be compared to other masking release effects. Detection thresholds were obtained for various frequency configurations where the target component was resolved or unresolved in the auditory system. The results from 6 normal-hearing listeners show a significant decrease of masked thresholds between harmonic and mistuned conditions in all configurations and provide evidence for the employment of different detection strategies for resolved and unresolved components. The data suggest that across-frequency processing is involved in the release from masking. The results emphasize the ability of this method to assess integrative aspects of pitch and harmonicity perception.
- Published
- 2016
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.