Back to Search Start Over

Context-Aware Audio-Visual Speech Enhancement Based on Neuro-Fuzzy Modeling and User Preference Learning

Authors :
Chen, Song
Kirton-Wingate, Jasper
Doctor, Faiyaz
Arshad, Usama
Dashtipour, Kia
Gogate, Mandar
Halim, Zahid
Al-Dubai, Ahmed
Arslan, Tughrul
Hussain, Amir
Source :
IEEE Transactions on Fuzzy Systems; October 2024, Vol. 32 Issue: 10 p5400-5412, 13p
Publication Year :
2024

Abstract

It is estimated that by 2050 approximately one in ten individuals globally will experience disabling hearing impairment. In the presence of everyday reverberant noise, a substantial proportion of individual users encounter challenges in speech comprehension. This study introduces a novel application of neuro-fuzzy modeling that synergizes and fuses audio-visual speech enhancement (AV SE) with an initial user preference learning based framework. Specifically, our approach uniquely integrates multimodal AV speech data with innovative SE methods and fuzzy inferencing techniques. This integration is further enriched by incorporating a user-preference learning model that adapts to environmental and user-specific contexts, including signal-to-noise ratios, sound power, and the quality of visual information. The proposed framework facilitates the incorporation of clinical measures such as user cognitive load (or listening effort) with real-world uncertainty to steer the system outputs. We employ an adaptive fuzzy neural network to derive the most effective Sugeno fuzzy inference model, employing particle swarm optimization to ensure optimal SE by considering sound power, ambient noise levels, and visual quality. Experimental results utilize our new benchmark AV multitalker challenge dataset to demonstrate the superiority of our user preference-informed, context-aware AV SE approach in enhancing speech intelligibility and quality in challenging noisy conditions, marking a significant advancement over conventional methods while reducing energy consumption. The conclusion supports the ecological scalability of our approach and its potential for real-world applications, setting a new benchmark in AV SE research, paving the way for future assistive hearing and communication technologies.

Details

Language :
English
ISSN :
10636706
Volume :
32
Issue :
10
Database :
Supplemental Index
Journal :
IEEE Transactions on Fuzzy Systems
Publication Type :
Periodical
Accession number :
ejs67653649
Full Text :
https://doi.org/10.1109/TFUZZ.2024.3435050