Start Over

Speaker separation in realistic noise environments with applications to a cognitively-controlled hearing aid.

Authors :: Borgström, Bengt J.
Brandstein, Michael S.
Ciccarelli, Gregory A.
Quatieri, Thomas F.
Smalt, Christopher J.
Source :: Neural Networks. Aug2021, Vol. 140, p136-147. 12p.
Publication Year :: 2021
Abstract: Future wearable technology may provide for enhanced communication in noisy environments and for the ability to pick out a single talker of interest in a crowded room simply by the listener shifting their attentional focus. Such a system relies on two components, speaker separation and decoding the listener's attention to acoustic streams in the environment. To address the former, we present a system for joint speaker separation and noise suppression, referred to as the Binaural Enhancement via Attention Masking Network (BEAMNET). The BEAMNET system is an end-to-end neural network architecture based on self-attention. Binaural input waveforms are mapped to a joint embedding space via a learned encoder, and separate multiplicative masking mechanisms are included for noise suppression and speaker separation. Pairs of output binaural waveforms are then synthesized using learned decoders, each capturing a separated speaker while maintaining spatial cues. A key contribution of BEAMNET is that the architecture contains a separation path, an enhancement path, and an autoencoder path. This paper proposes a novel loss function which simultaneously trains these paths, so that disabling the masking mechanisms during inference causes BEAMNET to reconstruct the input speech signals. This allows dynamic control of the level of suppression applied by BEAMNET via a minimum gain level, which is not possible in other state-of-the-art approaches to end-to-end speaker separation. This paper also proposes a perceptually-motivated waveform distance measure. Using objective speech quality metrics, the proposed system is demonstrated to perform well at separating two equal-energy talkers, even in high levels of background noise. Subjective testing shows an improvement in speech intelligibility across a range of noise levels, for signals with artificially added head-related transfer functions and background noise. Finally, when used as part of an auditory attention decoder (AAD) system using existing electroencephalogram (EEG) data, BEAMNET is found to maintain the decoding accuracy achieved with ideal speaker separation, even in severe acoustic conditions. These results suggest that this enhancement system is highly effective at decoding auditory attention in realistic noise environments, and could possibly lead to improved speech perception in a cognitively controlled hearing aid. [ABSTRACT FROM AUTHOR]

Subjects :: *SPEECH processing systems
*HEARING aids
*AUDITORY selective attention
*INTELLIGIBILITY of speech
*ACOUSTIC streaming
*SPEECH perception
*ASSISTIVE listening systems

Details

Language :: English
ISSN :: 08936080
Volume :: 140
Database :: Academic Search Index
Journal :: Neural Networks
Publication Type :: Academic Journal
Accession number :: 150297035
Full Text :: https://doi.org/10.1016/j.neunet.2021.02.020

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Speaker separation in realistic noise environments with applications to a cognitively-controlled hearing aid.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Speaker separation in realistic noise environments with applications to a cognitively-controlled hearing aid.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources