1. Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time
- Author
-
Li, Yue, Hindriks, Koen V., and Kunneman, Florian A.
- Subjects
Computer Science - Robotics ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing ,68T50 - Abstract
Spectral subtraction, widely used for its simplicity, has been employed to address the Robot Ego Speech Filtering (RESF) problem for detecting speech contents of human interruption from robot's single-channel microphone recordings when it is speaking. However, this approach suffers from oversubtraction in the fundamental frequency range (FFR), leading to degraded speech content recognition. To address this, we propose a Two-Mask Conformer-based Metric Generative Adversarial Network (CMGAN) to enhance the detected speech and improve recognition results. Our model compensates for oversubtracted FFR values with high-frequency information and long-term features and then de-noises the new spectrogram. In addition, we introduce an incremental processing method that allows semi-real-time audio processing with streaming input on a network trained on long fixed-length input. Evaluations of two datasets, including one with unseen noise, demonstrate significant improvements in recognition accuracy and the effectiveness of the proposed two-mask approach and incremental processing, enhancing the robustness of the proposed RESF pipeline in real-world HRI scenarios., Comment: 6 pages, 2 figures, submitted to 2025 IEEE ICASSP
- Published
- 2024