1. A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement
- Author
-
Wang, Zhepei, Giri, Ritwik, Shah, Devansh, Valin, Jean-Marc, Goodwin, Michael M., and Smaragdis, Paris
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement. This is achieved by incorporating a frame-wise conditioning input that specifies the type of enhancement output. To improve the quality of the enhanced output and mitigate oversuppression, we experiment with re-weighting frames by the presence or absence of speech activity and applying augmentations to speaker embeddings. By training under a multi-task learning setting, we empirically show that the proposed unified model obtains promising results on both personalized and non-personalized speech enhancement benchmarks and reaches similar performance to models that are trained specialized for either task. The strong performance of the proposed method demonstrates that the unified model is a more economical alternative compared to keeping separate task-specific models during inference., Comment: Accepted by ICASSP 2023
- Published
- 2023
- Full Text
- View/download PDF