1. Self-supervised speech denoising using only noisy audio signals
- Author
-
Jiasong Wu, Qingchun Li, Guanyu Yang, Lei Li, Lotfi Senhadji, Huazhong Shu, Laboratoire Traitement du Signal et de l'Image (LTSI), Université de Rennes (UR)-Institut National de la Santé et de la Recherche Médicale (INSERM), Centre de Recherche en Information Biomédicale sino-français (CRIBS), Université de Rennes (UR)-Southeast University [Jiangsu]-Institut National de la Santé et de la Recherche Médicale (INSERM), Laboratory of Image Science and Technology [Nanjing] (LIST), Southeast University [Jiangsu]-School of Computer Science and Engineering, National Key Research and Development Program of China, NKRDPC: 2021ZD0113202, Institut National de la Santé et de la Recherche Médicale, Inserm, and National Natural Science Foundation of China, NSFC: 50912040302, 61876037, 62171125
- Subjects
FOS: Computer and information sciences ,Audio sub-sampler ,Linguistics and Language ,Sound (cs.SD) ,Speech denoising ,Communication ,Computer Science - Sound ,Language and Linguistics ,Computer Science Applications ,Self-supervised ,Training target ,Computer Science::Sound ,Audio and Speech Processing (eess.AS) ,Modeling and Simulation ,FOS: Electrical engineering, electronic engineering, information engineering ,Computer Vision and Pattern Recognition ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,Software ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
In traditional speech denoising tasks, clean audio signals are often used as the training target, but absolutely clean signals are collected from expensive recording equipment or in studios with the strict environments. To overcome this drawback, we propose an end-to-end self-supervised speech denoising training scheme using only noisy audio signals, named Only-Noisy Training (ONT), without extra training conditions. The proposed ONT strategy constructs training pairs only from each single noisy audio, and it contains two modules: training audio pairs generated module and speech denoising module. The first module adopts a random audio sub-sampler on each noisy audio to generate training pairs. The sub-sampled pairs are then fed into a novel complex-valued speech denoising module. Experimental results show that the proposed method not only eliminates the high dependence on clean targets of traditional audio denoising tasks, but also achieves on-par or better performance than other training strategies. Availability-ONT is available at https://github.com/liqingchunnnn/Only-Noisy-Training, Comment: 11 pages, 4 figures, 6 tables
- Published
- 2023
- Full Text
- View/download PDF