Start Over

Language and noise transfer in speech enhancement generative adversarial network

Authors :: Antonio Bonafonte
Maruchan Park
Joan Serrà
Santiago Pascual
Kang-Hun Ahn
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
Source :: UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC), ICASSP, Recercat. Dipósit de la Recerca de Catalunya, instname
Publication Year :: 2018
Publisher :: Institute of Electrical and Electronics Engineers (IEEE), 2018.
Abstract: ©2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by fine-tuning the generator with small amounts of data. We investigate the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages: Catalan and Korean. We also study the variability of test performance to unseen noise as a function of the amount of different types of noise available for training. Results show that adapting a pre-trained English model with 10 min of data already achieves a comparable performance to having two orders of magnitude more data. They also demonstrate the relative stability in test performance with respect to the number of training noise types.

Subjects :: FOS: Computer and information sciences
Sound (cs.SD)
Generative adversarial networks
Computer science
Speech recognition
Speech enhancement
02 engineering and technology
Computer Science - Sound
Parla
Machine Learning (cs.LG)
030507 speech-language pathology & audiology
03 medical and health sciences
Audio and Speech Processing (eess.AS)
Transfer (computing)
Aprenentatge
FOS: Electrical engineering, electronic engineering, information engineering
0202 electrical engineering, electronic engineering, information engineering
Speech
Learning
business.industry
Deep learning
020206 networking & telecommunications
Function (mathematics)
Enginyeria de la telecomunicació [Àrees temàtiques de la UPC]
Transfer learning
Noise
Computer Science - Learning
Artificial intelligence
0305 other medical science
business
Ensenyament i aprenentatge [Àrees temàtiques de la UPC]
Generator (mathematics)
Electrical Engineering and Systems Science - Audio and Speech Processing

Details

Language :: English
Database :: OpenAIRE
Journal :: UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC), ICASSP, Recercat. Dipósit de la Recerca de Catalunya, instname
Accession number :: edsair.doi.dedup.....2214a8f00709b6b5573c5108c867938f

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Language and noise transfer in speech enhancement generative adversarial network

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Language and noise transfer in speech enhancement generative adversarial network

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources