Back to Search Start Over

Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

Authors :
Soboleva, Daria
Skopek, Ondrej
Šajgalík, Márius
Cărbune, Victor
Weissenberger, Felix
Proskurnia, Julia
Prisacari, Bogdan
Valcarce, Daniel
Lu, Justin
Prabhavalkar, Rohit
Miklos, Balint
Publication Year :
2020

Abstract

We present a novel multi-modal unspoken punctuation prediction system for the English language which combines acoustic and text features. We demonstrate for the first time, that by relying exclusively on synthetic data generated using a prosody-aware text-to-speech system, we can outperform a model trained with expensive human audio recordings on the unspoken punctuation prediction problem. Our model architecture is well suited for on-device use. This is achieved by leveraging hash-based embeddings of automatic speech recognition text output in conjunction with acoustic features as input to a quasi-recurrent neural network, keeping the model size small and latency low.<br />Comment: Accepted to IEEE ICASSP 2021

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2010.10203
Document Type :
Working Paper