Back to Search Start Over

Transformer-Based Automatic Punctuation Prediction and Word Casing Reconstruction of the ASR Output

Authors :
Jan Lehečka
Pavel Ircing
Luboš Šmídl
Jan Švec
Source :
Text, Speech, and Dialogue ISBN: 9783030835262, TDS
Publication Year :
2021
Publisher :
Springer International Publishing, 2021.

Abstract

The paper proposes a module for automatic punctuation prediction and casing reconstruction based on transformers architectures (BERT/T5) that constitutes the current state-of-the-art in many similar NLP tasks. The main motivation for our work was to increase the readability of the ASR output. The ASR output is usually in the form of a continuous stream of text, without punctuation marks and with all words in lowercase. The resulting punctuation and casing reconstruction module is evaluated on both the written text and the actual ASR output in three languages (English, Czech and Slovak).

Details

ISBN :
978-3-030-83526-2
ISBNs :
9783030835262
Database :
OpenAIRE
Journal :
Text, Speech, and Dialogue ISBN: 9783030835262, TDS
Accession number :
edsair.doi.dedup.....0afb7b3d748ec7158e437bda8c0d2185
Full Text :
https://doi.org/10.1007/978-3-030-83527-9_7