Back to Search
Start Over
PSST! Prosodic Speech Segmentation with Transformers
- Publication Year :
- 2023
-
Abstract
- Self-attention mechanisms have enabled transformers to achieve superhuman-level performance on many speech-to-text (STT) tasks, yet the challenge of automatic prosodic segmentation has remained unsolved. In this paper we finetune Whisper, a pretrained STT model, to annotate intonation unit (IU) boundaries by repurposing low-frequency tokens. Our approach achieves an accuracy of 95.8%, outperforming previous methods without the need for large-scale labeled data or enterprise grade compute resources. We also diminish input signals by applying a series of filters, finding that low pass filters at a 3.2 kHz level improve segmentation performance in out of sample and out of distribution contexts. We release our model as both a transcription tool and a baseline for further improvements in prosodic segmentation.<br />Comment: 5 pages, 3 figures. For associated repository, see https://github.com/Nathan-Roll1/psst
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2302.01984
- Document Type :
- Working Paper
- Full Text :
- https://doi.org/10.18653/v1/2023.conll-1.31