Back to Search Start Over

Precise Detection of Speech Endpoints Dynamically: A Wavelet Convolution based approach

Authors :
Roy, Tanmoy
Marwala, Tshilidzi
Chakraverty, Snehashish
Publication Year :
2018

Abstract

Precise detection of speech endpoints is an important factor which affects the performance of the systems where speech utterances need to be extracted from the speech signal such as Automatic Speech Recognition (ASR) system. Existing endpoint detection (EPD) methods mostly uses Short-Term Energy (STE), Zero-Crossing Rate (ZCR) based approaches and their variants. But STE and ZCR based EPD algorithms often fail in the presence of Non-speech Sound Artifacts (NSAs) produced by the speakers. Algorithms based on pattern recognition and classification techniques are also proposed but require labeled data for training. A new algorithm termed as Wavelet Convolution based Speech Endpoint Detection (WCSEPD) is proposed in this article to extract speech endpoints. WCSEPD decomposes the speech signal into high-frequency and low-frequency components using wavelet convolution and computes entropy based thresholds for the two frequency components. The low-frequency thresholds are used to extract voiced speech segments, whereas the high-frequency thresholds are used to extract the unvoiced speech segments by filtering out the NSAs. WCSEPD does not require any labeled data for training and can automatically extract speech segments. Experiment results show that the proposed algorithm precisely extracts speech endpoints in the presence of NSAs.<br />Comment: 25 Pages

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.1804.06159
Document Type :
Working Paper
Full Text :
https://doi.org/10.1016/j.cnsns.2018.07.008