Back to Search
Start Over
Significance of relative phase features for shouted and normal speech classification
- Source :
- EURASIP Journal on Audio, Speech, and Music Processing, Vol 2024, Iss 1, Pp 1-14 (2024)
- Publication Year :
- 2024
- Publisher :
- SpringerOpen, 2024.
-
Abstract
- Abstract Shouted and normal speech classification plays an important role in many speech-related applications. The existing works are often based on magnitude-based features and ignore phase-based features, which are directly related to magnitude information. In this paper, the importance of phase-based features is explored for the detection of shouted speech. The novel contributions of this work are as follows. (1) Three phase-based features, namely, relative phase (RP), linear prediction analysis estimated speech-based RP (LPAES-RP) and linear prediction residual-based RP (LPR-RP) features, are explored for shouted and normal speech classification. (2) We propose a new RP feature, called the glottal source-based RP (GRP) feature. The main idea of the proposed GRP feature is to exploit the difference between RP and LPAES-RP features to detect shouted speech. (3) A score combination of phase- and magnitude-based features is also employed to further improve the classification performance. The proposed feature and combination are evaluated using the shouted normal electroglottograph speech (SNE-Speech) corpus. The experimental findings show that the RP, LPAES-RP, and LPR-RP features provide promising results for the detection of shouted speech. We also find that the proposed GRP feature can provide better results than those of the standard mel-frequency cepstral coefficient (MFCC) feature. Moreover, compared to using individual features, the score combination of the MFCC and RP/LPAES-RP/LPR-RP/GRP features yields an improved detection performance. Performance analysis under noisy environments shows that the score combination of the MFCC and the RP/LPAES-RP/LPR-RP features gives more robust classification. These outcomes show the importance of RP features in distinguishing shouted speech from normal speech.
Details
- Language :
- English
- ISSN :
- 16874722
- Volume :
- 2024
- Issue :
- 1
- Database :
- Directory of Open Access Journals
- Journal :
- EURASIP Journal on Audio, Speech, and Music Processing
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.00f296c9c67e48219caa4a7c2ced5455
- Document Type :
- article
- Full Text :
- https://doi.org/10.1186/s13636-023-00324-4