Back to Search Start Over

Significance of relative phase features for shouted and normal speech classification

Authors :
Khomdet Phapatanaburi
Longbiao Wang
Meng Liu
Seiichi Nakagawa
Talit Jumphoo
Peerapong Uthansakul
Source :
EURASIP Journal on Audio, Speech, and Music Processing, Vol 2024, Iss 1, Pp 1-14 (2024)
Publication Year :
2024
Publisher :
SpringerOpen, 2024.

Abstract

Abstract Shouted and normal speech classification plays an important role in many speech-related applications. The existing works are often based on magnitude-based features and ignore phase-based features, which are directly related to magnitude information. In this paper, the importance of phase-based features is explored for the detection of shouted speech. The novel contributions of this work are as follows. (1) Three phase-based features, namely, relative phase (RP), linear prediction analysis estimated speech-based RP (LPAES-RP) and linear prediction residual-based RP (LPR-RP) features, are explored for shouted and normal speech classification. (2) We propose a new RP feature, called the glottal source-based RP (GRP) feature. The main idea of the proposed GRP feature is to exploit the difference between RP and LPAES-RP features to detect shouted speech. (3) A score combination of phase- and magnitude-based features is also employed to further improve the classification performance. The proposed feature and combination are evaluated using the shouted normal electroglottograph speech (SNE-Speech) corpus. The experimental findings show that the RP, LPAES-RP, and LPR-RP features provide promising results for the detection of shouted speech. We also find that the proposed GRP feature can provide better results than those of the standard mel-frequency cepstral coefficient (MFCC) feature. Moreover, compared to using individual features, the score combination of the MFCC and RP/LPAES-RP/LPR-RP/GRP features yields an improved detection performance. Performance analysis under noisy environments shows that the score combination of the MFCC and the RP/LPAES-RP/LPR-RP features gives more robust classification. These outcomes show the importance of RP features in distinguishing shouted speech from normal speech.

Details

Language :
English
ISSN :
16874722
Volume :
2024
Issue :
1
Database :
Directory of Open Access Journals
Journal :
EURASIP Journal on Audio, Speech, and Music Processing
Publication Type :
Academic Journal
Accession number :
edsdoj.00f296c9c67e48219caa4a7c2ced5455
Document Type :
article
Full Text :
https://doi.org/10.1186/s13636-023-00324-4