Multilingual speech mode classification model for Indian languages

Authors :: K. Sreenivasa Rao
Kumud Tripathi
Source :: NCC
Publication Year :: 2020
Publisher :: IEEE, 2020.
Abstract: This paper explores the vocal tract and excitation source information for the multilingual speech mode classification (MSMC) task. MSMC is a language independent speech mode classification model that could detect the mode of speech spoken in any language. Here, we considered data of three broad speech modes: conversation, extempore, and read from three Indian languages, namely, Telugu, Bengali, and Odia. The vocal tract information is captured using Mel-frequency cepstral coefficients. The pitch contour processed at supra-segmental level represents the excitation source information. The MSMC model is developed using multilayer perceptron. Experimental results show that the vocal tract features provide better overall identification accuracy, compared to excitation source information. Further, an improvement in overall accuracy is achieved by combining the scores obtained by two separate MSMC model based on excitation source and vocal tract features. The results generated using a combined score, outperform the model developed using standard vocal tract feature.