1. Exploring Bengali speech for gender classification: machine learning and deep learning approaches.
- Author
-
Arpita, Habiba Dewan, Al Ryan, Abdullah, Hossain, Md. Fahad, Rahman, Md. Sadekur, Sajjad, Md, and Islam Prova, Nuzhat Noor
- Subjects
MACHINE learning ,DEEP learning ,CONVOLUTIONAL neural networks ,SUPPORT vector machines ,RANDOM forest algorithms ,SPEECH perception - Abstract
Speech enables clear and powerful idea transmission. The human voice, rich in tone and emotion, holds unique beauty and significance in daily life. Vocal pitches vary by gender and are influenced by emotions and languages. While people naturally perceive these nuances, machines often struggle to capture these subtle distinctions. Machines may struggle to detect these nuances, but people effortlessly perceive them. This project aims to use various machine learning (ML) and deep learning (DL) techniques to reliably determine an individual's gender from a corpus of Bengali conversations. Our dataset comprises 3185 Bengali speeches, with 1100 delivered by males, 1035 by women, and 1050 by those who identify as third gender. We employed six distinct feature extraction techniques to examine the audio data: roll-off, spectral centroid, chroma-stft, spectral bandwidth, zero crossing rate, and Mel-frequency cepstral coefficients (MFCC). Extreme gradient boosting (XGBoost), support vector machines (SVM), Knearest neighbors (KNN), decision trees classifier (DTC), and random forest (RF) were employed as the five ML algorithms to comprehensively analyze the dataset. For a full study, we also included 1D convolutional neural networks (CNN) from the DL area. The 1D CNN performed extraordinarily well, exceeding the accuracy of all other algorithms with a stunning 99.37%. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF