13 results on '"Tumisho Billson Mokgonyane"'
Search Results
2. Automatic Speaker Recognition System based on Optimised Machine Learning Algorithms.
- Author
-
Tumisho Billson Mokgonyane, Tshephisho Joseph Sefara, Thipe Isaiah Modipa, and Madimetja Jonas D. Manamela
- Published
- 2019
- Full Text
- View/download PDF
3. Gender Identification in Sepedi Speech Corpus
- Author
-
Tumisho Billson Mokgonyane and Tshephisho Joseph Sefara
- Subjects
Identification (information) ,Audio signal ,Computer science ,Multilayer perceptron ,Speech recognition ,Feature extraction ,Speech corpus ,Feature selection ,Convolutional neural network ,Data modeling - Abstract
Gender identification is the task of identifying the gender of the speaker from the audio signal. Most gender identification systems are developed using datasets belonging to well-resourced languages. There has been little focus on creating gender identification systems for under resourced African languages. This paper presents the development of a gender identification system using a Sepedi speech dataset containing a duration of 55.7 hours made of 30776 males and 28337 females. We build a gender identification system using machine learning models that are trained using multilayer Perceptron (MLP), convolutional neural network (CNN), and long short-term memory (LSTM). Mid-term features are extracted from time domain features, frequency domain features and cepstral domain features, and normalised using the Z-score normalisation technique. XGBoost is used as a feature selection method to select important features. MLP achieved the same F-score and an accuracy of 94% for data with seen speakers while LSTM and CNN achieved the same F-score and an accuracy of 97%. We further evaluated the models on data with unseen speakers. All the models achieved good performance in F-score and accuracy.
- Published
- 2021
4. A Cross-platform Interface for Automatic Speaker Identification and Verification
- Author
-
Thipe Isaiah Modipa, Tshephisho Joseph Sefara, Madimetja Jonas Manamela, and Tumisho Billson Mokgonyane
- Subjects
Identification (information) ,Computer science ,Interface (Java) ,Speech recognition ,Multilayer perceptron ,Feature extraction ,Audio analyzer ,Language technology ,Classifier (linguistics) ,Speaker recognition - Abstract
The task of automatically identifying and/or verifying the identity of a speaker from a recording of a speech sample, known as automatic speaker recognition, has been studied for many years and automatic speaker recognition technologies have improved recently and becoming inexpensive and reliable methods for identifying and verifying people. Although automatic speaker recognition research has now spanned over 50 years, there is not adequate research done with regards to low-resourced South African indigenous languages. In this paper, a multi-layer perceptron (MLP) classifier model is trained and deployed on a graphical user interface for real time identification and verification of Sepedi native speakers. Sepedi is a low-resourced language spoken by the majority of residents in the Limpopo province of South Africa. The data used to train the speaker recognition system is obtained from the NCHLT (National Centre for Human Language Technology) project. A total of 34 short-term acoustic features of speech are extracted with the use of py Audio Analysis library and Sklearn is used to train the MLP classifier model which performs well with an accuracy of 95%. The GUI is developed with QT Creator and PyQT4 and it has obtained a true acceptance rate (TAR) of 66.67% and a true rejection rate of (TRR) 13.33%.
- Published
- 2021
5. Emotional Speaker Recognition based on Machine and Deep Learning
- Author
-
Tshephisho Joseph Sefara and Tumisho Billson Mokgonyane
- Subjects
Support vector machine ,Artificial neural network ,Computer science ,business.industry ,Speech recognition ,Multilayer perceptron ,Deep learning ,Deep neural networks ,Artificial intelligence ,business ,Speaker recognition ,Convolutional neural network ,Random forest - Abstract
Speaker recognition is a method which recognise a speaker from characteristics of a voice. Speaker recognition technologies have been widely used in many domains. Most speaker recognition systems have been trained on normal clean recordings, however the performance of these speaker recognition systems tends to degrade when recognising speech which has emotions. This paper presents an emotional speaker recognition system trained using machine and deep learning algorithms using time, frequency and spectral features on emotional speech database acquired from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). We trained and compared the performance of five machine learning models (Logistic Regression, Support Vector Machine, Random Forest, XGBoost, and k-Nearest Neighbor), and three deep learning models (Long Short-Term Memory network, Multilayer Perceptron, and Convolutional Neural Network). After the evaluation of the models, the deep neural networks showed good performance compared to machine learning models by attaining the highest accuracy of 92% outperforming the state-of-the-art models in emotional speaker detection from speech signals.
- Published
- 2020
6. The Effects of Acoustic Features of Speech for Automatic Speaker Recognition
- Author
-
Tumisho Billson Mokgonyane, Thipe Isaiah Modipa, Moses Sebaka Masekwameng, Tshephisho Joseph Sefara, and Madimetja Jonas Manamela
- Subjects
Support vector machine ,Kernel (linear algebra) ,Computer science ,Speech recognition ,Language technology ,Feature extraction ,Identity (object-oriented programming) ,Sample (statistics) ,Perceptron ,Speaker recognition - Abstract
Automatic speaker recognition is the task of automatically determining or verifying the identity of a speaker from a recording of his or her speech sample and has been studied for many decades. One of the most important steps of speaker recognition that significantly influences the speaker recognition performance is known as feature extraction. Acoustic features of speech have been researched by many researchers around the world, however, there is limited research conducted on African indigenous languages, South African official languages in particular. This paper presents the effects of acoustic features of speech towards the performance of speaker recognition systems focusing on South African low-resourced languages. This study investigates the acoustic features of speech using the National Centre for Human Language Technology (NCHLT) Sepedi speech data. Acoustic features of speech such as Time-domain, Frequency-domain and Cepstral-domain features are evaluated on four machine learning algorithms: K-Nearest Neighbours (K-NN), two kernel-based Support Vector Machines (SVM), and Multilayer Perceptrons (MLP). The results show that the performance is poor for time-domain features and good for spectral-domain features and even better for cepstral-domain features. However, the combination of these three features resulted in a higher accuracy and $F_{1}$ score of 98%.
- Published
- 2020
7. Effects of Language Modelling for Sepedi-English Code-Switched Speech in Automatic Speech Recognition System
- Author
-
Mercy Mosibudi Mogale, Madimetja Jonas Manamela, Tumisho Billson Mokgonyane, Moses Sebaka Masekwameng, and Thipe Isaiah Modipa
- Subjects
0209 industrial biotechnology ,Computer science ,Speech recognition ,Languages of Africa ,Speech coding ,02 engineering and technology ,Code (semiotics) ,Data modeling ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Language modelling ,Language model ,Smoothing - Abstract
Speech is the primary means of communication among people. Spoken dialogue system give out some means for people to be able to interact with computer systems. The automatic speech recognition system itself is forms part a of spoken dialogue systems. This type of system did a great job for European languages with more challenges encountered for the recognition of South African languages. In this study, we investigate the appropriate approaches for the development of language models for the recognition of Sepedi-English code-switched speech and their effect in ASR. The SRI Language Modeling (SRILM) toolkit was used to develop the Language Model (LM). The Kaldi toolkit to develop ASR system was chosen which is specifically used for speech recognition. This toolkit was used to evaluate the effects of the smoothing techniques. We have evaluated Four smoothing techniques namely Good-Turing (GT), Witten-Bell (WB), Modified Kneser-Ney (MKN), and Laplace (LP) Smoothing. The Witten-Bell smoothing technique was found to perform better than the other three smoothing techniques for Sepedi-English CS data in Language Modelling and also in our ASR.
- Published
- 2020
8. Automatic Speaker Recognition System based on Optimised Machine Learning Algorithms
- Author
-
Madimetja Jonas Manamela, Thipe Isaiah Modipa, Tumisho Billson Mokgonyane, and Tshephisho Joseph Sefara
- Subjects
Progress in artificial intelligence ,Automatic speaker recognition ,Artificial neural network ,Computer science ,business.industry ,020206 networking & telecommunications ,02 engineering and technology ,Machine learning ,computer.software_genre ,Speaker recognition ,01 natural sciences ,k-nearest neighbors algorithm ,Random forest ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,010301 acoustics ,Classifier (UML) ,computer ,Algorithm - Abstract
Speaker recognition is a technique that automatically identifies a speaker from a recording of their voice. Speaker recognition technologies are taking a new trend due to the progress in artificial intelligence and machine learning and have been widely used in many domains. Continuing research in the field of speaker recognition has now spanned over 50 years. In over half a century, a great deal of progress has been made towards improving the accuracy of the system's decisions, through the use of more successful machine learning algorithms. This paper presents the development of automatic speaker recognition system based on optimised machine learning algorithms. The algorithms are optimised for better and improved performance. Four classifier models, namely, Support Vector Machines, K-Nearest Neighbors, Random Forest, Logistic Regression, and Artificial Neural Networks are trained and compared. The system resulted with Artificial Neural Networks obtaining the state-of-the-art accuracy of 96.03% outperforming KNN, SVM, RF and LR classifiers.
- Published
- 2019
9. HMM-based Speech Synthesis System incorporated with Language Identification for Low-resourced Languages
- Author
-
Madimetja Jonas Manamela, Thipe Isaiah Modipa, Tshephisho Joseph Sefara, and Tumisho Billson Mokgonyane
- Subjects
Language identification ,Computer science ,business.industry ,Mean opinion score ,Languages of Africa ,Foreign language ,Word error rate ,Speech synthesis ,Intelligibility (communication) ,computer.software_genre ,Artificial intelligence ,Hidden Markov model ,business ,computer ,Natural language processing - Abstract
Text-to-speech (TTS) synthesis systems are of benefit towards learning new or foreign languages. These systems are currently available for various major languages but not available for low-resourced languages. Scarcity of these systems may lead to challenges in learning new languages specifically low-resourced languages. Development of language-specific systems like TTS and Language identification (LID) have an important task to address in mitigating the historical linguistic effects of discrimination and domination imposed onto low-resourced indigenous languages. This paper presents the development of a multi-language LID+TTS synthesis system that generate audio of input text using the predicted language in four South African languages, namely: Tshivenda, Sepedi, Xitsonga and IsiNdebele. On the front-end, is the LID module that detects language of the input text before the TTS synthesis module produces output audio. The LID module is trained on a 4 million words dataset resulted with 99% accuracy outperforming the state-of-the-art systems. A robust method for building TTS voices called hidden Markov model method is used to build new voices in the selected languages. The quality of the voices is measured using the mean opinion score and word error rate metrics that resulted with positive results on the understandability, naturalness, pleasantness, intelligibility and overall impression of the system of the newly created TTS voices. The system is available as a website service.
- Published
- 2019
10. The Effects of Data Size on Text-Independent Automatic Speaker Identification System
- Author
-
Thipe Isaiah Modipa, Madimetja Jonas Manamela, Tshephisho Joseph Sefara, and Tumisho Billson Mokgonyane
- Subjects
Progress in artificial intelligence ,Support vector machine ,Computer science ,Speech recognition ,Multilayer perceptron ,Perceptron ,Speaker recognition ,Field (computer science) ,Utterance ,k-nearest neighbors algorithm - Abstract
Speaker recognition is a technique that automatically identifies a speaker from a recording of their speech utterance. Speaker recognition technologies are taking a new direction due to rapid progress in artificial intelligence. Research in the field of speaker recognition has shown fruitful results. There is, however, not much work done for African indigenous languages that have limited speech data resources. This paper presents how data size impacts the accuracy of an automatic speaker recognition system models, focusing on the Sepedi language as it is one of the South African under-resourced language. The speech data used is acquired from the South African Centre for Digital Language Resources. Four machine learning models, namely, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Multilayer Perceptrons (MLP) and Logistic Regression (LR) are trained under four data setting environment. LR performed better than other models with the highest accuracy of 91% while SVM obtained the highest increase of 4% in accuracy as data size increases.
- Published
- 2019
11. Automatic Speaker Recognition System based on Machine Learning Algorithms
- Author
-
Thipe Isaiah Modipa, Tumisho Billson Mokgonyane, Mercy Mosibudi Mogale, Tshephisho Joseph Sefara, Madimetja Jonas Manamela, and Phuti J. Manamela
- Subjects
Computer science ,business.industry ,Perceptron ,Machine learning ,computer.software_genre ,Speaker recognition ,Cross-validation ,Random forest ,Support vector machine ,Multilayer perceptron ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Graphical user interface - Abstract
Speaker recognition is a technique used to automatically recognize a speaker from a recording of their voice or speech utterance. Speaker recognition technology has improved over recent years and has become inexpensive and and reliable method for person identification and verification. Research in the field of speaker recognition has now spanned over five decades and has shown fruitful results, however there is not much work done with regards to South African indigenous languages. This paper presents the development of an automatic speaker recognition system that incorporates classification and recognition of Sepedi home language speakers. Four classifier models, namely, Support Vector Machines, K-Nearest Neighbors, Multilayer Perceptrons (MLP) and Random Forest (RF), are trained using WEKA data mining tool. Auto-WEKA is applied to determine the best classifier model together with its best hyper-parameters. The performance of each model is evaluated in WEKA using 10-fold cross validation. MLP and RF yielded good accuracy surpassing the state-of-the-art with an accuracy of 97% and 99.9% respectively, the RF model is then implemented on a graphical user interface for development testing.
- Published
- 2019
12. The Automatic Recognition of Sepedi Speech Emotions Based on Machine Learning Algorithms
- Author
-
Tshepisho J Sefara, Tumisho Billson Mokgonyane, Madimetja Jonas Manamela, Thipe Isaiah Modipa, and Phuti J. Manamela
- Subjects
Support vector machine ,Sadness ,Statistical classification ,Computer science ,media_common.quotation_subject ,Emotion classification ,Feature extraction ,Happiness ,Speech corpus ,Affective computing ,Algorithm ,media_common - Abstract
Over the past years, speech emotion recognition (SER) studies have been gaining much interest in the fields of affective computing and human-computer interaction (HCI). The idea was to improve the interaction between human beings and machines. In this paper, an SER system that classifies and recognise six basic emotions (anger, sadness, disgust, fear, happiness, and neutral) from speech spoken in Sepedi language (one of South Africa's official languages) is discussed. Speech recordings were collected from the Sepedi language speakers and TV drama broadcast to create emotional speech corpora. 34 speech features were then extracted from the speech corpora, using the pyAudioAnalysis tool, to train and compare different algorithms using 10 folds cross-validation. The experiments were conducted using WEKA data-mining software. The results showed that Auto-WEKA outperforms all the standard algorithms (SVM. KNN and MLP). Recorded speech corpus yielded good recognition accuracy compare to TV broadcast speech corpus.
- Published
- 2018
13. Development of a speech-enabled basic arithmetic m-learning application for foundation phase learners
- Author
-
Madimetja Jonas Manamela, Tumisho Billson Mokgonyane, Thipe Isaiah Modipa, Tshephisho Joseph Sefara, and Phuti J. Manamela
- Subjects
060201 languages & linguistics ,Computer program ,business.industry ,Process (engineering) ,Computer science ,media_common.quotation_subject ,Speech synthesis ,06 humanities and the arts ,computer.software_genre ,Software ,Numeracy ,M-learning ,Reading (process) ,0602 languages and literature ,Arithmetic ,business ,computer ,Spoken language ,media_common - Abstract
In very simple terms, speech synthesis is the process of generating spoken language by machine on the basis of text input, and text-to-speech is a specific type which takes as input raw text and aims to mimic the human process of reading. Computerassisted learning (CAL) can be defined as learning or teaching through the use of computers with packaged knowledge content learning materials. CAL involves a computer program or file developed specifically for educational purposes. Mobile learning or “m-learning” is the ability to obtain or provide educational content on personal pocket devices such as PDAs, smartphones and mobile phones. m-Learning as an educational activity makes sense only when the technology in use facilitates and supports mobility in learning. In this paper, we discuss the development of a mathematical computer-assisted learning mobile application that integrates a text-to-speech synthesis module for South African low-resourced languages, initially targeting the Sepedi language. The system is aimed at assisting mathematically illiterate persons and foundation phase learners to learn and understand the representation and articulation of mathematical expressions incorporating four basic arithmetic operations (addition, subtraction, multiplication, and division). It also incorporates a few numeracy functions. The results obtained from the experiments conducted with the prototype CAL system show that 80% of the participants were impressed by the developed mobile application. There is great need to enhance the development of software applications that support the teaching and learning activities at the foundation phase of education in South Africa.
- Published
- 2017
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.