89 results on '"Tien-Ping Tan"'
Search Results
52. Examining Machine Learning Techniques in Business News Headline Sentiment Analysis
- Author
-
Eng Kee Tan, Hooi Mei Lim, Tien-Ping Tan, and Seong Liang Ooi Lim
- Subjects
business.industry ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Sentiment analysis ,Decision tree ,Headline ,Perceptron ,Machine learning ,computer.software_genre ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,Naive Bayes classifier ,Recurrent neural network ,Classifier (linguistics) ,Artificial intelligence ,InformationSystems_MISCELLANEOUS ,Market sentiment ,business ,computer - Abstract
Sentiment analysis is a natural language processing task that attempts to predict the opinion, feeling or view of a text. The interest in sentiment analysis has been rising due to the availability of a large amount of sentiment corpus and the enormous potential of sentiment analysis applications. This work attempts to evaluate different machine learning techniques in predicting the sentiment of the readers toward business news headlines. News articles report events that have happened in the world and expert opinions. These are factors that will affect market sentiment, and a headline can be considered as a summary of an article in a single sentence. In this study, we constructed a sentiment analysis corpus which consists of business news headlines. We examined two different approaches, namely text classification and recurrent neural network (RNN) in predicting the sentiment of a business news headline. For text classification approach, multi-layer perceptron (MLP) classifier, multinomial naive Bayes, complement naive Bayes and decision trees were experimented. On the other hand, for the RNN approach, we evaluated the typical RNN architecture and the encoder-decoder architecture in predicting the sentiment.
- Published
- 2020
53. Sliding Window and Parallel LSTM with Attention and CNN for Sentence Alignment on Low-Resource Languages.
- Author
-
Tien-Ping Tan, Chai Kim Lim, and Rahman, Wan Rose Eliza Abdul
- Subjects
MACHINE translating ,FEEDFORWARD neural networks ,CONVOLUTIONAL neural networks ,ARTIFICIAL neural networks ,MACHINE learning - Abstract
A parallel text corpus is an important resource for building a machine translation (MT) system. Existing resources such as translated documents, bilingual dictionaries, and translated subtitles are excellent resources for constructing parallel text corpus. A sentence alignment algorithm automatically aligns source sentences and target sentences because manual sentence alignment is resource-intensive. Over the years, sentence alignment approaches have improved from sentence length heuristics to statistical lexical models to deep neural networks. Solving the alignment problem as a classification problem is interesting as classification is the core of machine learning. This paper proposes a parallel long-short-term memory with attention and convolutional neural network (parallel LSTM+Attention+CNN) for classifying two sentences as parallel or non-parallel sentences. A sliding window approach is also proposed with the classifier to align sentences in the source and target languages. The proposed approach was compared with three classifiers, namely the feedforward neural network, CNN, and bi-directional LSTM. It is also compared with the BleuAlign sentence alignment system. The classification accuracy of these models was evaluated using Malay-English parallel text corpus and UN French-English parallel text corpus. The Malay-English sentence alignment performance was then evaluated using research documents and the very challenging Classical Malay-English document. The proposed classifier obtained more than 80% accuracy in categorizing parallel/non-parallel sentences with a model built using only five thousand training parallel sentences. It has a higher sentence alignment accuracy than other baseline systems. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
54. Mining opinionated product features using WordNet lexicographer files
- Author
-
Keng Hoon Gan, Saif A. Ahmad Alrababah, and Tien-Ping Tan
- Subjects
Service (business) ,Information retrieval ,Computer science ,Sentiment analysis ,Feature extraction ,WordNet ,Customer perspective ,02 engineering and technology ,Library and Information Sciences ,Domain (software engineering) ,Lexicography ,Product (business) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Information Systems - Abstract
Online customer reviews are an important assessment tool for businesses as they contain feedback that is valuable from the customer perspective. These reviews provide a significant basis on which potential customers can select the product that best meets their preferences. In online reviews, customers describe positive or negative experiences with a product or service or any part of it (i.e. features). Consumers frequently experience difficulty finding the desired product for comparison because of the massive number of online reviews. The automatic extraction of important product features is necessary to support customers in search of relevant product features. These features are the criteria that make it possible for customers to characterise different types of products. This article proposes a domain independent approach for identifying explicit opinionated features and attributes that are strongly related to a specific domain product using lexicographer files in WordNet. In our approach, N_gram analysis and the SentiStrength opinion lexicon have been employed to support the extraction of opinionated features. The empirical evaluation of the proposed system using online reviews of two popular datasets of supervised and unsupervised systems showed that our approach achieved competitive results for feature extraction from product reviews.
- Published
- 2016
55. Using Dictionary and Lemmatizer to Improve Low Resource English-Malay Statistical Machine Translation System
- Author
-
Tien-Ping Tan, Yin-Lai Yeong, and Siti Khaotijah Mohammad
- Subjects
Machine translation ,Computer science ,Speech recognition ,parallel corpus ,02 engineering and technology ,computer.software_genre ,Machine translation software usability ,lemmatization ,Domain (software engineering) ,Example-based machine translation ,English-Malay ,Rule-based machine translation ,0202 electrical engineering, electronic engineering, information engineering ,Evaluation of machine translation ,Statistical machine tranlstion ,General Environmental Science ,BLEU ,Malay ,business.industry ,Lemmatisation ,Bilingual dictionary ,020206 networking & telecommunications ,language.human_language ,Machine-readable dictionary ,language ,General Earth and Planetary Sciences ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,dictionary - Abstract
Statistical Machine Translation (SMT) is one of the most popular methods for machine translation. In this work, we carried out English-Malay SMT by acquiring an English-Malay parallel corpus in computer science domain. On the other hand, the training parallel corpus is from a general domain. Thus, there will be a lot of out of vocabulary during translation. We attempt to improve the English-Malay SMT in computer science domain using a dictionary and an English lemmatizer. Our study shows that a combination of approach using bilingual dictionary and English lemmatization improves the BLEU score for English to Malay translation from 12.90 to 15.41.
- Published
- 2016
- Full Text
- View/download PDF
56. Evaluating Code-Switched Malay-English Speech Using Time Delay Neural Networks
- Author
-
Tien-Ping Tan and Anand Singh
- Subjects
Artificial neural network ,Computer science ,Speech recognition ,0202 electrical engineering, electronic engineering, information engineering ,language ,Code (cryptography) ,020206 networking & telecommunications ,020201 artificial intelligence & image processing ,02 engineering and technology ,language.human_language ,Malay - Published
- 2018
57. Mandarin–English code-switching speech corpus in South-East Asia: SEAME
- Author
-
Dau-Cheng Lyu, Haizhou Li, Tien-Ping Tan, and Eng Siong Chng
- Subjects
Text corpus ,Linguistics and Language ,Phrase ,Computer science ,Language change ,business.industry ,Speech corpus ,Library and Information Sciences ,Code-switching ,computer.software_genre ,Mandarin Chinese ,Language and Linguistics ,language.human_language ,Linguistics ,Education ,Corpus linguistics ,language ,Artificial intelligence ,Computational linguistics ,Transcription (software) ,business ,computer ,Natural language processing - Abstract
This paper introduces the South East Asia Mandarin---English corpus, a 63-h spontaneous Mandarin---English code-switching transcribed speech corpus suitable for LVCSR and language change detection/identification research. The corpus is recorded under unscripted interview and conversational settings from 157 Singaporean and Malaysian speakers who spoke a mixture of Mandarin and English within a single sentence. About 82 % of the transcribed utterances are intra-sentential code-switching speech and the corpus will be release by LDC in 2015. This paper presents an analysis of the code-switching statistics of the corpus, such as the duration of monolingual segments and the frequency of language turns in code-switch utterances. We also summarize the development effort, details such as the processing time for transcription, validation and language boundary labelling. Lastly, we present textual analyses of code-switch segments examining the word length of monolingual segments in code-switch utterances and the most common single word and two-word phrase of such segments.
- Published
- 2015
58. Malay speech corpus of telecommunication call center preparation for ASR
- Author
-
N. Abu Haris, M. Draman, D. C. Tee, Tien-Ping Tan, S. Saidon, Mohd Izhan Mohd Yusoff, Z. Lambak, M.R. Yahya, and Safwati Ibrahim
- Subjects
Matching (statistics) ,Computer science ,business.industry ,020209 energy ,Speech recognition ,Acoustic model ,Speech corpus ,02 engineering and technology ,computer.software_genre ,language.human_language ,Set (abstract data type) ,0202 electrical engineering, electronic engineering, information engineering ,language ,Code (cryptography) ,Speech analytics ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing ,Malay - Abstract
This paper presents the methodology uses in preparing a conversation speech corpus for acoustic model training of Malay automatic speech recognition (ASR) in telco call center. Data preparation is significant and should be done properly in order to build robust model for an ASR system. We described the issues during filtering process and the list of sensitive data to be removed to avoid any personal information being leaked out to third party. After that, we manually transcribed the filtered data based on a set of transcribing rules specifically designed to suit with Malay ASR engine. Finally, we conducted analysis based on the 5-hours transcribed data to obtain N-gram models and the frequency of word occurrence for our call center sample voice data which can help us to develop symptom-cause code matching application in the coming future.
- Published
- 2017
59. Comparative analysis of MCDM methods for product aspect ranking: TOPSIS and VIKOR
- Author
-
Keng Hoon Gan, Saif A. Ahmad Alrababah, and Tien-Ping Tan
- Subjects
Engineering ,Process management ,Process (engineering) ,business.industry ,Sentiment analysis ,TOPSIS ,Multiple-criteria decision analysis ,computer.software_genre ,Identification (information) ,Ranking ,Product (category theory) ,Data mining ,Decision-making ,business ,computer - Abstract
The extracted product aspects (like “battery life”, “zoom”) from online customer reviews are dissimilar in their significances, some of these aspects have a great influence on the potential customer's decision likewise on the businesses' strategies for product enhancements. Supporting the probable customers with a list of the most representative product aspects will assist their purchasing decision and facilitate the comparative process among the presented products. For the firms, identifying critical product aspects creates a new perspective of product manufacturing and marketing strategies to be competitive and innovative. However the manual identification of the most representative product aspects from the huge amounts of the extracted product aspects in online reviews is a tedious and time-consuming task. Thus, ranking the extracted aspects becomes a necessity to identify the important product aspects mentioned in the customer reviews. The purpose of this study is to formulate the product aspect ranking problem as a decision making process using Multi-Criteria Decision Making (MCDM). In response, a comparative analysis between two different MCDM ranking approaches, namely; TOPSIS and VIKOR has been conducted to investigate their performances in prioritizing the most important product aspects in customer reviews. The experimental results on different product reviews demonstrate the effectiveness of these two methods in prioritizing the genuine product aspects in customer feedback.
- Published
- 2017
60. Product aspect ranking using sentiment analysis and TOPSIS
- Author
-
Keng Hoon Gan, Saif A. Ahmad Alrababah, and Tien-Ping Tan
- Subjects
Engineering ,business.industry ,Sentiment analysis ,WordNet ,TOPSIS ,Multiple-criteria decision analysis ,computer.software_genre ,Domain (software engineering) ,Ranking ,Similarity (psychology) ,Product (category theory) ,Data mining ,business ,computer - Abstract
The explosive growth of customer reviews on e-commerce websites has inspired many researchers to explore the problem of identifying the product aspects that have been mentioned in online reviews. Most of the conducted research studies extract the product aspects based on three main criteria: 1) extracting the aspects that have been commented repeatedly in online reviews, 2) determining of important aspects as those that have been described positively and negatively by many customers in their reviews, and 3) the association between the domain product aspect (like ‘camera’) and the other aspects contained in a specific product review. However, a lacuna remains as how to efficiently investigate online reviews to identify the most important product aspects by considering all the three criteria jointly. In response, this paper proposes a novel product aspect ranking framework using sentiment analysis and TOPSIS (Technique for Order Performance by Similarity to Ideal Solution). The proposed work is decomposed into two stages: aspect extraction and aspect ranking. In aspect extraction stage, sentiment analysis is used to identify the product aspects from customer reviews in an unsupervised manner based on the three criteria of extraction. In the second stage, the extracted product aspects from the previous criteria have been involved simultaneously in TOPSIS to produce a ranked list of the most representative product aspects. The empirical evaluation of the proposed work using online reviews of four products shows its effectiveness in finding representative aspects.
- Published
- 2016
61. Hybrid Machine Translation with Multi-Source Encoder-Decoder Long Short-Term Memory in English-Malay Translation
- Author
-
Siti Khaotijah Mohammad, Tien-Ping Tan, Yin-Lai Yeong, and Keng Hoon Gan
- Subjects
010302 applied physics ,Text corpus ,General Computer Science ,Machine translation ,Artificial neural network ,Computer science ,business.industry ,General Engineering ,02 engineering and technology ,Hybrid machine translation ,021001 nanoscience & nanotechnology ,computer.software_genre ,Translation (geometry) ,01 natural sciences ,Domain (software engineering) ,0103 physical sciences ,Artificial intelligence ,0210 nano-technology ,General Agricultural and Biological Sciences ,business ,computer ,Encoder ,Natural language processing ,Sentence - Abstract
Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) are the state-of-the-art approaches in machine translation (MT). The translation produced by a SMT is based on the statistical analysis of text corpora, while NMT uses deep neural network to model and to generate a translation. SMT and NMT have their strength and weaknesses. SMT may produce better translation with a small parallel text corpus compared to NMT. Nevertheless, when the amount of parallel text available is large, the quality of the translation produced by NMT is often higher than SMT. Besides that, study also shown that the translation produced by SMT is better than NMT in cases where there is a domain mismatch between training and testing. SMT also has an advantage on long sentences. In addition, when a translation produced by an NMT is wrong, it is very difficult to find the error. In this paper, we investigate a hybrid approach that combine SMT and NMT to perform English to Malay translation. The motivation of using a hybrid machine translation is to combine the strength of both approaches to produce a more accurate translation. Our approach uses the multi-source encoder-decoder long short-term memory (LSTM) architecture. The architecture uses two encoders, one to embed the sentence to be translated, and another encoder to embed the initial translation produced by SMT. The translation from the SMT can be viewed as a “suggestion translation” to the neural MT. Our experiments show that the hybrid MT increases the BLEU scores of our best baseline machine translation in computer science domain and news domain from 21.21 and 48.35 to 35.97 and 61.81 respectively.
- Published
- 2018
62. A FAST ADAPTATION TECHNIQUE FOR BUILDING DIALECTAL MALAY SPEECH SYNTHESIS ACOUSTIC MODEL
- Author
-
Yen-Min Jasmina Khaw and Tien-Ping Tan
- Subjects
Engineering ,business.industry ,Speech recognition ,Speech quality ,General Engineering ,Acoustic model ,Speech synthesis ,Speech corpus ,computer.software_genre ,language.human_language ,Term (time) ,language ,Hidden Markov model ,Adaptation (computer science) ,business ,computer ,Malay - Abstract
This paper presents a fast adaptation technique to build a hidden Markov model (HMM) based dialectal speech synthesis acoustic model. Standard Malay is used as a source language whereas Kelantanese Malay is chosen to be target language in this study. Kelantan dialect is a Malay dialect from the northeast of Peninsular Malaysia. One of the most important steps and time consuming in building a HMM acoustic model is the alignment of speech sound. A good alignment will produce a clear and natural synthesize speech. The importance of this study is to propose a quick approach for aligning and building a good dialectal speech synthesis acoustic model by using a different source acoustic model. There are two proposed adaptation approaches in this study to synthesize dialectal Malay sentences using different amount of target speech and a source acoustic model to build the target acoustic model of speech synthesis system. From the results, we found out that the dialectal speech synthesis system built with adaptation approaches are much better in term of speech quality compared to the one without applying adaptation approach.
- Published
- 2015
63. A SYSTEM COMBINATION FOR MALAY BROADCAST NEWS TRANSCRIPTION
- Author
-
Zainab A. Khalaf, Li-Pei Wong, Basem H. A. Ahmed, and Tien-Ping Tan
- Subjects
Engineering ,System combination ,business.industry ,media_common.quotation_subject ,Speech recognition ,General Engineering ,Word error rate ,Domain model ,language.human_language ,Transcription (linguistics) ,Voting ,language ,Language model ,business ,Decoding methods ,Malay ,media_common - Abstract
In this paper, we propose a post decoding system combination approach for automatic transcribing Malay broadcast news. This approach combines the hypotheses produced by parallel automatic speech recognition (ASR) systems. Each ASR system uses different language models, one which is generic domain model and another is domain specific model. The main idea is to take advantage of different ASR knowledge to improve ASR decoding result. It uses the language score and time information to produce a 1-best lattice, and then rescore the 1-best lattice to get the most likely word sequence as the final output. The proposed approach was compared with conventional combination approach, the recognizer output voting error reduction (ROVER). Our proposed approach improved the word error rate (WER) from 33.9% to 30.6% with an average relative WER improvement of 9.74%, and it is better than the conventional ROVER approach.
- Published
- 2015
64. Merging of Native and Non-native Speech for Low-resource Accented ASR
- Author
-
Laurent Besacier, Sarah Samson Juan, Tien-Ping Tan, Benjamin Lecouteux, Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Laboratoire d'Informatique de Grenoble (LIG), and Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)
- Subjects
Computer science ,Low resource ,Speech recognition ,SGMM ,02 engineering and technology ,computer.software_genre ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,modelling ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Cross-lingual acoustic modelling ,multi-accent ,0202 electrical engineering, electronic engineering, information engineering ,Artificial neural network ,business.industry ,automatic speech recognition ,accent-specific DNN ,020206 networking & telecommunications ,Mixture model ,Weighting ,non-native speech ,cross-lingual acoustic ,Multi-accent SGMM ,Deep neural networks ,Artificial intelligence ,0305 other medical science ,business ,computer ,Merge (version control) ,Subspace topology ,Natural language processing ,Speaker adaptation ,low-resource system - Abstract
International audience; This paper presents our recent study on low-resource automatic speech recognition (ASR) system with accented speech. We propose multi-accent Subspace Gaussian Mixture Models (SGMM) and accent-specific Deep Neural Networks (DNN) for improving non-native ASR performance. In the SGMM framework, we present an original language weighting strategy to merge the globally shared parameters of two models based on native and non-native speech respectively. In the DNN framework, a native deep neural net is fine-tuned to non-native speech. Over the non-native baseline, we achieved relative improvement of 15 % for multi-accent SGMM and 34 % for accent-specific DNN with speaker adaptation.
- Published
- 2015
65. A grapheme and phone rescoring combination system for Malay broadcast news recognition
- Author
-
Li-Pei Wong, Tien-Ping Tan, and Zainab A. Khalaf
- Subjects
Decision support system ,Computer science ,business.industry ,Speech recognition ,Multiple hypotheses ,Grapheme ,Word error rate ,computer.software_genre ,language.human_language ,Phone ,language ,Artificial intelligence ,Transcription (software) ,Combination system ,business ,computer ,Natural language processing ,Malay - Abstract
The main motivation of this paper is to improve the automatic speech recognition (ASR) hypothesis in the Malay language. Manual news transcription is too expensive and takes a long time. Hence, without an ASR system, access to audio archives and searches within them would be restricted to the limited number of textual documents that have been manually transcribed by humans or indexed with keywords. Multiple hypotheses are useful because the single best recognition output still has numerous errors, even for state-of-the-art systems. In this paper, we propose an approach to reduce the word error rate (WER) in an ASR hypothesis. This approach is known as the three-pass combination method using parallel ASR systems. The three-pass combination system based on grapheme rescoring and phone rescoring re-evaluates all of the hypotheses produced by the ASR systems to produce a more accurate hypothesis. To evaluate the performance of the proposed approach, Malay broadcast news contains speech from newscaster, reporter and interviewers in noisy environments recorded from Malaysia local news channels are employed. This approach reduced the WER by 4.4% from 34.5% to 30.1%. The performance of the proposed approach was compared with six approaches that are frequently used for ASR rescoring and combination.
- Published
- 2015
66. A bee colony optimization with automated parameter tuning for sequential ordering problem
- Author
-
Tien-Ping Tan, Moon Hong Wun, Ahamad Tajudin Khader, and Li-Pei WongT
- Subjects
Mathematical optimization ,Optimization problem ,business.industry ,Computer science ,Hamiltonian path ,symbols.namesake ,Production planning ,Genetic algorithm ,Vehicle routing problem ,symbols ,Benchmark (computing) ,Local search (optimization) ,business ,Metaheuristic - Abstract
Sequential Ordering Problem (SOP) is a type of Combinatorial Optimization Problem (COP). Solving SOP requires finding a feasible Hamiltonian path with minimum cost without violating the precedence constraints. SOP models myriad of real world industrial applications, particularly in the fields of transportation, vehicle routing and production planning. The main objective of this research is to propose an idea of solving SOP using the Bee Colony Optimization (BCO) algorithm. The underlying mechanism of the BCO algorithm is the bee foraging behavior in a typical bee colony. Throughout the research, the SOP benchmark problems from TSPLIB will be chosen as the testbed to evaluate the performance of the BCO algorithm in terms of the solution cost and the computational time needed to obtain an optimum solution. Moreover, efforts are taken to investigate the feasibility of using the Genetic Algorithm to optimally tune the parameters equipped in the existing BCO model. On average, over the selected 40 benchmark problems, the proposed method has successfully solved 9 (22.5%) benchmark problems to optimum, 17 (42.5%) benchmark problems ≤ 1% of deviation from the known optimum, and 37 (85%) benchmark problems ≤ 5% of deviation from the known optimum. Overall, the 40 benchmark problems are solved to 2.19% from the known optimum on average.
- Published
- 2014
67. Using closely-related language to build an ASR for a very under-resourced language: Iban
- Author
-
Laurent Besacier, Benjamin Lecouteux, Tien-Ping Tan, Sarah Samson Juan, Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Laboratoire d'Informatique de Grenoble (LIG), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF), and Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)
- Subjects
Subspace Gaussian Mixture Model ,acoustic modelling ,Computer science ,business.industry ,Speech recognition ,speech recognition ,Pronunciation ,Mixture model ,computer.software_genre ,01 natural sciences ,bootstrapping g2p ,language.human_language ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,030507 speech-language pathology & audiology ,03 medical and health sciences ,subspace Gaussian mixture model ,0103 physical sciences ,language ,Artificial intelligence ,0305 other medical science ,business ,010301 acoustics ,computer ,Natural language processing ,Malay - Abstract
International audience; This paper describes our work on automatic speech recognition system (ASR) for an under-resourced language, Iban, a language that is mainly spoken in Sarawak, Malaysia. We collected 8 hours of data to begin this study due to no resources for ASR exist. We employed bootstrapping techniques involving a closely-related language for rapidly building and improve an Iban system. First, we used already available data from Malay, a local dominant language in Malaysia, to bootstrap grapheme-to-phoneme system (G2P) for the target language. We also built various types of G2Ps, including a grapheme-based and an English G2P, to produce different versions of dictionaries. We tested all of the dictionaries on the Iban ASR to provide us the best version. Second, we improved the baseline GMM system word error rate (WER) result by utilizing subspace Gaussian mixture models (SGMM). To test, we set two levels of data sparseness on Iban data; 7 hours and 1 hour transcribed speech. We investigated cross-lingual SGMM where the shared parameters were obtained either in monolingual or multilingual fashion and then applied to the target language for training. Experiments on out-of-language data, English and Malay, as source languages result in lower WERs when Iban data is very limited.
- Published
- 2014
68. Solving Asymmetric Traveling Salesman Problems using a generic Bee Colony Optimization framework with insertion local search
- Author
-
Ahamad Tajudin Khader, Mohammed Azmi Al-Betar, Li-Pei Wong, and Tien-Ping Tan
- Subjects
Set (abstract data type) ,Mathematical optimization ,Computer science ,business.industry ,Benchmark (computing) ,Combinatorial optimization problem ,Initialization ,Local search (optimization) ,Pruning (decision trees) ,business ,Metaheuristic ,Travelling salesman problem - Abstract
The Asymmetric Traveling Salesman Problem (ATSP) is one of the Combinatorial Optimization Problems that has been intensively studied in computer science and operations research. Solving ATSP is NP-hard and it is harder if the problem is with large scale data. This paper intends to address the ATSP using an hybrid approach which integrates the generic Bee Colony Optimization (BCO) framework and an insertion-based local search procedure. The generic BCO framework computationally realizes the bee foraging behaviour in a typical bee colony where bees travel across different locations to discover new food sources and perform waggle dances to recruit more bees towards newly discovered food sources. Besides the bee foraging behaviour, the generic BCO framework is enriched with an initialization engine, a fragmented solution construction mechanism, a local search and a pruning strategy. When the proposed algorithm is tested on a set of 27 ATSP benchmark problem instances, 37% of the benchmark instances are constantly solved to optimum. 89% of the problem instances are optimally solved for at least once. On average, the proposed BCO algorithm is able to obtain 0.140% deviation from known optimum for all the 27 instances. In terms of the average computational time, the proposed algorithm requires 48.955s (< 1 minutes) to obtain the best tour length for each instance.
- Published
- 2013
69. A MALAY TEXT CORPUS ANALYSIS FOR SENTENCE COMPRESSION USING PATTERN-GROWTH METHOD
- Author
-
Alias, Suraya, primary, Mohammad, Siti Khaotijah, additional, Keng Hoon, Gan, additional, and Tien Ping, Tan, additional
- Published
- 2016
- Full Text
- View/download PDF
70. Broadcast News Story Clustering via Term and Sentence Matching
- Author
-
Tien-Ping Tan and Foong Kuin Yow
- Subjects
Fuzzy clustering ,Data stream clustering ,Computer science ,Speech recognition ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Correlation clustering ,Search engine indexing ,Vector space model ,Canopy clustering algorithm ,Cluster analysis ,Sentence - Abstract
In this paper, we propose a rule-based approach that uses the term and sentence matching criteria for clustering Malay broadcast news to different stories. The proposed clustering method does not require users to predefined number of clusters. The three main stages of the clustering are sentences segmentation, indexing, and also term and sentence matching clustering. The sentences in the transcription will be segmented before indexing. Indexing involves tokenization, stop word removal, stemming, term selection and term representation. A vector space model (VSM) is used to represent the terms and sentences in the form of vectors. The sentences will then be grouped into clusters by using term and sentence matching thresholds. The proposed approach shows a significantly better accuracy than the baseline approaches.
- Published
- 2013
71. Improving the Accuracy of Large Vocabulary Continuous Speech Recognizer Using Dependency Parse Tree and Chomsky Hierarchy in Lattice Rescoring
- Author
-
Enya Kong Tang, Kai Sze Hong, and Tien-Ping Tan
- Subjects
Vocabulary ,Dependency (UML) ,Chomsky hierarchy ,Computer science ,business.industry ,media_common.quotation_subject ,Speech recognition ,Parse tree ,computer.software_genre ,Rule-based machine translation ,Regular language ,Language model ,Artificial intelligence ,business ,computer ,Natural language processing ,Natural language ,media_common - Abstract
This research work describes our approaches in using dependency parse tree information to derive useful hidden word statistics to improve the baseline system of Malay large vocabulary automatic speech recognition system. The traditional approaches to train language model are mainly based on Chomsky hierarchy type 3 that approximates natural language as regular language. This approach ignores the characteristics of natural language. Our work attempted to overcome these limitations by extending the approach to consider Chomsky hierarchy type 1 and type 2. We extracted the dependency tree based lexical information and incorporate the information into the language model. The second pass lattice rescoring was performed to produce better hypotheses for Malay large vocabulary continuous speech recognition system. The absolute WER reduction was 2.2% and 3.8% for MASS and MASS-NEWS Corpus, respectively.
- Published
- 2013
72. MBNSeg: A Clustering System for Segmenting Malay Spoken Broadcast News
- Author
-
Tien Ping Tan and Zainab A. Khalaf Aleqili
- Subjects
Data source ,Computer Networks and Communications ,Latent semantic analysis ,Computer science ,business.industry ,Speech recognition ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,computer.software_genre ,language.human_language ,Market segmentation ,language ,Unsupervised learning ,Artificial intelligence ,Transcription (software) ,Document retrieval ,Cluster analysis ,business ,computer ,Natural language processing ,Malay - Abstract
This paper describes a spoken document retrieval system for processing Malay spoken broadcast news that uses an approach to enhance retrieval performance. An automatic speech recognition (ASR) system was adapted to reduce the impact of ASR transcription errors on retrieval performance. The performance of unsupervised learning was evaluated using Malay broadcast news as the data source. A latent semantic analysis was used to reduce the impact of synonymous words and to identify the story boundaries within the news segments. Among other things, the current system proved to be a powerful instrument to identify news story boundaries automatically.
- Published
- 2013
73. Randomized psychoacoustic model for mobile, panoramic, heritage-viewing applications
- Author
-
Kian Lam Tan, Chen Kim Lim, Tien-Ping Tan, and Abdullah Zawawi Talib
- Subjects
Multimedia ,Panorama ,Computer science ,ComputingMethodologies_MISCELLANEOUS ,Town hall ,Virtual heritage ,Psychoacoustics ,Interaction design ,computer.software_genre ,computer ,Digital audio - Abstract
In preservation of heritage, lately there has been a phenomenal growth in digital audio preservation. A well-preserved virtual heritage does not include only the visual images of the heritage sites, but should ideally include also the audio surroundings of the heritage site that are incorporated into the virtual heritage applications as a whole. An acoustical heritage comprises the audio elements which surround the heritage sites. Psychoacoustic heritage is the extra significant psychological audio characteristic that adds to the surroundings of the heritage sites.In this paper, a randomized psychoacoustic model is proposed in 360° panoramic view on mobile platform. User interaction using multi-gesture tap is vital in delivering the psychoacoustic effects to the user. The contribution and the novelty of this research is in proposing and deploying a new psychoacoustic model mainly for the environment of 360° panoramic view in virtual heritage. In the proposed model, audio interaction design includes the changes in acoustical characteristics such as amplitude, frequency as well as the velocity within the various balanced panorama's azimuthal zones. Simulations have been carried out on a model of the Town Hall, one of the UNESCO heritage sites in George Town, Penang, Malaysia. The experimental data sets are also collected and have been found to produce promising psychoacoustic effects in panoramic view.
- Published
- 2012
74. Collection and annotation of Malay conversational speech corpus
- Author
-
Eng Siong Chng, Haizhou Li, Tze Yuang Chong, Xiong Xiao, Tien-Ping Tan, School of Computer Engineering, and International Conference on Speech Database and Assessments (2012 : Macau)
- Subjects
Conversational speech ,business.industry ,Computer science ,media_common.quotation_subject ,Speech technology ,Speech corpus ,computer.software_genre ,VoxForge ,Linguistics ,language.human_language ,Annotation ,language ,Conversation ,Artificial intelligence ,Transcription (software) ,business ,computer ,Natural language processing ,Malay ,media_common - Abstract
We report the development of a Malay conversational speech corpus as part of our research in spontaneous conversational speech LVCSR. This corpus development effort is the collaboration between NTU and USM. The goal is to collect, transcribe, and annotate 50 hours of conversational Malay speech. The conversation is recorded from both close-talk and telephone channels, and both speakers' utterances are kept in separate tracks. Besides the word transcription, we also annotate linguistics phenomena such as fillers and disfluencies. To date, 20 hours have been recorded, transcribed and analyzed. The details of our analysis will be presented in this report.
- Published
- 2012
75. A Malay Dialect Translation and Synthesis System: Proposal and Preliminary System
- Author
-
Yen-Min Jasmina Khaw, Sang-Seong Goh, and Tien-Ping Tan
- Subjects
Vocabulary ,business.industry ,Computer science ,media_common.quotation_subject ,Context (language use) ,Speech synthesis ,Pronunciation ,computer.software_genre ,language.human_language ,Linguistics ,language ,Artificial intelligence ,Official language ,business ,computer ,Utterance ,Sentence ,Natural language processing ,Malay ,media_common - Abstract
Malay is a language from the Austronesian family. Malay is the official language in Malaysia, Indonesia, Singapore, and Brunei. However, Malay spoken in different countries, and even within a country itself might vary in terms of pronunciation and vocabulary from one place to another. The Malay dialects in Malaysia can be grouped according to the states of the country. In this paper, we propose the architecture of a Malay dialect translation and synthesis system, that given a sentence in standard Malay, it translates and synthesizes an utterance in the dialect requested. The system consists of 3 modules, dialect translation system, dialect G2P system, and speech synthesis system. The outcome from this study is two folds. From linguistic viewpoint, it will help us understanding and appreciating the interesting differences in the Malay dialect in Malaysia, which is important to help preserve the dialect and culture in it. Secondly, the proposed system will be useful for people who like to learn a particular dialect or it can be used in places that require this facility. At this stage, we have completed the standard Malay system, and this paper presents our work so far.
- Published
- 2012
76. Analysis of Malay Speech Recognition for Different Speaker Origins
- Author
-
Tien-Ping Tan, Laurent Besacier, and Sarah Samson Juan
- Subjects
business.industry ,Computer science ,Speech recognition ,Acoustic model ,Context (language use) ,computer.software_genre ,Linear discriminant analysis ,language.human_language ,language ,Trigram ,Artificial intelligence ,Language model ,Hidden Markov model ,business ,computer ,Accent (sociolinguistics) ,Natural language processing ,Malay - Abstract
This paper explores speech recognition performance for Malay language with multi accents from speakers of different origins or ethnicities. Accented speech imposes accuracy problem in automatic speech recognition systems. This frequently occurs to non-native speakers of a language due to insufficiency of the non-natives data in the recognizers. In this study, we investigate the mentioned problem by building a Malay model in our recognizer and test its performance for speakers of various ethnicities. Our Malay corpora consist of read speeches and texts that are collected from local newspapers in Malaysia. Speakers who contributed the speeches are of different ethnic backgrounds. We employ context dependent models by applying linear discriminant analysis for our acoustic model and a trigram based language model. Our experiments show improved results when linear discriminant analysis technique was employed in our model while our recognizer performed worst for speakers with accent that are not available in the training data.
- Published
- 2012
77. Pronunciation Modeling for Malaysian English
- Author
-
Tien-Ping Tan and Yen Min Khaw
- Subjects
Computer science ,business.industry ,Generalization ,First language ,Speech recognition ,Word error rate ,Speech synthesis ,Non-native pronunciations of English ,Pronunciation ,computer.software_genre ,language.human_language ,language ,Malaysian English ,Artificial intelligence ,Syllable ,business ,computer ,Natural language processing - Abstract
In this paper, we proposed an approach to model the pronunciation of Malaysian English for automatic speech recognition. The proposed method consists of two phases: phones adaptation and pronunciation generalization. In the first phase of phones adaptation, we identify the English phonemes used by Malaysian speakers by carrying out perception test, and then the mismatch due to the influenced of the speaker's mother tongue is removed. The second phase of the proposed approach is pronunciation generalization. In this phase, the hypothesis is that non-native speakers generalized the pronunciation of syllable in different contexts and words by applying the same pronunciations. The proposed approach has improved the performance of the automatic speech recognition. The results shown that the proposed approach reduces the word error rate (WER) from 34.6% to 33.9%.
- Published
- 2012
78. BASRAH: Arabic Verses Meters Identification System
- Author
-
Tien-Ping Tan, Maytham Alabbas, and Zainab A. Khalaf
- Subjects
Arabic ,Computer science ,business.industry ,language ,Artificial intelligence ,computer.software_genre ,business ,computer ,language.human_language ,Natural language processing ,Identification system - Abstract
In this paper, we present BASRAH, a system that automatically identifies the meter of Arabic verse, which is an operation that requires a certain level of human expertise. BASRAH uses the numerical prosody method, which depends on verse coding that is derived from the general concept of al-Khalil's feet through using the two primary units (cord=2 and peg=3). BASRAH has proved to be an efficient tool to help inexperienced users to determine the meters of Arabic verses when we tested it on thousands of old and modern Arabic verses.
- Published
- 2011
79. Applying Grapheme, Word, and Syllable Information for Language Identification in Code Switching Sentences
- Author
-
Tien-Ping Tan and Yin-Lai Yeong
- Subjects
Vocabulary ,Language identification ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Word processing ,Grapheme ,computer.software_genre ,n-gram ,Artificial intelligence ,Syllable ,business ,computer ,Sentence ,Natural language processing ,Word (computer architecture) ,media_common - Abstract
In this paper, we propose an automatic language identification approach for code switching sentences by using the morphological structures and sequence of the syllable. The approach was tested on Malay-English code switching sentences. The proposed language identification approach achieves 90.75% in term of accuracy on the vocabularies. Our approach was further improved by combining the knowledge from other level in the sentence: word and alphabet. The additional information further improves the accuracy of our language identification method to 96.36%.
- Published
- 2011
80. Recherche par le contenu dans des documents audiovisuels multilingues
- Author
-
Georges Quénot, Tien-Ping Tan, Viet Bac Le, Philippe Mulhem, Laurent Besacier, Stéphane Ayache, Modélisation et Recherche d’Information Multimédia [Grenoble] (MRIM), Laboratoire d'Informatique de Grenoble (LIG), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF), Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP), Laboratoire d'informatique Fondamentale de Marseille - UMR 6166 (LIF), Université de la Méditerranée - Aix-Marseille 2-Université de Provence - Aix-Marseille 1-Centre National de la Recherche Scientifique (CNRS), Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
060201 languages & linguistics ,alphabet phonétique international ,06 humanities and the arts ,02 engineering and technology ,Library and Information Sciences ,recherche audio ,programmation dynamique ,multilingue ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,0602 languages and literature ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,star challenge ,ComputingMilieux_MISCELLANEOUS - Abstract
RÉSUMÉ. Nous présentons dans cet article une approche basée sur l'utilisation de l'Alpha- bet Phonétique International (API) pour l'indexation et la recherche par le contenu de docu- ments audiovisuels multilingues. L'approche fonctionne même si les documents contiennent des langues inconnues. Elle a été validée dans le cadre de la compétition u Star Challenge » sur les moteurs de recherche organisée par l'Agence A*STAR de Singapour. Notre approche comprend la construction d'un modèle acoustique multilingue basé sur l'API et une méthode fondée sur la programmation dynamique pour la recherche de segments de documents par u détection de chaînes API ». La programmation dynamique permet de repérer la chaîne de la requête dans la chaîne du document, même avec un taux d'erreur de transcription au niveau phonétique signifi- catif. Les méthodes que nous avons développées nous ont classés premiers et troisièmes sur les tâches de recherche monolingues (anglais), cinquièmes sur la tâche de recherche multilingue et premiers sur la tâche de recherche multimodale (audio et image).
- Published
- 2010
81. MASS: A Malay language LVCSR corpus resource
- Author
-
Eng Siong Chng, Haizhou Li, Xiong Xiao, Tien-Ping Tan, and Enya Kong Tang
- Subjects
Text corpus ,Vocabulary ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Speech corpus ,Rule-based system ,Pronunciation ,computer.software_genre ,language.human_language ,Web page ,language ,Artificial intelligence ,business ,Accent (sociolinguistics) ,computer ,Natural language processing ,Malay ,media_common - Abstract
This paper presents the development of the speech, text and pronunciation dictionary resources required to build a large vocabulary speech recognizer for the Malay language. This project is a collaboration project among three universities: USM, MMU from Malaysia and NTU from Singapore. The Malay speech corpus consists of read speech (speaker independent/ dependent and accent independent/ dependent) and broadcast news. To date, 90 speakers have been recorded which is equal to a total of nearly 70 hours of read speech, and 10 hours of broadcast news from local TV stations in Malaysia was transcribed. The text corpus consists of 700Mbytes of data extracted from Malaysia's local news web pages from 1998–2008 and a rule based G2P tool is develop to generate the pronunciation dictionary.
- Published
- 2009
82. Content-Based Search in Multilingual Audiovisual Documents using the International Phonetic Alphabet
- Author
-
Philippe Mulhem, Tien-Ping Tan, Stéphane Ayache, Le Viet Bac, Georges Quénot, Laurent Besacier, Modélisation et Recherche d’Information Multimédia [Grenoble] (MRIM), Laboratoire d'Informatique de Grenoble (LIG), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'informatique Fondamentale de Marseille - UMR 6166 (LIF), Université de la Méditerranée - Aix-Marseille 2-Université de Provence - Aix-Marseille 1-Centre National de la Recherche Scientifique (CNRS), Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF), and Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Lorraine (INPL)-Université Nancy 2-Université Henri Poincaré - Nancy 1 (UHP)
- Subjects
Computer Networks and Communications ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Context (language use) ,02 engineering and technology ,Dynamic programming ,computer.software_genre ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Task (project management) ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Search engine ,Phone ,International Phonetic Alphabet ,Multilingual ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,ComputingMilieux_MISCELLANEOUS ,Context model ,Information retrieval ,Query string ,business.industry ,Star Challenge ,String (computer science) ,Search engine indexing ,Audio retrieval ,Acoustic model ,Spotting ,Ranking ,Hardware and Architecture ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,020201 artificial intelligence & image processing ,Artificial intelligence ,0305 other medical science ,business ,computer ,Natural language ,Software ,Natural language processing - Abstract
Published online in October 2009; International audience; We present in this paper an approach based on the use of the International Phonetic Alphabet (IPA) for content-based indexing and retrieval of multilingual audiovisual documents. The approach works even if the languages of the document are unknown. It has been validated in the context of the ''Star Challenge'' search engine competition organized by the Agency for Science, Technology and Research (A*STAR) of Singapore. Our approach includes the building of an IPA-based multilingual acoustic model and a dynamic programming based method for searching document segments by ''IPA string spotting''. Dynamic programming allows for retrieving the query string in the document string even with a significant transcription error rate at the phone level. The methods that we developed ranked us as first and third on the monolingual (English) search task, as fifth on the multilingual search task and as first on the multimodal (audio and image) search task.
- Published
- 2009
83. Reconnaissance automatique de la parole non native
- Author
-
Tien Ping, Tan, Laboratoire d'Informatique de Grenoble (LIG), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF), Université Joseph-Fourier - Grenoble I, and Laurent Besacier et Eric Castelli(laurent.besacier@imag.fr et Eric.Castelli@mica.edu.vn)
- Subjects
modélisation de prononciation ,reconnaissance automatique de la parole non native ,[INFO.INFO-OH]Computer Science [cs]/Other [cs.OH] ,non-native pronunciation modeling ,accent identification ,identification d'accent ,non-native speech recognition ,modélisation acoustique multilingue non native ,non-native multilingual acoustic modeling - Abstract
Automatic speech recognition technology has achieved maturity, where it has been widely integrated into many systems. However, speech recognition system for non-native speakers still suffers from high error rate, which is due to the mismatch between the non-native speech and the trained models. Recording sufficient non-native speech for training is time consuming and often difficult. In this thesis, we propose approaches to adapt acoustic and pronunciation model under different resource constraints for non-native speakers. A preliminary work on accent identification has also been carried out.Multilingual acoustic modeling has been proposed for modeling cross-lingual transfer of non-native speakers to overcome the difficulty in obtaining non-native speech. In cases where multilingual acoustic models are available, a hybrid approach of acoustic interpolation and merging has been proposed for adapting the target acoustic model. The proposed approach has also proven to be useful for context modeling. However, if multilingual corpora are available instead, a class of three interpolation methods has equally been introduced for adaptation. Two of them are supervised speaker adaptation methods, which can be carried out with only few non-native utterances.In term of pronunciation modeling, two existing approaches which model pronunciation variants, one at the pronunciation dictionary and another at the rescoring module have been revisited, so that they can work under limited amount of non-native speech. We have also proposed a speaker clustering approach called “latent pronunciation analysis” for clustering non-native speakers based on pronunciation habits. This approach can also be used for pronunciation adaptation.Finally, a text dependent accent identification method has been proposed. The approach can work with little amount of non-native speech for creating robust accent models. This is made possible with the generalizability of the decision trees and the usage of multilingual resources to increase the performance of the accent models.; Les technologies de reconnaissance automatique de la parole sont désormais intégrées dans de nombreux systèmes. La performance des systèmes de reconnaissance vocale pour les locuteurs non natifs continue cependant à souffrir de taux d'erreur élevés, en raison de la différence entre la parole non native et les modèles entraînés. La réalisation d'enregistrements en grande quantité de parole non native est souvent difficile et peu réaliste pour représenter toutes les origines des locuteurs. Dans cette thèse, nous proposons des approches pour adapter les modèles acoustiques et de prononciation sous différentes conditions de ressource pour les locuteurs non natifs. Un travail préliminaire sur l'identification d'accent a également proposé.Ce travail de thèse repose sur le concept de modélisation acoustique translingue qui permet de représenter les locuteurs non natifs dans un espace multilingue sans utiliser (ou en utilisant très peu) de parole non native. Une approche hybride d'interpolation et de fusion est proposée pour l'adaptation des modèles en langue cible en utilisant une collection de modèles acoustiques multilingues. L'approche proposée est également utile pour la modélisation du contexte de prononciation. Si, en revanche, des corpus multilingues sont disponibles, des méthodes d'interpolation peuvent être utilisées pour l'adaptation à la parole non native. Deux d'entre elles sont proposées pour une adaptation supervisée et peuvent être employées avec seulement quelques phrases non natives.En ce qui concerne la modélisation de la prononciation, deux approches existantes (l'une fondée sur la modification du dictionnaire de prononciation, l'autre fondée sur la définition d'un score de prononciation utilisé dans une phase de re-scoring) sont revisitées dans cette thèse et adaptées pour fonctionner sur une quantité de données limitée. Une nouvelle approche de groupement de locuteurs selon leurs habitudes de prononciation, est également présentée : nous l'appelons « analyse de prononciation latente ». Cette approche se révèle également utile pour améliorer le modèle de prononciation pour la reconnaissance automatique de la parole non native.Enfin, une méthode d'identification d'accent est proposée. Elle nécessite une petite quantité de parole non native pour créer les modèles d'accents. Ceci est rendu possible en utilisant la capacité de généralisation des arbres de décision et en utilisant des ressources multilingues pour augmenter la performance du modèle d'accent.
- Published
- 2008
84. Automatic Speech Recognition for Non-Native Speakers
- Author
-
Tien Ping, Tan, Laboratoire d'Informatique de Grenoble (LIG), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF), Université Joseph-Fourier - Grenoble I, and Laurent Besacier et Eric Castelli(laurent.besacier@imag.fr et Eric.Castelli@mica.edu.vn)
- Subjects
modélisation de prononciation ,reconnaissance automatique de la parole non native ,[INFO.INFO-OH]Computer Science [cs]/Other [cs.OH] ,non-native pronunciation modeling ,accent identification ,identification d'accent ,non-native speech recognition ,modélisation acoustique multilingue non native ,non-native multilingual acoustic modeling - Abstract
Automatic speech recognition technology has achieved maturity, where it has been widely integrated into many systems. However, speech recognition system for non-native speakers still suffers from high error rate, which is due to the mismatch between the non-native speech and the trained models. Recording sufficient non-native speech for training is time consuming and often difficult. In this thesis, we propose approaches to adapt acoustic and pronunciation model under different resource constraints for non-native speakers. A preliminary work on accent identification has also been carried out.Multilingual acoustic modeling has been proposed for modeling cross-lingual transfer of non-native speakers to overcome the difficulty in obtaining non-native speech. In cases where multilingual acoustic models are available, a hybrid approach of acoustic interpolation and merging has been proposed for adapting the target acoustic model. The proposed approach has also proven to be useful for context modeling. However, if multilingual corpora are available instead, a class of three interpolation methods has equally been introduced for adaptation. Two of them are supervised speaker adaptation methods, which can be carried out with only few non-native utterances.In term of pronunciation modeling, two existing approaches which model pronunciation variants, one at the pronunciation dictionary and another at the rescoring module have been revisited, so that they can work under limited amount of non-native speech. We have also proposed a speaker clustering approach called “latent pronunciation analysis” for clustering non-native speakers based on pronunciation habits. This approach can also be used for pronunciation adaptation.Finally, a text dependent accent identification method has been proposed. The approach can work with little amount of non-native speech for creating robust accent models. This is made possible with the generalizability of the decision trees and the usage of multilingual resources to increase the performance of the accent models.; Les technologies de reconnaissance automatique de la parole sont désormais intégrées dans de nombreux systèmes. La performance des systèmes de reconnaissance vocale pour les locuteurs non natifs continue cependant à souffrir de taux d'erreur élevés, en raison de la différence entre la parole non native et les modèles entraînés. La réalisation d'enregistrements en grande quantité de parole non native est souvent difficile et peu réaliste pour représenter toutes les origines des locuteurs. Dans cette thèse, nous proposons des approches pour adapter les modèles acoustiques et de prononciation sous différentes conditions de ressource pour les locuteurs non natifs. Un travail préliminaire sur l'identification d'accent a également proposé.Ce travail de thèse repose sur le concept de modélisation acoustique translingue qui permet de représenter les locuteurs non natifs dans un espace multilingue sans utiliser (ou en utilisant très peu) de parole non native. Une approche hybride d'interpolation et de fusion est proposée pour l'adaptation des modèles en langue cible en utilisant une collection de modèles acoustiques multilingues. L'approche proposée est également utile pour la modélisation du contexte de prononciation. Si, en revanche, des corpus multilingues sont disponibles, des méthodes d'interpolation peuvent être utilisées pour l'adaptation à la parole non native. Deux d'entre elles sont proposées pour une adaptation supervisée et peuvent être employées avec seulement quelques phrases non natives.En ce qui concerne la modélisation de la prononciation, deux approches existantes (l'une fondée sur la modification du dictionnaire de prononciation, l'autre fondée sur la définition d'un score de prononciation utilisé dans une phase de re-scoring) sont revisitées dans cette thèse et adaptées pour fonctionner sur une quantité de données limitée. Une nouvelle approche de groupement de locuteurs selon leurs habitudes de prononciation, est également présentée : nous l'appelons « analyse de prononciation latente ». Cette approche se révèle également utile pour améliorer le modèle de prononciation pour la reconnaissance automatique de la parole non native.Enfin, une méthode d'identification d'accent est proposée. Elle nécessite une petite quantité de parole non native pour créer les modèles d'accents. Ceci est rendu possible en utilisant la capacité de généralisation des arbres de décision et en utilisant des ressources multilingues pour augmenter la performance du modèle d'accent.
- Published
- 2008
85. Acoustic Model Interpolation for Non-Native Speech Recognition
- Author
-
Laurent Besacier and Tien-Ping Tan
- Subjects
Computer science ,business.industry ,Speech recognition ,Contrast (statistics) ,Acoustic model ,Pattern recognition ,Speaker recognition ,Least squares ,Reduction (complexity) ,Artificial intelligence ,Loudspeaker ,business ,Natural language ,Interpolation - Abstract
This paper proposes three interpolation techniques which use the target language and the speaker's native language to improve non-native speech recognition system. These interpolation techniques are manual interpolation, weighted least square and eigenvoices. Each of them can be used under different situation and constraints. In contrast to weighted least square and eigenvoices methods, manual interpolation can be achieved offline without any adaptation data. These methods can also be combined with MLLR to improve the recognition rate. Experiments presented in this paper show that the best non native adaptation method, combined with MLLR can give 10% WER absolute reduction on a French automatic speech recognition system for both Chinese and Vietnamese native speakers.
- Published
- 2007
86. Solving Asymmetric Traveling Salesman Problems using a generic Bee Colony Optimization framework with insertion local search.
- Author
-
Wong, Li-Pei, Khader, Ahamad Tajudin, Al-Betar, Mohammed Azmi, and Tien-Ping Tan
- Published
- 2013
- Full Text
- View/download PDF
87. MASS: A Malay language LVCSR corpus resource.
- Author
-
Tien-Ping Tan, Xiong Xiao, Tang, E.K., Eng Siong Chng, and Haizhou Li
- Published
- 2009
- Full Text
- View/download PDF
88. Content-Based Search in Multilingual Audiovisual Documents Using the International Phonetic Alphabet.
- Author
-
Quenot, G., Tien Ping Tan, Le Viet Bac, Ayache, S., Besacier, L., and Mulhem, P.
- Published
- 2009
- Full Text
- View/download PDF
89. Modeling context and language variation for non-native speech recognition
- Author
-
Laurent Besacier and Tien-Ping Tan
- Subjects
business.industry ,Computer science ,Speech recognition ,Vietnamese ,Face (sociological concept) ,Context (language use) ,Pronunciation ,computer.software_genre ,Speaker recognition ,language.human_language ,Speaker diarisation ,Variation (linguistics) ,language ,Language model ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Non-native speakers often face difficulty in pronouncing like the native speakers. This paper proposes to model pronunciation variation in non-native speaker’s speech using only acoustics models, without the need for the corpus. Variation in term of context and language will be modeled. The combination of both modeling resulted in the reduction of absolute WER as much as 16% and 6% for native Vietnamese and Chinese speakers of French.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.