Author: "Tien-Ping Tan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Tien-Ping Tan"' showing total 103 results

Start Over Author "Tien-Ping Tan"

103 results on '"Tien-Ping Tan"'

1. Iterative Self-Supervised Learning for Legal Similar Case Retrieval

Author: Yao Liu, Tien-Ping Tan, and Xiaoping Zhan
Subjects: Legal information retrieval, similar case retrieval, iterative training, self-supervised learning, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: In the realm of legal artificial intelligence (AI), the spotlight has been cast on its remarkable precision and efficiency, especially in tasks such as similar case retrieval where the identification of pertinent cases in response to a given query is of paramount importance. This task, distinct from traditional text retrieval, presents a set of unique challenges that necessitate the availability of high-quality, annotated datasets to facilitate efficient model training. The intricacies of handling extended queries and candidate documents, coupled with the varied interpretations of similarity, further compound the complexity of this endeavor. This study introduces an innovative training approach, combining dense and sparse retrieval methods. Utilizing a sparse retrieval model, we extract unlabeled data from extensive legal cases. Subsequently, a dense retrieval model screens this data, merging it with labeled data to create pseudo-labeled data, iteratively training until convergence. The results demonstrate exceptional performance in the Chinese law retrieval task dataset, showcasing a notable 3.66% precision enhancement and a substantial 3.62% improvement in mean average precision (MAP). However, the dataset’s imbalance across different charges of cases poses a challenge, potentially affecting retrieval performance for long-tailed legal cases. Nonetheless, these outcomes signify accelerated and more efficient retrieval of similar cases for legal professionals. Additionally, they provide high-quality references for non-legal individuals lacking expertise in the field.
Published: 2024
Full Text: View/download PDF

2. Explaining legal judgments: A multitask learning framework for enhancing factual consistency in rationale generation

Author: Congqing He, Tien-Ping Tan, Sheng Xue, and Yanyu Tan
Subjects: Legal judgments, Explanation, Rationale generation, Event Chain, Factual consistency, Electronic computers. Computer science, QA75.5-76.95
Abstract: The explanation of legal judgments is crucial for the transparency, fairness, and trustworthiness, aiming to provide rationales for decision-making. While previous works have focused on improving the accuracy of legal judgment prediction, the lack of explainability seriously limits the practical application of these methods. Many researchers have dedicated effort to extracting or generating rationales as explanations for legal judgments, ignoring the factual consistency of these rationales. Inconsistencies between rationales and fact descriptions severely impact their applicability. To address these issues, we investigate Event Chain – ordered sequences of events related to criminal behavior – as an intermediate representation in the fact descriptions, to enhance the representation of causal relationships among events, and focus on crucial events within the fact description. Specifically, we propose a multi-task learning approach, dubbed LegalMind, that introduces Event Chain as an auxiliary task and jointly models Event Chain and rationale in a unified decoder, to improve factual consistency in rationale generation. The experiment results show that our model outperforms the state-of-the-art methods, achieving improvements of 7.65% and 6.65% in AVG-BLEU on the CJO dataset and LAIC2021 dataset respectively compared to BART-C3VG. Furthermore, in comparison to BART-C3VG, our model demonstrated superior performance with increases of 6.17% and 8.33% in factual consistency on the CJO and LAIC2021 datasets, respectively.
Published: 2023
Full Text: View/download PDF

3. Hybrid transfer learning strategy for cross-subject EEG emotion recognition

Author: Wei Lu, Haiyan Liu, Hua Ma, Tien-Ping Tan, and Lingnan Xia
Subjects: affective computing, cross-subject EEG emotion recognition, fine-tuning, domain adaptation, few-shot, Neurosciences. Biological psychiatry. Neuropsychiatry, RC321-571
Abstract: Emotion recognition constitutes a pivotal research topic within affective computing, owing to its potential applications across various domains. Currently, emotion recognition methods based on deep learning frameworks utilizing electroencephalogram (EEG) signals have demonstrated effective application and achieved impressive performance. However, in EEG-based emotion recognition, there exists a significant performance drop in cross-subject EEG Emotion recognition due to inter-individual differences among subjects. In order to address this challenge, a hybrid transfer learning strategy is proposed, and the Domain Adaptation with a Few-shot Fine-tuning Network (DFF-Net) is designed for cross-subject EEG emotion recognition. The first step involves the design of a domain adaptive learning module specialized for EEG emotion recognition, known as the Emo-DA module. Following this, the Emo-DA module is utilized to pre-train a model on both the source and target domains. Subsequently, fine-tuning is performed on the target domain specifically for the purpose of cross-subject EEG emotion recognition testing. This comprehensive approach effectively harnesses the attributes of domain adaptation and fine-tuning, resulting in a noteworthy improvement in the accuracy of the model for the challenging task of cross-subject EEG emotion recognition. The proposed DFF-Net surpasses the state-of-the-art methods in the cross-subject EEG emotion recognition task, achieving an average recognition accuracy of 93.37% on the SEED dataset and 82.32% on the SEED-IV dataset.
Published: 2023
Full Text: View/download PDF

4. Knowledge-Enriched Multi-Cross Attention Network for Legal Judgment Prediction

Author: Congqing He, Tien-Ping Tan, Xiaobo Zhang, and Sheng Xue
Subjects: Legal judgment prediction, legal charge knowledge, multi-cross attention, confusing charges and law articles, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Legal judgment prediction (LJP) automatically predicts the judgment results of a legal case based on its fact description, which has excellent prospects in judicial assistance systems and consultation services for the public. Most previous studies either focused on enhancing LJP’s performance while ignoring the issue of confusing charges and law articles, or only used law articles to improve the judgment of confusing verdicts, resulting in the limited model performance. This paper introduces legal charge knowledge as a type of knowledge to enhance the representation of fact descriptions and incorporates it into deep neural networks. We then propose a Knowledge-enriched Multi-Cross Attention Network (KEMCAN) to improve LJP’s performance, and resolve legal cases involving confusing charges and law articles. Specifically, a cross-attention mechanism is proposed to model the relationship between legal charge knowledge and fact description in a unified model. The experimental results demonstrate that our model outperforms the state-of-the-art methods on two real-world datasets, achieving an average improvement of 3.95% in macro-F1 for charge prediction and 1.98% for law article prediction.
Published: 2023
Full Text: View/download PDF

5. Bi-Branch Vision Transformer Network for EEG Emotion Recognition

Author: Wei Lu, Tien-Ping Tan, and Hua Ma
Subjects: Affective computing, EEG-based emotion recognition, transformer, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Electroencephalogram (EEG) signals have emerged as an important tool for emotion research due to their objective reflection of real emotional states. Deep learning-based EEG emotion classification algorithms have made encouraging progress, but existing models struggle with capturing long-range dependence and integrating temporal, frequency, and spatial domain features that limit their classification ability. To address these challenges, this study proposes a Bi-branch Vision Transformer- based EEG emotion recognition model, Bi-ViTNet, that integrates spatial-temporal and spatial-frequency feature representations. Specifically, Bi-ViTNet is composed of spatial-frequency feature extraction branch and spatial-temporal feature extraction branch that fuse spatial-frequency-temporal features in a unified framework. Each branch is composed of Linear Embedding and Transformer Encoder, which is used to extract spatial-frequency features and spatial-temporal features. Finally, fusion and classification are performed by the Fusion and Classification layer. Experiments on SEED and SEED-IV datasets demonstrate that Bi-ViTNet outperforms state-of-the-art baselines.
Published: 2023
Full Text: View/download PDF

6. Bilingual Automatic Speech Recognition: A Review, Taxonomy and Open Challenges

Author: Ahmad A. M. Abushariah, Hua-Nong Ting, Mumtaz Begum Peer Mustafa, Anis Salwa Mohd Khairuddin, Mohammad A. M. Abushariah, and Tien-Ping Tan
Subjects: ASR, bilingual ASR, ASR architecture, code mixing, code switching, cross lingual, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: In this technological era, smart and intelligent systems that are integrated with artificial intelligence (AI) techniques, algorithms, tools, and technologies, have impact on various aspects in our daily life. Communication and interaction between human and machine using speech become increasingly important, since it is an obvious substitute for keyboards and screens in the communication process. Therefore, numerous technologies take advantage of speech such as Automatic Speech Recognition (ASR), where human natural speech for many languages is used as the means to interact with machines. Majority of the related works on ASR concentrate on the development and evaluation of ASR systems that serve a single language only, such as Arabic, English, Chinese, French, and many others. However, research attempts that combine multiple languages (bilingual and multilingual) during the development and evaluation of ASR systems are very limited. This paper aims to provide comprehensive research background and fundamentals of bilingual ASR, and related works that have combined two languages for ASR tasks from 2010 to 2021. It also formulates research taxonomy and discusses open challenges to the bilingual ASR research. Based on our literature investigation, it is clear that bilingual ASR using deep learning approach is highly demanded and is able to provide acceptable performance. In addition, many combinations of two languages such as Arabic-English, Arabic-Malay, and others, are still limited, which can open new research opportunities. Finally, it is clear that ASR research is moving towards not only bilingual ASR, but also multilingual ASR.
Published: 2023
Full Text: View/download PDF

7. Enhanced DouDiZhu Card Game Strategy Using Oracle Guiding and Adaptive Deep Monte Carlo Method.

Author: Qian Luo, Tien Ping Tan, Daochen Zha, and Tianqiao Zhang
Published: 2024

8. Learning to Automatically Generating Genre-Specific Song Lyrics: A Comparative Study.

Author: Tze Huat Tee, Belicia Qiao Bei Yeap, Keng Hoon Gan, and Tien Ping Tan
Published: 2022
Full Text: View/download PDF

9. Low Resource Malay Dialect Automatic Speech Recognition Modeling Using Transfer Learning from a Standard Malay Model.

Author: Tien-Ping Tan, Lei Qin, Samson Juan, Sarah Flora, and Yen Min Khaw, Jasmina
Subjects: ARTIFICIAL neural networks, MALAY language, LINGUISTICS, HIDDEN Markov models, SPEECH, SPEECH perception
Abstract: Approaches to automatic speech recognition have transited from Hidden Markov Model (HMM)-based ASR to deep neural networks. The advantages of deep neural network approaches are that they can be developed quickly and perform better given large language resources. Nevertheless, dialect speech recognition is still challenging due to the limited resources. Transfer learning approaches have been proposed to improve speech recognition for low resources. In the first approach, the model is pre-trained on a large and diverse labeled dataset to learn the acoustic and language patterns from the speech signal. Then, the model parameters are updated with a new dataset, and the pre-trained model is fine-tuned on a low-resource language dataset. The fine-tuning process is usually completed by freezing the pre-trained layers and training the remaining layers of the model on the low-resource language corpus. Another approach is to use a pre-trained model to capture the compact and meaningful features as input to the encoder. Pre-training in this approach usually involves using unsupervised learning methods to train models on a corpus of large amounts of unmarked data. It enables the model to learn the general patterns and relationships between the input speech signals. This paper proposes a training recipe using transfer learning and Standard Malay models to improve automatic speech recognition for Kelantan and Sarawak Malay dialects. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Evaluating Code-Switched Malay-English Speech Using Time Delay Neural Networks.

Author: Anand Singh and Tien-Ping Tan
Published: 2018
Full Text: View/download PDF

11. Using Dictionary and Lemmatizer to Improve Low Resource English-Malay Statistical Machine Translation System.

Author: Yin-Lai Yeong, Tien Ping Tan, and Siti Khaotijah Mohammad
Published: 2016
Full Text: View/download PDF

12. HYBRID DISTANCE-STATISTICAL-BASED PHRASE ALIGNMENT FOR ANALYZING PARALLEL TEXTS IN STANDARD MALAY AND MALAY DIALECTS.

Author: Yen-Min Jasmina Khaw, Tien-Ping Tan, and Bali, Ranaivo-Malançon
Subjects: CROSS-language information retrieval, COMPUTATIONAL linguistics, PARALLEL algorithms, MALAY language, DIALECTS, SPEECH
Abstract: Parallel texts corpora are essential resources in linguistics and natural language processing, especially in translation and multilingual information retrieval. The publicly available parallel text corpora are limited to certain genres, types and domains. Furthermore, the parallel dialect text is scarce, even though they are important in the analysis and study of a dialect. Collecting parallel dialect text is challenging because dialects typically appear in the form of speech and very limited dialectic texts exist. Moreover, there is no standard orthography in most dialects. The contributions of this paper are threefold. First, the paper describes a methodology in acquiring a parallel text corpus of Standard Malay and Malay dialects, particularly Kelantan Malay and Sarawak Malay. Second, we propose a hybrid of distancebased and statistical-based alignment algorithm to align words and phrases the parallel text. The results show that the precision and recall values of the proposed alignment algorithm are more than 95% and better than the state-of-the-art GIZA++. Third, the alignment obtained were compared to find out the lexical similarities and differences between Standard Malay and the two studied Malay dialects, contributing valuable insights into the linguistic variations within the Malay language family. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. A grapheme and phone rescoring combination system for Malay broadcast news recognition.

Author: Zainab Ali Khalaf, Tien Ping Tan, and Li-Pei Wong
Published: 2015
Full Text: View/download PDF

14. Merging of Native and Non-native Speech for Low-resource Accented ASR.

Author: Sarah Flora Samson Juan, Laurent Besacier, Benjamin Lecouteux, and Tien Ping Tan
Published: 2015
Full Text: View/download PDF

15. Language identification of code Switching sentences and multilingual sentences of under-resourced languages by using multi structural word information.

Author: Yin-Lai Yeong and Tien-Ping Tan
Published: 2014
Full Text: View/download PDF

16. Preparation of MaDiTS corpus for Malay dialect translation and speech synthesis system.

Author: Yen-Min Jasmina Khaw and Tien-Ping Tan
Published: 2014

17. Using closely-related language to build an ASR for a very under-resourced language: Iban.

Author: Sarah Flora Samson Juan, Laurent Besacier, Benjamin Lecouteux, and Tien-Ping Tan
Published: 2014
Full Text: View/download PDF

18. Acoustic model merging using acoustic models from multilingual speakers for automatic speech recognition.

Author: Tien-Ping Tan, Laurent Besacier, and Benjamin Lecouteux
Published: 2014
Full Text: View/download PDF

19. Hybrid approach for aligning parallel sentences for languages without a written form using standard Malay and Malay dialects.

Author: Yen-Min Jasmina Khaw and Tien-Ping Tan
Published: 2014
Full Text: View/download PDF

20. Using Informative Score for Instance Selection Strategy in Semi-Supervised Sentiment Classification

Author: Lee Lay Shan, Vivian, primary, Keng Hoon, Gan, additional, Tien Ping, Tan, additional, and Abdullah, Rosni, additional
Published: 2023
Full Text: View/download PDF

21. MDou: Accelerating DouDiZhu Self-Play Learning Using Monte-Carlo Method With Minimum Split Pruning and a Single Q-Network

Author: Qian Luo, Tien-Ping Tan, Yi Su, and Zhanggen Jin
Subjects: Artificial Intelligence, Control and Systems Engineering, Electrical and Electronic Engineering, Software
Published: 2022

22. A Deep Learning Model with Name Attention to Predict the Stock Trend from News Headline

Author: Tien-Ping Tan, Phaik Ching Soon, Marwan Aleasa, Huah Yong Chan, and Keng Hoon Gan
Published: 2023

23. Text Analytics of Vaccine Myths on Reddit

Author: Sylvia Shiau Ching Wong, Jing-Ru Tan, Keng Hoon Gan, and Tien Ping Tan
Abstract: Widespread online misinformation that aims to convince vaccine-hesitant populations continues to threaten healthcare systems globally. Assessing features of online content including topics and sentiments against vaccines could help curb the spread of vaccine-related misinformation and allow stakeholders to draft better regulations and public policies. Using a public dataset extracted from Reddit, the authors performed text analytics including sentiment analysis, N-gram, and topic modeling to grasp the sentiments, the most popular phrases (N-grams), and topics of the subreddit. The sentiment analysis results revealed mostly positive sentiments in the subreddit's discussions. The N-gram analysis identified “cause autism” and “MMR cause autism” as the most frequent bigram and trigram. The NMF topic modeling results revealed five topics discussing different aspects of vaccines. These findings implied the significance of the ability to assess public confidence and sentiment from social media platforms to enable effective responses against the proliferation of vaccine misinformation.
Published: 2022

24. Solving Asymmetric Traveling Salesman Problems using a generic Bee Colony Optimization framework with insertion local search.

Author: Li-Pei Wong, Ahamad Tajudin Khader, Mohammed Azmi Al-Betar, and Tien-Ping Tan
Published: 2013
Full Text: View/download PDF

25. The development and analysis of a Malay broadcasr news corpus.

Author: Tze Yuang Chong, Xiong Xiao, Haihua Xu, Tien Ping Tan, Chau Khoa Pham, Dau-Cheng Lyu, Chng Eng Siong, and Haizhou Li 0001
Published: 2013
Full Text: View/download PDF

26. Improving the Accuracy of Large Vocabulary Continuous Speech Recognizer Using Dependency Parse Tree and Chomsky Hierarchy in Lattice Rescoring.

Author: Kai Sze Hong, Tien-Ping Tan, and Enya Kong Tang
Published: 2013
Full Text: View/download PDF

27. Broadcast News Story Clustering via Term and Sentence Matching.

Author: Foong Kuin Yow and Tien-Ping Tan
Published: 2013
Full Text: View/download PDF

28. Randomized psychoacoustic model for mobile, panoramic, heritage-viewing applications.

Author: Chen Kim Lim, Tien Ping Tan, Kian Lam Tan, and Abdullah Zawawi Talib
Published: 2012
Full Text: View/download PDF

29. A Malay Dialect Translation and Synthesis System: Proposal and Preliminary System.

Author: Tien-Ping Tan, Sang-Seong Goh, and Yen-Min Jasmina Khaw
Published: 2012
Full Text: View/download PDF

30. Automatic Speech Recognition of Code Switching Speech Using 1-Best Rescoring.

Author: Basem H. A. Ahmed and Tien-Ping Tan
Published: 2012
Full Text: View/download PDF

31. Analysis of Malay Speech Recognition for Different Speaker Origins.

Author: Sarah Flora Samson Juan, Laurent Besacier, and Tien Ping Tan
Published: 2012
Full Text: View/download PDF

32. Pronunciation Modeling for Malaysian English.

Author: Yen-Min Jasmina Khaw and Tien Ping Tan
Published: 2012
Full Text: View/download PDF

33. Sliding Window and Parallel LSTM with Attention and CNN for Sentence Alignment on Low-Resource Languages

Author: Tien-Ping Tan, Chai Kim Lim, and Wan Rose Eliza Abdul Rahman
Subjects: InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Abstract: A parallel text corpus is an important resource for building a machine translation (MT) system. Existing resources such as translated documents, bilingual dictionaries, and translated subtitles are excellent resources for constructing parallel text corpus. A sentence alignment algorithm automatically aligns source sentences and target sentences because manual sentence alignment is resource-intensive. Over the years, sentence alignment approaches have improved from sentence length heuristics to statistical lexical models to deep neural networks. Solving the alignment problem as a classification problem is interesting as classification is the core of machine learning. This paper proposes a parallel long-short-term memory with attention and convolutional neural network (parallel LSTM+Attention+CNN) for classifying two sentences as parallel or non-parallel sentences. A sliding window approach is also proposed with the classifier to align sentences in the source and target languages. The proposed approach was compared with three classifiers, namely the feedforward neural network, CNN, and bi-directional LSTM. It is also compared with the BleuAlign sentence alignment system. The classification accuracy of these models was evaluated using Malay-English parallel text corpus and UN French-English parallel text corpus. The Malay-English sentence alignment performance was then evaluated using research documents and the very challenging Classical Malay-English document. The proposed classifier obtained more than 80% accuracy in categorizing parallel/non-parallel sentences with a model built using only five thousand training parallel sentences. It has a higher sentence alignment accuracy than other baseline systems.
Published: 2021

34. Non-native Accent Pronunciation Modeling in Automatic Speech Recognition.

Author: Basem H. A. Ahmed and Tien-Ping Tan
Published: 2011
Full Text: View/download PDF

35. Applying Grapheme, Word, and Syllable Information for Language Identification in Code Switching Sentences.

Author: Yin-Lai Yeong and Tien-Ping Tan
Published: 2011
Full Text: View/download PDF

36. BASRAH: Arabic Verses Meters Identification System.

Author: Zainab Ali Khalaf, Maytham Alabbas, and Tien-Ping Tan
Published: 2011
Full Text: View/download PDF

37. SEAME: a Mandarin-English code-switching speech corpus in south-east asia.

Author: Dau-Cheng Lyu, Tien Ping Tan, Engsiong Chng, and Haizhou Li 0001
Published: 2010
Full Text: View/download PDF

38. Language identification of code switching Malay-English words using syllable structure information.

Author: Yin-Lai Yeong and Tien-Ping Tan
Published: 2010

39. Content-Based Search in Multilingual Audiovisual Documents Using the International Phonetic Alphabet.

Author: Georges Quénot, Tien Ping Tan, Viet Bac Le, Stéphane Ayache, Laurent Besacier, and Philippe Mulhem
Published: 2009
Full Text: View/download PDF

40. Recherche par le contenu dans des documents audiovisuels multilingues.

Author: Georges Quénot, Tien Ping Tan, Viet Bac Le, Stéphane Ayache, Laurent Besacier, and Philippe Mulhem
Published: 2009
Full Text: View/download PDF

41. Improving pronunciation modeling for non-native speech recognition.

Author: Tien Ping Tan and Laurent Besacier
Published: 2008
Full Text: View/download PDF

42. Modeling context and language variation for non-native speech recognition.

Author: Tien Ping Tan and Laurent Besacier
Published: 2007
Full Text: View/download PDF

43. Acoustic Model Interpolation for Non-Native Speech Recognition.

Author: Tien-Ping Tan and Laurent Besacier
Published: 2007
Full Text: View/download PDF

44. A French Non-Native Corpus for Automatic Speech Recognition.

Author: Tien-Ping Tan and Laurent Besacier
Published: 2006

45. Product Aspect Ranking Using Multi-Criteria Decision Making

Author: Saif A. Ahmad Alrababah, Keng Hoon Gan, Tien-Ping Tan, and Mohammed N. Al-Kabi
Abstract: Online product reviews often mention several aspects of the product. Reviews with multiple aspects are sometimes problematic because some of the aspects mentioned are of little or no relevance to either consumers or providers. Hence, it is important to identify relevant aspects of a product by ranking them in the order of their importance. With that, this chapter introduces a new criterion known as Aspect Relevancy in the process of ranking aspects. The study also incorporates multi-criteria decision-making (MCDM) to recognize vital aspects retrieved from the consumers reviews of products and services. In ranking the selected aspects, the subjective technique for order of preference by similarity to ideal solution (TOPSIS) is employed. The experimental results using Bing Liu and SemEval 2016 Task 5 datasets have demonstrated positive outcome of the proposed approach when compared with two baseline approaches in terms of NDCG@k ranking measure.
Published: 2022

46. Spatial information extraction from travel narratives: Analysing the notion of co-occurrence indicating closeness of tourist places

Author: Erum Haris, Keng Hoon Gan, and Tien-Ping Tan
Subjects: Geospatial analysis, Closeness, 0211 other engineering and technologies, 02 engineering and technology, Library and Information Sciences, computer.software_genre, Data science, Spatial relation, Geography, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Social media, Narrative, Reflection (computer graphics), Spatial analysis, computer, Tourism, 021101 geological & geomatics engineering, Information Systems
Abstract: Recent advancements in social media have generated a myriad of unstructured geospatial data. Travel narratives are among the richest sources of such spatial clues. They are also a reflection of writers’ interaction with places. One of the prevalent ways to model this interaction is a points of interest (POIs) graph depicting popular POIs and routes. A relevant notion is that frequent pairwise occurrences of POIs indicate their geographic proximity. This work presents an empirical interpretation of this theory and constructs spatially enriched POI graphs, a clear augmentation to popularity-based POI graphs. A triplet pattern, rule-based spatial relation extraction technique SpatRE is proposed and compared with standard relation extraction systems Ollie and Stanford OpenIE. A travel blogs data set is also contributed containing labelled spatial relations. The performance is further evaluated on SemEval 2013 benchmark data sets. Finally, spatially enriched POI graphs are qualitatively compared with TripAdvisor and Google Maps to visualise information accuracy.
Published: 2019

47. Applying Linguistic G2P Knowledge on a Statistical Grapheme-to-phoneme Conversion in Khmer

Author: Tien-Ping Tan and Vathnak Sar
Subjects: Sequence, Computer science, Grapheme, Word error rate, 020206 networking & telecommunications, Speech synthesis, 02 engineering and technology, Pronunciation, computer.software_genre, Linguistics, Simple (abstract algebra), Vowel, 0202 electrical engineering, electronic engineering, information engineering, General Earth and Planetary Sciences, 020201 artificial intelligence & image processing, computer, Word (computer architecture), General Environmental Science
Abstract: A Grapheme-to-Phoneme (G2P) convertor generates the pronunciation given a word. G2P is an important module in a speech synthesis system and an automatic speech recognition system. Two main G2P approaches are: knowledge-based and data-driven. The knowledge based G2P is built based on linguist knowledge. The data-driven approach such as the statistical approach on the other hand does not need expert knowledge, but it requires data to learn the rules. In this research, we propose an approach that combines linguistic knowledge into a statistical-based G2P convertor for Khmer. We examined a simple way of adding linguistic knowledge into the statistical G2P convertor by simply inserting vowel tags into a Khmer word. Three types of vowel tags were used. The main strength of this approach is it combines the strength of linguistic knowledge and statistical-based approach, to build a robust G2P model. The information allows better modeling and prediction of the phoneme sequence, thus improving the phoneme error rate (PER) and word error rate (WER). The PER and WER of our proposed Khmer G2P improve from 23.2% and 69.6% to 11.1% and 51.4% respectively.
Published: 2019

48. Semi-supervised Learning for Sentiment Classification using Small Number of Labeled Data

Author: Vivian Lay Shan Lee, Rosni Abdullah, Keng Hoon Gan, and Tien-Ping Tan
Subjects: Artificial neural network, Computer science, business.industry, Small number, Sentiment analysis, 020206 networking & telecommunications, 02 engineering and technology, Semi-supervised learning, Machine learning, computer.software_genre, Task (project management), ComputingMethodologies_PATTERNRECOGNITION, 0202 electrical engineering, electronic engineering, information engineering, General Earth and Planetary Sciences, Labeled data, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, General Environmental Science
Abstract: Sentiment analysis is an essential task to gain insights over a huge amount of opinions and thoughts. Timeliness of data is important in making major decision. However, the manual data labeling method is slow and expensive, it also cannot cope with this enormous amount of data. We investigated the literature of sentiment analysis and discovered most of the works using manual data labeling. We propose semi-supervised learning as a method that helps to reducing the effort and time needed in data labeling as it uses a combination of small amount of labeled data and a large pool of unlabeled data for model training. In our work, we trained semi-supervised deep neural network with different settings and compared the model performances to a baseline, the supervised deep neural network trained with same number of labeled data. From the results, we can see that the unlabeled data is useful in improving the data performances. But it is not a guarantee, the unlabeled data must be handled with care otherwise degraded the model performances.
Published: 2019

49. A Hybrid of Sentence-Level Approach and Fragment-Level Approach of Parallel Text Extraction from Comparable Text

Author: Tien-Ping Tan, Yin-Lai Yeong, and Keng Hoon Gan
Subjects: Machine translation, business.industry, Computer science, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, Domain (software engineering), Fragment (logic), 0202 electrical engineering, electronic engineering, information engineering, General Earth and Planetary Sciences, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing, Sentence, General Environmental Science, BLEU
Abstract: Parallel texts are essential resources in linguistics, natural language processing, and multilingual information retrieval. Many studies attempt to extract parallel text from existing resources, particularly from comparable texts. The approaches to extract parallel text from comparable text can be divided into sentence-level approach and fragment-level approach. In this paper, an approach that combines sentence-level approach and fragment-level approach is proposed. The study was evaluated using statistical machine translation (SMT) and neural machine translation (NMT). The experiment results show a very significant improvement in the BLEU scores of SMT and NMT. The BLEU scores for SMT for the test in computer science domain and news domain increase from 17.45 and 41.45 to 18.56 and 48.65 respectively. On the other hand, the BLEU scores for NMT in the computer science domain and news domain increase from 14.42 and 19.39 to 21.17 and 41.75 respectively.
Published: 2019

50. Translating IdiomsusingParaphrasing, Machine Translation and Rescoring

Author: et.al, Tien-Ping, Tan and et.al, Tien-Ping, Tan
Abstract: Idioms are rich multi-word expressions that can be found in many works of literature. The meaning of most idioms cannot be deduced literally. This makes translating idioms challenging. Moreover, the parallel text that contains idioms is limited. As a result, machine translation has difficulty in translating idioms correctly. Paraphrasing is a process to restate the meaning of a text or a passage using different words in the same language. Often, paraphrasing is used to give readers a clearer understanding of the original text. Paraphrasing can be used to assist machine translation in translating idioms. In this article, we attempted to improve the translation of idioms using paraphrasing. An approach that combine paraphrasing and rescoring with machine translation is proposed. The paraphrasing and rescoring improve the translation produced by neural machine translation from 12.03% to 12.92%.
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

103 results on '"Tien-Ping Tan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources