Author: "Chu-Ren Huang" / Publisher: association for computational linguistics - Searchworks@Jio Institute Digital Library Search Results

1. Lexicon of Changes: Towards the Evaluation of Diachronic Semantic Shift in Chinese

Author: Jing Chen, Emmanuele Chersoni, and Chu-ren Huang
Published: 2022

2. Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT

Author: Chu-Ren Huang, Won Ik Cho, Emmanuele Chersoni, and Yu-yin Hsu
Subjects: Computer science, business.industry, Event (relativity), Verb, Artificial intelligence, business, computer.software_genre, computer, Natural language processing
Published: 2021

3. PolyU CBS-Comp at SemEval-2021 Task 1: Lexical Complexity Prediction (LCP)

Author: Chu-Ren Huang, Rong Xiang, Wenjie Li, Qin Lu, Emmanuele Chersoni, and Jinghang Gu
Subjects: business.industry, Computer science, Context (language use), Gradient boosting, Artificial intelligence, business, computer.software_genre, computer, Sentence, Word (computer architecture), Natural language processing, SemEval, Task (project management)
Abstract: In this contribution, we describe the system presented by the PolyU CBS-Comp Team at the Task 1 of SemEval 2021, where the goal was the estimation of the complexity of words in a given sentence context. Our top system, based on a combination of lexical, syntactic, word embeddings and Transformers-derived features and on a Gradient Boosting Regressor, achieves a top correlation score of 0.754 on the subtask 1 for single words and 0.659 on the subtask 2 for multiword expressions.
Published: 2021

4. Is Domain Adaptation Worth Your Investment? Comparing BERT and FinBERT on Financial Tasks

Author: Bo Peng, Emmanuele Chersoni, Yu-Yin Hsu, and Chu-Ren Huang
Published: 2021

5. Using Conceptual Norms for Metaphor Detection

Author: Menghan Jiang, Chu-Ren Huang, Emmanuele Chersoni, Mingyu Wan, Rong Xiang, Qi Su, and Kathleen Ahrens
Subjects: Modality (human–computer interaction), Computer science, business.industry, Metaphor, media_common.quotation_subject, 05 social sciences, Context (language use), Representation (arts), computer.software_genre, 050105 experimental psychology, Task (project management), 03 medical and health sciences, 0302 clinical medicine, 0501 psychology and cognitive sciences, Artificial intelligence, business, computer, 030217 neurology & neurosurgery, Natural language processing, media_common
Abstract: This paper reports a linguistically-enriched method of detecting token-level metaphors for the second shared task on Metaphor Detection. We participate in all four phases of competition with both datasets, i.e. Verbs and AllPOS on the VUA and the TOFEL datasets. We use the modality exclusivity and embodiment norms for constructing a conceptual representation of the nodes and the context. Our system obtains an F-score of 0.652 for the VUA Verbs track, which is 5% higher than the strong baselines. The experimental results across models and datasets indicate the salient contribution of using modality exclusivity and modality shift information for predicting metaphoricity.
Published: 2020

6. Comparing Probabilistic, Distributional and Transformer-Based Models on Logical Metonymy Interpretation

Author: Rambelli, Giulia, Emmanuele, Chersoni, Lenci, Alessandro, Phlippe, Blache, and Chu-Ren, Huang
Subjects: distributional semantics, Logical metronymy, deep learning, transformers
Published: 2020

7. A Report on the Third

Author: Miikka Silfverberg, Radu Tudor Ionescu, Natalia Klyueva, Tanja Samardžić, Chu-Ren Huang, Tung-Le Pan, Francis M. Tyers, Andrei M. Butnaru, Shervin Malmasi, Marcos Zampieri, Tommi Jauhiainen, and Yves Scherrer
Subjects: History, Language identification, Romanian, Library science, 020206 networking & telecommunications, 02 engineering and technology, Mandarin Chinese, language.human_language, Task (project management), German, Identification (information), Variation (linguistics), 0202 electrical engineering, electronic engineering, information engineering, language, 020201 artificial intelligence & image processing, Mainland
Abstract: In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019. This year, the campaign included five shared tasks, including one task re-run – German Dialect Identification (GDI) – and four new tasks – Cross-lingual Morphological Analysis (CMA), Discriminating between Mainland and Taiwan variation of Mandarin Chinese (DMT), Moldavian vs. Romanian Cross-dialect Topic identification (MRC), and Cuneiform Language Identification (CLI). A total of 22 teams submitted runs across the five shared tasks. After the end of the competition, we received 14 system description papers, which are published in the VarDial workshop proceedings and referred to in this report.
Published: 2019

8. A Cognition Based Attention Model for Sentiment Analysis

Author: Minglei Li, Qin Lu, Yunfei Long, Chu-Ren Huang, and Rong Xiang
Subjects: Computer science, business.industry, media_common.quotation_subject, Sentiment analysis, Context (language use), Cognition, 02 engineering and technology, Attention model, computer.software_genre, Preference, 020204 information systems, Reading (process), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Layer (object-oriented design), business, computer, Natural language processing, media_common
Abstract: Attention models are proposed in sentiment analysis because some words are more important than others. However,most existing methods either use local context based text information or user preference information. In this work, we propose a novel attention model trained by cognition grounded eye-tracking data. A reading prediction model is first built using eye-tracking data as dependent data and other features in the context as independent data. The predicted reading time is then used to build a cognition based attention (CBA) layer for neural sentiment analysis. As a comprehensive model, We can capture attentions of words in sentences as well as sentences in documents. Different attention mechanisms can also be incorporated to capture other aspects of attentions. Evaluations show the CBA based method outperforms the state-of-the-art local context based attention methods significantly. This brings insight to how cognition grounded data can be brought into NLP tasks.
Published: 2017

9. Leveraging Eventive Information for Better Metaphor Detection and Classification

Author: Yunfei Long, I-Hsuan Chen, Chu-Ren Huang, and Qin Lu
Subjects: Metaphor, Event (computing), Computer science, business.industry, media_common.quotation_subject, 02 engineering and technology, 010501 environmental sciences, computer.software_genre, Chinese writing, 01 natural sciences, Identification (information), Classifier (linguistics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Representation (mathematics), Set (psychology), business, computer, Natural language processing, 0105 earth and related environmental sciences, media_common
Abstract: Metaphor detection has been both challenging and rewarding in natural language processing applications. This study offers a new approach based on eventive information in detecting metaphors by leveraging the Chinese writing system, which is a culturally bound ontological system organized according to the basic concepts represented by radicals. As such, the information represented is available in all Chinese text without pre-processing. Since metaphor detection is another culturally based conceptual representation, we hypothesize that sub-textual information can facilitate the identification and classification of the types of metaphoric events denoted in Chinese text. We propose a set of syntactic conditions crucial to event structures to improve the model based on the classification of radical groups. With the proposed syntactic conditions, the model achieves a performance of 0.8859 in terms of F-scores, making 1.7% of improvement than the same classifier with only Bag-of-word features. Results show that eventive information can improve the effectiveness of metaphor detection. Event information is rooted in every language, and thus this approach has a high potential to be applied to metaphor detection in other languages.
Published: 2017

10. Representing Verbs with Rich Contexts: an Evaluation on Verb Similarity

Author: Philippe Blache, Chu-Ren Huang, Alessandro Lenci, Enrico Santus, Emmanuele Chersoni, Laboratoire Parole et Langage (LPL), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), The Hong Kong Polytechnic University [Hong Kong] (POLYU), University of Pisa - Università di Pisa, Department of Chinese and Bilingual Studies [Hong-Kong], ANR-16-CONV-0002,ILCB,ILCB: Institute of Language Communication and the Brain(2016), Pouchoulin, Gilles, and ILCB: Institute of Language Communication and the Brain - - ILCB2016 - ANR-16-CONV-0002 - CONV - VALID
Subjects: FOS: Computer and information sciences, Computer Science - Artificial Intelligence, Computer science, [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing, Context (language use), Verb, 02 engineering and technology, computer.software_genre, 050105 experimental psychology, Sentence processing, Task (project management), Linguistica computazionale, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, Semantica distribuzionale, 0501 psychology and cognitive sciences, Distributional semantics, verb similarity, Computer Science - Computation and Language, Mental lexicon, business.industry, 05 social sciences, Linguistica computazionale, Semantica distribuzionale, joint context, [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, Artificial Intelligence (cs.AI), 020201 artificial intelligence & image processing, Artificial intelligence, business, Computation and Language (cs.CL), computer, Natural language processing, Word order
Abstract: Several studies on sentence processing suggest that the mental lexicon keeps track of the mutual expectations between words. Current DSMs, however, represent context words as separate features, thereby loosing important information for word expectations, such as word interrelations. In this paper, we present a DSM that addresses this issue by defining verb contexts as joint syntactic dependencies. We test our representation in a verb similarity task on two datasets, showing that joint contexts achieve performances comparable to single dependencies or even better. Moreover, they are able to overcome the data sparsity problem of joint feature spaces, in spite of the limited size of our training corpus., 5 pages
Published: 2016

11. Create a Manual Chinese Word Segmentation Dataset Using Crowdsourcing Method

Author: Yao Yao, Chu-Ren Huang, Angel Chan, and Shichang Wang
Subjects: Data collection, business.industry, Computer science, Text segmentation, Word error rate, Crowdsourcing, computer.software_genre, Data quality, Segmentation, Artificial intelligence, Chinese word, business, computer, Natural language processing, Sentence
Abstract: The manual Chinese word segmentation dataset WordSegCHC 1.0 which was built by eight crowdsourcing tasks conducted on the Crowdflower platform contains the manual word segmentation data of 152 Chinese sentences whose length ranges from 20 to 46 characters without punctuations. All the sentences received 200 segmentation responses in their corresponding crowdsourcing tasks and the numbers of valid response of them range from 123 to 143 (each sentence was segmented by more than 120 subjects). We also proposed an evaluation method called manual segmentation error rate (MSER) to evaluate the dataset; the MSER of the dataset is proved to be very low which indicates reliable data quality. In this work, we applied the crowdsourcing method to Chinese word segmentation task and the results confirmed again that the crowdsourcing method is a promising tool for linguistic data collection; the framework of crowdsourcing linguistic data collection used in this work can be reused in similar tasks; the resultant dataset filled a gap in Chinese language resources to the best of our knowledge, and it has potential applications in the research of word intuition of Chinese speakers and Chinese language processing.
Published: 2015

12. LLT-PolyU: Identifying Sentiment Intensity in Ironic Tweets

Author: Enrico Santus, Chu-Ren Huang, Anna Laszlo, and Hongzhi Xu
Subjects: Training set, Sarcasm, Computer science, business.industry, media_common.quotation_subject, Decision tree, Regression analysis, computer.software_genre, Task (project management), Irony, Identification (information), Artificial intelligence, business, computer, Natural language processing, media_common
Abstract: In this paper, we describe the system we built for Task 11 of SemEval2015, which aims at identifying the sentiment intensity of figurative language in tweets. We use various features, including those specially concerned with the identification of irony and sarcasm. The features are evaluated through a decision tree regression model and a support vector regression model. The experiment result of the fivecross validation on the training data shows that the tree regression model outperforms the support vector regression model. The former is therefore used for the final evaluation of the task. The results show that our model performs especially well in predicting the sentiment intensity of tweets involving irony and sarcasm.
Published: 2015

13. EVALution 1.0: an Evolving Semantic Dataset for Training and Evaluation of Distributional Semantic Models

Author: Frances Yung, Enrico Santus, Chu-Ren Huang, and Alessandro Lenci
Subjects: Relation (database), Computer science, business.industry, WordNet, Extension (predicate logic), Semantic field, computer.software_genre, Meronymy, Word lists by frequency, Semantic computing, Data mining, Artificial intelligence, Tuple, business, computer, Natural language processing
Abstract: In this paper, we introduce EVALution 1.0, a dataset designed for the training and the evaluation of Distributional Semantic Models (DSMs). This version consists of almost 7.5K tuples, instantiating several semantic relations between word pairs (including hypernymy, synonymy, antonymy, meronymy). The dataset is enriched with a large amount of additional information (i.e. relation domain, word frequency, word POS, word semantic field, etc.) that can be used for either filtering the pairs or performing an in-depth analysis of the results. The tuples were extracted from a combination of ConceptNet 5.0 and WordNet 4.0, and subsequently filtered through automatic methods and crowdsourcing in order to ensure their quality. The dataset is freely downloadable1. An extension in RDF format, including also scripts for data processing, is under development.
Published: 2015

14. What You Need to Know about Chinese for Chinese Language Processing

Author: Chu-Ren Huang
Subjects: Computer science, Need to know, Language technology, Chinese language, Computational linguistics, Linguistics, Task (project management)
Abstract: The synergy between language sciences and language technology has been an elusive one for the computational linguistics community, especially when dealing with a language other than English. The reasons are two-fold: the lack of an accessible comprehensive and robust account of a specific language so as to allow strategic linking between a processing task to linguistic devices, and the lack of successful computational studies taking advantage of such links. With a fast growing number of available online resources, as well as a rapidly increasing number of members of the CL community who are interested in and/or working on Chinese language processing, the time is ripe to take a serious look at how knowledge of Chinese can help Chinese language processing.
Published: 2015

15. CWN-LMF

Author: Lung-Hao Lee, Chu-Ren Huang, and Shu-Kai Hsieh
Subjects: Lexical choice, Lexical semantics, Lexical functional grammar, business.industry, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, WordNet, Future application, computer.software_genre, Data exchange, Lexical Markup Framework, Artificial intelligence, business, computer, Natural language processing, Collaborative Application Markup Language
Abstract: Lexical Markup Framework (LMF, ISO-24613) is the ISO standard which provides a common standardized framework for the construction of natural language processing lexicons. LMF facilitates data exchange among computational linguistic resources, and also promises a convenient uniformity for future application. This study describes the design and implementation of the WordNet-LMF used to represent lexical semantics in Chinese WordNet. The compiled CWN-LMF will be released to the community for linguistic researches.
Published: 2009

16. A framework of feature selection methods for text categorization

Author: Shoushan Li, Chu-Ren Huang, Chengqing Zong, and Rui Xia
Subjects: Text categorization, business.industry, Computer science, Feature selection, Artificial intelligence, Ratio measurement, business, Machine learning, computer.software_genre, computer, Odds, Task (project management)
Abstract: In text categorization, feature selection (FS) is a strategy that aims at making text classifiers more efficient and accurate. However, when dealing with a new task, it is still difficult to quickly select a suitable one from various FS methods provided by many previous studies. In this paper, we propose a theoretic framework of FS methods based on two basic measurements: frequency measurement and ratio measurement. Then six popular FS methods are in detail discussed under this framework. Moreover, with the guidance of our theoretical analysis, we propose a novel method called weighed frequency and odds (WFO) that combines the two measurements with trained weights. The experimental results on data sets from both topic-based and sentiment classification tasks show that this new method is robust across different tasks and numbers of selected features.
Published: 2009

17. Query expansion using LMF-compliant lexical resources

Author: Virach Sornlertlamvanich, Kiyoaki Shirai, Thatsanee Charoenporn, Dain Kaplan, Takenobu Tokunaga, Shu-Kai Hsieh, Chu-Ren Huang, Monica Monachini, Nicoletta Calzolari, Yingju Xia, and Claudia Soria
Subjects: Information retrieval, Computer science, business.industry, International standard, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, computer.software_genre, Query language, Query expansion, Robustness (computer science), Lexical Markup Framework, Artificial intelligence, business, computer, Natural language processing, RDF query language, computer.programming_language
Abstract: This paper reports prototype multilingual query expansion system relying on LMF compliant lexical resources. The system is one of the deliverables of a three-year project aiming at establishing an international standard for language resources which is applicable to Asian languages. Our important contributions to ISO 24613, standard Lexical Markup Framework (LMF) include its robustness to deal with Asian languages, and its applicability to cross-lingual query tasks, as illustrated by the prototype introduced in this paper.
Published: 2009

18. Multilingual conceptual access to lexicon based on shared orthography

Author: Chiyo Hotani, Wan Ying Lin, Sheng-Yi Chen, Chu-Ren Huang, and Ya-Min Chou
Subjects: Class (computer programming), Kanji, Computer science, business.industry, WordNet, Ontology (information science), Lexicon, computer.software_genre, Linguistics, Writing system, Artificial intelligence, business, Generative lexicon, computer, Orthography, Natural language processing
Abstract: In this paper we propose a model for conceptual access to multilingual lexicon based on shared orthography. Our proposal relies crucially on two facts: That both Chinese and Japanese conventionally use Chinese orthography in their respective writing systems, and that the Chinese orthography is anchored on a system of radical parts which encodes basic concepts. Each orthographic unit, called hanzi and kanji respectively, contains a radical which indicates the broad semantic class of the meaning of that unit. Our study utilizes the homomorphism between the Chinese hanzi and Japanese kanji systems to identify bilingual word correspondences. We use bilingual dictionaries, including WordNet, to verify semantic relation between the crosslingual pairs. These bilingual pairs are then mapped to an ontology constructed based on relations to the relation between the meaning of each character and the basic concept of their radical parts. The conceptual structure of the radical ontology is proposed as a model for simultaneous conceptual access to both languages. A study based on words containing characters composed of the "(mouth)" radical is given to illustrate the proposal and the actual model. The fact that this model works for two typologically very different languages and that the model contains generative lexicon like coersive links suggests that this model has the conceptual robustness to be applied to other languages.
Published: 2008

19. Automatic discovery of named entity variants

Author: Petr Šimon, Shu-Kai Hsieh, and Chu-Ren Huang
Subjects: Information retrieval, Grammar, Computer science, business.industry, media_common.quotation_subject, Context (language use), Bootstrapping (linguistics), computer.software_genre, Named entity, Entity linking, Identification (information), Named-entity recognition, Artificial intelligence, business, Semantic Web, computer, Natural language processing, media_common
Abstract: Identification of transliterated names is a particularly difficult task of Named Entity Recognition (NER), especially in the Chinese context. Of all possible variations of transliterated named entities, the difference between PRC and Taiwan is the most prevalent and most challenging. In this paper, we introduce a novel approach to the automatic extraction of diverging transliterations of foreign named entities by bootstrapping co-occurrence statistics from tagged and segmented Chinese corpus. Preliminary experiment yields promising results and shows its potential in NLP applications.
Published: 2007

20. Rethinking Chinese word segmentation

Author: Chu-Ren Huang, Petr Šimon, Laurent Prevot, and Shu-Kai Hsieh
Subjects: Computer science, business.industry, Segmentation-based object categorization, Speech recognition, Lexical analysis, String (computer science), Text segmentation, computer.software_genre, Speech segmentation, Identification (information), Character (mathematics), Tokenization (data security), Segmentation, Artificial intelligence, business, computer, Natural language processing
Abstract: This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is to find a robust segmentation method that requires no prior lexical knowledge and no extensive training to adapt to new types of data. The challenge in modelling human cognition and acquisition it to segment words efficiently without using knowledge of wordhood. We propose a radical method of word segmentation to meet both challenges. The most critical concept that we introduce is that Chinese word segmentation is the classification of a string of character-boundaries (CB's) into either word-boundaries (WB's) and non-word-boundaries. In Chinese, CB's are delimited and distributed in between two characters. Hence we can use the distributional properties of CB among the background character strings to predict which CB's are WB's.
Published: 2007

21. Towards agent-based cross-lingual interoperability of distributed lexical resources

Author: Chu-Ren Huang, Andrea Marchetti, Monica Monachini, Claudia Soria, Maurizio Tesconi, Nicoletta Calzolari, and Francesca Bertagna
Subjects: World Wide Web, Cross lingual, ComputingMethodologies_PATTERNRECOGNITION, Information retrieval, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Interoperability, Cross-domain interoperability, Semantic interoperability, ComputingMethodologies_ARTIFICIALINTELLIGENCE
Abstract: In this paper we present an application fostering the integration and interoperability of computational lexicons, focusing on the particular case of mutual linking and cross-lingual enrichment of two wordnets, the ItalWordNet and Sinica BOW lexicons. This is intended as a case-study investigating the needs and requirements of semi-automatic integration and interoperability of lexical resources.
Published: 2006

22. When conset meets synset

Author: Shu-Kai Hsieh and Chu-Ren Huang
Subjects: business.industry, Computer science, Context (language use), computer.software_genre, Task (project management), Resource (project management), Lexical resource, Ontology, Artificial intelligence, Chinese characters, business, Set (psychology), Construct (philosophy), computer, Natural language processing
Abstract: This paper describes an on-going project concerning with an ontological lexical resource based on the abundant conceptual information grounded on Chinese characters. The ultimate goal of this project is set to construct a cognitively sound and computationally effective character-grounded machine-understandable resource. Philosophically, Chinese ideogram has its ontological status, but its applicability to the NLP task has not been expressed explicitly in terms of language resource. We thus propose the first attempt to locate Chinese characters within the context of ontology. Having the primary success in applying it to some NLP tasks, we believe that the construction of this knowledge resource will shed new light on theoretical setting as well as the construction of Chinese lexical semantic resources.
Published: 2006

23. Conceptual metaphors

Author: Chu-Ren Huang, Siaw-Fong Chung, and Kathleen Ahrens
Subjects: Knowledge representation and reasoning, business.industry, Computer science, Operational definition, Suggested Upper Merged Ontology, Representation (systemics), Conceptual metaphor, Ontology (information science), computer.software_genre, Domain (software engineering), Ontology, Upper ontology, Artificial intelligence, business, computer, Natural language processing
Abstract: The goal of this paper is to integrate the Conceptual Mapping Model with an ontology-based knowledge representation (i.e. Suggested Upper Merged Ontology (SUMO)) in order to demonstrate that conceptual metaphor analysis can be restricted and eventually, automated. In particular, we will propose a corpora-based operational definition for Mapping Principles, which are explanations of why a conventional conceptual metaphor has a particular source-target domain pairing. This paper will examine 2000 random examples of 'economy' (jingji) in Mandarin Chinese and postulate Mapping Principles based frequency and delimited with SUMO.
Published: 2003

24. OLACMS

Author: Chu-Ren Huang and Ru-Yng Chang
Subjects: Metadata, Set (abstract data type), Computer science, business.industry, Artificial intelligence, computer.software_genre, business, computer, Natural language processing
Abstract: OLACMS (stands for Open Language Archives Community Metadata Set) is a standard for describe language resources. This paper provides suggestion to OLACMS 0.4 version by comparing it with other standards and applying it to Chinese and Formosan languages.
Published: 2002

25. Induction of classification from lexicon expansion

Author: Chu-Ren Huang, Echa Chang, Changhua Yang, and Sue-Jin Ker
Subjects: Information retrieval, business.industry, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, WordNet, computer.software_genre, Lexicon, eXtended WordNet, Semantic network, Categorization, Taxonomy (general), Artificial intelligence, business, computer, Natural language processing
Abstract: We present in this paper a series of induced methods to assign domain tags to WordNet entries. Our prime objective is to enrich the contextual information in WordNet specific to each synset entry. By using the available lexical sources such as Far East Dictionary and the contextual information in WordNet itself, we can find a foundation upon which we can base our categorization. Next we further examine the similarity between common lexical taxonomy and the semantic hierarchy of WordNet. Based on this observation and the knowledge of other semantic relations we enlarge the coverage of our findings in a systematic way. Evaluation of the results shows that we achieved reasonable and satisfactory accuracy. We propose this as the first step of wordnet expansion into a bona fide semantic network linked to real-world knowledge.
Published: 2002

26. Translating lexical semantic relations

Author: Chu-Ren Huang, Dylan B. S. Tsai, and I-Ju E. Tseng
Subjects: business.industry, Computer science, WordNet, Ontology (information science), Logical entailment, computer.software_genre, Lexical item, Linguistics, Semantic computing, Ontology, Equivalence relation, Artificial intelligence, business, computer, Word (computer architecture), Natural language processing
Abstract: Establishing correspondences between wordnets of different languages is essential to both multilingual knowledge processing and for bootstrapping wordnets of low-density languages. We claim that such correspondences must be based on lexical semantic relations, rather than top ontology or word translations. In particular, we define a translation equivalence relation as a bilingual lexical semantic relation. Such relations can then be part of a logical entailment predicting whether source language semantic relations will hold in a target language or not. Our claim is tested with a study of 210 Chinese lexical lemmas and their possible semantic relations links bootstrapped from the Princeton WordNet. The results show that lexical semantic relation translations are indeed highly precise when they are logically inferable.
Published: 2002

27. Categorical ambiguity and information content

Author: Chu-Ren Huang and Ru-Yng Chang
Subjects: Structure (mathematical logic), Ambiguity resolution, business.industry, Computer science, media_common.quotation_subject, Grammatical category, Ambiguity, Resolution (logic), computer.software_genre, Symbol (chemistry), Artificial intelligence, business, Categorical variable, computer, Natural language processing, media_common
Abstract: Assignment of grammatical categories is the fundamental step in natural language processing. And ambiguity resolution is one of the most challenging NLP tasks that is currently still beyond the power of machines. When two questions are combined together, the problem of resolution of categorical ambiguity is what a computational linguistic system can do reasonably good, but yet still unable to mimic the excellence of human beings. This task is even more challenging in Chinese language processing because of the poverty of morphological information to mark categories and the lack of convention to mark word boundaries. In this paper, we try to investigate the nature of categorical ambiguity in Chinese based on Sinica Corpus. The study differs crucially from previous studies in that it directly measure information content as the degree of ambiguity. This method not only offers an alternative interpretation of ambiguity, it also allows a different measure of success of categorical disambiguation. Instead of precision or recall, we can also measure by how much the information load has been reduced. This approach also allows us to identify which are the most ambiguous words in terms of information content. The somewhat surprising result actually reinforces the Saussurian view that underlying the systemic linguistic structure, assignment of linguistic content for each linguistic symbol is arbitrary.
Published: 2002

28. Sinica Treebank

Author: Keh-Jiann Chen, Fengyi Chen, Zhao-Ming Gao, Kuang-Yu Chen, and Chu-Ren Huang
Subjects: Annotation, Information retrieval, business.industry, Interface (Java), Computer science, Treebank, Phrase structure rules, Artificial intelligence, Line (text file), business, computer.software_genre, computer, Natural language processing
Abstract: This paper describes the design criteria and annotation guidelines of Sinica Treebank. The three design criteria are: Maximal Resource Sharing, Minimal Structural Complexity, and Optimal Semantic Information. One of the important design decisions following these criteria is the encoding of thematic role information. An on-line interface facilitating empirical studies of Chinese phrase structure is also described.
Published: 2000

29. Segmentation standard for Chinese natural language processing

Author: Li-Li Chang, Keh-Jiann Chen, and Chu-Ren Huang
Subjects: Set (abstract data type), Computer science, Segmentation-based object categorization, business.industry, Theoretical definition, Segmentation, Artificial intelligence, computer.software_genre, Lexicon, business, computer, Natural language processing, Word (computer architecture)
Abstract: This paper proposes a segmentation standard for Chinese natural language processing. The standard is proposed to achieve linguistic felicity, computational feasibility, and data uniformity. Linguistic felicity is maintained by defining a segmentation unit to be equivalent to the theoretical definition of word, and by providing a set of segmentation principles that are equivalent to a functional definition of a word. Computational feasibility is ensured by the fact that the above functional definitions are procedural in nature and can be converted to segmentation algorithms, as well as by the implementable heuristic guidelines which deal with specific linguistic categories. Data uniformity is achieved by stratification of the standard itself and by defining a standard lexicon as part of the segmentation standard.
Published: 1996

30. Character-based collocation for Mandarin Chinese

Author: Keh-Jiann Chen, Chu-Ren Huang, and Yun-yan Yang
Subjects: Character (mathematics), Collocation, business.industry, Computer science, language, Artificial intelligence, business, computer.software_genre, Mandarin Chinese, computer, language.human_language, Natural language processing
Published: 1994

31. Wiktionary and NLP

Author: Navarro, Emmanuel, primary, Sajous, Franck, additional, Gaume, Bruno, additional, Prévot, Laurent, additional, ShuKai, Hsieh, additional, Tzu-Yi, Kuo, additional, Magistry, Pierre, additional, and Chu-Ren, Huang, additional
Published: 2009
Full Text: View/download PDF

32. A Chinese corpus for linguistic research

Author: Keh-Jiann Chen and Chu-Ren Huang
Subjects: business.industry, Computer science, Dual (grammatical number), computer.software_genre, Mandarin Chinese, language.human_language, Linguistics, Corpus linguistics, language, Classical Chinese, Chinese language, Artificial intelligence, business, computer, Natural language processing
Abstract: This is a project note on the first stage of the construction of a comprehensive corpus of both Modern and Classical Chinese. The corpus is built with the dual aim of serving as the central database for Chinese language processing and for supporting in-depth linguistic research in Mandarin Chinese.
Published: 1992

33. Information-based Case Grammar

Author: Keh-Jiann Chen and Chu-Ren Huang
Subjects: Parsing, Unification, Computer science, Formalism (philosophy), business.industry, Context (language use), Case grammar, computer.software_genre, Syntax, Adjunct, Lexical item, Feature (linguistics), TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES, Artificial intelligence, business, computer, Feature structure, Natural language processing
Abstract: In this paper we propose a framework of Information-based Case Grammar (ICG). This grammatical formalism entails that the lexical entry for each word contain both semantic and syntactic feature structures. In the feature structure of a phrasal head, we encode syntactic and semantic constraints on grammatical phrasal patterns in terms of thematic structures, and encode the precedence relations in terms of adjunct structures. Such feature structures denote partial information which defines the set of legal phrases. They also provide sufficient information to identify thematic roles. With this formalism, parsing and thematic analysis can be achieved simultaneously. Due to the simplicity and flexibility of Information-based Case Grammar, context dependent and discontinuous relations such as agreements, coordinations, long-distance dependencies, and control and binding, can be easily expressed. ICG is a kind of unification-based formalism. Therefore it inherits the advantages of unification-based formalisms and more.
Published: 1990

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

36 results on '"Chu-Ren Huang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources