Descriptor: "Cross-lingual learning" / Publication Type: Electronic Resources - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Cross-lingual learning"' showing total 14 results

Start Over Descriptor "Cross-lingual learning" Publication Type Electronic Resources

14 results on '"Cross-lingual learning"'

1. From Tokens to Trees: Mapping Syntactic Structures in the Deserts of Data-Scarce Languages

Author: Vilares, David, Muñoz Ortiz, Alberto, Vilares, David, and Muñoz Ortiz, Alberto
Abstract: [Abstract]: Low-resource learning in natural language processing focuses on developing effective resources, tools, and technologies for languages that are less popular within the industry and academia. This effort is crucial for several reasons, including ensuring that as many languages as possible are represented digitally, and enhancing access to language technologies for native speakers of minority languages. In this context, this paper outlines the motivation, research lines, and results from a Leonardo Grant - by FBBVA - on low-resource languages and parsing as sequence labeling. The project’s primary aim was to devise fast and accurate methods for low-resource syntactic parsing and to examine evaluation strategies as well as strengths and weaknesses in comparison to alternative parsing strategies.
Published: 2024

2. Source-Free Transductive Transfer Learning for Structured Prediction

Author: Kurniawan, Kemal Maulana and Kurniawan, Kemal Maulana
Abstract: Current transfer learning approaches require two strong assumptions: the source domain data is available and the target domain has labelled data. These assumptions are problematic when both the source domain data is private and the target domain has no labelled data. Thus, we consider the source-free unsupervised transfer setup in which the assumptions are violated across both languages and domains (genres). To transfer structured prediction models in the source-free setting, we propose two methods: Parsimonious Parser Transfer (PPT) designed for single-source transfer of dependency parsers across languages, and PPTX which is the multi-source version of PPT. Both methods outperform baselines. We then propose to improve PPTX with logarithmic opinion pooling (PPTX-LOP), and find that it is an effective multi-source transfer method for structured prediction in general. Next, we study if our proposed source-free transfer methods provide improvements when pretrained language models (PTLMs) are employed. We first propose Parsimonious Transfer for Sequence Tagging (PTST) which is a variation of PPT designed for sequence tagging. Then, we evaluate PTST and PPTX-LOP on domain adaptation of semantic tasks using PTLMs. We show that for globally normalised models, PTST and PPTX-LOP improve precision and recall respectively. Besides unlabelled data, the target domain may have models trained on various tasks (but not the task of interest). To investigate if these models can be used successfully to improve performance in source-free transfer, we propose two methods. We find that leveraging these models can improve recall over direct transfer with one of the proposed methods. Finally, we critically discuss and conclude the findings in this thesis. We cover relevant subsequent work and close with a discussion on limitations and future work.
Published: 2023

3. Cross-Lingual and Genre-Supervised Parsing and Tagging for Low-Resource Spoken Data

Author: Fosteri, Iliana and Fosteri, Iliana
Abstract: Dealing with low-resource languages is a challenging task, because of the absence of sufficient data to train machine-learning models to make predictions on these languages. One way to deal with this problem is to use data from higher-resource languages, which enables the transfer of learning from these languages to the low-resource target ones. The present study focuses on dependency parsing and part-of-speech tagging of low-resource languages belonging to the spoken genre, i.e., languages whose treebank data is transcribed speech. These are the following: Beja, Chukchi, Komi-Zyrian, Frisian-Dutch, and Cantonese. Our approach involves investigating different types of transfer languages, employing MACHAMP, a state-of-the-art parser and tagger that uses contextualized word embeddings, mBERT, and XLM-R in particular. The main idea is to explore how the genre, the language similarity, none of the two, or the combination of those affect the model performance in the aforementioned downstream tasks for our selected target treebanks. Our findings suggest that in order to capture speech-specific dependency relations, we need to incorporate at least a few genre-matching source data, while language similarity-matching source data are a better candidate when the task at hand is part-of-speech tagging. We also explore the impact of multi-task learning in one of our proposed methods, but we observe minor differences in the model performance.
Published: 2023

4. Cross-Lingual and Genre-Supervised Parsing and Tagging for Low-Resource Spoken Data

Author: Fosteri, Iliana and Fosteri, Iliana
Abstract: Dealing with low-resource languages is a challenging task, because of the absence of sufficient data to train machine-learning models to make predictions on these languages. One way to deal with this problem is to use data from higher-resource languages, which enables the transfer of learning from these languages to the low-resource target ones. The present study focuses on dependency parsing and part-of-speech tagging of low-resource languages belonging to the spoken genre, i.e., languages whose treebank data is transcribed speech. These are the following: Beja, Chukchi, Komi-Zyrian, Frisian-Dutch, and Cantonese. Our approach involves investigating different types of transfer languages, employing MACHAMP, a state-of-the-art parser and tagger that uses contextualized word embeddings, mBERT, and XLM-R in particular. The main idea is to explore how the genre, the language similarity, none of the two, or the combination of those affect the model performance in the aforementioned downstream tasks for our selected target treebanks. Our findings suggest that in order to capture speech-specific dependency relations, we need to incorporate at least a few genre-matching source data, while language similarity-matching source data are a better candidate when the task at hand is part-of-speech tagging. We also explore the impact of multi-task learning in one of our proposed methods, but we observe minor differences in the model performance.
Published: 2023

5. Translation-Based Implicit Annotation Projection for Zero-Shot Cross-Lingual Event Argument Extraction

Author: Lou, Chenwei, Gao, Jun, Yu, Changlong, Wang, Wei, Zhao, Huan, Tu, Weiwei, Xu, Ruifeng, Lou, Chenwei, Gao, Jun, Yu, Changlong, Wang, Wei, Zhao, Huan, Tu, Weiwei, and Xu, Ruifeng
Abstract: Zero-shot cross-lingual event argument extraction (EAE) is a challenging yet practical problem in Information Extraction. Most previous works heavily rely on external structured linguistic features, which are not easily accessible in real-world scenarios. This paper investigates a translation-based method to implicitly project annotations from the source language to the target language. With the use of translation-based parallel corpora, no additional linguistic features are required during training and inference. As a result, the proposed approach is more cost effective than previous works on zero-shot cross-lingual EAE. Moreover, our implicit annotation projection approach introduces less noises and hence is more effective and robust than explicit ones. Experimental results show that our model achieves the best performance, outperforming a number of competitive baselines. The thorough analysis further demonstrates the effectiveness of our model compared to explicit annotation projection approaches. © 2022 ACM.
Published: 2022

6. Cross-Lingual Word Embeddings

Author: Søgaard, Anders, Vulić, Ivan, Ruder, Sebastian, Faruqui, Manaal, Søgaard, Anders, Vulić, Ivan, Ruder, Sebastian, and Faruqui, Manaal
Abstract: The majority of natural language processing (NLP) is English language processing, and while there is good language technology support for (standard varieties of) English, support for Albanian, Burmese, or Cebuano-and most other languages-remains limited. Being able to bridge this digital divide is important for scientific and democratic reasons but also represents an enormous growth potential. A key challenge for this to happen is learning to align basic meaning-bearing units of different languages. In this book, the authors survey and discuss recent and historical work on supervised and unsupervised learning of such alignments. Specifically, the book focuses on so-called cross-lingual word embeddings. The survey is intended to be systematic, using consistent notation and putting the available methods on comparable form, making it easy to compare wildly different approaches. In so doing, the authors establish previously unreported relations between these methods and are able to present a fast-growing literature in a very compact way. Furthermore, the authors discuss how best to evaluate cross-lingual word embedding methods and survey the resources available for students and researchers interested in this topic. Table of Contents: Preface / Introduction / Monolingual Word Embedding Models / Cross-Lingual Word Embedding Models: Typology / A Brief History of Cross-Lingual Word Representations / Word-Level Alignment Models / Sentence-Level Alignment Methods / Document-Level Alignment Models / From Bilingual to Multilingual Training / Unsupervised Learning of Cross-Lingual Word Embeddings / Applications and Evaluation / Useful Data and Software / General Challenges and Future Directions / Bibliography / Authors' Biographies.
Published: 2019

7. Learning Language-Independent Representations of Verbs and Adjectives from Multimodal Retrieval

Author: Hansen, Victor Petren Bach, Sogaard, Anders, Hansen, Victor Petren Bach, and Sogaard, Anders
Abstract: This paper presents a simple modification to previous work on learning cross-lingual, grounded word representations from image-word pairs that, unlike previous work, is robust across different parts of speech, e.g., able to find the translation of the adjective 'social' relying only on image features associated with its translation candidates. Our method does not rely on black-box image search engines or any direct cross-lingual supervision. We evaluate our approach on English-German and English-Japanese word alignment, as well as on existing English-German bilingual dictionary induction datasets.
Published: 2019

8. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Author: Täckström, Oscar and Täckström, Oscar
Abstract: Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it
Published: 2013

9. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Author: Täckström, Oscar and Täckström, Oscar
Abstract: Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it
Published: 2013

10. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Author: Täckström, Oscar and Täckström, Oscar
Abstract: Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it
Published: 2013

11. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Author: Täckström, Oscar and Täckström, Oscar
Abstract: Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it
Published: 2013

12. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Author: Täckström, Oscar and Täckström, Oscar
Abstract: Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it
Published: 2013

13. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Author: Täckström, Oscar and Täckström, Oscar
Abstract: Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it
Published: 2013

14. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Author: Täckström, Oscar and Täckström, Oscar
Abstract: Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it
Published: 2013

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

14 results on '"Cross-lingual learning"'

1. From Tokens to Trees: Mapping Syntactic Structures in the Deserts of Data-Scarce Languages

2. Source-Free Transductive Transfer Learning for Structured Prediction

3. Cross-Lingual and Genre-Supervised Parsing and Tagging for Low-Resource Spoken Data

4. Cross-Lingual and Genre-Supervised Parsing and Tagging for Low-Resource Spoken Data

5. Translation-Based Implicit Annotation Projection for Zero-Shot Cross-Lingual Event Argument Extraction

6. Cross-Lingual Word Embeddings

7. Learning Language-Independent Representations of Verbs and Adjectives from Multimodal Retrieval

8. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

9. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

10. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

11. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

12. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

13. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

14. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

14 results on '"Cross-lingual learning"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources