"Language Modeling" / Publication Type: Electronic Resources / Publisher: association for computational linguistics - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Language Modeling"' showing total 19 results

Start Over "Language Modeling" Publication Type Electronic Resources Publisher association for computational linguistics

19 results on '"Language Modeling"'

1. How Long is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling

Author: Cahyawijaya, Samuel, Wilie, Bryan, Lovenia, Holy, Zhong, Mingqian, Zhong, Huan, Ip, Nancy Yuk-yu, Fung, Pascale, Cahyawijaya, Samuel, Wilie, Bryan, Lovenia, Holy, Zhong, Mingqian, Zhong, Huan, Ip, Nancy Yuk-yu, and Fung, Pascale
Abstract: Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA. However, the applicability of these methods to real clinical use cases is hindered, due to the limitation of pre-trained LMs in processing long textual data with thousands of words, which is a common length for a clinical note. In this work, we explore long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context. We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record (EHR) system to show the effectiveness and generalizability of this concept, achieving ~10% F1-score improvement. Based on our experiments, we conclude that capturing a longer clinical note interval is beneficial to the model performance, but there are different cut-off intervals to achieve the optimal performance for different target variables.
Published: 2022

2. The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives

Author: Voita, Elena, Sennrich, Rico; https://orcid.org/0000-0002-1438-4741, Titov, Ivan, Voita, Elena, Sennrich, Rico; https://orcid.org/0000-0002-1438-4741, and Titov, Ivan
Abstract: We seek to understand how the representations of individual tokens and the structure of the learned feature space evolve between layers in deep neural networks under different learning objectives. We chose the Transformers for our analysis as they have been shown effective with various tasks, including machine translation (MT), standard left-to-right language models (LM) and masked language modeling (MLM). Previous work used black-box probing tasks to show that the representations learned by the Transformer differ significantly depending on the objective. In this work, we use canonical correlation analysis and mutual information estimators to study how information flows across Transformer layers and observe that the choice of the objective determines this process. For example, as you go from bottom to top layers, information about the past in left-to-right language models gets vanished and predictions about the future get formed. In contrast, for MLM, representations initially acquire information about the context around the token, partially forgetting the token identity and producing a more generalized token representation. The token identity then gets recreated at the top MLM layers.
Published: 2019

3. Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning

Author: Winata, Genta Indra ECE, Madotto, Andrea, Wu, Chien-sheng, Fung, Pascale Ngan, Winata, Genta Indra ECE, Madotto, Andrea, Wu, Chien-sheng, and Fung, Pascale Ngan
Abstract: Lack of text data has been the major issue on code-switching language modeling. In this paper, we introduce multitask learning based language model which shares syntax representation of languages to leverage linguistic information and tackle the low resource data issue. Our model jointly learns both language modeling and Part-of-Speech tagging on codeswitched utterances. In this way, the model is able to identify the location of code-switching points and improves the prediction of next word. Our approach outperforms standard LSTM based language model, with an improvement of 9.7% and 7.4% in perplexity on SEAME Phase I and Phase II dataset respectively.
Published: 2018

4. Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning

Author: Winata, Genta Indra ECE, Madotto, Andrea, Wu, Chien-sheng, Fung, Pascale Ngan, Winata, Genta Indra ECE, Madotto, Andrea, Wu, Chien-sheng, and Fung, Pascale Ngan
Abstract: Lack of text data has been the major issue on code-switching language modeling. In this paper, we introduce multitask learning based language model which shares syntax representation of languages to leverage linguistic information and tackle the low resource data issue. Our model jointly learns both language modeling and Part-of-Speech tagging on codeswitched utterances. In this way, the model is able to identify the location of code-switching points and improves the prediction of next word. Our approach outperforms standard LSTM based language model, with an improvement of 9.7% and 7.4% in perplexity on SEAME Phase I and Phase II dataset respectively.
Published: 2018

5. Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning

Author: Winata, Genta Indra, Madotto, Andrea, Wu, Chien-sheng, Fung, Pascale, Winata, Genta Indra, Madotto, Andrea, Wu, Chien-sheng, and Fung, Pascale
Abstract: Lack of text data has been the major issue on code-switching language modeling. In this paper, we introduce multitask learning based language model which shares syntax representation of languages to leverage linguistic information and tackle the low resource data issue. Our model jointly learns both language modeling and Part-of-Speech tagging on codeswitched utterances. In this way, the model is able to identify the location of code-switching points and improves the prediction of next word. Our approach outperforms standard LSTM based language model, with an improvement of 9.7% and 7.4% in perplexity on SEAME Phase I and Phase II dataset respectively.
Published: 2018

6. Fast, Small and Exact: Infinite-order Language Modeling with Compressed Suffix Trees

Author: Shareghi, E, Petri, M, Haffari, G, Cohn, T, Shareghi, E, Petri, M, Haffari, G, and Cohn, T
Abstract: Efficient methods for storing and querying are critical for scaling high-order m-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query runtimes up to 2500×, despite only incurring a modest increase in construction time and memory usage. For large corpora and high Markov orders, our method is highly competitive with the state-of-the-art KenLM package. It imposes much lower memory requirements, often by orders of magnitude, and has runtimes that are either similar (for training) or comparable (for querying).
Published: 2016

7. Structural Supervision Improves Learning of Non-Local Grammatical Dependencies

Author: Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences, MIT-IBM Watson AI Lab, Wilcox, Ethan, Qian, Peng, Futrell, Richard, Ballesteros, Miguel, Levy, Roger P, Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences, MIT-IBM Watson AI Lab, Wilcox, Ethan, Qian, Peng, Futrell, Richard, Ballesteros, Miguel, and Levy, Roger P
Abstract: © 2019 Association for Computational Linguistics State-of-the-art LSTM language models trained on large corpora learn sequential contingencies in impressive detail and have been shown to acquire a number of non-local grammatical dependencies with some success. Here we investigate whether supervision with hierarchical structure enhances learning of a range of grammatical dependencies, a question that has previously been addressed only for subject-verb agreement. Using controlled experimental methods from psycholinguistics, we compare the performance of word-based LSTM models versus two models that represent hierarchical structure and deploy it in left-to-right processing: Recurrent Neural Network Grammars (RNNGs) (Dyer et al., 2016) and a incrementalized version of the Parsing-as-Language-Modeling configuration from Charniak et al. (2016). Models are tested on a diverse range of configurations for two classes of non-local grammatical dependencies in English-Negative Polarity licensing and Filler-Gap Dependencies. Using the same training data across models, we find that structurally-supervised models outperform the LSTM, with the RNNG demonstrating best results on both types of grammatical dependencies and even learning many of the Island Constraints on the filler-gap dependency. Structural supervision thus provides data efficiency advantages over purely string-based training of neural language models in acquiring human-like generalizations about non-local grammatical dependencies.
Published: 2022

8. You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

Author: Li, Haoran, Song, Yangqiu, Fan, Lixin, Li, Haoran, Song, Yangqiu, and Fan, Lixin
Abstract: Social chatbots, also known as chit-chat chatbots, evolve rapidly with large pretrained language models. Despite the huge progress, privacy concerns have arisen recently: training data of large language models can be extracted via model inversion attacks. On the other hand, the datasets used for training chatbots contain many private conversations between two individuals. In this work, we further investigate the privacy leakage of the hidden states of chatbots trained by language modeling which has not been well studied yet. We show that speakers' personas can be inferred through a simple neural network with high accuracy. To this end, we propose effective defense objectives to protect persona leakage from hidden states. We conduct extensive experiments to demonstrate that our proposed defense objectives can greatly reduce the attack accuracy from 37.6% to 0.5%. Meanwhile, the proposed objectives preserve language models' powerful generation ability. © 2022 Association for Computational Linguistics.
Published: 2022

9. Controlled text generation using dictionary prior in variational autoencoders

Author: Fang, Xianghong, Li, Jian, Shang, Lifeng, Jiang, Xin, Liu, Qun, Yeung, Dit Yan, Fang, Xianghong, Li, Jian, Shang, Lifeng, Jiang, Xin, Liu, Qun, and Yeung, Dit Yan
Abstract: While variational autoencoders (VAEs) have been widely applied in text generation tasks, they are troubled by two challenges: insufficient representation capacity and poor controllability. The former results from the posterior collapse and restrictive assumption, which impede better representation learning. The latter arises as continuous latent variables in traditional formulations hinder VAEs from interpretability and controllability. In this paper, we propose Dictionary Prior (DPrior), a new data-driven prior that enjoys the merits of expressivity and controllability. To facilitate controlled text generation with DPrior, we propose to employ contrastive learning to separate the latent space into several parts. Extensive experiments on both language modeling and controlled text generation demonstrate the effectiveness of the proposed approach.
Published: 2022

10. Distributionally Robust Recurrent Decoders with Random Network Distillation

Author: Miceli Barone, Antonio Valerio, Birch, Alexandra, Sennrich, Rico, Miceli Barone, Antonio Valerio, Birch, Alexandra, and Sennrich, Rico
Abstract: Neural machine learning models can successfully model language that is similar to their training distribution, but they are highly susceptible to degradation under distribution shift, which occurs in many practical applications when processing out-of-domain (OOD) text. This has been attributed to “shortcut learning”":" relying on weak correlations over arbitrary large contexts. We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to automatically disregard OOD context during inference, smoothly transitioning towards a less expressive but more robust model as the data becomes more OOD, while retaining its full context capability when operating in-distribution. We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets.
Published: 2022

11. Structural Supervision Improves Learning of Non-Local Grammatical Dependencies

Abstract: © 2019 Association for Computational Linguistics State-of-the-art LSTM language models trained on large corpora learn sequential contingencies in impressive detail and have been shown to acquire a number of non-local grammatical dependencies with some success. Here we investigate whether supervision with hierarchical structure enhances learning of a range of grammatical dependencies, a question that has previously been addressed only for subject-verb agreement. Using controlled experimental methods from psycholinguistics, we compare the performance of word-based LSTM models versus two models that represent hierarchical structure and deploy it in left-to-right processing: Recurrent Neural Network Grammars (RNNGs) (Dyer et al., 2016) and a incrementalized version of the Parsing-as-Language-Modeling configuration from Charniak et al. (2016). Models are tested on a diverse range of configurations for two classes of non-local grammatical dependencies in English-Negative Polarity licensing and Filler-Gap Dependencies. Using the same training data across models, we find that structurally-supervised models outperform the LSTM, with the RNNG demonstrating best results on both types of grammatical dependencies and even learning many of the Island Constraints on the filler-gap dependency. Structural supervision thus provides data efficiency advantages over purely string-based training of neural language models in acquiring human-like generalizations about non-local grammatical dependencies.
Published: 2021

12. Meta-Transfer Learning for Code-Switched Speech Recognition

Author: Winata, Genta Indra, Cahyawijaya, Samuel, Lin, Zhaojiang, Liu, Zihan, Xu, Peng, Fung, Pascale Ngan, Winata, Genta Indra, Cahyawijaya, Samuel, Lin, Zhaojiang, Liu, Zihan, Xu, Peng, and Fung, Pascale Ngan
Abstract: An increasing number of people in the world today speak a mixed-language as a result of being multilingual. However, building a speech recognition system for code-switching remains difficult due to the availability of limited resources and the expense and significant effort required to collect mixed-language data. We therefore propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting by judiciously extracting information from high-resource monolingual datasets. Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data. Based on experimental results, our model outperforms existing baselines on speech recognition and language modeling tasks, and is faster to converge.
Published: 2020

13. Meta-Transfer Learning for Code-Switched Speech Recognition

Author: Winata, Genta Indra, Cahyawijaya, Samuel, Lin, Zhaojiang, Liu, Zihan, Xu, Peng, Fung, Pascale Ngan, Winata, Genta Indra, Cahyawijaya, Samuel, Lin, Zhaojiang, Liu, Zihan, Xu, Peng, and Fung, Pascale Ngan
Abstract: An increasing number of people in the world today speak a mixed-language as a result of being multilingual. However, building a speech recognition system for code-switching remains difficult due to the availability of limited resources and the expense and significant effort required to collect mixed-language data. We therefore propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting by judiciously extracting information from high-resource monolingual datasets. Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data. Based on experimental results, our model outperforms existing baselines on speech recognition and language modeling tasks, and is faster to converge.
Published: 2020

14. Meta-Transfer Learning for Code-Switched Speech Recognition

Author: Winata, Genta Indra, Cahyawijaya, Samuel, Lin, Zhaojiang, Liu, Zihan, Xu, Peng, Fung, Pascale, Winata, Genta Indra, Cahyawijaya, Samuel, Lin, Zhaojiang, Liu, Zihan, Xu, Peng, and Fung, Pascale
Abstract: An increasing number of people in the world today speak a mixed-language as a result of being multilingual. However, building a speech recognition system for code-switching remains difficult due to the availability of limited resources and the expense and significant effort required to collect mixed-language data. We therefore propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting by judiciously extracting information from high-resource monolingual datasets. Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data. Based on experimental results, our model outperforms existing baselines on speech recognition and language modeling tasks, and is faster to converge.
Published: 2020

15. Syntax-driven iterative expansion language models for controllable text generation

Author: Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Casas Manzanares, Noé, Rodríguez Fonollosa, José Adrián, Ruiz Costa-Jussà, Marta, Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla, Casas Manzanares, Noé, Rodríguez Fonollosa, José Adrián, and Ruiz Costa-Jussà, Marta
Abstract: The dominant language modeling paradigm handles text as a sequence of discrete tokens. While that approach can capture the latent structure of the text, it is inherently constrained to sequential dynamics for text generation. We propose a new paradigm for introducing a syntactic inductive bias into neural text generation, where the dependency parse tree is used to drive the Transformer model to generate sentences iteratively. Our experiments show that this paradigm is effective at text generation, with quality between LSTMs and Transformers, and comparable diversity, requiring less than half their decoding steps, and its generation process allows direct control over the syntactic constructions of the generated text, enabling the induction of stylistic variations., This work is partially supported by Lucy Software / United Language Group (ULG) and the Catalan Agency for Management of University and Research Grants (AGAUR) through an Industrial Ph.D. Grant. This work also is supported in part by the Spanish Ministerio de Economía y Competitividad, the European Regional Development Fund through the postdoctoral senior grant Ramón y Cajal and by the Agencia Estatal de Investigación through the projects EUR2019-103819, PCIN2017-079 and PID2019-107579RB-I00 / AEI / 10.13039/501100011033, Peer Reviewed, Postprint (published version)
Published: 2020

16. Deep syntax language models and statistical machine translation

Author: Graham, Yvette, van Genabith, Josef, Graham, Yvette, and van Genabith, Josef
Abstract: Hierarchical Models increase the reordering capabilities of MT systems by introducing non-terminal symbols to phrases that map source language (SL) words/phrases to the correct position in the target language (TL) translation. Building translations via discontiguous TL phrases increases the difficulty of language modeling, however, introducing the need for heuristic techniques such as cube pruning (Chiang, 2005), for example. An additional possibility to aid language modeling in hierarchical systems is to use a language model that models fluency of words not using their local context in the string, as in traditional language models, but instead using the deeper context of a word. In this paper, we explore the potential of deep syntax language models providing an interesting comparison with the traditional string-based language model. We include an experimental evaluation that compares the two kinds of models independently of any MT system to investigate the possible potential of integrating a deep syntax language model into Hierarchical SMT systems.
Published: 2010

17. Deep syntax language models and statistical machine translation

Author: Graham, Yvette, van Genabith, Josef, Graham, Yvette, and van Genabith, Josef
Abstract: Hierarchical Models increase the reordering capabilities of MT systems by introducing non-terminal symbols to phrases that map source language (SL) words/phrases to the correct position in the target language (TL) translation. Building translations via discontiguous TL phrases increases the difficulty of language modeling, however, introducing the need for heuristic techniques such as cube pruning (Chiang, 2005), for example. An additional possibility to aid language modeling in hierarchical systems is to use a language model that models fluency of words not using their local context in the string, as in traditional language models, but instead using the deeper context of a word. In this paper, we explore the potential of deep syntax language models providing an interesting comparison with the traditional string-based language model. We include an experimental evaluation that compares the two kinds of models independently of any MT system to investigate the possible potential of integrating a deep syntax language model into Hierarchical SMT systems.
Published: 2010

18. Deep syntax language models and statistical machine translation

Author: Graham, Yvette, van Genabith, Josef, Graham, Yvette, and van Genabith, Josef
Abstract: Hierarchical Models increase the reordering capabilities of MT systems by introducing non-terminal symbols to phrases that map source language (SL) words/phrases to the correct position in the target language (TL) translation. Building translations via discontiguous TL phrases increases the difficulty of language modeling, however, introducing the need for heuristic techniques such as cube pruning (Chiang, 2005), for example. An additional possibility to aid language modeling in hierarchical systems is to use a language model that models fluency of words not using their local context in the string, as in traditional language models, but instead using the deeper context of a word. In this paper, we explore the potential of deep syntax language models providing an interesting comparison with the traditional string-based language model. We include an experimental evaluation that compares the two kinds of models independently of any MT system to investigate the possible potential of integrating a deep syntax language model into Hierarchical SMT systems.
Published: 2010

19. Deep syntax language models and statistical machine translation

Author: Graham, Yvette, van Genabith, Josef, Graham, Yvette, and van Genabith, Josef
Abstract: Hierarchical Models increase the reordering capabilities of MT systems by introducing non-terminal symbols to phrases that map source language (SL) words/phrases to the correct position in the target language (TL) translation. Building translations via discontiguous TL phrases increases the difficulty of language modeling, however, introducing the need for heuristic techniques such as cube pruning (Chiang, 2005), for example. An additional possibility to aid language modeling in hierarchical systems is to use a language model that models fluency of words not using their local context in the string, as in traditional language models, but instead using the deeper context of a word. In this paper, we explore the potential of deep syntax language models providing an interesting comparison with the traditional string-based language model. We include an experimental evaluation that compares the two kinds of models independently of any MT system to investigate the possible potential of integrating a deep syntax language model into Hierarchical SMT systems.
Published: 2010

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

19 results on '"Language Modeling"'

1. How Long is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling

2. The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives

3. Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning

4. Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning

5. Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning

6. Fast, Small and Exact: Infinite-order Language Modeling with Compressed Suffix Trees

7. Structural Supervision Improves Learning of Non-Local Grammatical Dependencies

8. You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

9. Controlled text generation using dictionary prior in variational autoencoders

10. Distributionally Robust Recurrent Decoders with Random Network Distillation

11. Structural Supervision Improves Learning of Non-Local Grammatical Dependencies

12. Meta-Transfer Learning for Code-Switched Speech Recognition

13. Meta-Transfer Learning for Code-Switched Speech Recognition

14. Meta-Transfer Learning for Code-Switched Speech Recognition

15. Syntax-driven iterative expansion language models for controllable text generation

16. Deep syntax language models and statistical machine translation

17. Deep syntax language models and statistical machine translation

18. Deep syntax language models and statistical machine translation

19. Deep syntax language models and statistical machine translation

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

19 results on '"Language Modeling"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources