Author: "Collier, Nigel" / Topic: computer science - machine learning - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Collier, Nigel"' showing total 19 results

Start Over Author "Collier, Nigel" Topic computer science - machine learning

19 results on '"Collier, Nigel"'

1. Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments

Author: Zhou, Han, Wan, Xingchen, Liu, Yinhong, Collier, Nigel, Vulić, Ivan, and Korhonen, Anna
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: Large language models (LLMs) have shown promising abilities as cost-effective and reference-free evaluators for assessing language generation quality. In particular, pairwise LLM evaluators, which compare two generated texts and determine the preferred one, have been employed in a wide range of applications. However, LLMs exhibit preference biases and worrying sensitivity to prompt designs. In this work, we first reveal that the predictive preference of LLMs can be highly brittle and skewed, even with semantically equivalent instructions. We find that fairer predictive preferences from LLMs consistently lead to judgments that are better aligned with humans. Motivated by this phenomenon, we propose an automatic Zero-shot Evaluation-oriented Prompt Optimization framework, ZEPO, which aims to produce fairer preference decisions and improve the alignment of LLM evaluators with human judgments. To this end, we propose a zero-shot learning objective based on the preference decision fairness. ZEPO demonstrates substantial performance improvements over state-of-the-art LLM evaluators, without requiring labeled data, on representative meta-evaluation benchmarks. Our findings underscore the critical correlation between preference fairness and human alignment, positioning ZEPO as an efficient prompt optimizer for bridging the gap between LLM evaluators and human judgments., Comment: EMNLP 2024
Published: 2024

2. TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

Author: Li, Chengzu, Zhang, Caiqi, Zhou, Han, Collier, Nigel, Korhonen, Anna, and Vulić, Ivan
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Top-view perspective denotes a typical way in which humans read and reason over different types of maps, and it is vital for localization and navigation of humans as well as of `non-human' agents, such as the ones backed by large Vision-Language Models (VLMs). Nonetheless, spatial reasoning capabilities of modern VLMs remain unattested and underexplored. In this work, we thus study their capability to understand and reason over spatial relations from the top view. The focus on top view also enables controlled evaluations at different granularity of spatial reasoning; we clearly disentangle different abilities (e.g., recognizing particular objects versus understanding their relative positions). We introduce the TopViewRS (Top-View Reasoning in Space) dataset, consisting of 11,384 multiple-choice questions with either realistic or semantic top-view map as visual input. We then use it to study and evaluate VLMs across 4 perception and reasoning tasks with different levels of complexity. Evaluation of 10 representative open- and closed-source VLMs reveals the gap of more than 50% compared to average human performance, and it is even lower than the random baseline in some cases. Although additional experiments show that Chain-of-Thought reasoning can boost model capabilities by 5.82% on average, the overall performance of VLMs remains limited. Our findings underscore the critical need for enhanced model capability in top-view spatial reasoning and set a foundation for further research towards human-level proficiency of VLMs in real-world multimodal tasks., Comment: 9 pages, 3 figures, 3 tables (21 pages, 4 figures, 15 tables including references and appendices)
Published: 2024

3. Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators

Author: Liu, Yinhong, Zhou, Han, Guo, Zhijiang, Shareghi, Ehsan, Vulić, Ivan, Korhonen, Anna, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) have demonstrated promising capabilities as automatic evaluators in assessing the quality of generated natural language. However, LLMs still exhibit biases in evaluation and often struggle to generate coherent evaluations that align with human assessments. In this work, we first conduct a systematic study of the misalignment between LLM evaluators and human judgement, revealing that existing calibration methods aimed at mitigating biases are insufficient for effectively aligning LLM evaluators. Inspired by the use of preference data in RLHF, we formulate the evaluation as a ranking problem and introduce Pairwise-preference Search (PairS), an uncertainty-guided search method that employs LLMs to conduct pairwise comparisons and efficiently ranks candidate texts. PairS achieves state-of-the-art performance on representative evaluation tasks and demonstrates significant improvements over direct scoring. Furthermore, we provide insights into the role of pairwise preference in quantifying the transitivity of LLMs and demonstrate how PairS benefits from calibration., Comment: This paper has been accepted by COLM 2024
Published: 2024

4. FireAct: Toward Language Agent Fine-tuning

Author: Chen, Baian, Shu, Chang, Shareghi, Ehsan, Collier, Nigel, Narasimhan, Karthik, and Yao, Shunyu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recent efforts have augmented language models (LMs) with external tools or environments, leading to the development of language agents that can reason and act. However, most of these agents rely on few-shot prompting techniques with off-the-shelf LMs. In this paper, we investigate and argue for the overlooked direction of fine-tuning LMs to obtain language agents. Using a setup of question answering (QA) with a Google search API, we explore a variety of base LMs, prompting methods, fine-tuning data, and QA tasks, and find language agents are consistently improved after fine-tuning their backbone LMs. For example, fine-tuning Llama2-7B with 500 agent trajectories generated by GPT-4 leads to a 77% HotpotQA performance increase. Furthermore, we propose FireAct, a novel approach to fine-tuning LMs with trajectories from multiple tasks and prompting methods, and show having more diverse fine-tuning data can further improve agents. Along with other findings regarding scaling effects, robustness, generalization, efficiency and cost, our work establishes comprehensive benefits of fine-tuning LMs for agents, and provides an initial set of experimental designs, insights, as well as open questions toward language agent fine-tuning., Comment: Code, data, and models are available at https://fireact-agent.github.io
Published: 2023

5. Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

Author: Fu, Zihao, Lam, Wai, Yu, Qian, So, Anthony Man-Cho, Hu, Shengding, Liu, Zhiyuan, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task. Despite the significant advancements in applying language models to the seq2seq task, there is still a lack of thorough analysis on the effectiveness of the decoder-only language model architecture. This paper aims to address this gap by conducting a detailed comparison between the encoder-decoder architecture and the decoder-only language model framework through the analysis of a regularized encoder-decoder structure. This structure is designed to replicate all behaviors in the classical decoder-only language model but has an encoder and a decoder making it easier to be compared with the classical encoder-decoder structure. Based on the analysis, we unveil the attention degeneration problem in the language model, namely, as the generation step number grows, less and less attention is focused on the source sequence. To give a quantitative understanding of this problem, we conduct a theoretical sensitivity analysis of the attention output with respect to the source input. Grounded on our analysis, we propose a novel partial attention language model to solve the attention degeneration problem. Experimental results on machine translation, summarization, and data-to-text generation tasks support our analysis and demonstrate the effectiveness of our proposed model.
Published: 2023

6. A Stability Analysis of Fine-Tuning a Pre-Trained Model

Author: Fu, Zihao, So, Anthony Man-Cho, and Collier, Nigel
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Fine-tuning a pre-trained model (such as BERT, ALBERT, RoBERTa, T5, GPT, etc.) has proven to be one of the most promising paradigms in recent NLP research. However, numerous recent works indicate that fine-tuning suffers from the instability problem, i.e., tuning the same model under the same setting results in significantly different performance. Many recent works have proposed different methods to solve this problem, but there is no theoretical understanding of why and how these methods work. In this paper, we propose a novel theoretical stability analysis of fine-tuning that focuses on two commonly used settings, namely, full fine-tuning and head tuning. We define the stability under each setting and prove the corresponding stability bounds. The theoretical bounds explain why and how several existing methods can stabilize the fine-tuning procedure. In addition to being able to explain most of the observed empirical discoveries, our proposed theoretical analysis framework can also help in the design of effective and provable methods. Based on our theory, we propose three novel strategies to stabilize the fine-tuning procedure, namely, Maximal Margin Regularizer (MMR), Multi-Head Loss (MHLoss), and Self Unsupervised Re-Training (SURT). We extensively evaluate our proposed approaches on 11 widely used real-world benchmark datasets, as well as hundreds of synthetic classification datasets. The experiment results show that our proposed methods significantly stabilize the fine-tuning procedure and also corroborate our theoretical analysis.
Published: 2023

7. On the Effectiveness of Parameter-Efficient Fine-Tuning

Author: Fu, Zihao, Yang, Haoran, So, Anthony Man-Cho, Lam, Wai, Bing, Lidong, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. These methods achieve surprisingly good performance and are shown to be more stable than their corresponding fully fine-tuned counterparts. However, such kind of methods is still not well understood. Some natural questions arise: How does the parameter sparsity lead to promising performance? Why is the model more stable than the fully fine-tuned models? How to choose the tunable parameters? In this paper, we first categorize the existing methods into random approaches, rule-based approaches, and projection-based approaches based on how they choose which parameters to tune. Then, we show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. We indicate that the sparsity is actually imposing a regularization on the original model by controlling the upper bound of the stability. Such stability leads to better generalization capability which has been empirically observed in a lot of recent research works. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters. To better choose the tunable parameters, we propose a novel Second-order Approximation Method (SAM) which approximates the original problem with an analytically solvable optimization function. The tunable parameters are determined by directly optimizing the approximation function. The experimental results show that our proposed SAM model outperforms many strong baseline models and it also verifies our theoretical analysis.
Published: 2022

8. How to tackle an emerging topic? Combining strong and weak labels for Covid news NER

Author: Ficek, Aleksander, Liu, Fangyu, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Being able to train Named Entity Recognition (NER) models for emerging topics is crucial for many real-world applications especially in the medical domain where new topics are continuously evolving out of the scope of existing models and datasets. For a realistic evaluation setup, we introduce a novel COVID-19 news NER dataset (COVIDNEWS-NER) and release 3000 entries of hand annotated strongly labelled sentences and 13000 auto-generated weakly labelled sentences. Besides the dataset, we propose CONTROSTER, a recipe to strategically combine weak and strong labels in improving NER in an emerging topic through transfer learning. We show the effectiveness of CONTROSTER on COVIDNEWS-NER while providing analysis on combining weak and strong labels for training. Our key findings are: (1) Using weak data to formulate an initial backbone before tuning on strong data outperforms methods trained on only strong or weak data. (2) A combination of out-of-domain and in-domain weak label training is crucial and can overcome saturation when being training on weak labels from a single source., Comment: AACL-IJCNLP 2022
Published: 2022

9. On Reality and the Limits of Language Data: Aligning LLMs with Human Norms

Author: Collier, Nigel H., Liu, Fangyu, and Shareghi, Ehsan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recent advancements in Large Language Models (LLMs) harness linguistic associations in vast natural language data for practical applications. However, their ability to understand the physical world using only language data remains a question. After reviewing existing protocols, we explore this question using a novel and tightly controlled reasoning test (ART) and compare human norms against versions of GPT-3. Our findings highlight the categories of common-sense relations models that could learn directly from data and areas of weakness. GPT-3 offers evidence for verbal reasoning on a par with human subjects for several relations including Synonymy, Antonymy, and Default inheritance, Without reinforcement learning from human judgements, it appears GPT-3 performs at the lower end of the reference interval for Has-part and Contained-in. Weaknesses were observed also in affordance characteristics through Necessary-quality, Order-of-size and Order-of-intensity. Combining LLMs with symbolic world grounding is a promising direction to address associative learning., Comment: 9 pages; data available, see https://sites.google.com/site/nhcollier/projects/art
Published: 2022

10. Improving Word Translation via Two-Stage Contrastive Learning

Author: Li, Yaoyiran, Liu, Fangyu, Collier, Nigel, Korhonen, Anna, and Vulić, Ivan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Word translation or bilingual lexicon induction (BLI) is a key cross-lingual task, aiming to bridge the lexical gap between different languages. In this work, we propose a robust and effective two-stage contrastive learning framework for the BLI task. At Stage C1, we propose to refine standard cross-lingual linear maps between static word embeddings (WEs) via a contrastive learning objective; we also show how to integrate it into the self-learning procedure for even more refined cross-lingual maps. In Stage C2, we conduct BLI-oriented contrastive fine-tuning of mBERT, unlocking its word translation capability. We also show that static WEs induced from the `C2-tuned' mBERT complement static WEs from Stage C1. Comprehensive experiments on standard BLI datasets for diverse languages and different experimental setups demonstrate substantial gains achieved by our framework. While the BLI method from Stage C1 already yields substantial gains over all state-of-the-art BLI methods in our comparison, even stronger improvements are met with the full two-stage framework: e.g., we report gains for 112/112 BLI setups, spanning 28 language pairs., Comment: ACL 2022 Main
Published: 2022
Full Text: View/download PDF

11. Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking

Author: Liu, Fangyu, Vulić, Ivan, Korhonen, Anna, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Injecting external domain-specific knowledge (e.g., UMLS) into pretrained language models (LMs) advances their capability to handle specialised in-domain tasks such as biomedical entity linking (BEL). However, such abundant expert knowledge is available only for a handful of languages (e.g., English). In this work, by proposing a novel cross-lingual biomedical entity linking task (XL-BEL) and establishing a new XL-BEL benchmark spanning 10 typologically diverse languages, we first investigate the ability of standard knowledge-agnostic as well as knowledge-enhanced monolingual and multilingual LMs beyond the standard monolingual English BEL task. The scores indicate large gaps to English performance. We then address the challenge of transferring domain-specific knowledge in resource-rich languages to resource-poor ones. To this end, we propose and evaluate a series of cross-lingual transfer methods for the XL-BEL task, and demonstrate that general-domain bitext helps propagate the available English knowledge to languages with little to no in-domain data. Remarkably, we show that our proposed domain-specific transfer methods yield consistent gains across all target languages, sometimes up to 20 Precision@1 points, without any in-domain knowledge in the target language, and without any in-domain parallel data., Comment: ACL-IJCNLP 2021
Published: 2021

12. Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

Author: Liu, Fangyu, Vulić, Ivan, Korhonen, Anna, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years. However, previous work has indicated that off-the-shelf MLMs are not effective as universal lexical or sentence encoders without further task-specific fine-tuning on NLI, sentence similarity, or paraphrasing tasks using annotated task data. In this work, we demonstrate that it is possible to turn MLMs into effective universal lexical and sentence encoders even without any additional data and without any supervision. We propose an extremely simple, fast and effective contrastive learning technique, termed Mirror-BERT, which converts MLMs (e.g., BERT and RoBERTa) into such encoders in 20-30 seconds without any additional external knowledge. Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples, and aims to maximise their similarity during identity fine-tuning. We report huge gains over off-the-shelf MLMs with Mirror-BERT in both lexical-level and sentence-level tasks, across different domains and different languages. Notably, in the standard sentence semantic similarity (STS) tasks, our self-supervised Mirror-BERT model even matches the performance of the task-tuned Sentence-BERT models from prior work. Finally, we delve deeper into the inner workings of MLMs, and suggest some evidence on why this simple approach can yield effective universal lexical and sentence encoders., Comment: EMNLP 2021 camera-ready version
Published: 2021

13. Self-Alignment Pretraining for Biomedical Entity Representations

Author: Liu, Fangyu, Shareghi, Ehsan, Meng, Zaiqiao, Basaldella, Marco, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Despite the widespread success of self-supervised learning via masked language models (MLM), accurately capturing fine-grained semantic relationships in the biomedical domain remains a challenge. This is of paramount importance for entity-level tasks such as entity linking where the ability to model entity relations (especially synonymy) is pivotal. To address this challenge, we propose SapBERT, a pretraining scheme that self-aligns the representation space of biomedical entities. We design a scalable metric learning framework that can leverage UMLS, a massive collection of biomedical ontologies with 4M+ concepts. In contrast with previous pipeline-based hybrid systems, SapBERT offers an elegant one-model-for-all solution to the problem of medical entity linking (MEL), achieving a new state-of-the-art (SOTA) on six MEL benchmarking datasets. In the scientific domain, we achieve SOTA even without task-specific supervision. With substantial improvement over various domain-specific pretrained MLMs such as BioBERT, SciBERTand and PubMedBERT, our pretraining scheme proves to be both effective and robust., Comment: NAACL 2021 camera-ready version
Published: 2020

14. Learning Sparse Sentence Encoding without Supervision: An Exploration of Sparsity in Variational Autoencoders

Author: Prokhorov, Victor, Li, Yingzhen, Shareghi, Ehsan, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: It has been long known that sparsity is an effective inductive bias for learning efficient representation of data in vectors with fixed dimensionality, and it has been explored in many areas of representation learning. Of particular interest to this work is the investigation of the sparsity within the VAE framework which has been explored a lot in the image domain, but has been lacking even a basic level of exploration in NLP. Additionally, NLP is also lagging behind in terms of learning sparse representations of large units of text e.g., sentences. We use the VAEs that induce sparse latent representations of large units of text to address the aforementioned shortcomings. First, we move in this direction by measuring the success of unsupervised state-of-the-art (SOTA) and other strong VAE-based sparsification baselines for text and propose a hierarchical sparse VAE model to address the stability issue of SOTA. Then, we look at the implications of sparsity on text classification across 3 datasets, and highlight a link between performance of sparse latent representations on downstream tasks and its ability to encode task-related information.
Published: 2020

15. On the Importance of the Kullback-Leibler Divergence Term in Variational Autoencoders for Text Generation

Author: Prokhorov, Victor, Shareghi, Ehsan, Li, Yingzhen, Pilehvar, Mohammad Taher, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Variational Autoencoders (VAEs) are known to suffer from learning uninformative latent representation of the input due to issues such as approximated posterior collapse, or entanglement of the latent space. We impose an explicit constraint on the Kullback-Leibler (KL) divergence term inside the VAE objective function. While the explicit constraint naturally avoids posterior collapse, we use it to further understand the significance of the KL term in controlling the information transmitted through the VAE channel. Within this framework, we explore different properties of the estimated posterior distribution, and highlight the trade-off between the amount of information encoded in a latent code during training, and the generative capacity of the model., Comment: 10 pages; Accepted in 3rd Workshop on Neural Generation and Translation (WNGT 2019)
Published: 2019

16. Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces

Author: Prokhorov, Victor, Pilehvar, Mohammad Taher, Kartsaklis, Dimitri, Lio, Pietro, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Word embedding techniques heavily rely on the abundance of training data for individual words. Given the Zipfian distribution of words in natural language texts, a large number of words do not usually appear frequently or at all in the training data. In this paper we put forward a technique that exploits the knowledge encoded in lexical resources, such as WordNet, to induce embeddings for unseen words. Our approach adapts graph embedding and cross-lingual vector space transformation techniques in order to merge lexical knowledge encoded in ontologies with that derived from corpus statistics. We show that the approach can provide consistent performance improvements across multiple evaluation benchmarks: in-vitro, on multiple rare word similarity datasets, and in-vivo, in two downstream text classification tasks., Comment: Accepted for presentation at AAAI 2019
Published: 2018

17. Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs

Author: Kartsaklis, Dimitri, Pilehvar, Mohammad Taher, and Collier, Nigel
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: This paper addresses the problem of mapping natural language text to knowledge base entities. The mapping process is approached as a composition of a phrase or a sentence into a point in a multi-dimensional entity space obtained from a knowledge graph. The compositional model is an LSTM equipped with a dynamic disambiguation mechanism on the input word embeddings (a Multi-Sense LSTM), addressing polysemy issues. Further, the knowledge base space is prepared by collecting random walks from a graph enhanced with textual features, which act as a set of semantic bridges between text and knowledge base entities. The ideas of this work are demonstrated on large-scale text-to-entity mapping and entity classification tasks, with state of the art results., Comment: Accepted for presentation at EMNLP 2018 (main conference)
Published: 2018

18. On Reality and the Limits of Language Data: Aligning LLMs with Human Norms

Author: Collier, Nigel H., Liu, Fangyu, and Shareghi, Ehsan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), Neural Networks, Computer Science - Artificial Intelligence, Artificial Intelligence, Computer Science, Machine learning, Linguistics, Computation and Language (cs.CL), Machine Learning (cs.LG), Natural Language Processing
Abstract: Recent advancements in Large Language Models (LLMs) harness linguistic associations in vast natural language data for practical applications. However, their ability to understand the physical world using only language data remains a question. After reviewing existing protocols, we explore this question using a novel and tightly controlled reasoning test (ART) and compare human norms against versions of GPT-3. Our findings highlight the categories of common-sense relations models that could learn directly from data and areas of weakness. GPT-3 offers evidence for verbal reasoning on a par with human subjects for several relations including Synonymy, Antonymy, and Default inheritance, Without reinforcement learning from human judgements, it appears GPT-3 performs at the lower end of the reference interval for Has-part and Contained-in. Weaknesses were observed also in affordance characteristics through Necessary-quality, Order-of-size and Order-of-intensity. Combining LLMs with symbolic world grounding is a promising direction to address associative learning., Comment: 9 pages; data available, see https://sites.google.com/site/nhcollier/projects/art
Published: 2023

19. Unseen word representation by aligning heterogeneous lexical semantic spaces

Author: Mohammad Taher Pilehvar, Nigel Collier, Pietro Liò, Victor Prokhorov, Dimitri Kartsaklis, Prokhorov, Victor [0000-0002-5843-8756], Lio, Pietro [0000-0002-0540-5053], Collier, Nigel [0000-0002-7230-4164], and Apollo - University of Cambridge Repository
Subjects: FOS: Computer and information sciences, 050101 languages & linguistics, Computer Science - Machine Learning, Word embedding, Graph embedding, Computer science, Computer Science - Artificial Intelligence, cs.LG, WordNet, 02 engineering and technology, computer.software_genre, Machine Learning (cs.LG), Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, Computer Science - Computation and Language, business.industry, 05 social sciences, cs.CL, General Medicine, cs.AI, Artificial Intelligence (cs.AI), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Computation and Language (cs.CL), Natural language processing, Word (computer architecture), Natural language, Merge (linguistics)
Abstract: Word embedding techniques heavily rely on the abundance of training data for individual words. Given the Zipfian distribution of words in natural language texts, a large number of words do not usually appear frequently or at all in the training data. In this paper we put forward a technique that exploits the knowledge encoded in lexical resources, such as WordNet, to induce embeddings for unseen words. Our approach adapts graph embedding and cross-lingual vector space transformation techniques in order to merge lexical knowledge encoded in ontologies with that derived from corpus statistics. We show that the approach can provide consistent performance improvements across multiple evaluation benchmarks: in-vitro, on multiple rare word similarity datasets, and in- vivo, in two downstream text classification tasks., MRC
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

19 results on '"Collier, Nigel"'

1. Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments

2. TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

3. Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators

4. FireAct: Toward Language Agent Fine-tuning

5. Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

6. A Stability Analysis of Fine-Tuning a Pre-Trained Model

7. On the Effectiveness of Parameter-Efficient Fine-Tuning

8. How to tackle an emerging topic? Combining strong and weak labels for Covid news NER

9. On Reality and the Limits of Language Data: Aligning LLMs with Human Norms

10. Improving Word Translation via Two-Stage Contrastive Learning

11. Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking

12. Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

13. Self-Alignment Pretraining for Biomedical Entity Representations

14. Learning Sparse Sentence Encoding without Supervision: An Exploration of Sparsity in Variational Autoencoders

15. On the Importance of the Kullback-Leibler Divergence Term in Variational Autoencoders for Text Generation

16. Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces

17. Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs

18. On Reality and the Limits of Language Data: Aligning LLMs with Human Norms

19. Unseen word representation by aligning heterogeneous lexical semantic spaces

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

19 results on '"Collier, Nigel"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources