Author: "Iacobacci, Ignacio" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Iacobacci, Ignacio"' showing total 47 results

Start Over Author "Iacobacci, Ignacio"

47 results on '"Iacobacci, Ignacio"'

1. Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency

Author: Gee, Leonidas, Gritta, Milan, Lampouras, Gerasimos, and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language
Abstract: Code Language Models have been trained to generate accurate solutions, typically with no regard for runtime. On the other hand, previous works that explored execution optimisation have observed corresponding drops in functional correctness. To that end, we introduce Code-Optimise, a framework that incorporates both correctness (passed, failed) and runtime (quick, slow) as learning signals via self-generated preference data. Our framework is both lightweight and robust as it dynamically selects solutions to reduce overfitting while avoiding a reliance on larger models for learning signals. Code-Optimise achieves significant improvements in pass@k while decreasing the competitive baseline runtimes by an additional 6% for in-domain data and up to 3% for out-of-domain data. As a byproduct, the average length of the generated solutions is reduced by up to 48% on MBPP and 23% on HumanEval, resulting in faster and cheaper inference. The generated data and codebase will be open-sourced at www.open-source.link., Comment: Under review at ARR (for EMNLP 2024)
Published: 2024

2. HumanRankEval: Automatic Evaluation of LMs as Conversational Assistants

Author: Gritta, Milan, Lampouras, Gerasimos, and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language
Abstract: Language models (LMs) as conversational assistants recently became popular tools that help people accomplish a variety of tasks. These typically result from adapting LMs pretrained on general domain text sequences through further instruction-tuning and possibly preference optimisation methods. The evaluation of such LMs would ideally be performed using human judgement, however, this is not scalable. On the other hand, automatic evaluation featuring auxiliary LMs as judges and/or knowledge-based tasks is scalable but struggles with assessing conversational ability and adherence to instructions. To help accelerate the development of LMs as conversational assistants, we propose a novel automatic evaluation task: HumanRankEval (HRE). It consists of a large-scale, diverse and high-quality set of questions, each with several answers authored and scored by humans. To perform evaluation, HRE ranks these answers based on their log-likelihood under the LM's distribution, and subsequently calculates their correlation with the corresponding human rankings. We support HRE's efficacy by investigating how efficiently it separates pretrained and instruction-tuned LMs of various sizes. We show that HRE correlates well with human judgements and is particularly responsive to model changes following instruction-tuning., Comment: Accepted to NACCL 2024 main conference
Published: 2024

3. MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation

Author: Tudosiu, Petru-Daniel, Yang, Yongxin, Zhang, Shifeng, Chen, Fei, McDonagh, Steven, Lampouras, Gerasimos, Iacobacci, Ignacio, and Parisot, Sarah
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Text-to-image generation has achieved astonishing results, yet precise spatial controllability and prompt fidelity remain highly challenging. This limitation is typically addressed through cumbersome prompt engineering, scene layout conditioning, or image editing techniques which often require hand drawn masks. Nonetheless, pre-existing works struggle to take advantage of the natural instance-level compositionality of scenes due to the typically flat nature of rasterized RGB output images. Towards adressing this challenge, we introduce MuLAn: a novel dataset comprising over 44K MUlti-Layer ANnotations of RGB images as multilayer, instance-wise RGBA decompositions, and over 100K instance images. To build MuLAn, we developed a training free pipeline which decomposes a monocular RGB image into a stack of RGBA layers comprising of background and isolated instances. We achieve this through the use of pretrained general-purpose models, and by developing three modules: image decomposition for instance discovery and extraction, instance completion to reconstruct occluded areas, and image re-assembly. We use our pipeline to create MuLAn-COCO and MuLAn-LAION datasets, which contain a variety of image decompositions in terms of style, composition and complexity. With MuLAn, we provide the first photorealistic resource providing instance decomposition and occlusion information for high quality images, opening up new avenues for text-to-image generative AI research. With this, we aim to encourage the development of novel generation and editing technology, in particular layer-wise solutions. MuLAn data resources are available at https://MuLAn-dataset.github.io/., Comment: CVPR 2024 - Project page: https://MuLAn-dataset.github.io/
Published: 2024

4. Findings of the First Workshop on Simulating Conversational Intelligence in Chat

Author: Graham, Yvette, Qureshi, Mohammed Rameez, Khalid, Haider, Lampouras, Gerasimos, Iacobacci, Ignacio, and Liu, Qun
Subjects: Computer Science - Computation and Language
Abstract: The aim of this workshop is to bring together experts working on open-domain dialogue research. In this speedily advancing research area many challenges still exist, such as learning information from conversations, engaging in realistic and convincing simulation of human intelligence and reasoning. SCI-CHAT follows previous workshops on open domain dialogue but with a focus on the simulation of intelligent conversation as judged in a live human evaluation. Models aim to include the ability to follow a challenging topic over a multi-turn conversation, while positing, refuting and reasoning over arguments. The workshop included both a research track and shared task. The main goal of this paper is to provide an overview of the shared task and a link to an additional paper that will include an in depth analysis of the shared task results following presentation at the workshop.
Published: 2024

5. Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis

Author: Gorinski, Philip John, Zimmer, Matthieu, Lampouras, Gerasimos, Deik, Derrick Goh Xin, and Iacobacci, Ignacio
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Programming Languages
Abstract: The advent of large pre-trained language models in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained with a Language Modelling (LM) objective. In addition, the property of programming language code being precisely evaluable with respect to its semantics -- through the use of Unit Tests to check its functional correctness -- lends itself to using Reinforcement Learning (RL) as a further training paradigm. Previous work has shown that RL can be applied as such to improve models' coding capabilities; however, such RL-based methods rely on a reward signal based on defined Unit Tests, which are much harder to obtain compared to the huge crawled code datasets used in LM objectives. In this work, we present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests, suitable for RL training of Code Synthesis models. We also introduce a straightforward, simple yet effective Actor-Critic RL training scheme and show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code language model's performance by up to 9.9% improvement over the original underlying code synthesis LM, and up to 4.3% over RL-based models trained with standard PPO or CodeRL., Comment: 9 pages + 4 pages appendix; 4 Figures, 4 Tables, 1 Algorithm; Accepted to Findings of EMNLP 2023
Published: 2023

6. A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems

Author: Hu, Songbo, Zhou, Han, Yuan, Moy, Gritta, Milan, Zhang, Guchun, Iacobacci, Ignacio, Korhonen, Anna, and Vulić, Ivan
Subjects: Computer Science - Computation and Language
Abstract: Achieving robust language technologies that can perform well across the world's many languages is a central goal of multilingual NLP. In this work, we take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue (ToD) systems. We first define new quantitative measures of absolute and relative equivalence in system performance, capturing disparities across languages and within individual languages. Through a series of controlled experiments, we demonstrate that performance disparities depend on a number of factors: the nature of the ToD task at hand, the underlying pretrained language model, the target language, and the amount of ToD annotated data. We empirically prove the existence of the adaptation and intrinsic biases in current ToD systems: e.g., ToD systems trained for Arabic or Turkish using annotated ToD data fully parallel to English ToD data still exhibit diminished ToD task performance. Beyond providing a series of insights into the performance disparities of ToD systems in different languages, our analyses offer practical tips on how to approach ToD data collection and system development for new languages., Comment: Accepted to EMNLP 2023
Published: 2023

7. Correct and Optimal: the Regular Expression Inference Challenge

Author: Valizadeh, Mojtaba, Gorinski, Philip John, Iacobacci, Ignacio, and Berger, Martin
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Formal Languages and Automata Theory
Abstract: We propose regular expression inference (REI) as a challenge for code/language modelling, and the wider machine learning community. REI is a supervised machine learning (ML) and program optimisation task, and poses the problem of finding minimal regular expressions from examples: Given two finite sets of strings $P$ and $N$ and a cost function $cost(\cdot)$, the task is to generate an expression $r$ that accepts all strings in $P$ and rejects all strings in $N$, while no other such expression $r'$ exists with $cost(r')
Published: 2023

8. Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

Author: Hu, Songbo, Zhou, Han, Hergul, Mete, Gritta, Milan, Zhang, Guchun, Iacobacci, Ignacio, Vulić, Ivan, and Korhonen, Anna
Subjects: Computer Science - Computation and Language
Abstract: Creating high-quality annotated data for task-oriented dialog (ToD) is known to be notoriously difficult, and the challenges are amplified when the goal is to create equitable, culturally adapted, and large-scale ToD datasets for multiple languages. Therefore, the current datasets are still very scarce and suffer from limitations such as translation-based non-native dialogs with translation artefacts, small scale, or lack of cultural adaptation, among others. In this work, we first take stock of the current landscape of multilingual ToD datasets, offering a systematic overview of their properties and limitations. Aiming to reduce all the detected limitations, we then introduce Multi3WOZ, a novel multilingual, multi-domain, multi-parallel ToD dataset. It is large-scale and offers culturally adapted dialogs in 4 languages to enable training and evaluation of multilingual and cross-lingual ToD systems. We describe a complex bottom-up data collection process that yielded the final dataset, and offer the first sets of baseline scores across different ToD-related tasks for future reference, also highlighting its challenging nature., Comment: A pre-MIT Press publication version for TACL
Published: 2023

9. Topic-Aware Response Generation in Task-Oriented Dialogue with Unstructured Knowledge Access

Author: Feng, Yue, Lampouras, Gerasimos, and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language
Abstract: To alleviate the problem of structured databases' limited coverage, recent task-oriented dialogue systems incorporate external unstructured knowledge to guide the generation of system responses. However, these usually use word or sentence level similarities to detect the relevant knowledge context, which only partially capture the topical level relevance. In this paper, we examine how to better integrate topical information in knowledge grounded task-oriented dialogue and propose ``Topic-Aware Response Generation'' (TARG), an end-to-end response generation model. TARG incorporates multiple topic-aware attention mechanisms to derive the importance weighting scheme over dialogue utterances and external knowledge sources towards a better understanding of the dialogue history. Experimental results indicate that TARG achieves state-of-the-art performance in knowledge selection and response generation, outperforming previous state-of-the-art by 3.2, 3.6, and 4.2 points in EM, F1 and BLEU-4 respectively on Doc2Dial, and performing comparably with previous work on DSTC9; both being knowledge-grounded task-oriented dialogue datasets., Comment: Findings of EMNLP 2022
Published: 2022

10. EntityCS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric Code Switching

Author: Whitehouse, Chenxi, Christopoulou, Fenia, and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language
Abstract: Accurate alignment between languages is fundamental for improving cross-lingual pre-trained language models (XLMs). Motivated by the natural phenomenon of code-switching (CS) in multilingual speakers, CS has been used as an effective data augmentation method that offers language alignment at the word- or phrase-level, in contrast to sentence-level via parallel instances. Existing approaches either use dictionaries or parallel sentences with word alignment to generate CS data by randomly switching words in a sentence. However, such methods can be suboptimal as dictionaries disregard semantics, and syntax might become invalid after random word switching. In this work, we propose EntityCS, a method that focuses on Entity-level Code-Switching to capture fine-grained cross-lingual semantics without corrupting syntax. We use Wikidata and English Wikipedia to construct an entity-centric CS corpus by switching entities to their counterparts in other languages. We further propose entity-oriented masking strategies during intermediate model training on the EntityCS corpus for improving entity prediction. Evaluation of the trained models on four entity-centric downstream tasks shows consistent improvements over the baseline with a notable increase of 10% in Fact Retrieval. We release the corpus and models to assist research on code-switching and enriching XLMs with external knowledge., Comment: Findings of EMNLP 2022
Published: 2022

11. Training Dynamics for Curriculum Learning: A Study on Monolingual and Cross-lingual NLU

Author: Christopoulou, Fenia, Lampouras, Gerasimos, and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language
Abstract: Curriculum Learning (CL) is a technique of training models via ranking examples in a typically increasing difficulty trend with the aim of accelerating convergence and improving generalisability. Current approaches for Natural Language Understanding (NLU) tasks use CL to improve in-distribution data performance often via heuristic-oriented or task-agnostic difficulties. In this work, instead, we employ CL for NLU by taking advantage of training dynamics as difficulty metrics, i.e., statistics that measure the behavior of the model at hand on specific task-data instances during training and propose modifications of existing CL schedulers based on these statistics. Differently from existing works, we focus on evaluating models on in-distribution (ID), out-of-distribution (OOD) as well as zero-shot (ZS) cross-lingual transfer datasets. We show across several NLU tasks that CL with training dynamics can result in better performance mostly on zero-shot cross-lingual transfer and OOD settings with improvements up by 8.5% in certain cases. Overall, experiments indicate that training dynamics can lead to better performing models with smoother training compared to other difficulty metrics while being 20% faster on average. In addition, through analysis we shed light on the correlations of task-specific versus task-agnostic metrics., Comment: 17 pages, 4 figures, 6 tables. To appear in EMNLP 2022
Published: 2022

12. Relational Graph Convolutional Neural Networks for Multihop Reasoning: A Comparative Study

Author: Staliūnaitė, Ieva, Gorinski, Philip John, and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Multihop Question Answering is a complex Natural Language Processing task that requires multiple steps of reasoning to find the correct answer to a given question. Previous research has explored the use of models based on Graph Neural Networks for tackling this task. Various architectures have been proposed, including Relational Graph Convolutional Networks (RGCN). For these many node types and relations between them have been introduced, such as simple entity co-occurrences, modelling coreferences, or "reasoning paths" from questions to answers via intermediary entities. Nevertheless, a thoughtful analysis on which relations, node types, embeddings and architecture are the most beneficial for this task is still missing. In this paper we explore a number of RGCN-based Multihop QA models, graph relations, and node embeddings, and empirically explore the influence of each on Multihop QA performance on the WikiHop dataset., Comment: 8 pages + 2 pages references, 3 figures, 3 tables
Published: 2022

13. PanGu-Coder: Program Synthesis with Function-Level Language Modeling

Author: Christopoulou, Fenia, Lampouras, Gerasimos, Gritta, Milan, Zhang, Guchun, Guo, Yinpeng, Li, Zhongqi, Zhang, Qi, Xiao, Meng, Shen, Bo, Li, Lin, Yu, Hao, Yan, Li, Zhou, Pingyi, Wang, Xin, Ma, Yuchi, Iacobacci, Ignacio, Wang, Yasheng, Liang, Guangtai, Wei, Jiansheng, Jiang, Xin, Wang, Qianxiang, and Liu, Qun
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Programming Languages, Computer Science - Software Engineering
Abstract: We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-Alpha architecture for text-to-code generation, i.e. the synthesis of programming language solutions given a natural language problem description. We train PanGu-Coder using a two-stage strategy: the first stage employs Causal Language Modelling (CLM) to pre-train on raw programming language data, while the second stage uses a combination of Causal Language Modelling and Masked Language Modelling (MLM) training objectives that focus on the downstream task of text-to-code generation and train on loosely curated pairs of natural language program definitions and code functions. Finally, we discuss PanGu-Coder-FT, which is fine-tuned on a combination of competitive programming problems and code with continuous integration tests. We evaluate PanGu-Coder with a focus on whether it generates functionally correct programs and demonstrate that it achieves equivalent or better performance than similarly sized models, such as CodeX, while attending a smaller context window and training on less data., Comment: 27 pages
Published: 2022

14. XQA-DST: Multi-Domain and Multi-Lingual Dialogue State Tracking

Author: Zhou, Han, Iacobacci, Ignacio, and Minervini, Pasquale
Subjects: Computer Science - Computation and Language
Abstract: Dialogue State Tracking (DST), a crucial component of task-oriented dialogue (ToD) systems, keeps track of all important information pertaining to dialogue history: filling slots with the most probable values throughout the conversation. Existing methods generally rely on a predefined set of values and struggle to generalise to previously unseen slots in new domains. To overcome these challenges, we propose a domain-agnostic extractive question answering (QA) approach with shared weights across domains. To disentangle the complex domain information in ToDs, we train our DST with a novel domain filtering strategy by excluding out-of-domain question samples. With an independent classifier that predicts the presence of multiple domains given the context, our model tackles DST by extracting spans in active domains. Empirical results demonstrate that our model can efficiently leverage domain-agnostic QA datasets by two-stage fine-tuning while being both domain-scalable and open-vocabulary in DST. It shows strong transferability by achieving zero-shot domain-adaptation results on MultiWOZ 2.1 with an average JGA of 36.7%. It further achieves cross-lingual transfer with state-of-the-art zero-shot results, 66.2% JGA from English to German and 75.7% JGA from English to Italian on WOZ 2.0., Comment: Accepted to Findings of EACL 2023
Published: 2022

15. CrossAligner & Co: Zero-Shot Transfer Methods for Task-Oriented Cross-lingual Natural Language Understanding

Author: Gritta, Milan, Hu, Ruoyu, and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language
Abstract: Task-oriented personal assistants enable people to interact with a host of devices and services using natural language. One of the challenges of making neural dialogue systems available to more users is the lack of training data for all but a few languages. Zero-shot methods try to solve this issue by acquiring task knowledge in a high-resource language such as English with the aim of transferring it to the low-resource language(s). To this end, we introduce CrossAligner, the principal method of a variety of effective approaches for zero-shot cross-lingual transfer based on learning alignment from unlabelled parallel data. We present a quantitative analysis of individual methods as well as their weighted combinations, several of which exceed state-of-the-art (SOTA) scores as evaluated across nine languages, fifteen test sets and three benchmark multilingual datasets. A detailed qualitative error analysis of the best methods shows that our fine-tuned language models can zero-shot transfer the task knowledge better than anticipated., Comment: Long paper (multilingual track) to appear at ACL (Findings) 2022
Published: 2022

16. Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning

Author: Minixhofer, Benjamin, Gritta, Milan, and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language
Abstract: Transfer learning has become the dominant paradigm for many natural language processing tasks. In addition to models being pretrained on large datasets, they can be further trained on intermediate (supervised) tasks that are similar to the target task. For small Natural Language Inference (NLI) datasets, language modelling is typically followed by pretraining on a large (labelled) NLI dataset before fine-tuning with each NLI subtask. In this work, we explore Gradient Boosted Decision Trees (GBDTs) as an alternative to the commonly used Multi-Layer Perceptron (MLP) classification head. GBDTs have desirable properties such as good performance on dense, numerical features and are effective where the ratio of the number of samples w.r.t the number of features is low. We then introduce FreeGBDT, a method of fitting a GBDT head on the features computed during fine-tuning to increase performance without additional computation by the neural network. We demonstrate the effectiveness of our method on several NLI datasets using a strong baseline model (RoBERTa-large with MNLI pretraining). The FreeGBDT shows a consistent improvement over the MLP classification head., Comment: Findings of ACL 2021
Published: 2021
Full Text: View/download PDF

17. XeroAlign: Zero-Shot Cross-lingual Transformer Alignment

Author: Gritta, Milan and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language
Abstract: The introduction of pretrained cross-lingual language models brought decisive improvements to multilingual NLP tasks. However, the lack of labelled task data necessitates a variety of methods aiming to close the gap to high-resource languages. Zero-shot methods in particular, often use translated task data as a training signal to bridge the performance gap between the source and target language(s). We introduce XeroAlign, a simple method for task-specific alignment of cross-lingual pretrained transformers such as XLM-R. XeroAlign uses translated task data to encourage the model to generate similar sentence embeddings for different languages. The XeroAligned XLM-R, called XLM-RA, shows strong improvements over the baseline models to achieve state-of-the-art zero-shot results on three multilingual natural language understanding tasks. XLM-RA's text classification accuracy exceeds that of XLM-R trained with labelled data and performs on par with state-of-the-art models on a cross-lingual adversarial paraphrasing task., Comment: Findings of ACL 2021 - Code: https://github.com/huawei-noah/noah-research/tree/master/xero_align
Published: 2021

18. Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation

Author: Staliūnaitė, Ieva, Gorinski, Philip John, and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language
Abstract: Determining the plausibility of causal relations between clauses is a commonsense reasoning task that requires complex inference ability. The general approach to this task is to train a large pretrained language model on a specific dataset. However, the available training data for the task is often scarce, which leads to instability of model training or reliance on the shallow features of the dataset. This paper presents a number of techniques for making models more robust in the domain of causal reasoning. Firstly, we perform adversarial training by generating perturbed inputs through synonym substitution. Secondly, based on a linguistic theory of discourse connectives, we perform data augmentation using a discourse parser for detecting causally linked clauses in large text, and a generative language model for generating distractors. Both methods boost model performance on the Choice of Plausible Alternatives (COPA) dataset, as well as on a Balanced COPA dataset, which is a modified version of the original data that has been developed to avoid superficial cues, leading to a more challenging benchmark. We show a statistically significant improvement in performance and robustness on both datasets, even with only a small number of additionally generated data points., Comment: 7 pages + pages references, 4 figures, 3 tables, paper accepted at AAAI2021
Published: 2021

19. Conversation Graph: Data Augmentation, Training and Evaluation for Non-Deterministic Dialogue Management

Author: Gritta, Milan, Lampouras, Gerasimos, and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language
Abstract: Task-oriented dialogue systems typically rely on large amounts of high-quality training data or require complex handcrafted rules. However, existing datasets are often limited in size considering the complexity of the dialogues. Additionally, conventional training signal inference is not suitable for non-deterministic agent behaviour, i.e. considering multiple actions as valid in identical dialogue states. We propose the Conversation Graph (ConvGraph), a graph-based representation of dialogues that can be exploited for data augmentation, multi-reference training and evaluation of non-deterministic agents. ConvGraph generates novel dialogue paths to augment data volume and diversity. Intrinsic and extrinsic evaluation across three datasets shows that data augmentation and/or multi-reference training with ConvGraph can improve dialogue success rates by up to 6.4%., Comment: Accepted at Transactions of Association of Computational Linguistics (to be presented at ACL 2021)
Published: 2020

20. Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA

Author: Staliūnaitė, Ieva and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Many NLP tasks have benefited from transferring knowledge from contextualized word embeddings, however the picture of what type of knowledge is transferred is incomplete. This paper studies the types of linguistic phenomena accounted for by language models in the context of a Conversational Question Answering (CoQA) task. We identify the problematic areas for the finetuned RoBERTa, BERT and DistilBERT models through systematic error analysis - basic arithmetic (counting phrases), compositional semantics (negation and Semantic Role Labeling), and lexical semantics (surprisal and antonymy). When enhanced with the relevant linguistic knowledge through multitask learning, the models improve in performance. Ensembles of the enhanced models yield a boost between 2.2 and 2.7 points in F1 score overall, and up to 42.1 points in F1 on the hardest question classes. The results show differences in ability to represent compositional and lexical information between RoBERTa, BERT and DistilBERT., Comment: 8 pages + 2 pages references, 3 tables, 1 figure. Accepted as long paper in EMNLP2020 main conference
Published: 2020

21. Show Us the Way: Learning to Manage Dialog from Demonstrations

Author: Gordon-Hall, Gabriel, Gorinski, Philip John, Lampouras, Gerasimos, and Iacobacci, Ignacio
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: We present our submission to the End-to-End Multi-Domain Dialog Challenge Track of the Eighth Dialog System Technology Challenge. Our proposed dialog system adopts a pipeline architecture, with distinct components for Natural Language Understanding, Dialog State Tracking, Dialog Management and Natural Language Generation. At the core of our system is a reinforcement learning algorithm which uses Deep Q-learning from Demonstrations to learn a dialog policy with the help of expert examples. We find that demonstrations are essential to training an accurate dialog policy where both state and action spaces are large. Evaluation of our Dialog Management component shows that our approach is effective - beating supervised and reinforcement learning baselines., Comment: 8 pages + 2 pages references, 4 figures, 4 tables, accepted to DSTC8 Workshop at AAAI2020
Published: 2020

22. Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

Author: Mancini, Massimiliano, Camacho-Collados, Jose, Iacobacci, Ignacio, and Navigli, Roberto
Subjects: Computer Science - Computation and Language
Abstract: Word embeddings are widely used in Natural Language Processing, mainly due to their success in capturing semantic information from massive corpora. However, their creation process does not allow the different meanings of a word to be automatically separated, as it conflates them into a single vector. We address this issue by proposing a new model which learns word and sense embeddings jointly. Our model exploits large corpora and knowledge from semantic networks in order to produce a unified vector space of word and sense embeddings. We evaluate the main features of our approach both qualitatively and quantitatively in a variety of tasks, highlighting the advantages of the proposed method in comparison to state-of-the-art word- and sense-based models., Comment: Accepted in CoNLL 2017. 12 pages
Published: 2016

23. Semantic Representations of Word Senses and Concepts

Author: Camacho-Collados, José, Iacobacci, Ignacio, Navigli, Roberto, and Pilehvar, Mohammad Taher
Subjects: Computer Science - Computation and Language
Abstract: Representing the semantics of linguistic items in a machine-interpretable form has been a major goal of Natural Language Processing since its earliest days. Among the range of different linguistic items, words have attracted the most research attention. However, word representations have an important limitation: they conflate different meanings of a word into a single vector. Representations of word senses have the potential to overcome this inherent limitation. Indeed, the representation of individual word senses and concepts has recently gained in popularity with several experimental results showing that a considerable performance improvement can be achieved across different NLP applications upon moving from word level to the deeper sense and concept levels. Another interesting point regarding the representation of concepts and word senses is that these models can be seamlessly applied to other linguistic items, such as words, phrases and sentences.
Published: 2016

24. Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis

Author: Gorinski, Philip, primary, Zimmer, Matthieu, additional, Lampouras, Gerasimos, additional, Deik, Derrick Goh Xin, additional, and Iacobacci, Ignacio, additional
Published: 2023
Full Text: View/download PDF

25. Multi 3 WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

Author: Hu, Songbo, primary, Zhou, Han, additional, Hergul, Mete, additional, Gritta, Milan, additional, Zhang, Guchun, additional, Iacobacci, Ignacio, additional, Vulić, Ivan, additional, and Korhonen, Anna, additional
Published: 2023
Full Text: View/download PDF

26. A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems

Author: Hu, Songbo, primary, Zhou, Han, additional, Yuan, Moy, additional, Gritta, Milan, additional, Zhang, Guchun, additional, Iacobacci, Ignacio, additional, Korhonen, Anna, additional, and Vulić, Ivan, additional
Published: 2023
Full Text: View/download PDF

27. Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation

Author: Stali��nait��, Ieva, Gorinski, Philip John, and Iacobacci, Ignacio
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, General Medicine, Computation and Language (cs.CL)
Abstract: Determining the plausibility of causal relations between clauses is a commonsense reasoning task that requires complex inference ability. The general approach to this task is to train a large pretrained language model on a specific dataset. However, the available training data for the task is often scarce, which leads to instability of model training or reliance on the shallow features of the dataset. This paper presents a number of techniques for making models more robust in the domain of causal reasoning. Firstly, we perform adversarial training by generating perturbed inputs through synonym substitution. Secondly, based on a linguistic theory of discourse connectives, we perform data augmentation using a discourse parser for detecting causally linked clauses in large text, and a generative language model for generating distractors. Both methods boost model performance on the Choice of Plausible Alternatives (COPA) dataset, as well as on a Balanced COPA dataset, which is a modified version of the original data that has been developed to avoid superficial cues, leading to a more challenging benchmark. We show a statistically significant improvement in performance and robustness on both datasets, even with only a small number of additionally generated data points., 7 pages + pages references, 4 figures, 3 tables, paper accepted at AAAI2021
Published: 2021

28. Hierarchical Recurrent Aggregative Generation for Few-Shot NLG

Author: Zhou, Giulio, primary, Lampouras, Gerasimos, additional, and Iacobacci, Ignacio, additional
Published: 2022
Full Text: View/download PDF

29. Training Dynamics for Curriculum Learning: A Study on Monolingual and Cross-lingual NLU

Author: Christopoulou, Fenia, primary, Lampouras, Gerasimos, additional, and Iacobacci, Ignacio, additional
Published: 2022
Full Text: View/download PDF

30. EntityCS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric Code Switching

Author: Whitehouse, Chenxi, primary, Christopoulou, Fenia, additional, and Iacobacci, Ignacio, additional
Published: 2022
Full Text: View/download PDF

31. CrossAligner & Co: Zero-Shot Transfer Methods for Task-Oriented Cross-lingual Natural Language Understanding

Author: Gritta, Milan, primary, Hu, Ruoyu, additional, and Iacobacci, Ignacio, additional
Published: 2022
Full Text: View/download PDF

32. Topic-Aware Response Generation in Task-Oriented Dialogue with Unstructured Knowledge Access

Author: Feng, Yue, primary, Lampouras, Gerasimos, additional, and Iacobacci, Ignacio, additional
Published: 2022
Full Text: View/download PDF

33. Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation

Author: Staliūnaitė, Ieva, primary, Gorinski, Philip John, additional, and Iacobacci, Ignacio, additional
Published: 2021
Full Text: View/download PDF

34. Conversation Graph: Data Augmentation, Training, and Evaluation for Non-Deterministic Dialogue Management

Author: Gritta, Milan, primary, Lampouras, Gerasimos, additional, and Iacobacci, Ignacio, additional
Published: 2021
Full Text: View/download PDF

35. XeroAlign: Zero-shot cross-lingual transformer alignment

Author: Gritta, Milan, primary and Iacobacci, Ignacio, additional
Published: 2021
Full Text: View/download PDF

36. Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning

Author: Minixhofer, Benjamin, primary, Gritta, Milan, additional, and Iacobacci, Ignacio, additional
Published: 2021
Full Text: View/download PDF

37. Auxiliary Capsules for Natural Language Understanding

Author: Staliunaite, Ieva, primary and Iacobacci, Ignacio, additional
Published: 2020
Full Text: View/download PDF

38. Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA

Author: Staliūnaitė, Ieva, primary and Iacobacci, Ignacio, additional
Published: 2020
Full Text: View/download PDF

39. Neural-grounded semantic representations and word sense disambiguation: a mutually beneficial relationship

Author: Iacobacci, IGNACIO JAVIER
Subjects: word sense disambiguation, distributional semantics, Settore INF/01 - Informatica, Embeddings
Published: 2019

40. LSTMEmbed: learning Word and Sense Representations from a Large Semantically Annotated Corpus with Long Short-Term Memories

Author: Iacobacci, IGNACIO JAVIER and Navigli, Roberto
Subjects: Neural Networks, Embeddings, Semantic Representations, LSTM
Published: 2019

41. LSTMEmbed: Learning Word and Sense Representations from a Large Semantically Annotated Corpus with Long Short-Term Memories

Author: Iacobacci, Ignacio, primary and Navigli, Roberto, additional
Published: 2019
Full Text: View/download PDF

42. Embeddings for word sense disambiguation: an evaluation study

Author: Iacobacci, IGNACIO JAVIER, Pilehvar, MOHAMMED TAHER, and Navigli, Roberto
Subjects: Language and Linguistics, Natural language processing systems
Published: 2016

43. Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

Author: Mancini, Massimiliano, primary, Camacho-Collados, Jose, additional, Iacobacci, Ignacio, additional, and Navigli, Roberto, additional
Published: 2017
Full Text: View/download PDF

44. SensEmbed: Learning sense embeddings for word and relational similarity

Author: Iacobacci, IGNACIO JAVIER, Pilehvar, MOHAMMED TAHER, and Navigli, Roberto
Subjects: Semantic similarity, Semantic network, Word embeddings
Published: 2015

45. Embeddings for Word Sense Disambiguation: An Evaluation Study

Author: Iacobacci, Ignacio, primary, Pilehvar, Mohammad Taher, additional, and Navigli, Roberto, additional
Published: 2016
Full Text: View/download PDF

46. SensEmbed: Learning Sense Embeddings for Word and Relational Similarity

Author: Iacobacci, Ignacio, primary, Pilehvar, Mohammad Taher, additional, and Navigli, Roberto, additional
Published: 2015
Full Text: View/download PDF

47. Embedding words and senses together via joint knowledge-enhanced training

Author: Mancini, Massimiliano, Camacho Collados, Jose, Iacobacci, Ignacio, Navigli, Roberto, Mancini, Massimiliano, Camacho Collados, Jose, Iacobacci, Ignacio, and Navigli, Roberto
Abstract: Word embeddings are widely used in Natural Language Processing, mainly due to their success in capturing semantic information from massive corpora. However, their creation process does not allow the different meanings of a word to be automatically separated, as it conflates them into a single vector. We address this issue by proposing a new model which learns word and sense embeddings jointly. Our model exploits large corpora and knowledge from semantic networks in order to produce a unified vector space of word and sense embeddings. We evaluate the main features of our approach both qualitatively and quantitatively in a variety of tasks, highlighting the advantages of the proposed method in comparison to state-of-the-art word- and sense-based models.

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

47 results on '"Iacobacci, Ignacio"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources