Author: "Hovy, Eduard" / Search Limiters: Full Text - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hovy, Eduard"' showing total 450 results

Start Over Author "Hovy, Eduard" Search Limiters Full Text

450 results on '"Hovy, Eduard"'

1. Packing Analysis: Packing Is More Appropriate for Large Models or Datasets in Supervised Fine-tuning

Author: Wang, Shuhe, Wang, Guoyin, Wang, Yizhong, Li, Jiwei, Hovy, Eduard, and Guo, Chen
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Packing, initially utilized in the pre-training phase, is an optimization technique designed to maximize hardware resource efficiency by combining different training sequences to fit the model's maximum input length. Although it has demonstrated effectiveness during pre-training, there remains a lack of comprehensive analysis for the supervised fine-tuning (SFT) stage on the following points: (1) whether packing can effectively enhance training efficiency while maintaining performance, (2) the suitable size of the model and dataset for fine-tuning with the packing method, and (3) whether packing unrelated or related training samples might cause the model to either excessively disregard or over-rely on the context. In this paper, we perform extensive comparisons between SFT methods using padding and packing, covering SFT datasets ranging from 69K to 1.2M and models from 8B to 70B. This provides the first comprehensive analysis of the advantages and limitations of packing versus padding, as well as practical considerations for implementing packing in various training scenarios. Our analysis covers various benchmarks, including knowledge, reasoning, and coding, as well as GPT-based evaluations, time efficiency, and other fine-tuning parameters. We also open-source our code for fine-tuning and evaluation and provide checkpoints fine-tuned on datasets of different sizes, aiming to advance future research on packing methods. Code is available at: https://github.com/ShuheWang1998/Packing-Analysis?tab=readme-ov-file.
Published: 2024

2. Can a Neural Model Guide Fieldwork? A Case Study on Morphological Inflection

Author: Mahmudi, Aso, Herce, Borja, Amestica, Demian Inostroza, Scherbakov, Andreas, Hovy, Eduard, and Vylomova, Ekaterina
Subjects: Computer Science - Computation and Language
Abstract: Linguistic fieldwork is an important component in language documentation and preservation. However, it is a long, exhaustive, and time-consuming process. This paper presents a novel model that guides a linguist during the fieldwork and accounts for the dynamics of linguist-speaker interactions. We introduce a novel framework that evaluates the efficiency of various sampling strategies for obtaining morphological data and assesses the effectiveness of state-of-the-art neural models in generalising morphological structures. Our experiments highlight two key strategies for improving the efficiency: (1) increasing the diversity of annotated data by uniform sampling among the cells of the paradigm tables, and (2) using model confidence as a guide to enhance positive interaction by providing reliable predictions during annotation.
Published: 2024

3. RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning based on Emotional Information

Author: Liu, Zhiwei, Yang, Kailai, Xie, Qianqian, de Kock, Christine, Ananiadou, Sophia, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: Misinformation is prevalent in various fields such as education, politics, health, etc., causing significant harm to society. However, current methods for cross-domain misinformation detection rely on time and resources consuming fine-tuning and complex model structures. With the outstanding performance of LLMs, many studies have employed them for misinformation detection. Unfortunately, they focus on in-domain tasks and do not incorporate significant sentiment and emotion features (which we jointly call affect). In this paper, we propose RAEmoLLM, the first retrieval augmented (RAG) LLMs framework to address cross-domain misinformation detection using in-context learning based on affective information. It accomplishes this by applying an emotion-aware LLM to construct a retrieval database of affective embeddings. This database is used by our retrieval module to obtain source-domain samples, which are subsequently used for the inference module's in-context few-shot learning to detect target domain misinformation. We evaluate our framework on three misinformation benchmarks. Results show that RAEmoLLM achieves significant improvements compared to the zero-shot method on three datasets, with the highest increases of 20.69%, 23.94%, and 39.11% respectively. This work will be released on https://github.com/lzw108/RAEmoLLM.
Published: 2024

4. Discourse in Multimedia: A Case Study in Extracting Geometry Knowledge from Textbooks

Author: Sachan, Mrinmaya, Dubey, Avinava, Hovy, Eduard H., Mitchell, Tom M., Roth, Dan, and Xing, Eric P.
Subjects: Computational linguistics. Natural language processing, P98-98.5
Abstract: To ensure readability, text is often written and presented with due formatting. These text formatting devices help the writer to effectively convey the narrative. At the same time, these help the readers pick up the structure of the discourse and comprehend the conveyed information. There have been a number of linguistic theories on discourse structure of text. However, these theories only consider unformatted text. Multimedia text contains rich formatting features that can be leveraged for various NLP tasks. In this article, we study some of these discourse features in multimedia text and what communicative function they fulfill in the context. As a case study, we use these features to harvest structured subject knowledge of geometry from textbooks. We conclude that the discourse and text layout features provide information that is complementary to lexical semantic information. Finally, we show that the harvested structured knowledge can be used to improve an existing solver for geometry problems, making it more accurate as well as more explainable.
Published: 2020
Full Text: View/download PDF

5. What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

Author: Choe, Sang Keun, Ahn, Hwijeen, Bae, Juhan, Zhao, Kewen, Kang, Minsoo, Chung, Youngseog, Pratapa, Adithya, Neiswanger, Willie, Strubell, Emma, Mitamura, Teruko, Schneider, Jeff, Hovy, Eduard, Grosse, Roger, and Xing, Eric
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to the model output, has been discussed as a potential solution. Nevertheless, applying existing data valuation methods to recent LLMs and their vast training datasets has been largely limited by prohibitive compute and memory costs. In this work, we focus on influence functions, a popular gradient-based data valuation method, and significantly improve its scalability with an efficient gradient projection strategy called LoGra that leverages the gradient structure in backpropagation. We then provide a theoretical motivation of gradient projection approaches to influence functions to promote trust in the data valuation process. Lastly, we lower the barrier to implementing data valuation systems by introducing LogIX, a software package that can transform existing training code into data valuation code with minimal effort. In our data valuation experiments, LoGra achieves competitive accuracy against more expensive baselines while showing up to 6,500x improvement in throughput and 5x reduction in GPU memory usage when applied to Llama3-8B-Instruct and the 1B-token dataset.
Published: 2024

6. A Sentiment Consolidation Framework for Meta-Review Generation

Author: Li, Miao, Lau, Jey Han, and Hovy, Eduard
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Modern natural language generation systems with Large Language Models (LLMs) exhibit the capability to generate a plausible summary of multiple documents; however, it is uncertain if they truly possess the capability of information consolidation to generate summaries, especially on documents with opinionated information. We focus on meta-review generation, a form of sentiment summarisation for the scientific domain. To make scientific sentiment summarization more grounded, we hypothesize that human meta-reviewers follow a three-layer framework of sentiment consolidation to write meta-reviews. Based on the framework, we propose novel prompting methods for LLMs to generate meta-reviews and evaluation metrics to assess the quality of generated meta-reviews. Our framework is validated empirically as we find that prompting LLMs based on the framework -- compared with prompting them with simple instructions -- generates better meta-reviews., Comment: Long paper, ACL 2024 Main
Published: 2024

7. Sim-GPT: Text Similarity via GPT Annotated Data

Author: Wang, Shuhe, Cao, Beiming, Zhang, Shengyu, Li, Xiaoya, Li, Jiwei, Wu, Fei, Wang, Guoyin, and Hovy, Eduard
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Due to the lack of a large collection of high-quality labeled sentence pairs with textual similarity scores, existing approaches for Semantic Textual Similarity (STS) mostly rely on unsupervised techniques or training signals that are only partially correlated with textual similarity, e.g., NLI-based datasets. To tackle this issue, in this paper, we propose the strategy of measuring text similarity via GPT annotated data (Sim-GPT for short). The core idea of Sim-GPT is to generate data with STS labels using GPT-4, based on which an STS model is trained. Sim-GPT framework utilizes LLMs to provide a substantial amount of reliable annotated data filling the gap of the lack of training signals for STS. Sim-GPT is trained on a one-time generated dataset using BERT or RoBERTa as the backbone, which offers long-term savings in cost and speed compared to repeatedly invoking LLMs for each sentence pair. Trained on the examples from GPT-4 (371K), Sim-GPT yields SOTA performances on the widely-used seven STS benchmarks: +0.99 over supervised-SimCSE, and +0.42 over the current SOTA PromCSE model. To encourage further advancements of the field, we release both models and the 371K annotated examples from GPT-4. Code, models and annotated data are available at: https://github.com/ShuheWang1998/Sim-GPT.
Published: 2023

8. Defining a New NLP Playground

Author: Li, Sha, Han, Chi, Yu, Pengfei, Edwards, Carl, Li, Manling, Wang, Xingyao, Fung, Yi R., Yu, Charles, Tetreault, Joel R., Hovy, Eduard H., and Ji, Heng
Subjects: Computer Science - Computation and Language
Abstract: The recent explosion of performance of large language models (LLMs) has changed the field of Natural Language Processing (NLP) more abruptly and seismically than any other shift in the field's 80-year history. This has resulted in concerns that the field will become homogenized and resource-intensive. The new status quo has put many academic researchers, especially PhD students, at a disadvantage. This paper aims to define a new NLP playground by proposing 20+ PhD-dissertation-worthy research directions, covering theoretical analysis, new and challenging problems, learning paradigms, and interdisciplinary applications., Comment: EMNLP Findings 2023 "Theme Track: Large Language Models and the Future of NLP"
Published: 2023

9. Factuality Challenges in the Era of Large Language Models

Author: Augenstein, Isabelle, Baldwin, Timothy, Cha, Meeyoung, Chakraborty, Tanmoy, Ciampaglia, Giovanni Luca, Corney, David, DiResta, Renee, Ferrara, Emilio, Hale, Scott, Halevy, Alon, Hovy, Eduard, Ji, Heng, Menczer, Filippo, Miguez, Ruben, Nakov, Preslav, Scheufele, Dietram, Sharma, Shivam, and Zagni, Giovanni
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The emergence of tools based on Large Language Models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered immense public attention. These incredibly useful, natural-sounding tools mark significant advances in natural language generation, yet they exhibit a propensity to generate false, erroneous, or misleading content -- commonly referred to as "hallucinations." Moreover, LLMs can be exploited for malicious applications, such as generating false but credible-sounding content and profiles at scale. This poses a significant challenge to society in terms of the potential deception of users and the increasing dissemination of inaccurate information. In light of these risks, we explore the kinds of technological innovations, regulatory reforms, and AI literacy initiatives needed from fact-checkers, news organizations, and the broader research and policy communities. By identifying the risks, the imminent threats, and some viable solutions, we seek to shed light on navigating various aspects of veracity in the era of generative AI., Comment: Our article offers a comprehensive examination of the challenges and risks associated with Large Language Models (LLMs), focusing on their potential impact on the veracity of information in today's digital landscape
Published: 2023

10. Data-efficient Active Learning for Structured Prediction with Partial Annotation and Self-Training

Author: Zhang, Zhisong, Strubell, Emma, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: In this work we propose a pragmatic method that reduces the annotation cost for structured label spaces using active learning. Our approach leverages partial annotation, which reduces labeling costs for structured outputs by selecting only the most informative sub-structures for annotation. We also utilize self-training to incorporate the current model's automatic predictions as pseudo-labels for un-annotated sub-structures. A key challenge in effectively combining partial annotation with self-training to reduce annotation cost is determining which sub-structures to select to label. To address this challenge, we adopt an error estimator to adaptively decide the partial selection ratio according to the current model's capability. In evaluations spanning four structured prediction tasks, we show that our combination of partial annotation and self-training using an adaptive selection ratio reduces annotation cost over strong full annotation baselines under a fair comparison scheme that takes reading time into consideration., Comment: Findings of EMNLP 2023
Published: 2023

11. What's the Meaning of Superhuman Performance in Today's NLU?

Author: Tedeschi, Simone, Bos, Johan, Declerck, Thierry, Hajic, Jan, Hershcovich, Daniel, Hovy, Eduard H., Koller, Alexander, Krek, Simon, Schockaert, Steven, Sennrich, Rico, Shutova, Ekaterina, and Navigli, Roberto
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: In the last five years, there has been a significant focus in Natural Language Processing (NLP) on developing larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension. These PLMs have achieved impressive results on these benchmarks, even surpassing human performance in some cases. This has led to claims of superhuman capabilities and the provocative idea that certain tasks have been solved. In this position paper, we take a critical look at these claims and ask whether PLMs truly have superhuman abilities and what the current benchmarks are really evaluating. We show that these benchmarks have serious limitations affecting the comparison between humans and PLMs and provide recommendations for fairer and more transparent benchmarks., Comment: 9 pages, long paper at ACL 2023 proceedings
Published: 2023

12. Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation

Author: Li, Miao, Hovy, Eduard, and Lau, Jey Han
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We present PeerSum, a novel dataset for generating meta-reviews of scientific papers. The meta-reviews can be interpreted as abstractive summaries of reviews, multi-turn discussions and the paper abstract. These source documents have rich inter-document relationships with an explicit hierarchical conversational structure, cross-references and (occasionally) conflicting information. To introduce the structural inductive bias into pre-trained language models, we introduce Rammer ( Relationship-aware Multi-task Meta-review Generator), a model that uses sparse attention based on the conversational structure and a multi-task training objective that predicts metadata features (e.g., review ratings). Our experimental results show that Rammer outperforms other strong baseline models in terms of a suite of automatic evaluation metrics. Further analyses, however, reveal that RAMMER and other models struggle to handle conflicts in source documents of PeerSum, suggesting meta-review generation is a challenging task and a promising avenue for further research., Comment: Long paper; Accepted to EMNLP 2023; Soundness: 3, 3, 4; Excitement: 3, 4, 4
Published: 2023

13. EvEntS ReaLM: Event Reasoning of Entity States via Language Models

Author: Spiliopoulou, Evangelia, Pagnoni, Artidoro, Bisk, Yonatan, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: This paper investigates models of event implications. Specifically, how well models predict entity state-changes, by targeting their understanding of physical attributes. Nominally, Large Language models (LLM) have been exposed to procedural knowledge about how objects interact, yet our benchmarking shows they fail to reason about the world. Conversely, we also demonstrate that existing approaches often misrepresent the surprising abilities of LLMs via improper task encodings and that proper model prompting can dramatically improve performance of reported baseline results across multiple tasks. In particular, our results indicate that our prompting technique is especially useful for unseen attributes (out-of-domain) or when only limited data is available., Comment: EMNLP 2022
Published: 2022

14. A Survey of Active Learning for Natural Language Processing

Author: Zhang, Zhisong, Strubell, Emma, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: In this work, we provide a survey of active learning (AL) for its applications in natural language processing (NLP). In addition to a fine-grained categorization of query strategies, we also investigate several other important aspects of applying AL to NLP problems. These include AL for structured prediction tasks, annotation cost, model learning (especially with deep neural models), and starting and stopping AL. Finally, we conclude with a discussion of related topics and future directions., Comment: EMNLP 2022
Published: 2022

15. CHARD: Clinical Health-Aware Reasoning Across Dimensions for Text Generation Models

Author: Feng, Steven Y., Khetan, Vivek, Sacaleanu, Bogdan, Gershman, Anatole, and Hovy, Eduard
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We motivate and introduce CHARD: Clinical Health-Aware Reasoning across Dimensions, to investigate the capability of text generation models to act as implicit clinical knowledge bases and generate free-flow textual explanations about various health-related conditions across several dimensions. We collect and present an associated dataset, CHARDat, consisting of explanations about 52 health conditions across three clinical dimensions. We conduct extensive experiments using BART and T5 along with data augmentation, and perform automatic, human, and qualitative analyses. We show that while our models can perform decently, CHARD is very challenging with strong potential for further exploration., Comment: EACL 2023. Code available at https://github.com/styfeng/CHARD
Published: 2022

16. PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation

Author: Keh, Sedrick Scott, Lu, Kevin, Gangal, Varun, Feng, Steven Y., Jhamtani, Harsh, Alikhani, Malihe, and Hovy, Eduard
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: A personification is a figure of speech that endows inanimate entities with properties and actions typically seen as requiring animacy. In this paper, we explore the task of personification generation. To this end, we propose PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation. We curate a corpus of personifications called PersonifCorp, together with automatically generated de-personified literalizations of these personifications. We demonstrate the usefulness of this parallel corpus by training a seq2seq model to personify a given literal input. Both automatic and human evaluations show that fine-tuning with PersonifCorp leads to significant gains in personification-related qualities such as animacy and interestingness. A detailed qualitative analysis also highlights key strengths and imperfections of PINEAPPLE over baselines, demonstrating a strong ability to generate diverse and creative personifications that enhance the overall appeal of a sentence., Comment: Accepted to COLING 2022; official Github repo at https://github.com/sedrickkeh/PINEAPPLE
Published: 2022

17. PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically

Author: Keh, Sedrick Scott, Feng, Steven Y., Gangal, Varun, Alikhani, Malihe, and Hovy, Eduard
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Tongue twisters are meaningful sentences that are difficult to pronounce. The process of automatically generating tongue twisters is challenging since the generated utterance must satisfy two conditions at once: phonetic difficulty and semantic meaning. Furthermore, phonetic difficulty is itself hard to characterize and is expressed in natural tongue twisters through a heterogeneous mix of phenomena such as alliteration and homophony. In this paper, we propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically. We leverage phoneme representations to capture the notion of phonetic difficulty, and we train language models to generate original tongue twisters on two proposed task settings. To do this, we curate a dataset called PANCETTA, consisting of existing English tongue twisters. Through automatic and human evaluation, as well as qualitative analysis, we show that PANCETTA generates novel, phonetically difficult, fluent, and semantically meaningful tongue twisters., Comment: EACL 2023. Code at https://github.com/sedrickkeh/PANCETTA
Published: 2022

18. Counterfactual Data Augmentation improves Factuality of Abstractive Summarization

Author: Rajagopal, Dheeraj, Shakeri, Siamak, Santos, Cicero Nogueira dos, Hovy, Eduard, and Chang, Chung-Ching
Subjects: Computer Science - Computation and Language
Abstract: Abstractive summarization systems based on pretrained language models often generate coherent but factually inconsistent sentences. In this paper, we present a counterfactual data augmentation approach where we augment data with perturbed summaries that increase the training data diversity. Specifically, we present three augmentation approaches based on replacing (i) entities from other and the same category and (ii) nouns with their corresponding WordNet hypernyms. We show that augmenting the training data with our approach improves the factual correctness of summaries without significantly affecting the ROUGE score. We show that in two commonly used summarization datasets (CNN/Dailymail and XSum), we improve the factual correctness by about 2.5 points on average
Published: 2022

19. NewsClaims: A New Benchmark for Claim Detection from News with Attribute Knowledge

Author: Reddy, Revanth Gangi, Chetan, Sai, Wang, Zhenhailong, Fung, Yi R., Conger, Kathryn, Elsayed, Ahmed, Palmer, Martha, Nakov, Preslav, Hovy, Eduard, Small, Kevin, and Ji, Heng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Claim detection and verification are crucial for news understanding and have emerged as promising technologies for mitigating misinformation and disinformation in the news. However, most existing work has focused on claim sentence analysis while overlooking additional crucial attributes (e.g., the claimer and the main object associated with the claim). In this work, we present NewsClaims, a new benchmark for attribute-aware claim detection in the news domain. We extend the claim detection problem to include extraction of additional attributes related to each claim and release 889 claims annotated over 143 news articles. NewsClaims aims to benchmark claim detection systems in emerging scenarios, comprising unseen topics with little or no training data. To this end, we see that zero-shot and prompt-based baselines show promising performance on this benchmark, while still considerably behind human performance., Comment: Accepted at EMNLP 2022
Published: 2021

20. NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Author: Dhole, Kaustubh D., Gangal, Varun, Gehrmann, Sebastian, Gupta, Aadesh, Li, Zhenhao, Mahamood, Saad, Mahendiran, Abinaya, Mille, Simon, Shrivastava, Ashish, Tan, Samson, Wu, Tongshuang, Sohl-Dickstein, Jascha, Choi, Jinho D., Hovy, Eduard, Dusek, Ondrej, Ruder, Sebastian, Anand, Sajant, Aneja, Nagender, Banjade, Rabin, Barthe, Lisa, Behnke, Hanna, Berlot-Attwell, Ian, Boyle, Connor, Brun, Caroline, Cabezudo, Marco Antonio Sobrevilla, Cahyawijaya, Samuel, Chapuis, Emile, Che, Wanxiang, Choudhary, Mukund, Clauss, Christian, Colombo, Pierre, Cornell, Filip, Dagan, Gautier, Das, Mayukh, Dixit, Tanay, Dopierre, Thomas, Dray, Paul-Alexis, Dubey, Suchitra, Ekeinhor, Tatiana, Di Giovanni, Marco, Goyal, Tanya, Gupta, Rishabh, Hamla, Louanes, Han, Sang, Harel-Canada, Fabrice, Honore, Antoine, Jindal, Ishan, Joniak, Przemyslaw K., Kleyko, Denis, Kovatchev, Venelin, Krishna, Kalpesh, Kumar, Ashutosh, Langer, Stefan, Lee, Seungjae Ryan, Levinson, Corey James, Liang, Hualou, Liang, Kaizhao, Liu, Zhexiong, Lukyanenko, Andrey, Marivate, Vukosi, de Melo, Gerard, Meoni, Simon, Meyer, Maxime, Mir, Afnan, Moosavi, Nafise Sadat, Muennighoff, Niklas, Mun, Timothy Sum Hon, Murray, Kenton, Namysl, Marcin, Obedkova, Maria, Oli, Priti, Pasricha, Nivranshu, Pfister, Jan, Plant, Richard, Prabhu, Vinay, Pais, Vasile, Qin, Libo, Raji, Shahab, Rajpoot, Pawan Kumar, Raunak, Vikas, Rinberg, Roy, Roberts, Nicolas, Rodriguez, Juan Diego, Roux, Claude, S., Vasconcellos P. H., Sai, Ananya B., Schmidt, Robin M., Scialom, Thomas, Sefara, Tshephisho, Shamsi, Saqib N., Shen, Xudong, Shi, Haoyue, Shi, Yiwen, Shvets, Anna, Siegel, Nick, Sileo, Damien, Simon, Jamie, Singh, Chandan, Sitelew, Roman, Soni, Priyank, Sorensen, Taylor, Soto, William, Srivastava, Aman, Srivatsa, KV Aditya, Sun, Tony, T, Mukund Varma, Tabassum, A, Tan, Fiona Anting, Teehan, Ryan, Tiwari, Mo, Tolkiehn, Marie, Wang, Athena, Wang, Zijian, Wang, Gloria, Wang, Zijie J., Wei, Fuxuan, Wilie, Bryan, Winata, Genta Indra, Wu, Xinyi, Wydmański, Witold, Xie, Tianbao, Yaseen, Usama, Yee, Michael A., Zhang, Jing, and Zhang, Yue
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter)., Comment: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter
Published: 2021

21. Template Filling for Controllable Commonsense Reasoning

Author: Rajagopal, Dheeraj, Khetan, Vivek, Sacaleanu, Bogdan, Gershman, Anatole, Fano, Andrew, and Hovy, Eduard
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large-scale sequence-to-sequence models have shown to be adept at both multiple-choice and open-domain commonsense reasoning tasks. However, the current systems do not provide the ability to control the various attributes of the reasoning chain. To enable better controllability, we propose to study the commonsense reasoning as a template filling task (TemplateCSR) -- where the language models fills reasoning templates with the given constraints as control factors. As an approach to TemplateCSR, we (i) propose a dataset of commonsense reasoning template-expansion pairs and (ii) introduce POTTER, a pretrained sequence-to-sequence model using prompts to perform commonsense reasoning across concepts. Our experiments show that our approach outperforms baselines both in generation metrics and factuality metrics. We also present a detailed error analysis on our approach's ability to reliably perform commonsense reasoning.
Published: 2021

22. Think about it! Improving defeasible reasoning by first modeling the question scenario

Author: Madaan, Aman, Tandon, Niket, Rajagopal, Dheeraj, Clark, Peter, Yang, Yiming, and Hovy, Eduard
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence. Existing cognitive science literature on defeasible reasoning suggests that a person forms a mental model of the problem scenario before answering questions. Our research goal asks whether neural models can similarly benefit from envisioning the question scenario before answering a defeasible query. Our approach is, given a question, to have a model first create a graph of relevant influences, and then leverage that graph as an additional input when answering the question. Our system, CURIOUS, achieves a new state-of-the-art on three different defeasible reasoning datasets. This result is significant as it illustrates that performance can be improved by guiding a system to "think about" a question and explicitly model the scenario, rather than answering reflexively. Code, data, and pre-trained models are located at https://github.com/madaan/thinkaboutit., Comment: EMNLP 2021
Published: 2021

23. Interpreting Deep Learning Models in Natural Language Processing: A Review

Author: Sun, Xiaofei, Yang, Diyi, Li, Xiaoya, Zhang, Tianwei, Meng, Yuxian, Qiu, Han, Wang, Guoyin, Hovy, Eduard, and Li, Jiwei
Subjects: Computer Science - Computation and Language
Abstract: Neural network models have achieved state-of-the-art performances in a wide range of natural language processing (NLP) tasks. However, a long-standing criticism against neural network models is the lack of interpretability, which not only reduces the reliability of neural NLP systems but also limits the scope of their applications in areas where interpretability is essential (e.g., health care applications). In response, the increasing interest in interpreting neural NLP models has spurred a diverse array of interpretation methods over recent years. In this survey, we provide a comprehensive review of various interpretation methods for neural models in NLP. We first stretch out a high-level taxonomy for interpretation methods in NLP, i.e., training-based approaches, test-based approaches, and hybrid approaches. Next, we describe sub-categories in each category in detail, e.g., influence-function based methods, KNN-based methods, attention-based models, saliency-based methods, perturbation-based methods, etc. We point out deficiencies of current methods and suggest some avenues for future research.
Published: 2021

24. Investigating Robustness of Dialog Models to Popular Figurative Language Constructs

Author: Jhamtani, Harsh, Gangal, Varun, Hovy, Eduard, and Berg-Kirkpatrick, Taylor
Subjects: Computer Science - Computation and Language
Abstract: Humans often employ figurative language use in communication, including during interactions with dialog systems. Thus, it is important for real-world dialog systems to be able to handle popular figurative language constructs like metaphor and simile. In this work, we analyze the performance of existing dialog models in situations where the input dialog context exhibits use of figurative language. We observe large gaps in handling of figurative language when evaluating the models on two open domain dialog datasets. When faced with dialog contexts consisting of figurative language, some models show very large drops in performance compared to contexts without figurative language. We encourage future research in dialog modeling to separately analyze and report results on figurative language in order to better test model capabilities relevant to real-world use. Finally, we propose lightweight solutions to help existing models become more robust to figurative language by simply using an external resource to translate figurative language to literal (non-figurative) forms while preserving the meaning to the best extent possible., Comment: EMNLP 2021 Short Paper
Published: 2021

25. Knowledge-Enhanced Evidence Retrieval for Counterargument Generation

Author: Jo, Yohan, Yoo, Haneul, Bak, JinYeong, Oh, Alice, Reed, Chris, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: Finding counterevidence to statements is key to many tasks, including counterargument generation. We build a system that, given a statement, retrieves counterevidence from diverse sources on the Web. At the core of this system is a natural language inference (NLI) model that determines whether a candidate sentence is valid counterevidence or not. Most NLI models to date, however, lack proper reasoning abilities necessary to find counterevidence that involves complex inference. Thus, we present a knowledge-enhanced NLI model that aims to handle causality- and example-based inference by incorporating knowledge graphs. Our NLI model outperforms baselines for NLI tasks, especially for instances that require the targeted inference. In addition, this NLI model further improves the counterevidence retrieval system, notably finding complex counterevidence better., Comment: To appear in Findings of EMNLP 2021
Published: 2021

26. Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models

Author: Feng, Steven Y., Lu, Kevin, Tao, Zhuofu, Alikhani, Malihe, Mitamura, Teruko, Hovy, Eduard, and Gangal, Varun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We investigate the use of multimodal information contained in images as an effective method for enhancing the commonsense of Transformer models for text generation. We perform experiments using BART and T5 on concept-to-text generation, specifically the task of generative commonsense reasoning, or CommonGen. We call our approach VisCTG: Visually Grounded Concept-to-Text Generation. VisCTG involves captioning images representing appropriate everyday scenarios, and using these captions to enrich and steer the generation process. Comprehensive evaluation and analysis demonstrate that VisCTG noticeably improves model performance while successfully addressing several issues of the baseline generations, including poor commonsense, fluency, and specificity., Comment: Accepted to AAAI 2022. Code at https://github.com/styfeng/VisCTG
Published: 2021

27. SAPPHIRE: Approaches for Enhanced Concept-to-Text Generation

Author: Feng, Steven Y., Huynh, Jessica, Narisetty, Chaitanya, Hovy, Eduard, and Gangal, Varun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We motivate and propose a suite of simple but effective improvements for concept-to-text generation called SAPPHIRE: Set Augmentation and Post-hoc PHrase Infilling and REcombination. We demonstrate their effectiveness on generative commonsense reasoning, a.k.a. the CommonGen task, through experiments using both BART and T5 models. Through extensive automatic and human evaluation, we show that SAPPHIRE noticeably improves model performance. An in-depth qualitative analysis illustrates that SAPPHIRE effectively addresses many issues of the baseline model generations, including lack of commonsense, insufficient specificity, and poor fluency., Comment: INLG 2021 [Best Long Paper]. Code available at https://github.com/styfeng/SAPPHIRE
Published: 2021

28. Comparative Error Analysis in Neural and Finite-state Models for Unsupervised Character-level Transduction

Author: Ryskina, Maria, Hovy, Eduard, Berg-Kirkpatrick, Taylor, and Gormley, Matthew R.
Subjects: Computer Science - Computation and Language
Abstract: Traditionally, character-level transduction problems have been solved with finite-state models designed to encode structural and linguistic knowledge of the underlying process, whereas recent approaches rely on the power and flexibility of sequence-to-sequence models with attention. Focusing on the less explored unsupervised learning scenario, we compare the two model classes side by side and find that they tend to make different types of errors even when achieving comparable performance. We analyze the distributions of different error classes using two unsupervised tasks as testbeds: converting informally romanized text into the native script of its language (for Russian, Arabic, and Kannada) and translating between a pair of closely related languages (Serbian and Bosnian). Finally, we investigate how combining finite-state and sequence-to-sequence models at decoding time affects the output quantitatively and qualitatively., Comment: Accepted to SIGMORPHON 2021
Published: 2021

29. Improving Automated Evaluation of Open Domain Dialog via Diverse Reference Augmentation

Author: Gangal, Varun, Jhamtani, Harsh, Hovy, Eduard, and Berg-Kirkpatrick, Taylor
Subjects: Computer Science - Computation and Language
Abstract: Multiple different responses are often plausible for a given open domain dialog context. Prior work has shown the importance of having multiple valid reference responses for meaningful and robust automated evaluations. In such cases, common practice has been to collect more human written references. However, such collection can be expensive, time consuming, and not easily scalable. Instead, we propose a novel technique for automatically expanding a human generated reference to a set of candidate references. We fetch plausible references from knowledge sources, and adapt them so that they are more fluent in context of the dialog instance in question. More specifically, we use (1) a commonsense knowledge base to elicit a large number of plausible reactions given the dialog history (2) relevant instances retrieved from dialog corpus, using similar past as well as future contexts. We demonstrate that our automatically expanded reference sets lead to large improvements in correlations of automated metrics with human ratings of system outputs for DailyDialog dataset., Comment: Findings of ACL 2021
Published: 2021

30. More Identifiable yet Equally Performant Transformers for Text Classification

Author: Bhardwaj, Rishabh, Majumder, Navonil, Poria, Soujanya, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: Interpretability is an important aspect of the trustworthiness of a model's predictions. Transformer's predictions are widely explained by the attention weights, i.e., a probability distribution generated at its self-attention unit (head). Current empirical studies provide shreds of evidence that attention weights are not explanations by proving that they are not unique. A recent study showed theoretical justifications to this observation by proving the non-identifiability of attention weights. For a given input to a head and its output, if the attention weights generated in it are unique, we call the weights identifiable. In this work, we provide deeper theoretical analysis and empirical observations on the identifiability of attention weights. Ignored in the previous works, we find the attention weights are more identifiable than we currently perceive by uncovering the hidden role of the key vector. However, the weights are still prone to be non-unique attentions that make them unfit for interpretation. To tackle this issue, we provide a variant of the encoder layer that decouples the relationship between key and value vector and provides identifiable weights up to the desired length of the input. We prove the applicability of such variations by providing empirical justifications on varied text classification tasks. The implementations are available at https://github.com/declare-lab/identifiable-transformers., Comment: ACL 2021
Published: 2021

31. Classifying Argumentative Relations Using Logical Mechanisms and Argumentation Schemes

Author: Jo, Yohan, Bang, Seojin, Reed, Chris, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: While argument mining has achieved significant success in classifying argumentative relations between statements (support, attack, and neutral), we have a limited computational understanding of logical mechanisms that constitute those relations. Most recent studies rely on black-box models, which are not as linguistically insightful as desired. On the other hand, earlier studies use rather simple lexical features, missing logical relations between statements. To overcome these limitations, our work classifies argumentative relations based on four logical and theory-informed mechanisms between two statements, namely (i) factual consistency, (ii) sentiment coherence, (iii) causal relation, and (iv) normative relation. We demonstrate that our operationalization of these logical mechanisms classifies argumentative relations without directly training on data labeled with the relations, significantly better than several unsupervised baselines. We further demonstrate that these mechanisms also improve supervised classifiers through representation learning., Comment: To Appear in TACL 2021
Published: 2021

32. Could you give me a hint? Generating inference graphs for defeasible reasoning

Author: Madaan, Aman, Rajagopal, Dheeraj, Tandon, Niket, Yang, Yiming, and Hovy, Eduard
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence. A commonly used method in cognitive science and logic literature is to handcraft argumentation supporting inference graphs. While humans find inference graphs very useful for reasoning, constructing them at scale is difficult. In this paper, we automatically generate such inference graphs through transfer learning from another NLP task that shares the kind of reasoning that inference graphs support. Through automated metrics and human evaluation, we find that our method generates meaningful graphs for the defeasible inference task. Human accuracy on this task improves by 20% by consulting the generated graphs. Our findings open up exciting new research avenues for cases where machine reasoning can help human reasoning. (A dataset of 230,000 influence graphs for each defeasible query is located at: https://tinyurl.com/defeasiblegraphs.), Comment: Findings of the Association for Computational Linguistics: ACL 2021
Published: 2021

33. A Survey of Data Augmentation Approaches for NLP

Author: Feng, Steven Y., Gangal, Varun, Wei, Jason, Chandar, Sarath, Vosoughi, Soroush, Mitamura, Teruko, and Hovy, Eduard
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks that require large amounts of training data. Despite this recent upsurge, this area is still relatively underexplored, perhaps due to the challenges posed by the discrete nature of language data. In this paper, we present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner. We first introduce and motivate data augmentation for NLP, and then discuss major methodologically representative approaches. Next, we highlight techniques that are used for popular NLP applications and tasks. We conclude by outlining current challenges and directions for future research. Overall, our paper aims to clarify the landscape of existing literature in data augmentation for NLP and motivate additional work in this area. We also present a GitHub repository with a paper list that will be continuously updated at https://github.com/styfeng/DataAug4NLP, Comment: Accepted to ACL 2021 Findings. GitHub repo with paper list at https://github.com/styfeng/DataAug4NLP ; Talk at https://www.youtube.com/watch?v=kNBVesKUZCk&ab_channel=StevenFeng ; Podcast at https://www.youtube.com/watch?v=qmqyT_97Poc&ab_channel=GradientFlow and https://thedataexchange.media/data-augmentation-in-natural-language-processing
Published: 2021

34. NAREOR: The Narrative Reordering Problem

Author: Gangal, Varun, Feng, Steven Y., Alikhani, Malihe, Mitamura, Teruko, and Hovy, Eduard
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Many implicit inferences exist in text depending on how it is structured that can critically impact the text's interpretation and meaning. One such structural aspect present in text with chronology is the order of its presentation. For narratives or stories, this is known as the narrative order. Reordering a narrative can impact the temporal, causal, event-based, and other inferences readers draw from it, which in turn can have strong effects both on its interpretation and interestingness. In this paper, we propose and investigate the task of Narrative Reordering (NAREOR) which involves rewriting a given story in a different narrative order while preserving its plot. We present a dataset, NAREORC, with human rewritings of stories within ROCStories in non-linear orders, and conduct a detailed analysis of it. Further, we propose novel task-specific training methods with suitable evaluation metrics. We perform experiments on NAREORC using state-of-the-art models such as BART and T5 and conduct extensive automatic and human evaluations. We demonstrate that although our models can perform decently, NAREOR is a challenging task with potential for further exploration. We also investigate two applications of NAREOR: generation of more interesting variations of stories and serving as adversarial sets for temporal/event-related tasks, besides discussing other prospective ones, such as for pedagogical setups related to language skills like essay writing and applications to medicine involving clinical narratives., Comment: Accepted to AAAI 2022; Code at https://github.com/vgtomahawk/NAREORCamReady
Published: 2021

35. StylePTB: A Compositional Benchmark for Fine-grained Controllable Text Style Transfer

Author: Lyu, Yiwei, Liang, Paul Pu, Pham, Hai, Hovy, Eduard, Póczos, Barnabás, Salakhutdinov, Ruslan, and Morency, Louis-Philippe
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Text style transfer aims to controllably generate text with targeted stylistic changes while maintaining core meaning from the source sentence constant. Many of the existing style transfer benchmarks primarily focus on individual high-level semantic changes (e.g. positive to negative), which enable controllability at a high level but do not offer fine-grained control involving sentence structure, emphasis, and content of the sentence. In this paper, we introduce a large-scale benchmark, StylePTB, with (1) paired sentences undergoing 21 fine-grained stylistic changes spanning atomic lexical, syntactic, semantic, and thematic transfers of text, as well as (2) compositions of multiple transfers which allow modeling of fine-grained stylistic changes as building blocks for more complex, high-level transfers. By benchmarking existing methods on StylePTB, we find that they struggle to model fine-grained changes and have an even more difficult time composing multiple styles. As a result, StylePTB brings novel challenges that we hope will encourage future research in controllable text style transfer, compositional models, and learning disentangled representations. Solving these challenges would present important steps towards controllable text generation., Comment: NAACL 2021, code available at https://github.com/lvyiwei1/StylePTB/
Published: 2021

36. CURIE: An Iterative Querying Approach for Reasoning About Situations

Author: Rajagopal, Dheeraj, Madaan, Aman, Tandon, Niket, Yang, Yiming, Prabhumoye, Shrimai, Ravichander, Abhilasha, Clark, Peter, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: Recently, models have been shown to predict the effects of unexpected situations, e.g., would cloudy skies help or hinder plant growth? Given a context, the goal of such situational reasoning is to elicit the consequences of a new situation (st) that arises in that context. We propose a method to iteratively build a graph of relevant consequences explicitly in a structured situational graph (st-graph) using natural language queries over a finetuned language model (M). Across multiple domains, CURIE generates st-graphs that humans find relevant and meaningful in eliciting the consequences of a new situation. We show that st-graphs generated by CURIE improve a situational reasoning end task (WIQA-QA) by 3 points on accuracy by simply augmenting their input with our generated situational graphs, especially for a hard subset that requires background knowledge and multi-hop reasoning., Comment: This paper builds upon EIGEN (arXiv:2010.11764) and proposes a general framework for situational reasoning
Published: 2021

37. SelfExplain: A Self-Explaining Architecture for Neural Text Classifiers

Author: Rajagopal, Dheeraj, Balachandran, Vidhisha, Hovy, Eduard, and Tsvetkov, Yulia
Subjects: Computer Science - Computation and Language
Abstract: We introduce SelfExplain, a novel self-explaining model that explains a text classifier's predictions using phrase-based concepts. SelfExplain augments existing neural classifiers by adding (1) a globally interpretable layer that identifies the most influential concepts in the training set for a given sample and (2) a locally interpretable layer that quantifies the contribution of each local input concept by computing a relevance score relative to the predicted label. Experiments across five text-classification datasets show that SelfExplain facilitates interpretability without sacrificing performance. Most importantly, explanations from SelfExplain show sufficiency for model predictions and are perceived as adequate, trustworthy and understandable by human judges compared to existing widely-used baselines.
Published: 2021

38. NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

Author: Ravichander, Abhilasha, Dalmia, Siddharth, Ryskina, Maria, Metze, Florian, Hovy, Eduard, and Black, Alan W
Subjects: Computer Science - Computation and Language
Abstract: When Question-Answering (QA) systems are deployed in the real world, users query them through a variety of interfaces, such as speaking to voice assistants, typing questions into a search engine, or even translating questions to languages supported by the QA system. While there has been significant community attention devoted to identifying correct answers in passages assuming a perfectly formed question, we show that components in the pipeline that precede an answering engine can introduce varied and considerable sources of error, and performance can degrade substantially based on these upstream noise sources even for powerful pre-trained QA models. We conclude that there is substantial room for progress before QA systems can be effectively deployed, highlight the need for QA evaluation to expand to consider real-world use, and hope that our findings will spur greater community interest in the issues that arise when our systems actually need to be of utility to humans., Comment: EACL 2021
Published: 2021

39. Measuring and Improving Consistency in Pretrained Language Models

Author: Elazar, Yanai, Kassner, Nora, Ravfogel, Shauli, Ravichander, Abhilasha, Hovy, Eduard, Schütze, Hinrich, and Goldberg, Yoav
Subjects: Computer Science - Computation and Language
Abstract: Consistency of a model -- that is, the invariance of its behavior under meaning-preserving alternations in its input -- is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel, we show that the consistency of all PLMs we experiment with is poor -- though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness., Comment: Accepted to the TACL journal, pre-MIT Press publication version
Published: 2021

40. Exploring Neural Entity Representations for Semantic Information

Author: Runge, Andrew and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: Neural methods for embedding entities are typically extrinsically evaluated on downstream tasks and, more recently, intrinsically using probing tasks. Downstream task-based comparisons are often difficult to interpret due to differences in task structure, while probing task evaluations often look at only a few attributes and models. We address both of these issues by evaluating a diverse set of eight neural entity embedding methods on a set of simple probing tasks, demonstrating which methods are able to remember words used to describe entities, learn type, relationship and factual information, and identify how frequently an entity is mentioned. We also compare these methods in a unified framework on two entity linking tasks and discuss how they generalize to different model architectures and datasets., Comment: 9 pages, 1 figure
Published: 2020

41. Incorporating a Local Translation Mechanism into Non-autoregressive Translation

Author: Kong, Xiang, Zhang, Zhisong, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: In this work, we introduce a novel local autoregressive translation (LAT) mechanism into non-autoregressive translation (NAT) models so as to capture local dependencies among tar-get outputs. Specifically, for each target decoding position, instead of only one token, we predict a short sequence of tokens in an autoregressive way. We further design an efficient merging algorithm to align and merge the out-put pieces into one final output sequence. We integrate LAT into the conditional masked language model (CMLM; Ghazvininejad et al.,2019) and similarly adopt iterative decoding. Empirical results on five translation tasks show that compared with CMLM, our method achieves comparable or better performance with fewer decoding iterations, bringing a 2.5xspeedup. Further analysis indicates that our method reduces repeated translations and performs better at longer sentences., Comment: EMNLP 2020
Published: 2020

42. Event-Related Bias Removal for Real-time Disaster Events

Author: Spiliopoulou, Evangelia, Maza, Salvador Medina, Hovy, Eduard, and Hauptmann, Alexander
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Social media has become an important tool to share information about crisis events such as natural disasters and mass attacks. Detecting actionable posts that contain useful information requires rapid analysis of huge volume of data in real-time. This poses a complex problem due to the large amount of posts that do not contain any actionable information. Furthermore, the classification of information in real-time systems requires training on out-of-domain data, as we do not have any data from a new emerging crisis. Prior work focuses on models pre-trained on similar event types. However, those models capture unnecessary event-specific biases, like the location of the event, which affect the generalizability and performance of the classifiers on new unseen data from an emerging new event. In our work, we train an adversarial neural model to remove latent event-specific biases and improve the performance on tweet importance classification., Comment: To appear in EMNLP Findings 2020
Published: 2020

43. A Dataset for Tracking Entities in Open Domain Procedural Text

Author: Tandon, Niket, Sakaguchi, Keisuke, Mishra, Bhavana Dalvi, Rajagopal, Dheeraj, Clark, Peter, Guerquin, Michal, Richardson, Kyle, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky,opaque, and clear. Previous formulations of this task provide the text and entities involved,and ask how those entities change for just a small, pre-defined set of attributes (e.g., location), limiting their fidelity. Our solution is a new task formulation where given just a procedural text as input, the task is to generate a set of state change tuples(entity, at-tribute, before-state, after-state)for each step,where the entity, attribute, and state values must be predicted from an open vocabulary. Using crowdsourcing, we create OPENPI1, a high-quality (91.5% coverage as judged by humans and completely vetted), and large-scale dataset comprising 29,928 state changes over 4,050 sentences from 810 procedural real-world paragraphs from WikiHow.com. A current state-of-the-art generation model on this task achieves 16.1% F1 based on BLEU metric, leaving enough room for novel model architectures., Comment: To appear in EMNLP 2020
Published: 2020

44. EIGEN: Event Influence GENeration using Pre-trained Language Models

Author: Madaan, Aman, Rajagopal, Dheeraj, Yang, Yiming, Ravichander, Abhilasha, Hovy, Eduard, and Prabhumoye, Shrimai
Subjects: Computer Science - Computation and Language
Abstract: Reasoning about events and tracking their influences is fundamental to understanding processes. In this paper, we present EIGEN - a method to leverage pre-trained language models to generate event influences conditioned on a context, nature of their influence, and the distance in a reasoning chain. We also derive a new dataset for research and evaluation of methods for event influence generation. EIGEN outperforms strong baselines both in terms of automated evaluation metrics (by 10 ROUGE points) and human judgments on closeness to reference and relevance of generations. Furthermore, we show that the event influences generated by EIGEN improve the performance on a "what-if" Question Answering (WIQA) benchmark (over 3% F1), especially for questions that require background knowledge and multi-hop reasoning.
Published: 2020

45. Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability

Author: Meng, Yuxian, Fan, Chun, Sun, Zijun, Hovy, Eduard, Wu, Fei, and Li, Jiwei
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Any prediction from a model is made by a combination of learning history and test stimuli. This provides significant insights for improving model interpretability: {\it because of which part(s) of which training example(s), the model attends to which part(s) of a test example}. Unfortunately, existing methods to interpret a model's predictions are only able to capture a single aspect of either test stimuli or learning history, and evidences from both are never combined or integrated. In this paper, we propose an efficient and differentiable approach to make it feasible to interpret a model's prediction by jointly examining training history and test stimuli. Test stimuli is first identified by gradient-based methods, signifying {\it the part of a test example that the model attends to}. The gradient-based saliency scores are then propagated to training examples using influence functions to identify {\it which part(s) of which training example(s)} make the model attends to the test stimuli. The system is differentiable and time efficient: the adoption of saliency scores from gradient-based methods allows us to efficiently trace a model's prediction through test stimuli, and then back to training examples through influence functions. We demonstrate that the proposed methodology offers clear explanations about neural model decisions, along with being useful for performing error analysis, crafting adversarial examples and fixing erroneously classified examples.
Published: 2020

46. Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task

Author: Kang, Dongyeop and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: Despite the recent success of contextualized language models on various NLP tasks, language model itself cannot capture textual coherence of a long, multi-sentence document (e.g., a paragraph). Humans often make structural decisions on what and how to say about before making utterances. Guiding surface realization with such high-level decisions and structuring text in a coherent way is essentially called a planning process. Where can the model learn such high-level coherence? A paragraph itself contains various forms of inductive coherence signals called self-supervision in this work, such as sentence orders, topical keywords, rhetorical structures, and so on. Motivated by that, this work proposes a new paragraph completion task PARCOM; predicting masked sentences in a paragraph. However, the task suffers from predicting and selecting appropriate topical content with respect to the given context. To address that, we propose a self-supervised text planner SSPlanner that predicts what to say first (content prediction), then guides the pretrained language model (surface realization) using the predicted content. SSPlanner outperforms the baseline generation models on the paragraph completion task in both automatic and human evaluation. We also find that a combination of noun and verb types of keywords is the most effective for content selection. As more number of content keywords are provided, overall generation quality also increases., Comment: EMNLP 2020
Published: 2020

47. BERTering RAMS: What and How Much does BERT Already Know About Event Arguments? -- A Study on the RAMS Dataset

Author: Gangal, Varun and Hovy, Eduard
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Using the attention map based probing frame-work from (Clark et al., 2019), we observe that, on the RAMS dataset (Ebner et al., 2020), BERT's attention heads have modest but well above-chance ability to spot event arguments sans any training or domain finetuning, vary-ing from a low of 17.77% for Place to a high of 51.61% for Artifact. Next, we find that linear combinations of these heads, estimated with approx 11% of available total event argument detection supervision, can push performance well-higher for some roles - highest two being Victim (68.29% Accuracy) and Artifact(58.82% Accuracy). Furthermore, we investigate how well our methods do for cross-sentence event arguments. We propose a procedure to isolate "best heads" for cross-sentence argument detection separately of those for intra-sentence arguments. The heads thus estimated have superior cross-sentence performance compared to their jointly estimated equivalents, albeit only under the unrealistic assumption that we already know the argument is present in an-other sentence. Lastly, we seek to isolate to what extent our numbers stem from lexical frequency based associations between gold arguments and roles. We propose NONCE, a scheme to create adversarial test examples by replacing gold arguments with randomly generated "nonce" words. We find that learnt linear combinations are robust to NONCE, though individual best heads can be more sensitive., Comment: Accepted for the BlackBoxNLP 2020 Workshop @EMNLP 2020; Pre-camera-ready copy
Published: 2020

48. Detecting Attackable Sentences in Arguments

Author: Jo, Yohan, Bang, Seojin, Manzoor, Emaad, Hovy, Eduard, and Reed, Chris
Subjects: Computer Science - Computation and Language
Abstract: Finding attackable sentences in an argument is the first step toward successful refutation in argumentation. We present a first large-scale analysis of sentence attackability in online arguments. We analyze driving reasons for attacks in argumentation and identify relevant characteristics of sentences. We demonstrate that a sentence's attackability is associated with many of these characteristics regarding the sentence's content, proposition types, and tone, and that an external knowledge source can provide useful information about attackability. Building on these findings, we demonstrate that machine learning models can automatically detect attackable sentences in arguments, significantly better than several baselines and comparably well to laypeople., Comment: EMNLP 2020
Published: 2020

49. Extracting Implicitly Asserted Propositions in Argumentation

Author: Jo, Yohan, Visser, Jacky, Reed, Chris, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: Argumentation accommodates various rhetorical devices, such as questions, reported speech, and imperatives. These rhetorical tools usually assert argumentatively relevant propositions rather implicitly, so understanding their true meaning is key to understanding certain arguments properly. However, most argument mining systems and computational linguistics research have paid little attention to implicitly asserted propositions in argumentation. In this paper, we examine a wide range of computational methods for extracting propositions that are implicitly asserted in questions, reported speech, and imperatives in argumentation. By evaluating the models on a corpus of 2016 U.S. presidential debates and online commentary, we demonstrate the effectiveness and limitations of the computational models. Our study may inform future research on argument mining and the semantics of these rhetorical devices in argumentation., Comment: EMNLP 2020
Published: 2020

50. GenAug: Data Augmentation for Finetuning Text Generators

Author: Feng, Steven Y., Gangal, Varun, Kang, Dongyeop, Mitamura, Teruko, and Hovy, Eduard
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for low-data regimes. We propose and evaluate various augmentation methods, including some that incorporate external knowledge, for finetuning GPT-2 on a subset of Yelp Reviews. We also examine the relationship between the amount of augmentation and the quality of the generated text. We utilize several metrics that evaluate important aspects of the generated text including its diversity and fluency. Our experiments demonstrate that insertion of character-level synthetic noise and keyword replacement with hypernyms are effective augmentation methods, and that the quality of generations improves to a peak at approximately three times the amount of original data., Comment: EMNLP 2020 Deep Learning Inside Out (DeeLIO) Workshop; Code available at https://github.com/styfeng/GenAug
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

450 results on '"Hovy, Eduard"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources