Author: "Prabhumoye, Shrimai" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Prabhumoye, Shrimai"' showing total 117 results

Start Over Author "Prabhumoye, Shrimai"

117 results on '"Prabhumoye, Shrimai"'

1. MIND: Math Informed syNthetic Dialogues for Pretraining LLMs

Author: Akter, Syeda Nahida, Prabhumoye, Shrimai, Kamalu, John, Satheesh, Sanjeev, Nyberg, Eric, Patwary, Mostofa, Shoeybi, Mohammad, and Catanzaro, Bryan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: The utility of synthetic data to enhance pretraining data quality and hence to improve downstream task accuracy has been widely explored in recent large language models (LLMs). Yet, these approaches fall inadequate in complex, multi-hop and mathematical reasoning tasks as the synthetic data typically fails to add complementary knowledge to the existing raw corpus. In this work, we propose a novel large-scale and diverse Math Informed syNthetic Dialogue (MIND) generation method that improves the mathematical reasoning ability of LLMs. Specifically, using MIND, we generate synthetic conversations based on OpenWebMath (OWM), resulting in a new math corpus, MIND-OWM. Our experiments with different conversational settings reveal that incorporating knowledge gaps between dialog participants is essential for generating high-quality math data. We further identify an effective way to format and integrate synthetic and raw data during pretraining to maximize the gain in mathematical reasoning, emphasizing the need to restructure raw data rather than use it as-is. Compared to pretraining just on raw data, a model pretrained on MIND-OWM shows significant boost in mathematical reasoning (GSM8K: +13.42%, MATH: +2.30%), including superior performance in specialized knowledge (MMLU: +4.55%, MMLU-STEM: +4.28%) and general purpose reasoning tasks (GENERAL REASONING: +2.51%)., Comment: 31 pages, 5 figures, 14 tables
Published: 2024

2. Data, Data Everywhere: A Guide for Pretraining Dataset Construction

Author: Parmar, Jupinder, Prabhumoye, Shrimai, Jennings, Joseph, Liu, Bo, Jhunjhunwala, Aastha, Wang, Zhilin, Patwary, Mostofa, Shoeybi, Mohammad, and Catanzaro, Bryan
Subjects: Computer Science - Computation and Language
Abstract: The impressive capabilities of recent language models can be largely attributed to the multi-trillion token pretraining datasets that they are trained on. However, model developers fail to disclose their construction methodology which has lead to a lack of open information on how to develop effective pretraining sets. To address this issue, we perform the first systematic study across the entire pipeline of pretraining set construction. First, we run ablations on existing techniques for pretraining set development to identify which methods translate to the largest gains in model accuracy on downstream evaluations. Then, we categorize the most widely used data source, web crawl snapshots, across the attributes of toxicity, quality, type of speech, and domain. Finally, we show how such attribute information can be used to further refine and improve the quality of a pretraining set. These findings constitute an actionable set of steps that practitioners can use to develop high quality pretraining sets., Comment: Accepted as an oral presentation at EMNLP 2024
Published: 2024

3. Nemotron-4 340B Technical Report

Author: Nvidia, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, and Zhu, Chen
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.
Published: 2024

4. AgentKit: Structured LLM Reasoning with Dynamic Graphs

Author: Wu, Yue, Fan, Yewen, Min, So Yeon, Prabhumoye, Shrimai, McAleer, Stephen, Bisk, Yonatan, Salakhutdinov, Ruslan, Li, Yuanzhi, and Mitchell, Tom
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offers a unified framework for explicitly constructing a complex "thought process" from simple natural language prompts. The basic building block in AgentKit is a node, containing a natural language prompt for a specific subtask. The user then puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process". For example, for the task of writing a paper, one may start with the thought process of 1) identify a core message, 2) identify prior research gaps, etc. The nodes in AgentKit can be designed and combined in different ways to implement multiple advanced capabilities including on-the-fly hierarchical planning, reflection, and learning from interactions. In addition, due to the modular nature and the intuitive design to simulate explicit human thought process, a basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience. Quantitatively, we show that agents designed through AgentKit achieve SOTA performance on WebShop and Crafter. These advances underscore AgentKit's potential in making LLM agents effective and accessible for a wider range of applications. https://github.com/holmeswww/AgentKit
Published: 2024

5. Nemotron-4 15B Technical Report

Author: Parmar, Jupinder, Prabhumoye, Shrimai, Jennings, Joseph, Patwary, Mostofa, Subramanian, Sandeep, Su, Dan, Zhu, Chen, Narayanan, Deepak, Jhunjhunwala, Aastha, Dattagupta, Ayush, Jawa, Vibhu, Liu, Jiwei, Mahabaleshwarkar, Ameya, Nitski, Osvald, Brundyn, Annika, Maki, James, Martinez, Miguel, You, Jiaxuan, Kamalu, John, LeGresley, Patrick, Fridman, Denys, Casper, Jared, Aithal, Ashwath, Kuchaiev, Oleksii, Shoeybi, Mohammad, Cohen, Jonathan, and Catanzaro, Bryan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.
Published: 2024

6. SPRING: Studying the Paper and Reasoning to Play Games

Author: Wu, Yue, Prabhumoye, Shrimai, Min, So Yeon, Bisk, Yonatan, Salakhutdinov, Ruslan, Azaria, Amos, Mitchell, Tom, and Li, Yuanzhi
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Open-world survival games pose significant challenges for AI algorithms due to their multi-tasking, deep exploration, and goal prioritization requirements. Despite reinforcement learning (RL) being popular for solving games, its high sample complexity limits its effectiveness in complex open-world games like Crafter or Minecraft. We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM). Prompted with the LaTeX source as game context and a description of the agent's current observation, our SPRING framework employs a directed acyclic graph (DAG) with game-related questions as nodes and dependencies as edges. We identify the optimal action to take in the environment by traversing the DAG and calculating LLM responses for each node in topological order, with the LLM's answer to final node directly translating to environment actions. In our experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RL baselines, trained for 1M steps, without any training. Finally, we show the potential of games as a test bed for LLMs.
Published: 2023

7. Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents

Author: Wu, Yue, Min, So Yeon, Bisk, Yonatan, Salakhutdinov, Ruslan, Azaria, Amos, Li, Yuanzhi, Mitchell, Tom, and Prabhumoye, Shrimai
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Pre-trained large language models (LLMs) capture procedural knowledge about the world. Recent work has leveraged LLM's ability to generate abstract plans to simplify challenging control tasks, either by action scoring, or action modeling (fine-tuning). However, the transformer architecture inherits several constraints that make it difficult for the LLM to directly serve as the agent: e.g. limited input lengths, fine-tuning inefficiency, bias from pre-training, and incompatibility with non-text environments. To maintain compatibility with a low-level trainable actor, we propose to instead use the knowledge in LLMs to simplify the control problem, rather than solving it. We propose the Plan, Eliminate, and Track (PET) framework. The Plan module translates a task description into a list of high-level sub-tasks. The Eliminate module masks out irrelevant objects and receptacles from the observation for the current sub-task. Finally, the Track module determines whether the agent has accomplished each sub-task. On the AlfWorld instruction following benchmark, the PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
Published: 2023

8. Self-Refine: Iterative Refinement with Self-Feedback

Author: Madaan, Aman, Tandon, Niket, Gupta, Prakhar, Hallinan, Skyler, Gao, Luyu, Wiegreffe, Sarah, Alon, Uri, Dziri, Nouha, Prabhumoye, Shrimai, Yang, Yiming, Gupta, Shashank, Majumder, Bodhisattwa Prasad, Hermann, Katherine, Welleck, Sean, Yazdanbakhsh, Amir, and Clark, Peter
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it to refine itself, iteratively. Self-Refine does not require any supervised training data, additional training, or reinforcement learning, and instead uses a single LLM as the generator, refiner, and feedback provider. We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3.5, ChatGPT, and GPT-4) LLMs. Across all evaluated tasks, outputs generated with Self-Refine are preferred by humans and automatic metrics over those generated with the same LLM using conventional one-step generation, improving by ~20% absolute on average in task performance. Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test time using our simple, standalone approach., Comment: Code, data, and demo at https://selfrefine.info/
Published: 2023

9. Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Author: Prabhumoye, Shrimai, Patwary, Mostofa, Shoeybi, Mohammad, and Catanzaro, Bryan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Pretrained large language models have become indispensable for solving various natural language processing (NLP) tasks. However, safely deploying them in real world applications is challenging because they generate toxic content. To address this challenge, we propose two novel pretraining data augmentation strategies that significantly reduce model toxicity without compromising its utility. Our two strategies are: (1) MEDA: adds raw toxicity score as meta-data to the pretraining samples, and (2) INST: adds instructions to those samples indicating their toxicity. Our results indicate that our best performing strategy (INST) substantially reduces the toxicity probability up to 61% while preserving the accuracy on five benchmark NLP tasks as well as improving AUC scores on four bias detection tasks by 1.3%. We also demonstrate the generalizability of our techniques by scaling the number of training samples and the number of model parameters., Comment: This paper will be presented at EACL 2023
Published: 2023

10. BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models

Author: Kocielnik, Rafal, Prabhumoye, Shrimai, Zhang, Vivian, Jiang, Roy, Alvarez, R. Michael, and Anandkumar, Anima
Subjects: Computer Science - Computation and Language, Computer Science - Computers and Society, 68T50, I.2.7, J.5, K.4.1
Abstract: Pretrained Language Models (PLMs) harbor inherent social biases that can result in harmful real-world implications. Such social biases are measured through the probability values that PLMs output for different social groups and attributes appearing in a set of test sentences. However, bias testing is currently cumbersome since the test sentences are generated either from a limited set of manual templates or need expensive crowd-sourcing. We instead propose using ChatGPT for the controllable generation of test sentences, given any arbitrary user-specified combination of social groups and attributes appearing in the test sentences. When compared to template-based methods, our approach using ChatGPT for test sentence generation is superior in detecting social bias, especially in challenging settings such as intersectional biases. We present an open-source comprehensive bias testing framework (BiasTestGPT), hosted on HuggingFace, that can be plugged into any open-source PLM for bias testing. User testing with domain experts from various fields has shown their interest in being able to test modern AI for social biases. Our tool has significantly improved their awareness of such biases in PLMs, proving to be learnable and user-friendly. We thus enable seamless open-ended social bias testing of PLMs by domain experts through an automatic large-scale generation of diverse test sentences for any combination of social categories and attributes.
Published: 2023

11. Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions

Author: Kocielnik, Rafal, Kangaslahti, Sara, Prabhumoye, Shrimai, Hari, Meena, Alvarez, R. Michael, and Anandkumar, Anima
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Labeling social-media data for custom dimensions of toxicity and social bias is challenging and labor-intensive. Existing transfer and active learning approaches meant to reduce annotation effort require fine-tuning, which suffers from over-fitting to noise and can cause domain shift with small sample sizes. In this work, we propose a novel Active Transfer Few-shot Instructions (ATF) approach which requires no fine-tuning. ATF leverages the internal linguistic knowledge of pre-trained language models (PLMs) to facilitate the transfer of information from existing pre-labeled datasets (source-domain task) with minimum labeling effort on unlabeled target data (target-domain task). Our strategy can yield positive transfer achieving a mean AUC gain of 10.5% compared to no transfer with a large 22b parameter PLM. We further show that annotation of just a few target-domain samples via active learning can be beneficial for transfer, but the impact diminishes with more annotation effort (26% drop in gain between 100 and 2000 annotated examples). Finally, we find that not all transfer scenarios yield a positive gain, which seems related to the PLMs initial performance on the target-domain task., Comment: Accepted to NeurIPS Workshop on Transfer Learning for Natural Language Processing, 2022, New Orleans
Published: 2022

12. Evaluating Parameter Efficient Learning for Generation

Author: Xu, Peng, Patwary, Mostofa, Prabhumoye, Shrimai, Adams, Virginia, Prenger, Ryan J., Ping, Wei, Lee, Nayeon, Shoeybi, Mohammad, and Catanzaro, Bryan
Subjects: Computer Science - Computation and Language
Abstract: Parameter efficient learning methods (PERMs) have recently gained significant attention as they provide an efficient way for pre-trained language models (PLMs) to adapt to a downstream task. However, these conclusions are mostly drawn from in-domain evaluations over the full training set. In this paper, we present comparisons between PERMs and finetuning from three new perspectives: (1) the effect of sample and model size to in-domain evaluations, (2) generalization to unseen domains and new datasets, and (3) the faithfulness of generations. Our results show that for in-domain settings (a) there is a cross point of sample size for which PERMs will perform better than finetuning when training with fewer samples, and (b) larger PLMs have larger cross points. For cross-domain and cross-dataset cases, we show that (a) Adapter (Houlsby et al., 2019) performs the best amongst all the PERMs studied here, and (b) it outperforms finetuning if the task dataset is below a certain size. We also compare the faithfulness of generations and show that PERMs can achieve better faithfulness score than finetuning, especially for small training set, by as much as 6%. Finally, we apply Adapter to MT-NLG 530b (Smith et al., 2022) and achieve new state-of-the-art results on Xsum (Narayan et al., 2018) for all ROUGE scores (ROUGE-1 49.17, ROUGE-2 27.20, ROUGE-L 40.98)., Comment: Accepted to EMNLP 2022 main conference
Published: 2022

13. Context Generation Improves Open Domain Question Answering

Author: Su, Dan, Patwary, Mostofa, Prabhumoye, Shrimai, Xu, Peng, Prenger, Ryan, Shoeybi, Mohammad, Fung, Pascale, Anandkumar, Anima, and Catanzaro, Bryan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Closed-book question answering (QA) requires a model to directly answer an open-domain question without access to any external knowledge. Prior work on closed-book QA either directly finetunes or prompts a pretrained language model (LM) to leverage the stored knowledge. However, they do not fully exploit the parameterized knowledge. To address this issue, we propose a two-stage, closed-book QA framework which employs a coarse-to-fine approach to extract relevant knowledge and answer a question. Our approach first generates a related context for a given question by prompting a pretrained LM. We then prompt the same LM for answer prediction using the generated context and the question. Additionally, to eliminate failure caused by context uncertainty, we marginalize over generated contexts. Experimental results on three QA benchmarks show that our method significantly outperforms previous closed-book QA methods (e.g. exact matching 68.6% vs. 55.3%), and is on par with open-book methods that exploit external knowledge sources (e.g. 68.6% vs. 68.0%). Our method is able to better exploit the stored knowledge in pretrained LMs without adding extra learnable parameters or needing finetuning, and paves the way for hybrid models that integrate pretrained LMs with external knowledge., Comment: 8 pages; Accepted at EACL2023
Published: 2022

14. Multi-Stage Prompting for Knowledgeable Dialogue Generation

Author: Liu, Zihan, Patwary, Mostofa, Prenger, Ryan, Prabhumoye, Shrimai, Ping, Wei, Shoeybi, Mohammad, and Catanzaro, Bryan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Existing knowledge-grounded dialogue systems typically use finetuned versions of a pretrained language model (LM) and large-scale knowledge bases. These models typically fail to generalize on topics outside of the knowledge base, and require maintaining separate potentially large checkpoints each time finetuning is needed. In this paper, we aim to address these limitations by leveraging the inherent knowledge stored in the pretrained LM as well as its powerful generation ability. We propose a multi-stage prompting approach to generate knowledgeable responses from a single pretrained LM. We first prompt the LM to generate knowledge based on the dialogue context. Then, we further prompt it to generate responses based on the dialogue context and the previously generated knowledge. Results show that our knowledge generator outperforms the state-of-the-art retrieval-based model by 5.8% when combining knowledge relevance and correctness. In addition, our multi-stage prompting outperforms the finetuning-based dialogue model in terms of response knowledgeability and engagement by up to 10% and 5%, respectively. Furthermore, we scale our model up to 530 billion parameters and show that larger LMs improve the generation correctness score by up to 10%, and response relevance, knowledgeability and engagement by up to 10%. Our code is available at: https://github.com/NVIDIA/Megatron-LM.
Published: 2022

15. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Author: Smith, Shaden, Patwary, Mostofa, Norick, Brandon, LeGresley, Patrick, Rajbhandari, Samyam, Casper, Jared, Liu, Zhun, Prabhumoye, Shrimai, Zerveas, George, Korthikanti, Vijay, Zhang, Elton, Child, Rewon, Aminabadi, Reza Yazdani, Bernauer, Julie, Song, Xia, Shoeybi, Mohammad, He, Yuxiong, Houston, Michael, Tiwary, Saurabh, and Catanzaro, Bryan
Subjects: Computer Science - Computation and Language
Abstract: Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and fine-tuning techniques. Because of their success, the size of these models has increased rapidly, requiring high-performance hardware, software, and algorithmic techniques to enable training such large models. As the result of a joint effort between Microsoft and NVIDIA, we present details on the training of the largest monolithic transformer based language model, Megatron-Turing NLG 530B (MT-NLG), with 530 billion parameters. In this paper, we first focus on the infrastructure as well as the 3D parallelism methodology used to train this model using DeepSpeed and Megatron. Next, we detail the training process, the design of our training corpus, and our data curation techniques, which we believe is a key ingredient to the success of the model. Finally, we discuss various evaluation results, as well as other interesting observations and new properties exhibited by MT-NLG. We demonstrate that MT-NLG achieves superior zero-, one-, and few-shot learning accuracies on several NLP benchmarks and establishes new state-of-the-art results. We believe that our contributions will help further the development of large-scale training infrastructures, large-scale language models, and natural language generations., Comment: Shaden Smith and Mostofa Patwary contributed equally
Published: 2022

16. Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

Author: Prabhumoye, Shrimai, Kocielnik, Rafal, Shoeybi, Mohammad, Anandkumar, Anima, and Catanzaro, Bryan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Detecting social bias in text is challenging due to nuance, subjectivity, and difficulty in obtaining good quality labeled datasets at scale, especially given the evolving nature of social biases and society. To address these challenges, we propose a few-shot instruction-based method for prompting pre-trained language models (LMs). We select a few class-balanced exemplars from a small support repository that are closest to the query to be labeled in the embedding space. We then provide the LM with instruction that consists of this subset of labeled exemplars, the query text to be classified, a definition of bias, and prompt it to make a decision. We demonstrate that large LMs used in a few-shot context can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models. We observe that the largest 530B parameter model is significantly more effective in detecting social bias compared to smaller models (achieving at least 13% improvement in AUC metric compared to other models). It also maintains a high AUC (dropping less than 2%) when the labeled repository is reduced to as few as $100$ samples. Large pretrained language models thus make it easier and quicker to build new bias detectors., Comment: Submission revised with new results
Published: 2021

17. Focused Attention Improves Document-Grounded Generation

Author: Prabhumoye, Shrimai, Hashimoto, Kazuma, Zhou, Yingbo, Black, Alan W, and Salakhutdinov, Ruslan
Subjects: Computer Science - Computation and Language
Abstract: Document grounded generation is the task of using the information provided in a document to improve text generation. This work focuses on two different document grounded generation tasks: Wikipedia Update Generation task and Dialogue response generation. Our work introduces two novel adaptations of large scale pre-trained encoder-decoder models focusing on building context driven representation of the document and enabling specific attention to the information in the document. Additionally, we provide a stronger BART baseline for these tasks. Our proposed techniques outperform existing methods on both automated (at least 48% increase in BLEU-4 points) and human evaluation for closeness to reference and relevance to the document. Furthermore, we perform comprehensive manual inspection of the generated output and categorize errors to provide insights into future directions in modeling these tasks., Comment: Accepted at North American Chapter of the Association for Computational Linguistics (NAACL) 2021
Published: 2021

18. CURIE: An Iterative Querying Approach for Reasoning About Situations

Author: Rajagopal, Dheeraj, Madaan, Aman, Tandon, Niket, Yang, Yiming, Prabhumoye, Shrimai, Ravichander, Abhilasha, Clark, Peter, and Hovy, Eduard
Subjects: Computer Science - Computation and Language
Abstract: Recently, models have been shown to predict the effects of unexpected situations, e.g., would cloudy skies help or hinder plant growth? Given a context, the goal of such situational reasoning is to elicit the consequences of a new situation (st) that arises in that context. We propose a method to iteratively build a graph of relevant consequences explicitly in a structured situational graph (st-graph) using natural language queries over a finetuned language model (M). Across multiple domains, CURIE generates st-graphs that humans find relevant and meaningful in eliciting the consequences of a new situation. We show that st-graphs generated by CURIE improve a situational reasoning end task (WIQA-QA) by 3 points on accuracy by simply augmenting their input with our generated situational graphs, especially for a hard subset that requires background knowledge and multi-hop reasoning., Comment: This paper builds upon EIGEN (arXiv:2010.11764) and proposes a general framework for situational reasoning
Published: 2021

19. EIGEN: Event Influence GENeration using Pre-trained Language Models

Author: Madaan, Aman, Rajagopal, Dheeraj, Yang, Yiming, Ravichander, Abhilasha, Hovy, Eduard, and Prabhumoye, Shrimai
Subjects: Computer Science - Computation and Language
Abstract: Reasoning about events and tracking their influences is fundamental to understanding processes. In this paper, we present EIGEN - a method to leverage pre-trained language models to generate event influences conditioned on a context, nature of their influence, and the distance in a reasoning chain. We also derive a new dataset for research and evaluation of methods for event influence generation. EIGEN outperforms strong baselines both in terms of automated evaluation metrics (by 10 ROUGE points) and human judgments on closeness to reference and relevance of generations. Furthermore, we show that the event influences generated by EIGEN improve the performance on a "what-if" Question Answering (WIQA) benchmark (over 3% F1), especially for questions that require background knowledge and multi-hop reasoning.
Published: 2020

20. Case Study: Deontological Ethics in NLP

Author: Prabhumoye, Shrimai, Boldt, Brendon, Salakhutdinov, Ruslan, and Black, Alan W
Subjects: Computer Science - Computation and Language
Abstract: Recent work in natural language processing (NLP) has focused on ethical challenges such as understanding and mitigating bias in data and algorithms; identifying objectionable content like hate speech, stereotypes and offensive language; and building frameworks for better system design and data handling practices. However, there has been little discussion about the ethical foundations that underlie these efforts. In this work, we study one ethical theory, namely deontological ethics, from the perspective of NLP. In particular, we focus on the generalization principle and the respect for autonomy through informed consent. We provide four case studies to demonstrate how these principles can be used with NLP systems. We also recommend directions to avoid the ethical issues in these systems., Comment: Accepted at North American Chapter of the Association for Computational Linguistics (NAACL) 2021
Published: 2020

21. Exploring Controllable Text Generation Techniques

Author: Prabhumoye, Shrimai, Black, Alan W, and Salakhutdinov, Ruslan
Subjects: Computer Science - Computation and Language
Abstract: Neural controllable text generation is an important area gaining attention due to its plethora of applications. Although there is a large body of prior work in controllable text generation, there is no unifying theme. In this work, we provide a new schema of the pipeline of the generation process by classifying it into five modules. The control of attributes in the generation process requires modification of these modules. We present an overview of different techniques used to perform the modulation of these modules. We also provide an analysis on the advantages and disadvantages of these techniques. We further pave ways to develop new architectures based on the combination of the modules described in this paper., Comment: Will be published at COLING 2020
Published: 2020

22. Topological Sort for Sentence Ordering

Author: Prabhumoye, Shrimai, Salakhutdinov, Ruslan, and Black, Alan W
Subjects: Computer Science - Computation and Language
Abstract: Sentence ordering is the task of arranging the sentences of a given text in the correct order. Recent work using deep neural networks for this task has framed it as a sequence prediction problem. In this paper, we propose a new framing of this task as a constraint solving problem and introduce a new technique to solve it. Additionally, we propose a human evaluation for this task. The results on both automatic and human metrics across four different datasets show that this new technique is better at capturing coherence in documents., Comment: Will be published at the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) 2020
Published: 2020

23. Politeness Transfer: A Tag and Generate Approach

Author: Madaan, Aman, Setlur, Amrith, Parekh, Tanmay, Poczos, Barnabas, Neubig, Graham, Yang, Yiming, Salakhutdinov, Ruslan, Black, Alan W, and Prabhumoye, Shrimai
Subjects: Computer Science - Computation and Language
Abstract: This paper introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning. We also provide a dataset of more than 1.39 instances automatically labeled for politeness to encourage benchmark evaluations on this new task. We design a tag and generate pipeline that identifies stylistic attributes and subsequently generates a sentence in the target style while preserving most of the source content. For politeness as well as five other transfer tasks, our model outperforms the state-of-the-art methods on automatic metrics for content preservation, with a comparable or better performance on style transfer accuracy. Additionally, our model surpasses existing methods on human evaluations for grammaticality, meaning preservation and transfer accuracy across all the six style transfer tasks. The data and code is located at https://github.com/tag-and-generate., Comment: To appear at ACL 2020
Published: 2020

24. I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

Author: Prabhumoye, Shrimai, Li, Margaret, Urbanek, Jack, Dinan, Emily, Kiela, Douwe, Weston, Jason, and Szlam, Arthur
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Dialogue research tends to distinguish between chit-chat and goal-oriented tasks. While the former is arguably more naturalistic and has a wider use of language, the latter has clearer metrics and a straightforward learning signal. Humans effortlessly combine the two, for example engaging in chit-chat with the goal of exchanging information or eliciting a specific response. Here, we bridge the divide between these two domains in the setting of a rich multi-player text-based fantasy environment where agents and humans engage in both actions and dialogue. Specifically, we train a goal-oriented model with reinforcement learning against an imitation-learned ``chit-chat'' model with two approaches: the policy either learns to pick a topic or learns to pick an utterance given the top-K utterances from the chit-chat model. We show that both models outperform an inverse model baseline and can converse naturally with their dialogue partner in order to achieve goals.
Published: 2020

25. Modeling Product Search Relevance in e-Commerce

Author: Iyer, Rahul Radhakrishnan, Kohli, Rohan, and Prabhumoye, Shrimai
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: With the rapid growth of e-Commerce, online product search has emerged as a popular and effective paradigm for customers to find desired products and engage in online shopping. However, there is still a big gap between the products that customers really desire to purchase and relevance of products that are suggested in response to a query from the customer. In this paper, we propose a robust way of predicting relevance scores given a search query and a product, using techniques involving machine learning, natural language processing and information retrieval. We compare conventional information retrieval models such as BM25 and Indri with deep learning models such as word2vec, sentence2vec and paragraph2vec. We share some of our insights and findings from our experiments., Comment: 11 pages, 3 figures
Published: 2020

26. Generating Interactive Worlds with Text

Author: Fan, Angela, Urbanek, Jack, Ringshia, Pratik, Dinan, Emily, Qian, Emma, Karamcheti, Siddharth, Prabhumoye, Shrimai, Kiela, Douwe, Rocktaschel, Tim, Szlam, Arthur, and Weston, Jason
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Procedurally generating cohesive and interesting game environments is challenging and time-consuming. In order for the relationships between the game elements to be natural, common-sense has to be encoded into arrangement of the elements. In this work, we investigate a machine learning approach for world creation using content from the multi-player text adventure game environment LIGHT. We introduce neural network based models to compositionally arrange locations, characters, and objects into a coherent whole. In addition to creating worlds based on existing elements, our models can generate new game content. Humans can also leverage our models to interactively aid in worldbuilding. We show that the game environments created with our approach are cohesive, diverse, and preferred by human evaluators compared to other machine learning based world construction algorithms.
Published: 2019

27. Principled Frameworks for Evaluating Ethics in NLP Systems

Author: Prabhumoye, Shrimai, Mayfield, Elijah, and Black, Alan W
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We critique recent work on ethics in natural language processing. Those discussions have focused on data collection, experimental design, and interventions in modeling. But we argue that we ought to first understand the frameworks of ethics that are being used to evaluate the fairness and justice of algorithmic systems. Here, we begin that discussion by outlining deontological ethics, and envision a research agenda prioritized by it.
Published: 2019

28. 'My Way of Telling a Story': Persona based Grounded Story Generation

Author: Prabhumoye, Shrimai, Chandu, Khyathi Raghavi, Salakhutdinov, Ruslan, and Black, Alan W
Subjects: Computer Science - Computation and Language
Abstract: Visual storytelling is the task of generating stories based on a sequence of images. Inspired by the recent works in neural generation focusing on controlling the form of text, this paper explores the idea of generating these stories in different personas. However, one of the main challenges of performing this task is the lack of a dataset of visual stories in different personas. Having said that, there are independent datasets for both visual storytelling and annotated sentences for various persona. In this paper we describe an approach to overcome this by getting labelled persona data from a different task and leveraging those annotations to perform persona based story generation. We inspect various ways of incorporating personality in both the encoder and the decoder representations to steer the generation in the target direction. To this end, we propose five models which are incremental extensions to the baseline model to perform the task at hand. In our experiments we use five different personas to guide the generation process. We find that the models based on our hypotheses perform better at capturing words while generating stories in the target persona.
Published: 2019

29. Towards Content Transfer through Grounded Text Generation

Author: Prabhumoye, Shrimai, Quirk, Chris, and Galley, Michel
Subjects: Computer Science - Computation and Language
Abstract: Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness. However, there has been less work on controlling neural text generation for content. This paper introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. Our experiments on Wikipedia data show significant improvements against competitive baselines. As another contribution of this paper, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.
Published: 2019

30. The Second Conversational Intelligence Challenge (ConvAI2)

Author: Dinan, Emily, Logacheva, Varvara, Malykh, Valentin, Miller, Alexander, Shuster, Kurt, Urbanek, Jack, Kiela, Douwe, Szlam, Arthur, Serban, Iulian, Lowe, Ryan, Prabhumoye, Shrimai, Black, Alan W, Rudnicky, Alexander, Williams, Jason, Pineau, Joelle, Burtsev, Mikhail, and Weston, Jason
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Human-Computer Interaction
Abstract: We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations) -- in terms of repetition, consistency and balance of dialogue acts (e.g. how many questions asked vs. answered).
Published: 2019

31. A Dataset for Document Grounded Conversations

Author: Zhou, Kangyan, Prabhumoye, Shrimai, and Black, Alan W
Subjects: Computer Science - Computation and Language
Abstract: This paper introduces a document grounded dataset for text conversations. We define "Document Grounded Conversations" as conversations that are about the contents of a specified document. In this dataset the specified documents were Wikipedia articles about popular movies. The dataset contains 4112 conversations with an average of 21.43 turns per conversation. This positions this dataset to not only provide a relevant chat history while generating responses but also provide a source of information that the models could use. We describe two neural architectures that provide benchmark performance on the task of generating the next response. We also evaluate our models for engagement and fluency, and find that the information from the document helps in generating more engaging and fluent responses.
Published: 2018

32. Style Transfer Through Multilingual and Feedback-Based Back-Translation

Author: Prabhumoye, Shrimai, Tsvetkov, Yulia, Black, Alan W, and Salakhutdinov, Ruslan
Subjects: Computer Science - Computation and Language
Abstract: Style transfer is the task of transferring an attribute of a sentence (e.g., formality) while maintaining its semantic content. The key challenge in style transfer is to strike a balance between the competing goals, one to preserve meaning and the other to improve the style transfer accuracy. Prior research has identified that the task of meaning preservation is generally harder to attain and evaluate. This paper proposes two extensions of the state-of-the-art style transfer models aiming at improving the meaning preservation in style transfer. Our evaluation shows that these extensions help to ground meaning better while improving the transfer accuracy.
Published: 2018

33. Style Transfer Through Back-Translation

Author: Prabhumoye, Shrimai, Tsvetkov, Yulia, Salakhutdinov, Ruslan, and Black, Alan W
Subjects: Computer Science - Computation and Language
Abstract: Style transfer is the task of rephrasing the text to contain specific stylistic properties without changing the intent or affect within the context. This paper introduces a new method for automatic style transfer. We first learn a latent representation of the input sentence which is grounded in a language translation model in order to better preserve the meaning of the sentence while reducing stylistic properties. Then adversarial generation techniques are used to make the output match the desired style. We evaluate this technique on three different style transformations: sentiment, gender and political slant. Compared to two state-of-the-art style transfer modeling techniques we show improvements both in automatic evaluation of style transfer and in manual evaluation of meaning preservation and fluency., Comment: Accepted at ACL 2018
Published: 2018

34. Linguistic Markers of Influence in Informal Interactions

Author: Prabhumoye, Shrimai, Choudhary, Samridhi, Spiliopoulou, Evangelia, Bogart, Christopher, Rose, Carolyn Penstein, and Black, Alan W
Subjects: Computer Science - Computation and Language, Computer Science - Social and Information Networks
Abstract: There has been a long standing interest in understanding `Social Influence' both in Social Sciences and in Computational Linguistics. In this paper, we present a novel approach to study and measure interpersonal influence in daily interactions. Motivated by the basic principles of influence, we attempt to identify indicative linguistic features of the posts in an online knitting community. We present the scheme used to operationalize and label the posts with indicator features. Experiments with the identified features show an improvement in the classification accuracy of influence by 3.15%. Our results illustrate the important correlation between the characteristics of the language and its potential to influence others., Comment: 10 pages, Accepted in NLP+CSS workshop for ACL (Association for Computational Linguistics) 2017
Published: 2017

35. The Second Conversational Intelligence Challenge (ConvAI2)

Author: Dinan, Emily, Logacheva, Varvara, Malykh, Valentin, Miller, Alexander, Shuster, Kurt, Urbanek, Jack, Kiela, Douwe, Szlam, Arthur, Serban, Iulian, Lowe, Ryan, Prabhumoye, Shrimai, Black, Alan W., Rudnicky, Alexander, Williams, Jason, Pineau, Joelle, Burtsev, Mikhail, Weston, Jason, Escalante, Hugo Jair, Series Editor, Guyon, Isabelle, Series Editor, Escalera, Sergio, Series Editor, and Herbrich, Ralf, editor
Published: 2020
Full Text: View/download PDF

36. AgentKit: Flow Engineering with Graphs, not Coding

Author: Wu, Yue, Fan, Yewen, Min, So Yeon, Prabhumoye, Shrimai, McAleer, Stephen, Bisk, Yonatan, Salakhutdinov, Ruslan, Li, Yuanzhi, Mitchell, Tom, Wu, Yue, Fan, Yewen, Min, So Yeon, Prabhumoye, Shrimai, McAleer, Stephen, Bisk, Yonatan, Salakhutdinov, Ruslan, Li, Yuanzhi, and Mitchell, Tom
Abstract: We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offers a unified framework for explicitly constructing a complex "thought process" from simple natural language prompts. The basic building block in AgentKit is a node, containing a natural language prompt for a specific subtask. The user then puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process". For example, for the task of writing a paper, one may start with the thought process of 1) identify a core message, 2) identify prior research gaps, etc. The nodes in AgentKit can be designed and combined in different ways to implement multiple advanced capabilities including on-the-fly hierarchical planning, reflection, and learning from interactions. In addition, due to the modular nature and the intuitive design to simulate explicit human thought process, a basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience. Quantitatively, we show that agents designed through AgentKit achieve SOTA performance on WebShop and Crafter. These advances underscore AgentKit's potential in making LLM agents effective and accessible for a wider range of applications. https://github.com/holmeswww/AgentKit
Published: 2024

37. Introduction to NIPS 2017 Competition Track

Author: Escalera, Sergio, Weimer, Markus, Burtsev, Mikhail, Malykh, Valentin, Logacheva, Varvara, Lowe, Ryan, Serban, Iulian Vlad, Bengio, Yoshua, Rudnicky, Alexander, Black, Alan W., Prabhumoye, Shrimai, Kidziński, Łukasz, Mohanty, Sharada Prasanna, Ong, Carmichael F., Hicks, Jennifer L., Levine, Sergey, Salathé, Marcel, Delp, Scott, Huerga, Iker, Grigorenko, Alexander, Thorbergsson, Leifur, Das, Anasuya, Nemitz, Kyla, Sandker, Jenna, King, Stephen, Ecker, Alexander S., Gatys, Leon A., Bethge, Matthias, Boyd-Graber, Jordan, Feng, Shi, Rodriguez, Pedro, Iyyer, Mohit, He, He, Daumé, Hal, III, McGregor, Sean, Banifatemi, Amir, Kurakin, Alexey, Goodfellow, Ian, Bengio, Samy, Escalante, Hugo Jair, Series Editor, Guyon, Isabelle, Series Editor, Escalera, Sergio, Series Editor, and Weimer, Markus, editor
Published: 2018
Full Text: View/download PDF

38. The First Conversational Intelligence Challenge

Author: Burtsev, Mikhail, Logacheva, Varvara, Malykh, Valentin, Serban, Iulian Vlad, Lowe, Ryan, Prabhumoye, Shrimai, Black, Alan W., Rudnicky, Alexander, Bengio, Yoshua, Escalante, Hugo Jair, Series Editor, Guyon, Isabelle, Series Editor, Escalera, Sergio, Series Editor, and Weimer, Markus, editor
Published: 2018
Full Text: View/download PDF

39. The Second Conversational Intelligence Challenge (ConvAI2)

Author: Dinan, Emily, primary, Logacheva, Varvara, additional, Malykh, Valentin, additional, Miller, Alexander, additional, Shuster, Kurt, additional, Urbanek, Jack, additional, Kiela, Douwe, additional, Szlam, Arthur, additional, Serban, Iulian, additional, Lowe, Ryan, additional, Prabhumoye, Shrimai, additional, Black, Alan W., additional, Rudnicky, Alexander, additional, Williams, Jason, additional, Pineau, Joelle, additional, Burtsev, Mikhail, additional, and Weston, Jason, additional
Published: 2019
Full Text: View/download PDF

40. AutoBiasTest: Controllable Sentence Generation for Automated and Open-Ended Social Bias Testing in Language Models

Author: Kocielnik, Rafal, Prabhumoye, Shrimai, Zhang, Vivian, Alvarez, R. Michael, Anandkumar, Anima, Kocielnik, Rafal, Prabhumoye, Shrimai, Zhang, Vivian, Alvarez, R. Michael, and Anandkumar, Anima
Abstract: Social bias in Pretrained Language Models (PLMs) affects text generation and other downstream NLP tasks. Existing bias testing methods rely predominantly on manual templates or on expensive crowd-sourced data. We propose a novel AutoBiasTest method that automatically generates sentences for testing bias in PLMs, hence providing a flexible and low-cost alternative. Our approach uses another PLM for generation and controls the generation of sentences by conditioning on social group and attribute terms. We show that generated sentences are natural and similar to human-produced content in terms of word length and diversity. We illustrate that larger models used for generation produce estimates of social bias with lower variance. We find that our bias scores are well correlated with manual templates, but AutoBiasTest highlights biases not captured by these templates due to more diverse and realistic test sentences. By automating large-scale test sentence generation, we enable better estimation of underlying bias distributions.
Published: 2023

41. Context Generation Improves Open Domain Question Answering

Author: Su, Dan, Patwary, Mostofa, Prabhumoye, Shrimai, Xu, Peng, Prenger, Ryan, Shoeybi, Mohammad, Fung, Pascale Ngan, Anandkumar, Anima, Catanzaro, Bryan, Su, Dan, Patwary, Mostofa, Prabhumoye, Shrimai, Xu, Peng, Prenger, Ryan, Shoeybi, Mohammad, Fung, Pascale Ngan, Anandkumar, Anima, and Catanzaro, Bryan
Abstract: Closed-book question answering (QA) requires a model to directly answer an open-domain question without access to any external knowledge. Prior work on closed-book QA either directly finetunes or prompts a pretrained language model (LM) to leverage the stored knowledge. However, they do not fully exploit the parameterized knowledge. To address this inefficiency, we propose a two-stage, closed-book QA framework which employs a coarse-to-fine approach to extract the relevant knowledge and answer a question. We first generate a related context for a given question by prompting a pretrained LM. We then prompt the same LM to generate an answer using the generated context and the question. Additionally, we marginalize over the generated contexts to improve the accuracies and reduce context uncertainty. Experimental results on three QA benchmarks show that our method significantly outperforms previous closed-book QA methods. For example on TriviaQA, our method improves exact match accuracy from 55.3% to 68.6%, and is on par with open-book QA methods (68.6% vs. 68.0%). Our results show that our new methodology is able to better exploit the stored knowledge in pretrained LMs without adding extra learnable parameters or needing finetuning, and paves the way for hybrid models that integrate pretrained LMs with external knowledge. © 2023 Association for Computational Linguistics.
Published: 2023

42. A Prototype of an Intelligent Search Engine Using Machine Learning Based Training for Learning to Rank

Author: Rai, Piyush, Prabhumoye, Shrimai, Khattri, Pranay, Sandhu, Love Rose Singh, Sowmya Kamath, S., Howlett, Robert J., Series editor, Jain, Lakhmi C., Series editor, Kumar Kundu, Malay, editor, Mohapatra, Durga Prasad, editor, Konar, Amit, editor, and Chakraborty, Aruna, editor
Published: 2014
Full Text: View/download PDF

43. The First Conversational Intelligence Challenge

Author: Burtsev, Mikhail, primary, Logacheva, Varvara, additional, Malykh, Valentin, additional, Serban, Iulian Vlad, additional, Lowe, Ryan, additional, Prabhumoye, Shrimai, additional, Black, Alan W., additional, Rudnicky, Alexander, additional, and Bengio, Yoshua, additional
Published: 2018
Full Text: View/download PDF

44. Introduction to NIPS 2017 Competition Track

Author: Escalera, Sergio, primary, Weimer, Markus, additional, Burtsev, Mikhail, additional, Malykh, Valentin, additional, Logacheva, Varvara, additional, Lowe, Ryan, additional, Serban, Iulian Vlad, additional, Bengio, Yoshua, additional, Rudnicky, Alexander, additional, Black, Alan W., additional, Prabhumoye, Shrimai, additional, Kidziński, Łukasz, additional, Mohanty, Sharada Prasanna, additional, Ong, Carmichael F., additional, Hicks, Jennifer L., additional, Levine, Sergey, additional, Salathé, Marcel, additional, Delp, Scott, additional, Huerga, Iker, additional, Grigorenko, Alexander, additional, Thorbergsson, Leifur, additional, Das, Anasuya, additional, Nemitz, Kyla, additional, Sandker, Jenna, additional, King, Stephen, additional, Ecker, Alexander S., additional, Gatys, Leon A., additional, Bethge, Matthias, additional, Boyd-Graber, Jordan, additional, Feng, Shi, additional, Rodriguez, Pedro, additional, Iyyer, Mohit, additional, He, He, additional, Daumé, Hal, additional, McGregor, Sean, additional, Banifatemi, Amir, additional, Kurakin, Alexey, additional, Goodfellow, Ian, additional, and Bengio, Samy, additional
Published: 2018
Full Text: View/download PDF

45. SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning

Author: Wu, Yue, Prabhumoye, Shrimai, Min, So Yeon, Bisk, Yonatan, Salakhutdinov, Ruslan, Azaria, Amos, Mitchell, Tom, and Li, Yuanzhi
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: Open-world survival games pose significant challenges for AI algorithms due to their multi-tasking, deep exploration, and goal prioritization requirements. Despite reinforcement learning (RL) being popular for solving games, its high sample complexity limits its effectiveness in complex open-world games like Crafter or Minecraft. We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM). Prompted with the LaTeX source as game context and a description of the agent's current observation, our SPRING framework employs a directed acyclic graph (DAG) with game-related questions as nodes and dependencies as edges. We identify the optimal action to take in the environment by traversing the DAG and calculating LLM responses for each node in topological order, with the LLM's answer to final node directly translating to environment actions. In our experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RL baselines, trained for 1M steps, without any training. Finally, we show the potential of games as a test bed for LLMs.
Published: 2023
Full Text: View/download PDF

46. Multi-Stage Prompting for Knowledgeable Dialogue Generation

Author: Liu, Zihan, primary, Patwary, Mostofa, additional, Prenger, Ryan, additional, Prabhumoye, Shrimai, additional, Ping, Wei, additional, Shoeybi, Mohammad, additional, and Catanzaro, Bryan, additional
Published: 2022
Full Text: View/download PDF

47. Evaluating Parameter Efficient Learning for Generation

Author: Xu, Peng, primary, Patwary, Mostofa, additional, Prabhumoye, Shrimai, additional, Adams, Virginia, additional, Prenger, Ryan, additional, Ping, Wei, additional, Lee, Nayeon, additional, Shoeybi, Mohammad, additional, and Catanzaro, Bryan, additional
Published: 2022
Full Text: View/download PDF

48. CURIE: An Iterative Querying Approach for Reasoning About Situations

Author: Rajagopal, Dheeraj, primary, Madaan, Aman, additional, Tandon, Niket, additional, Yang, Yiming, additional, Prabhumoye, Shrimai, additional, Ravichander, Abhilasha, additional, Clark, Peter, additional, and Hovy, Eduard H, additional
Published: 2022
Full Text: View/download PDF

49. Five sources of bias in natural language processing

Author: Hovy, Dirk, primary and Prabhumoye, Shrimai, additional
Published: 2021
Full Text: View/download PDF

50. Guiding Principles for Participatory Design-inspired Natural Language Processing

Author: Caselli, Tommaso, Cibin, Roberto, Conforti, Costanza, Encinas, Enrique, Teli, Maurizio, Field, Anjalie, Prabhumoye, Shrimai, Sap, Maarten, Jin, Zhijing, Zhao, Jieyu, Brockett, Chris, and Computational Linguistics (CL)
Subjects: Guiding Principles, Computer science, business.industry, Process (engineering), Context (language use), computer.software_genre, Identification (information), Software deployment, Participatory design, Reflexivity, ethical nlp, Artificial intelligence, participatory design, natural language processing, business, computer, Natural language, Natural language processing
Abstract: We introduce 9 guiding principles to integrate Participatory Design (PD) methods in the development of Natural Language Processing (NLP) systems. The adoption of PD methods by NLP will help to alleviate issues concerning the development of more democratic, fairer, less-biased technologies to process natural language data. This short paper is the outcome of an ongoing dialogue between designers and NLP experts and adopts a non-standard format following previous work by Traum (2000); Bender (2013); Abzianidze and Bos (2019). Every section is a guiding principle. While principles 1–3 illustrate assumptions and methods that inform community-based PD practices, we used two fictional design scenarios (Encinas and Blythe, 2018), which build on top of situations familiar to the authors, to elicit the identification of the other 6. Principles 4–6 describes the impact of PD methods on the design of NLP systems, targeting two critical aspects: data collection & annotation, and the deployment & evaluation. Finally, principles 7–9 guide a new reflexivity of the NLP research with respect to its context, actors and participants, and aims. We hope this guide will offer inspiration and a road-map to develop a new generation of PD-inspired NLP. We introduce 9 guiding principles to integrate Participatory Design (PD) methods in the development of Natural Language Processing (NLP) systems. The adoption of PD methods by NLP will help to alleviate issues concerning the development of more democratic, fairer, less-biased technologies to process natural language data. This short paper is the outcome of an ongoing dialogue between designers and NLP experts and adopts a non-standard format following previous work by Traum (2000); Bender (2013); Abzianidze and Bos (2019). Every section is a guiding principle. While principles 1–3 illustrate assumptions and methods that inform community-based PD practices, we used two fictional design scenarios (Encinas and Blythe, 2018), which build on top of situations familiar to the authors, to elicit the identification of the other 6. Principles 4–6 describes the impact of PD methods on the design of NLP systems, targeting two critical aspects: data collection & annotation, and the deployment & evaluation. Finally, principles 7–9 guide a new reflexivity of the NLP research with respect to its context, actors and participants, and aims. We hope this guide will offer inspiration and a road-map to develop a new generation of PD-inspired NLP
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

117 results on '"Prabhumoye, Shrimai"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources