Author: "Prabhumoye, Shrimai" / Language: english - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Prabhumoye, Shrimai"' showing total 14 results

Start Over Author "Prabhumoye, Shrimai" Language english

14 results on '"Prabhumoye, Shrimai"'

1. Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents

Author: Wu, Yue, Min, So Yeon, Bisk, Yonatan, Salakhutdinov, Ruslan, Azaria, Amos, Li, Yuanzhi, Mitchell, Tom, and Prabhumoye, Shrimai
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Pre-trained large language models (LLMs) capture procedural knowledge about the world. Recent work has leveraged LLM's ability to generate abstract plans to simplify challenging control tasks, either by action scoring, or action modeling (fine-tuning). However, the transformer architecture inherits several constraints that make it difficult for the LLM to directly serve as the agent: e.g. limited input lengths, fine-tuning inefficiency, bias from pre-training, and incompatibility with non-text environments. To maintain compatibility with a low-level trainable actor, we propose to instead use the knowledge in LLMs to simplify the control problem, rather than solving it. We propose the Plan, Eliminate, and Track (PET) framework. The Plan module translates a task description into a list of high-level sub-tasks. The Eliminate module masks out irrelevant objects and receptacles from the observation for the current sub-task. Finally, the Track module determines whether the agent has accomplished each sub-task. On the AlfWorld instruction following benchmark, the PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
Published: 2023

2. Self-Refine: Iterative Refinement with Self-Feedback

Author: Madaan, Aman, Tandon, Niket, Gupta, Prakhar, Hallinan, Skyler, Gao, Luyu, Wiegreffe, Sarah, Alon, Uri, Dziri, Nouha, Prabhumoye, Shrimai, Yang, Yiming, Gupta, Shashank, Majumder, Bodhisattwa Prasad, Hermann, Katherine, Welleck, Sean, Yazdanbakhsh, Amir, and Clark, Peter
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL), Machine Learning (cs.LG)
Abstract: Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it to refine itself, iteratively. Self-Refine does not require any supervised training data, additional training, or reinforcement learning, and instead uses a single LLM as the generator, refiner, and feedback provider. We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3.5, ChatGPT, and GPT-4) LLMs. Across all evaluated tasks, outputs generated with Self-Refine are preferred by humans and automatic metrics over those generated with the same LLM using conventional one-step generation, improving by ~20% absolute on average in task performance. Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test time using our simple, standalone approach., Code, data, and demo at https://selfrefine.info/
Published: 2023

3. Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Author: Prabhumoye, Shrimai, Patwary, Mostofa, Shoeybi, Mohammad, and Catanzaro, Bryan
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Pretrained large language models have become indispensable for solving various natural language processing (NLP) tasks. However, safely deploying them in real world applications is challenging because they generate toxic content. To address this challenge, we propose two novel pretraining data augmentation strategies that significantly reduce model toxicity without compromising its utility. Our two strategies are: (1) MEDA: adds raw toxicity score as meta-data to the pretraining samples, and (2) INST: adds instructions to those samples indicating their toxicity. Our results indicate that our best performing strategy (INST) substantially reduces the toxicity probability up to 61% while preserving the accuracy on five benchmark NLP tasks as well as improving AUC scores on four bias detection tasks by 1.3%. We also demonstrate the generalizability of our techniques by scaling the number of training samples and the number of model parameters., This paper will be presented at EACL 2023
Published: 2023

4. AutoBiasTest: Controllable Sentence Generation for Automated and Open-Ended Social Bias Testing in Language Models

Author: Kocielnik, Rafal, Prabhumoye, Shrimai, Zhang, Vivian, Alvarez, R. Michael, and Anandkumar, Anima
Subjects: FOS: Computer and information sciences, Computer Science - Computers and Society, Computer Science - Computation and Language, 68T50, I.2.7, Computers and Society (cs.CY), J.5, Computation and Language (cs.CL), K.4.1
Abstract: Social bias in Pretrained Language Models (PLMs) affects text generation and other downstream NLP tasks. Existing bias testing methods rely predominantly on manual templates or on expensive crowd-sourced data. We propose a novel AutoBiasTest method that automatically generates sentences for testing bias in PLMs, hence providing a flexible and low-cost alternative. Our approach uses another PLM for generation and controls the generation of sentences by conditioning on social group and attribute terms. We show that generated sentences are natural and similar to human-produced content in terms of word length and diversity. We illustrate that larger models used for generation produce estimates of social bias with lower variance. We find that our bias scores are well correlated with manual templates, but AutoBiasTest highlights biases not captured by these templates due to more diverse and realistic test sentences. By automating large-scale test sentence generation, we enable better estimation of underlying bias distributions
Published: 2023

5. Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions

Author: Kocielnik, Rafal, Kangaslahti, Sara, Prabhumoye, Shrimai, Hari, Meena, Alvarez, R. Michael, and Anandkumar, Anima
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Labeling social-media data for custom dimensions of toxicity and social bias is challenging and labor-intensive. Existing transfer and active learning approaches meant to reduce annotation effort require fine-tuning, which suffers from over-fitting to noise and can cause domain shift with small sample sizes. In this work, we propose a novel Active Transfer Few-shot Instructions (ATF) approach which requires no fine-tuning. ATF leverages the internal linguistic knowledge of pre-trained language models (PLMs) to facilitate the transfer of information from existing pre-labeled datasets (source-domain task) with minimum labeling effort on unlabeled target data (target-domain task). Our strategy can yield positive transfer achieving a mean AUC gain of 10.5% compared to no transfer with a large 22b parameter PLM. We further show that annotation of just a few target-domain samples via active learning can be beneficial for transfer, but the impact diminishes with more annotation effort (26% drop in gain between 100 and 2000 annotated examples). Finally, we find that not all transfer scenarios yield a positive gain, which seems related to the PLMs initial performance on the target-domain task., Comment: Accepted to NeurIPS Workshop on Transfer Learning for Natural Language Processing, 2022, New Orleans
Published: 2022

6. Context Generation Improves Open Domain Question Answering

Author: Su, Dan, Patwary, Mostofa, Prabhumoye, Shrimai, Xu, Peng, Prenger, Ryan, Shoeybi, Mohammad, Fung, Pascale, Anandkumar, Anima, and Catanzaro, Bryan
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Closed-book question answering (QA) requires a model to directly answer an open-domain question without access to any external knowledge. Prior work on closed-book QA either directly finetunes or prompts a pretrained language model (LM) to leverage the stored knowledge. However, they do not fully exploit the parameterized knowledge. To address this issue, we propose a two-stage, closed-book QA framework which employs a coarse-to-fine approach to extract relevant knowledge and answer a question. Our approach first generates a related context for a given question by prompting a pretrained LM. We then prompt the same LM for answer prediction using the generated context and the question. Additionally, to eliminate failure caused by context uncertainty, we marginalize over generated contexts. Experimental results on three QA benchmarks show that our method significantly outperforms previous closed-book QA methods (e.g. exact matching 68.6% vs. 55.3%), and is on par with open-book methods that exploit external knowledge sources (e.g. 68.6% vs. 68.0%). Our method is able to better exploit the stored knowledge in pretrained LMs without adding extra learnable parameters or needing finetuning, and paves the way for hybrid models that integrate pretrained LMs with external knowledge., 8 pages; Accepted at EACL2023
Published: 2022

7. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Author: Smith, Shaden, Patwary, Mostofa, Norick, Brandon, LeGresley, Patrick, Rajbhandari, Samyam, Casper, Jared, Liu, Zhun, Prabhumoye, Shrimai, Zerveas, George, Korthikanti, Vijay, Zhang, Elton, Child, Rewon, Aminabadi, Reza Yazdani, Bernauer, Julie, Song, Xia, Shoeybi, Mohammad, He, Yuxiong, Houston, Michael, Tiwary, Saurabh, and Catanzaro, Bryan
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and fine-tuning techniques. Because of their success, the size of these models has increased rapidly, requiring high-performance hardware, software, and algorithmic techniques to enable training such large models. As the result of a joint effort between Microsoft and NVIDIA, we present details on the training of the largest monolithic transformer based language model, Megatron-Turing NLG 530B (MT-NLG), with 530 billion parameters. In this paper, we first focus on the infrastructure as well as the 3D parallelism methodology used to train this model using DeepSpeed and Megatron. Next, we detail the training process, the design of our training corpus, and our data curation techniques, which we believe is a key ingredient to the success of the model. Finally, we discuss various evaluation results, as well as other interesting observations and new properties exhibited by MT-NLG. We demonstrate that MT-NLG achieves superior zero-, one-, and few-shot learning accuracies on several NLP benchmarks and establishes new state-of-the-art results. We believe that our contributions will help further the development of large-scale training infrastructures, large-scale language models, and natural language generations., Shaden Smith and Mostofa Patwary contributed equally
Published: 2022

8. Guiding Principles for Participatory Design-inspired Natural Language Processing

Author: Caselli, Tommaso, Cibin, Roberto, Conforti, Costanza, Encinas, Enrique, Teli, Maurizio, Field, Anjalie, Prabhumoye, Shrimai, Sap, Maarten, Jin, Zhijing, Zhao, Jieyu, Brockett, Chris, and Computational Linguistics (CL)
Subjects: Guiding Principles, Computer science, business.industry, Process (engineering), Context (language use), computer.software_genre, Identification (information), Software deployment, Participatory design, Reflexivity, ethical nlp, Artificial intelligence, participatory design, natural language processing, business, computer, Natural language, Natural language processing
Abstract: We introduce 9 guiding principles to integrate Participatory Design (PD) methods in the development of Natural Language Processing (NLP) systems. The adoption of PD methods by NLP will help to alleviate issues concerning the development of more democratic, fairer, less-biased technologies to process natural language data. This short paper is the outcome of an ongoing dialogue between designers and NLP experts and adopts a non-standard format following previous work by Traum (2000); Bender (2013); Abzianidze and Bos (2019). Every section is a guiding principle. While principles 1–3 illustrate assumptions and methods that inform community-based PD practices, we used two fictional design scenarios (Encinas and Blythe, 2018), which build on top of situations familiar to the authors, to elicit the identification of the other 6. Principles 4–6 describes the impact of PD methods on the design of NLP systems, targeting two critical aspects: data collection & annotation, and the deployment & evaluation. Finally, principles 7–9 guide a new reflexivity of the NLP research with respect to its context, actors and participants, and aims. We hope this guide will offer inspiration and a road-map to develop a new generation of PD-inspired NLP. We introduce 9 guiding principles to integrate Participatory Design (PD) methods in the development of Natural Language Processing (NLP) systems. The adoption of PD methods by NLP will help to alleviate issues concerning the development of more democratic, fairer, less-biased technologies to process natural language data. This short paper is the outcome of an ongoing dialogue between designers and NLP experts and adopts a non-standard format following previous work by Traum (2000); Bender (2013); Abzianidze and Bos (2019). Every section is a guiding principle. While principles 1–3 illustrate assumptions and methods that inform community-based PD practices, we used two fictional design scenarios (Encinas and Blythe, 2018), which build on top of situations familiar to the authors, to elicit the identification of the other 6. Principles 4–6 describes the impact of PD methods on the design of NLP systems, targeting two critical aspects: data collection & annotation, and the deployment & evaluation. Finally, principles 7–9 guide a new reflexivity of the NLP research with respect to its context, actors and participants, and aims. We hope this guide will offer inspiration and a road-map to develop a new generation of PD-inspired NLP
Published: 2021

9. I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

Author: Prabhumoye, Shrimai, Li, Margaret, Urbanek, Jack, Dinan, Emily, Kiela, Douwe, Weston, Jason, and Szlam, Arthur
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), InformationSystems_MODELSANDPRINCIPLES, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Statistics - Machine Learning, Machine Learning (stat.ML), Computation and Language (cs.CL)
Abstract: Dialogue research tends to distinguish between chit-chat and goal-oriented tasks. While the former is arguably more naturalistic and has a wider use of language, the latter has clearer metrics and a straightforward learning signal. Humans effortlessly combine the two, for example engaging in chit-chat with the goal of exchanging information or eliciting a specific response. Here, we bridge the divide between these two domains in the setting of a rich multi-player text-based fantasy environment where agents and humans engage in both actions and dialogue. Specifically, we train a goal-oriented model with reinforcement learning against an imitation-learned ``chit-chat'' model with two approaches: the policy either learns to pick a topic or learns to pick an utterance given the top-K utterances from the chit-chat model. We show that both models outperform an inverse model baseline and can converse naturally with their dialogue partner in order to achieve goals.
Published: 2020

10. Principled Frameworks for Evaluating Ethics in NLP Systems

Author: Prabhumoye, Shrimai, Mayfield, Elijah, and Black, Alan W
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, ComputingMilieux_THECOMPUTINGPROFESSION, Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: We critique recent work on ethics in natural language processing. Those discussions have focused on data collection, experimental design, and interventions in modeling. But we argue that we ought to first understand the frameworks of ethics that are being used to evaluate the fairness and justice of algorithmic systems. Here, we begin that discussion by outlining deontological ethics, and envision a research agenda prioritized by it.
Published: 2019

11. The Second Conversational Intelligence Challenge (ConvAI2)

Author: Dinan, Emily, Logacheva, Varvara, Malykh, Valentin, Miller, Alexander, Shuster, Kurt, Urbanek, Jack, Kiela, Douwe, Szlam, Arthur, Serban, Iulian, Lowe, Ryan, Prabhumoye, Shrimai, Black, Alan W, Rudnicky, Alexander, Williams, Jason, Pineau, Joelle, Burtsev, Mikhail, and Weston, Jason
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Computation and Language (cs.CL), Human-Computer Interaction (cs.HC)
Abstract: We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations) -- in terms of repetition, consistency and balance of dialogue acts (e.g. how many questions asked vs. answered).
Published: 2019

12. Five sources of bias in natural language processing.

Author: Hovy, Dirk and Prabhumoye, Shrimai
Subjects: NATURAL language processing, EXPERIMENTAL design
Abstract: Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much of the recent work has focused on describing bias and providing an overview of bias in a larger context. Here, we provide a simple, actionable summary of this recent work. We outline five sources where bias can occur in NLP systems: (1) the data, (2) the annotation process, (3) the input representations, (4) the models, and finally (5) the research design (or how we conceptualize our research). We explore each of the bias sources in detail in this article, including examples and links to related work, as well as potential counter‐measures. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

13. Automated query analysis techniques for semantics based question answering system.

Author: Prabhumoye, Shrimai, Rai, Piyush, Sandhu, Loverose S., Priya L, and Sowmya Kamath S
Published: 2014
Full Text: View/download PDF

14. A Prototype of an Intelligent Search Engine Using Machine Learning Based Training for Learning to Rank.

Author: Rai, Piyush, Prabhumoye, Shrimai, Khattri, Pranay, Sandhu, Love Rose Singh, and Sowmya Kamath, S.
Published: 2014
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

14 results on '"Prabhumoye, Shrimai"'

1. Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents

2. Self-Refine: Iterative Refinement with Self-Feedback

3. Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

4. AutoBiasTest: Controllable Sentence Generation for Automated and Open-Ended Social Bias Testing in Language Models

5. Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions

6. Context Generation Improves Open Domain Question Answering

7. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

8. Guiding Principles for Participatory Design-inspired Natural Language Processing

9. I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

10. Principled Frameworks for Evaluating Ethics in NLP Systems

11. The Second Conversational Intelligence Challenge (ConvAI2)

12. Five sources of bias in natural language processing.

13. Automated query analysis techniques for semantics based question answering system.

14. A Prototype of an Intelligent Search Engine Using Machine Learning Based Training for Learning to Rank.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Journal

Database

Publisher

14 results on '"Prabhumoye, Shrimai"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources