Author: "Wilie, Bryan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wilie, Bryan"' showing total 39 results

Start Over Author "Wilie, Bryan"

39 results on '"Wilie, Bryan"'

1. WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

Author: Winata, Genta Indra, Hudi, Frederikus, Irawan, Patrick Amadeus, Anugraha, David, Putri, Rifki Afina, Wang, Yutong, Nohejl, Adam, Prathama, Ubaidillah Ariq, Ousidhoum, Nedjma, Amriani, Afifa, Rzayev, Anar, Das, Anirban, Pramodya, Ashmari, Adila, Aulia, Wilie, Bryan, Mawalim, Candy Olivia, Cheng, Ching Lam, Abolade, Daud, Chersoni, Emmanuele, Santus, Enrico, Ikhwantri, Fariz, Kuwanto, Garry, Zhao, Hanyang, Wibowo, Haryo Akbarianto, Lovenia, Holy, Cruz, Jan Christian Blaise, Putra, Jan Wira Gotama, Myung, Junho, Susanto, Lucky, Machin, Maria Angelica Riera, Zhukova, Marina, Anugraha, Michael, Adilazuarda, Muhammad Farid, Santosa, Natasha, Limkonchotiwat, Peerat, Dabre, Raj, Audino, Rio Alexander, Cahyawijaya, Samuel, Zhang, Shi-Xiong, Salim, Stephanie Yulia, Zhou, Yi, Gui, Yinxuan, Adelani, David Ifeoluwa, Lee, En-Shiun Annie, Okada, Shogo, Purwarianti, Ayu, Aji, Alham Fikri, Watanabe, Taro, Wijaya, Derry Tanti, Oh, Alice, and Ngo, Chong-Wah
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark to date. It includes tasks for identifying dish names and their origins. We provide evaluation datasets in two sizes (12k and 60k instances) alongside a training dataset (1 million instances). Our findings show that while VLMs perform better with correct location context, they struggle with adversarial contexts and predicting specific regional cuisines and languages. To support future research, we release a knowledge base with annotated food entries and images along with the VQA data., Comment: Preprint
Published: 2024

2. LLM Internal States Reveal Hallucination Risk Faced With a Query

Author: Ji, Ziwei, Chen, Delong, Ishii, Etsuko, Cahyawijaya, Samuel, Bang, Yejin, Wilie, Bryan, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: The hallucination problem of Large Language Models (LLMs) significantly limits their reliability and trustworthiness. Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. Inspired by this, our paper investigates whether LLMs can estimate their own hallucination risk before response generation. We analyze the internal mechanisms of LLMs broadly both in terms of training data sources and across 15 diverse Natural Language Generation (NLG) tasks, spanning over 700 datasets. Our empirical analysis reveals two key insights: (1) LLM internal states indicate whether they have seen the query in training data or not; and (2) LLM internal states show they are likely to hallucinate or not regarding the query. Our study explores particular neurons, activation layers, and tokens that play a crucial role in the LLM perception of uncertainty and hallucination risk. By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32\% at run time.
Published: 2024

3. Belief Revision: The Adaptability of Large Language Models Reasoning

Author: Wilie, Bryan, Cahyawijaya, Samuel, Ishii, Etsuko, He, Junxian, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: The capability to reason from text is crucial for real-world NLP applications. Real-world scenarios often involve incomplete or evolving data. In response, individuals update their beliefs and understandings accordingly. However, most existing evaluations assume that language models (LMs) operate with consistent information. We introduce Belief-R, a new dataset designed to test LMs' belief revision ability when presented with new evidence. Inspired by how humans suppress prior inferences, this task assesses LMs within the newly proposed delta reasoning ($\Delta R$) framework. Belief-R features sequences of premises designed to simulate scenarios where additional information could necessitate prior conclusions drawn by LMs. We evaluate $\sim$30 LMs across diverse prompting strategies and found that LMs generally struggle to appropriately revise their beliefs in response to new information. Further, models adept at updating often underperformed in scenarios without necessary updates, highlighting a critical trade-off. These insights underscore the importance of improving LMs' adaptiveness to changing information, a step toward more reliable AI systems.
Published: 2024

4. High-Dimension Human Value Representation in Large Language Models

Author: Cahyawijaya, Samuel, Chen, Delong, Bang, Yejin, Khalatbari, Leila, Wilie, Bryan, Ji, Ziwei, Ishii, Etsuko, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The widespread application of Large Language Models (LLMs) across various tasks and fields has necessitated the alignment of these models with human values and preferences. Given various approaches of human value alignment, ranging from Reinforcement Learning with Human Feedback (RLHF), to constitutional learning, etc. there is an urgent need to understand the scope and nature of human values injected into these models before their release. There is also a need for model alignment without a costly large scale human annotation effort. We propose UniVaR, a high-dimensional representation of human value distributions in LLMs, orthogonal to model architecture and training data. Trained from the value-relevant output of eight multilingual LLMs and tested on the output from four multilingual LLMs, namely LlaMA2, ChatGPT, JAIS and Yi, we show that UniVaR is a powerful tool to compare the distribution of human values embedded in different LLMs with different langauge sources. Through UniVaR, we explore how different LLMs prioritize various values in different languages and cultures, shedding light on the complex interplay between human values and language modeling.
Published: 2024

5. Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages

Author: Cahyawijaya, Samuel, Lovenia, Holy, Koto, Fajri, Putri, Rifki Afina, Dave, Emmanuel, Lee, Jhonson, Shadieq, Nuur, Cenggoro, Wawan, Akbar, Salsabil Maulana, Mahendra, Muhammad Ihza, Putri, Dea Annisayanti, Wilie, Bryan, Winata, Genta Indra, Aji, Alham Fikri, Purwarianti, Ayu, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) show remarkable human-like capability in various domains and languages. However, a notable quality gap arises in low-resource languages, e.g., Indonesian indigenous languages, rendering them ineffective and inefficient in such linguistic contexts. To bridge this quality gap, we introduce Cendol, a collection of Indonesian LLMs encompassing both decoder-only and encoder-decoder architectures across a range of model sizes. We highlight Cendol's effectiveness across a diverse array of tasks, attaining 20% improvement, and demonstrate its capability to generalize to unseen tasks and indigenous languages of Indonesia. Furthermore, Cendol models showcase improved human favorability despite their limitations in capturing indigenous knowledge and cultural values in Indonesia. In addition, we discuss the shortcomings of parameter-efficient tunings, such as LoRA, for language adaptation. Alternatively, we propose the usage of vocabulary adaptation to enhance efficiency. Lastly, we evaluate the safety of Cendol and showcase that safety in pre-training in one language such as English is transferable to low-resource languages, such as Indonesian, even without RLHF and safety fine-tuning., Comment: Cendol models are released under Apache 2.0 license and will be made publicly available soon
Published: 2024

6. Contrastive Learning for Inference in Dialogue

Author: Ishii, Etsuko, Xu, Yan, Wilie, Bryan, Ji, Ziwei, Lovenia, Holy, Chung, Willy, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker. While recent large language models show remarkable advances in inference tasks, their performance in inductive reasoning, where not all information is present in the context, is far behind deductive reasoning. In this paper, we analyze the behavior of the models based on the task difficulty defined by the semantic information gap -- which distinguishes inductive and deductive reasoning (Johnson-Laird, 1988, 1993). Our analysis reveals that the disparity in information between dialogue contexts and desired inferences poses a significant challenge to the inductive inference process. To mitigate this information gap, we investigate a contrastive learning approach by feeding negative samples. Our experiments suggest negative samples help models understand what is wrong and improve their inference generations., Comment: Accepted to EMNLP2023
Published: 2023

7. InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems

Author: Chung, Willy, Cahyawijaya, Samuel, Wilie, Bryan, Lovenia, Holy, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP), yet remain under-explored for task-oriented dialogue systems (TODS), especially for end-to-end TODS. We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning. By leveraging LLMs, InstructTODS generates a proxy belief state that seamlessly translates user intentions into dynamic queries for efficient interaction with any KB. Our extensive experiments demonstrate that InstructTODS achieves comparable performance to fully fine-tuned TODS in guiding dialogues to successful completion without prior knowledge or task-specific data. Furthermore, a rigorous human evaluation of end-to-end TODS shows that InstructTODS produces dialogue responses that notably outperform both the gold responses and the state-of-the-art TODS in terms of helpfulness, informativeness, and humanness. Moreover, the effectiveness of LLMs in TODS is further supported by our comprehensive evaluations on TODS subtasks: dialogue state tracking, intent classification, and response generation. Code and implementations could be found here https://github.com/WillyHC22/InstructTODS/
Published: 2023

8. PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems

Author: Wilie, Bryan, Xu, Yan, Chung, Willy, Cahyawijaya, Samuel, Lovenia, Holy, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Grounding dialogue response generation on external knowledge is proposed to produce informative and engaging responses. However, current knowledge-grounded dialogue (KGD) systems often fail to align the generated responses with human-preferred qualities due to several issues like hallucination and the lack of coherence. Upon analyzing multiple language model generations, we observe the presence of alternative generated responses within a single decoding process. These alternative responses are more faithful and exhibit a comparable or higher level of relevance to prior conversational turns compared to the optimal responses prioritized by the decoding processes. To address these challenges and driven by these observations, we propose Polished \& Informed Candidate Scoring (PICK), a generation re-scoring framework that empowers models to generate faithful and relevant responses without requiring additional labeled data or model tuning. Through comprehensive automatic and human evaluations, we demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history. Furthermore, PICK consistently improves the system's performance with both oracle and retrieved knowledge in all decoding strategies. We provide the detailed implementation in https://github.com/bryanwilie/pick .
Published: 2023

9. NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

Author: Cahyawijaya, Samuel, Lovenia, Holy, Koto, Fajri, Adhista, Dea, Dave, Emmanuel, Oktavianti, Sarah, Akbar, Salsabil Maulana, Lee, Jhonson, Shadieq, Nuur, Cenggoro, Tjeng Wawan, Linuwih, Hanung Wahyuning, Wilie, Bryan, Muridan, Galih Pradipta, Winata, Genta Indra, Moeljadi, David, Aji, Alham Fikri, Purwarianti, Ayu, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Democratizing access to natural language processing (NLP) technology is crucial, especially for underrepresented and extremely low-resource languages. Previous research has focused on developing labeled and unlabeled corpora for these languages through online scraping and document translation. While these methods have proven effective and cost-efficient, we have identified limitations in the resulting corpora, including a lack of lexical diversity and cultural relevance to local communities. To address this gap, we conduct a case study on Indonesian local languages. We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets. Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content. In addition, we present the \datasetname{} benchmark, encompassing 12 underrepresented and extremely low-resource languages spoken by millions of individuals in Indonesia. Our empirical experiment results using existing multilingual large language models conclude the need to extend these models to more underrepresented languages. We release the NusaWrites dataset at https://github.com/IndoNLP/nusa-writes.
Published: 2023

10. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Author: Bang, Yejin, Cahyawijaya, Samuel, Lee, Nayeon, Dai, Wenliang, Su, Dan, Wilie, Bryan, Lovenia, Holy, Ji, Ziwei, Yu, Tiezheng, Chung, Willy, Do, Quyet V., Xu, Yan, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. We find that ChatGPT outperforms LLMs with zero-shot learning on most tasks and even outperforms fine-tuned models on some tasks. We find that it is better at understanding non-Latin script languages than generating them. It is able to generate multimodal content from textual prompts, via an intermediate code generation step. Moreover, we find that ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning, hence making it an unreliable reasoner. It is, for example, better at deductive than inductive reasoning. ChatGPT suffers from hallucination problems like other LLMs and it generates more extrinsic hallucinations from its parametric memory as it does not have access to an external knowledge base. Finally, the interactive feature of ChatGPT enables human collaboration with the underlying LLM to improve its performance, i.e, 8% ROUGE-1 on summarization and 2% ChrF++ on machine translation, in a multi-turn "prompt engineering" fashion. We also release codebase for evaluation set extraction., Comment: 45 pages, AACL 2022
Published: 2023

11. NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Author: Cahyawijaya, Samuel, Lovenia, Holy, Aji, Alham Fikri, Winata, Genta Indra, Wilie, Bryan, Mahendra, Rahmad, Wibisono, Christian, Romadhony, Ade, Vincentio, Karissa, Koto, Fajri, Santoso, Jennifer, Moeljadi, David, Wirawan, Cahya, Hudi, Frederikus, Parmonangan, Ivan Halim, Alfina, Ika, Wicaksono, Muhammad Satrio, Putra, Ilham Firdausi, Rahmadani, Samsul, Oenang, Yulianti, Septiandri, Ali Akbar, Jaya, James, Dhole, Kaustubh D., Suryani, Arie Ardiyanti, Putri, Rifki Afina, Su, Dan, Stevens, Keith, Nityasya, Made Nindyatama, Adilazuarda, Muhammad Farid, Ignatius, Ryan, Diandaru, Ryandito, Yu, Tiezheng, Ghifari, Vito, Dai, Wenliang, Xu, Yan, Damapuspita, Dyah, Tho, Cuk, Karo, Ichwanul Muslim Karo, Fatyanosa, Tirana Noor, Ji, Ziwei, Fung, Pascale, Neubig, Graham, Baldwin, Timothy, Ruder, Sebastian, Sujaini, Herry, Sakti, Sakriani, and Purwarianti, Ayu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.
Published: 2022

12. RHO ($\rho$): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding

Author: Ji, Ziwei, Liu, Zihan, Lee, Nayeon, Yu, Tiezheng, Wilie, Bryan, Zeng, Min, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Dialogue systems can leverage large pre-trained language models and knowledge to generate fluent and informative responses. However, these models are still prone to produce hallucinated responses not supported by the input source, which greatly hinders their application. The heterogeneity between external knowledge and dialogue context challenges representation learning and source integration, and further contributes to unfaithfulness. To handle this challenge and generate more faithful responses, this paper presents RHO ($\rho$) utilizing the representations of linked entities and relation predicates from a knowledge graph (KG). We propose (1) local knowledge grounding to combine textual embeddings with the corresponding KG embeddings; and (2) global knowledge grounding to equip RHO with multi-hop reasoning abilities via the attention mechanism. In addition, we devise a response re-ranking technique based on walks over KG sub-graphs for better conversational reasoning. Experimental results on OpenDialKG show that our approach significantly outperforms state-of-the-art methods on both automatic and human evaluation by a large margin, especially in hallucination reduction (17.54% in FeQA)., Comment: accepted by ACL 2023 Findings
Published: 2022

13. How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling

Author: Cahyawijaya, Samuel, Wilie, Bryan, Lovenia, Holy, Zhong, Huan, Zhong, MingQian, Ip, Yuk-Yu Nancy, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA. However, the applicability of these methods to real clinical use cases is hindered, due to the limitation of pre-trained LMs in processing long textual data with thousands of words, which is a common length for a clinical note. In this work, we explore long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context. We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record (EHR) system to show the effectiveness and generalizability of this concept, achieving 10\% F1-score improvement. Based on our experiments, we conclude that capturing a longer clinical note interval is beneficial to the model performance, but there are different cut-off intervals to achieve the optimal performance for different target variables. Our code is available at https://github.com/HLTCHKUST/long-biomedical-model.
Published: 2022

14. Every picture tells a story: Image-grounded controllable stylistic story generation

Author: Lovenia, Holy, Wilie, Bryan, Barraud, Romain, Cahyawijaya, Samuel, Chung, Willy, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Generating a short story out of an image is arduous. Unlike image captioning, story generation from an image poses multiple challenges: preserving the story coherence, appropriately assessing the quality of the story, steering the generated story into a certain style, and addressing the scarcity of image-story pair reference datasets limiting supervision during training. In this work, we introduce Plug-and-Play Story Teller (PPST) and improve image-to-story generation by: 1) alleviating the data scarcity problem by incorporating large pre-trained models, namely CLIP and GPT-2, to facilitate a fluent image-to-text generation with minimal supervision, and 2) enabling a more style-relevant generation by incorporating stylistic adapters to control the story generation. We conduct image-to-story generation experiments with non-styled, romance-styled, and action-styled PPST approaches and compare our generated stories with those of previous work over three aspects, i.e., story coherence, image-story relevance, and style fitness, using both automatic and human evaluation. The results show that PPST improves story coherence and has better image-story relevance, but has yet to be adequately stylistic., Comment: Accepted in LaTeCH-CLfL 2022 (6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature), COLING 2022
Published: 2022

15. NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages

Author: Cahyawijaya, Samuel, Aji, Alham Fikri, Lovenia, Holy, Winata, Genta Indra, Wilie, Bryan, Mahendra, Rahmad, Koto, Fajri, Moeljadi, David, Vincentio, Karissa, Romadhony, Ade, and Purwarianti, Ayu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: At the center of the underlying issues that halt Indonesian natural language processing (NLP) research advancement, we find data scarcity. Resources in Indonesian languages, especially the local ones, are extremely scarce and underrepresented. Many Indonesian researchers do not publish their dataset. Furthermore, the few public datasets that we have are scattered across different platforms, thus makes performing reproducible and data-centric research in Indonesian NLP even more arduous. Rising to this challenge, we initiate the first Indonesian NLP crowdsourcing effort, NusaCrowd. NusaCrowd strives to provide the largest datasheets aggregation with standardized data loading for NLP tasks in all Indonesian languages. By enabling open and centralized access to Indonesian NLP resources, we hope NusaCrowd can tackle the data scarcity problem hindering NLP progress in Indonesia and bring NLP practitioners to move towards collaboration.
Published: 2022

16. GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Author: Gehrmann, Sebastian, Bhattacharjee, Abhik, Mahendiran, Abinaya, Wang, Alex, Papangelis, Alexandros, Madaan, Aman, McMillan-Major, Angelina, Shvets, Anna, Upadhyay, Ashish, Yao, Bingsheng, Wilie, Bryan, Bhagavatula, Chandra, You, Chaobin, Thomson, Craig, Garbacea, Cristina, Wang, Dakuo, Deutsch, Daniel, Xiong, Deyi, Jin, Di, Gkatzia, Dimitra, Radev, Dragomir, Clark, Elizabeth, Durmus, Esin, Ladhak, Faisal, Ginter, Filip, Winata, Genta Indra, Strobelt, Hendrik, Hayashi, Hiroaki, Novikova, Jekaterina, Kanerva, Jenna, Chim, Jenny, Zhou, Jiawei, Clive, Jordan, Maynez, Joshua, Sedoc, João, Juraska, Juraj, Dhole, Kaustubh, Chandu, Khyathi Raghavi, Perez-Beltrachini, Laura, Ribeiro, Leonardo F. R., Tunstall, Lewis, Zhang, Li, Pushkarna, Mahima, Creutz, Mathias, White, Michael, Kale, Mihir Sanjay, Eddine, Moussa Kamal, Daheim, Nico, Subramani, Nishant, Dusek, Ondrej, Liang, Paul Pu, Ammanamanchi, Pawan Sasanka, Zhu, Qi, Puduppully, Ratish, Kriz, Reno, Shahriyar, Rifat, Cardenas, Ronald, Mahamood, Saad, Osei, Salomey, Cahyawijaya, Samuel, Štajner, Sanja, Montella, Sebastien, Shailza, Jolly, Shailza, Mille, Simon, Hasan, Tahmid, Shen, Tianhao, Adewumi, Tosin, Raunak, Vikas, Raheja, Vipul, Nikolaev, Vitaly, Tsai, Vivian, Jernite, Yacine, Xu, Ying, Sang, Yisi, Liu, Yixin, and Hou, Yufang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.
Published: 2022

17. Towards Answering Open-ended Ethical Quandary Questions

Author: Bang, Yejin, Lee, Nayeon, Yu, Tiezheng, Khalatbari, Leila, Xu, Yan, Cahyawijaya, Samuel, Su, Dan, Wilie, Bryan, Barraud, Romain, Barezi, Elham J., Madotto, Andrea, Kee, Hayden, and Fung, Pascale
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Considerable advancements have been made in various NLP tasks based on the impressive power of large language models (LLMs) and many NLP applications are deployed in our daily lives. In this work, we challenge the capability of LLMs with the new task of Ethical Quandary Generative Question Answering. Ethical quandary questions are more challenging to address because multiple conflicting answers may exist to a single quandary. We explore the current capability of LLMs in providing an answer with a deliberative exchange of different perspectives to an ethical quandary, in the approach of Socratic philosophy, instead of providing a closed answer like an oracle. We propose a model that searches for different ethical principles applicable to the ethical quandary and generates an answer conditioned on the chosen principles through prompt-based few-shot learning. We also discuss the remaining challenges and ethical issues involved in this task and suggest the direction toward developing responsible NLP systems by incorporating human values explicitly., Comment: 16 pages
Published: 2022

18. Can Question Rewriting Help Conversational Question Answering?

Author: Ishii, Etsuko, Xu, Yan, Cahyawijaya, Samuel, and Wilie, Bryan
Subjects: Computer Science - Computation and Language
Abstract: Question rewriting (QR) is a subtask of conversational question answering (CQA) aiming to ease the challenges of understanding dependencies among dialogue history by reformulating questions in a self-contained form. Despite seeming plausible, little evidence is available to justify QR as a mitigation method for CQA. To verify the effectiveness of QR in CQA, we investigate a reinforcement learning approach that integrates QR and CQA tasks and does not require corresponding QR datasets for targeted CQA. We find, however, that the RL method is on par with the end-to-end baseline. We provide an analysis of the failure and describe the difficulty of exploiting QR for CQA., Comment: Accepted at Workshop on Insights from Negative Results in NLP at ACL2022
Published: 2022

19. Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

Author: Lovenia, Holy, Wilie, Bryan, Chung, Willy, Zeng, Min, Cahyawijaya, Samuel, Dan, Su, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task. Unfortunately, existing adaptations mainly involve deterministic rules that cannot generalize well. Here, we propose Clozer, a sequence-tagging based cloze answer extraction method used in TAPT that is extendable for adaptation on any cloze-style machine reading comprehension (MRC) downstream tasks. We experiment on multiple-choice cloze-style MRC tasks, and show that Clozer performs significantly better compared to the oracle and state-of-the-art in escalating TAPT effectiveness in lifting model performance, and prove that Clozer is able to recognize the gold answers independently of any heuristics.
Published: 2022
Full Text: View/download PDF

20. NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Author: Dhole, Kaustubh D., Gangal, Varun, Gehrmann, Sebastian, Gupta, Aadesh, Li, Zhenhao, Mahamood, Saad, Mahendiran, Abinaya, Mille, Simon, Shrivastava, Ashish, Tan, Samson, Wu, Tongshuang, Sohl-Dickstein, Jascha, Choi, Jinho D., Hovy, Eduard, Dusek, Ondrej, Ruder, Sebastian, Anand, Sajant, Aneja, Nagender, Banjade, Rabin, Barthe, Lisa, Behnke, Hanna, Berlot-Attwell, Ian, Boyle, Connor, Brun, Caroline, Cabezudo, Marco Antonio Sobrevilla, Cahyawijaya, Samuel, Chapuis, Emile, Che, Wanxiang, Choudhary, Mukund, Clauss, Christian, Colombo, Pierre, Cornell, Filip, Dagan, Gautier, Das, Mayukh, Dixit, Tanay, Dopierre, Thomas, Dray, Paul-Alexis, Dubey, Suchitra, Ekeinhor, Tatiana, Di Giovanni, Marco, Goyal, Tanya, Gupta, Rishabh, Hamla, Louanes, Han, Sang, Harel-Canada, Fabrice, Honore, Antoine, Jindal, Ishan, Joniak, Przemyslaw K., Kleyko, Denis, Kovatchev, Venelin, Krishna, Kalpesh, Kumar, Ashutosh, Langer, Stefan, Lee, Seungjae Ryan, Levinson, Corey James, Liang, Hualou, Liang, Kaizhao, Liu, Zhexiong, Lukyanenko, Andrey, Marivate, Vukosi, de Melo, Gerard, Meoni, Simon, Meyer, Maxime, Mir, Afnan, Moosavi, Nafise Sadat, Muennighoff, Niklas, Mun, Timothy Sum Hon, Murray, Kenton, Namysl, Marcin, Obedkova, Maria, Oli, Priti, Pasricha, Nivranshu, Pfister, Jan, Plant, Richard, Prabhu, Vinay, Pais, Vasile, Qin, Libo, Raji, Shahab, Rajpoot, Pawan Kumar, Raunak, Vikas, Rinberg, Roy, Roberts, Nicolas, Rodriguez, Juan Diego, Roux, Claude, S., Vasconcellos P. H., Sai, Ananya B., Schmidt, Robin M., Scialom, Thomas, Sefara, Tshephisho, Shamsi, Saqib N., Shen, Xudong, Shi, Haoyue, Shi, Yiwen, Shvets, Anna, Siegel, Nick, Sileo, Damien, Simon, Jamie, Singh, Chandan, Sitelew, Roman, Soni, Priyank, Sorensen, Taylor, Soto, William, Srivastava, Aman, Srivatsa, KV Aditya, Sun, Tony, T, Mukund Varma, Tabassum, A, Tan, Fiona Anting, Teehan, Ryan, Tiwari, Mo, Tolkiehn, Marie, Wang, Athena, Wang, Zijian, Wang, Gloria, Wang, Zijie J., Wei, Fuxuan, Wilie, Bryan, Winata, Genta Indra, Wu, Xinyi, Wydmański, Witold, Xie, Tianbao, Yaseen, Usama, Yee, Michael A., Zhang, Jing, and Zhang, Yue
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter)., Comment: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter
Published: 2021

21. Greenformer: Factorization Toolkit for Efficient Deep Neural Networks

Author: Cahyawijaya, Samuel, Winata, Genta Indra, Lovenia, Holy, Wilie, Bryan, Dai, Wenliang, Ishii, Etsuko, and Fung, Pascale
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: While the recent advances in deep neural networks (DNN) bring remarkable success, the computational cost also increases considerably. In this paper, we introduce Greenformer, a toolkit to accelerate the computation of neural networks through matrix factorization while maintaining performance. Greenformer can be easily applied with a single line of code to any DNN model. Our experimental results show that Greenformer is effective for a wide range of scenarios. We provide the showcase of Greenformer at https://samuelcahyawijaya.github.io/greenformer-demo/.
Published: 2021

22. IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Author: Cahyawijaya, Samuel, Winata, Genta Indra, Wilie, Bryan, Vincentio, Karissa, Li, Xiaohong, Kuncoro, Adhiguna, Ruder, Sebastian, Lim, Zhi Yuan, Bahar, Syafri, Khodra, Masayu Leylia, Purwarianti, Ayu, and Fung, Pascale
Subjects: Computer Science - Computation and Language
Abstract: Natural language generation (NLG) benchmarks provide an important avenue to measure progress and develop better NLG systems. Unfortunately, the lack of publicly available NLG benchmarks for low-resource languages poses a challenging barrier for building NLG systems that work well for languages with limited amounts of data. Here we introduce IndoNLG, the first benchmark to measure natural language generation (NLG) progress in three low-resource -- yet widely spoken -- languages of Indonesia: Indonesian, Javanese, and Sundanese. Altogether, these languages are spoken by more than 100 million native speakers, and hence constitute an important use case of NLG systems today. Concretely, IndoNLG covers six tasks: summarization, question answering, chit-chat, and three different pairs of machine translation (MT) tasks. We collate a clean pretraining corpus of Indonesian, Sundanese, and Javanese datasets, Indo4B-Plus, which is used to pretrain our models: IndoBART and IndoGPT. We show that IndoBART and IndoGPT achieve competitive performance on all tasks -- despite using only one-fifth the parameters of a larger multilingual model, mBART-LARGE (Liu et al., 2020). This finding emphasizes the importance of pretraining on closely related, local languages to achieve more efficient learning and faster inference for very low-resource languages like Javanese and Sundanese., Comment: Accepted in EMNLP 2021, 10 pages
Published: 2021

23. IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

Author: Wilie, Bryan, Vincentio, Karissa, Winata, Genta Indra, Cahyawijaya, Samuel, Li, Xiaohong, Lim, Zhi Yuan, Soleman, Sidik, Mahendra, Rahmad, Fung, Pascale, Bahar, Syafri, and Purwarianti, Ayu
Subjects: Computer Science - Computation and Language
Abstract: Although Indonesian is known to be the fourth most frequently used language over the internet, the research progress on this language in the natural language processing (NLP) is slow-moving due to a lack of available resources. In response, we introduce the first-ever vast resource for the training, evaluating, and benchmarking on Indonesian natural language understanding (IndoNLU) tasks. IndoNLU includes twelve tasks, ranging from single sentence classification to pair-sentences sequence labeling with different levels of complexity. The datasets for the tasks lie in different domains and styles to ensure task diversity. We also provide a set of Indonesian pre-trained models (IndoBERT) trained from a large and clean Indonesian dataset Indo4B collected from publicly available sources such as social media texts, blogs, news, and websites. We release baseline models for all twelve tasks, as well as the framework for benchmark evaluation, and thus it enables everyone to benchmark their system performances., Comment: This paper will be presented in AACL-IJCNLP 2020 (with new results and acknowledgment)
Published: 2020

24. NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Author: Cahyawijaya, Samuel, Lovenia, Holy, Wilie, Bryan, Su, Dan, Yu, Tiezheng, Dai, Wenliang, Xu, Yan, Ji, Ziwei, Fung, Pascale Ngan, Cahyawijaya, Samuel, Lovenia, Holy, Wilie, Bryan, Su, Dan, Yu, Tiezheng, Dai, Wenliang, Xu, Yan, Ji, Ziwei, and Fung, Pascale Ngan
Abstract: We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are underrepresented despite being widely spoken. © 2023 Association for Computational Linguistics.
Published: 2023

25. Towards Answering Ethical Quandary Questions

Author: Bang, Ye Jin, Lee, Nayeon, Yu, Tiezheng, Khalatbari, Leila, Xu, Yan, Cahyawijaya, Samuel, Su, Dan, Wilie, Bryan, Barraud, Romain Maurice, Jebalbarezi Sarbijan, Elham, Madotto, Andrea, Kee, Hayden, Fung, Pascale Ngan, Bang, Ye Jin, Lee, Nayeon, Yu, Tiezheng, Khalatbari, Leila, Xu, Yan, Cahyawijaya, Samuel, Su, Dan, Wilie, Bryan, Barraud, Romain Maurice, Jebalbarezi Sarbijan, Elham, Madotto, Andrea, Kee, Hayden, and Fung, Pascale Ngan
Published: 2023

26. InstructTODS: Large language models for end-to-end task-oriented dialogue systems

Author: Chung, Willy Hoo Choun, Cahyawijaya, Samuel, Lovenia, Holy, Wilie, Bryan, Fung, Pascale Ngan, Chung, Willy Hoo Choun, Cahyawijaya, Samuel, Lovenia, Holy, Wilie, Bryan, and Fung, Pascale Ngan
Abstract: Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP), yet remain under-explored for task-oriented dialogue systems (TODS), especially for end-to-end TODS. We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning. By leveraging LLMs, InstructTODS generates a proxy belief state that seamlessly translates user intentions into dynamic queries for efficient interaction with any KB. Our extensive experiments demonstrate that InstructTODS achieves comparable performance to fully fine-tuned TODS in guiding dialogues to successful completion without prior knowledge or task-specific data. Furthermore, a rigorous human evaluation of end-to-end TODS shows that InstructTODS produces dialogue responses that notably outperform both the gold responses and the state-of-the-art TODS in terms of helpfulness, informativeness, and humanness. Moreover, the effectiveness of LLMs in TODS is further supported by our comprehensive evaluations on TODS subtasks: dialogue state tracking, intent classification, and response generation. Code and implementations could be found here1 .
Published: 2023

27. Contrastive Learning for Inference in Dialogue

Author: Ishii, Etsuko, primary, Xu, Yan, additional, Wilie, Bryan, additional, Ji, Ziwei, additional, Lovenia, Holy, additional, Chung, Willy, additional, and Fung, Pascale, additional
Published: 2023
Full Text: View/download PDF

28. RHO: Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding

Author: Ji, Ziwei, primary, Liu, Zihan, additional, Lee, Nayeon, additional, Yu, Tiezheng, additional, Wilie, Bryan, additional, Zeng, Min, additional, and Fung, Pascale, additional
Published: 2023
Full Text: View/download PDF

29. NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Author: Cahyawijaya, Samuel, primary, Lovenia, Holy, additional, Aji, Alham Fikri, additional, Winata, Genta, additional, Wilie, Bryan, additional, Koto, Fajri, additional, Mahendra, Rahmad, additional, Wibisono, Christian, additional, Romadhony, Ade, additional, Vincentio, Karissa, additional, Santoso, Jennifer, additional, Moeljadi, David, additional, Wirawan, Cahya, additional, Hudi, Frederikus, additional, Wicaksono, Muhammad Satrio, additional, Parmonangan, Ivan, additional, Alfina, Ika, additional, Putra, Ilham Firdausi, additional, Rahmadani, Samsul, additional, Oenang, Yulianti, additional, Septiandri, Ali, additional, Jaya, James, additional, Dhole, Kaustubh, additional, Suryani, Arie, additional, Putri, Rifki Afina, additional, Su, Dan, additional, Stevens, Keith, additional, Nityasya, Made Nindyatama, additional, Adilazuarda, Muhammad, additional, Hadiwijaya, Ryan, additional, Diandaru, Ryandito, additional, Yu, Tiezheng, additional, Ghifari, Vito, additional, Dai, Wenliang, additional, Xu, Yan, additional, Damapuspita, Dyah, additional, Wibowo, Haryo, additional, Tho, Cuk, additional, Karo Karo, Ichwanul, additional, Fatyanosa, Tirana, additional, Ji, Ziwei, additional, Neubig, Graham, additional, Baldwin, Timothy, additional, Ruder, Sebastian, additional, Fung, Pascale, additional, Sujaini, Herry, additional, Sakti, Sakriani, additional, and Purwarianti, Ayu, additional
Published: 2023
Full Text: View/download PDF

30. Integrating Question Rewriting in Conversational Question Answering: A Reinforcement Learning Approach

Author: Ishii, Etsuko, Wilie, Bryan, Xu, Yan, Cahyawijaya, Samuel, Fung, Pascale, Ishii, Etsuko, Wilie, Bryan, Xu, Yan, Cahyawijaya, Samuel, and Fung, Pascale
Abstract: Resolving dependencies among dialogue history is one of the main obstacles in the research on conversational question answering (CQA). The conversational question rewrites (QR) task has been shown to be effective to solve this problem by reformulating questions in a self-contained form. However, QR datasets are limited and existing methods tend to depend on the assumption of the existence of corresponding QR datasets for every CQA dataset. This paper proposes a reinforcement learning approach that integrates QR and CQA tasks without corresponding labeled QR datasets. We train a QR model based on the reward signal obtained from the CQA, and the experimental results show that our approach can bring improvement over the pipeline approaches. The code is available at https://github. com/HLTCHKUST/cqr4cqa.
Published: 2022

31. Integrating Question Rewrites in Conversational Question Answering: A Reinforcement Learning Approach

Author: Ishii, Etsuko, primary, Wilie, Bryan, additional, Xu, Yan, additional, Cahyawijaya, Samuel, additional, and Fung, Pascale, additional
Published: 2022
Full Text: View/download PDF

32. How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling

Author: Cahyawijaya, Samuel, primary, Wilie, Bryan, additional, Lovenia, Holy, additional, Zhong, Huan, additional, Zhong, MingQian, additional, Ip, Yuk-Yu Nancy, additional, and Fung, Pascale, additional
Published: 2022
Full Text: View/download PDF

33. GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Author: Gehrmann, Sebastian, primary, Bhattacharjee, Abhik, additional, Mahendiran, Abinaya, additional, Wang, Alex, additional, Papangelis, Alexandros, additional, Madaan, Aman, additional, Mcmillan-major, Angelina, additional, Shvets, Anna, additional, Upadhyay, Ashish, additional, Bohnet, Bernd, additional, Yao, Bingsheng, additional, Wilie, Bryan, additional, Bhagavatula, Chandra, additional, You, Chaobin, additional, Thomson, Craig, additional, Garbacea, Cristina, additional, Wang, Dakuo, additional, Deutsch, Daniel, additional, Xiong, Deyi, additional, Jin, Di, additional, Gkatzia, Dimitra, additional, Radev, Dragomir, additional, Clark, Elizabeth, additional, Durmus, Esin, additional, Ladhak, Faisal, additional, Ginter, Filip, additional, Winata, Genta Indra, additional, Strobelt, Hendrik, additional, Hayashi, Hiroaki, additional, Novikova, Jekaterina, additional, Kanerva, Jenna, additional, Chim, Jenny, additional, Zhou, Jiawei, additional, Clive, Jordan, additional, Maynez, Joshua, additional, Sedoc, João, additional, Juraska, Juraj, additional, Dhole, Kaustubh, additional, Chandu, Khyathi Raghavi, additional, Beltrachini, Laura Perez, additional, Ribeiro, Leonardo F . R., additional, Tunstall, Lewis, additional, Zhang, Li, additional, Pushkarna, Mahim, additional, Creutz, Mathias, additional, White, Michael, additional, Kale, Mihir Sanjay, additional, Eddine, Moussa Kamal, additional, Daheim, Nico, additional, Subramani, Nishant, additional, Dusek, Ondrej, additional, Liang, Paul Pu, additional, Ammanamanchi, Pawan Sasanka, additional, Zhu, Qi, additional, Puduppully, Ratish, additional, Kriz, Reno, additional, Shahriyar, Rifat, additional, Cardenas, Ronald, additional, Mahamood, Saad, additional, Osei, Salomey, additional, Cahyawijaya, Samuel, additional, Štajner, Sanja, additional, Montella, Sebastien, additional, Jolly, Shailza, additional, Mille, Simon, additional, Hasan, Tahmid, additional, Shen, Tianhao, additional, Adewumi, Tosin, additional, Raunak, Vikas, additional, Raheja, Vipul, additional, Nikolaev, Vitaly, additional, Tsai, Vivian, additional, Jernite, Yacine, additional, Xu, Ying, additional, Sang, Yisi, additional, Liu, Yixin, additional, and Hou, Yufang, additional
Published: 2022
Full Text: View/download PDF

34. Can Question Rewriting Help Conversational Question Answering?

Author: Ishii, Etsuko, primary, Xu, Yan, additional, Cahyawijaya, Samuel, additional, and Wilie, Bryan, additional
Published: 2022
Full Text: View/download PDF

35. Clozer”:" Adaptable Data Augmentation for Cloze-style Reading Comprehension

Author: Lovenia, Holy, primary, Wilie, Bryan, additional, Chung, Willy, additional, Min, Zeng, additional, Cahyawijaya, Samuel, additional, Su, Dan, additional, and Fung, Pascale, additional
Published: 2022
Full Text: View/download PDF

36. IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Author: Cahyawijaya, Samuel, primary, Winata, Genta Indra, additional, Wilie, Bryan, additional, Vincentio, Karissa, additional, Li, Xiaohong, additional, Kuncoro, Adhiguna, additional, Ruder, Sebastian, additional, Lim, Zhi Yuan, additional, Bahar, Syafri, additional, Khodra, Masayu, additional, Purwarianti, Ayu, additional, and Fung, Pascale, additional
Published: 2021
Full Text: View/download PDF

37. Identifying Indicators of Household Indebtedness by Provinces

Author: Rezzy Eko Caraka, Hafianti, Silvi, Hidayati, Sri, Wilie, Bryan, and Muztahid, Muhammad Rheza
Published: 2019
Full Text: View/download PDF

38. CountNet: End to End Deep Learning for Crowd Counting

Author: Wilie, Bryan, primary, Cahyawijaya, Samuel, additional, and Adiprawita, Widyawardana, additional
Published: 2018
Full Text: View/download PDF

39. IDEnet : Inception-Based Deep Convolutional Neural Network for Crowd Counting Estimation

Author: Cahyawijaya, Samuel, primary, Wilie, Bryan, additional, and Adiprawita, Widyawardana, additional
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

39 results on '"Wilie, Bryan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources