Author: "Dernoncourt, Franck" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Dernoncourt, Franck"' showing total 450 results

Start Over Author "Dernoncourt, Franck"

450 results on '"Dernoncourt, Franck"'

1. DynaSaur: Large Language Agents Beyond Predefined Actions

Author: Nguyen, Dang, Lai, Viet Dac, Yoon, Seunghyun, Rossi, Ryan A., Zhao, Handong, Zhang, Ruiyi, Mathur, Puneet, Lipka, Nedim, Wang, Yu, Bui, Trung, Dernoncourt, Franck, and Zhou, Tianyi
Subjects: Computer Science - Computation and Language
Abstract: Existing LLM agent systems typically select actions from a fixed and predefined set at every step. While this approach is effective in closed, narrowly-scoped environments, we argue that it presents two major challenges when deploying LLM agents in real-world scenarios: (1) selecting from a fixed set of actions significantly restricts the planning and acting capabilities of LLM agents, and (2) this approach requires substantial human effort to enumerate and implement all possible actions, which becomes impractical in complex environments with a vast number of potential actions. In this work, we propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner. In this framework, the agent interacts with the environment by generating and executing programs written in a general-purpose programming language at each step. Furthermore, generated actions are accumulated over time for future reuse. Our extensive experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods. Notably, it allows an LLM agent to recover in scenarios where no relevant action exists in the predefined set or when existing actions fail due to unforeseen edge cases. At the time of writing, we hold the top position on the GAIA public leaderboard. Our code can be found in \href{https://github.com/adobe-research/dynasaur}{https://github.com/adobe-research/dynasaur}., Comment: 15 pages, 8 figures
Published: 2024

2. LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding

Author: Chen, Jian, Zhang, Ruiyi, Zhou, Yufan, Yu, Tong, Dernoncourt, Franck, Gu, Jiuxiang, Rossi, Ryan A., Chen, Changyou, and Sun, Tong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large multimodal models (LMMs) have recently shown great progress in text-rich image understanding, yet they still struggle with complex, multi-page, visually-rich documents. Traditional methods using document parsers for retrieval-augmented generation suffer from performance and efficiency limitations, while directly presenting all pages to LMMs leads to inefficiencies, especially with lengthy documents. In this work, we present a novel framework named LoRA-Contextualizing Adaptation of Large multimodal models (LoCAL), which broadens the capabilities of any LMM to support long-document understanding. We demonstrate that LMMs can effectively serve as multimodal retrievers, fetching relevant pages to answer user questions based on these pages. LoCAL is implemented with two specific LMM adapters: one for evidence page retrieval and another for question answering. Empirical results show state-of-the-art performance on public benchmarks, demonstrating the effectiveness of LoCAL., Comment: Currently Under Review
Published: 2024

3. GRS-QA -- Graph Reasoning-Structured Question Answering Dataset

Author: Pahilajani, Anish, Trivedi, Devasha, Shuai, Jincen, Yone, Khin S., Jain, Samyak Rajesh, Park, Namyong, Rossi, Ryan A., Ahmed, Nesreen K., Dernoncourt, Franck, and Wang, Yu
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) have excelled in multi-hop question-answering (M-QA) due to their advanced reasoning abilities. However, the impact of the inherent reasoning structures on LLM M-QA performance remains unclear, largely due to the absence of QA datasets that provide fine-grained reasoning structures. To address this gap, we introduce the Graph Reasoning-Structured Question Answering Dataset (GRS-QA), which includes both semantic contexts and reasoning structures for QA pairs. Unlike existing M-QA datasets, where different reasoning structures are entangled together, GRS-QA explicitly captures intricate reasoning pathways by constructing reasoning graphs, where nodes represent textual contexts and edges denote logical flows. These reasoning graphs of different structures enable a fine-grained evaluation of LLM reasoning capabilities across various reasoning structures. Our empirical analysis reveals that LLMs perform differently when handling questions with varying reasoning structures. This finding facilitates the exploration of textual structures as compared with semantics., Comment: 15 pages, 24 figures, 10 tables
Published: 2024

4. Personalization of Large Language Models: A Survey

Author: Zhang, Zhehao, Rossi, Ryan A., Kveton, Branislav, Shao, Yijia, Yang, Diyi, Zamani, Hamed, Dernoncourt, Franck, Barrow, Joe, Yu, Tong, Kim, Sungchul, Zhang, Ruiyi, Gu, Jiuxiang, Derr, Tyler, Chen, Hongjie, Wu, Junda, Chen, Xiang, Wang, Zichao, Mitra, Subrata, Lipka, Nedim, Ahmed, Nesreen, and Wang, Yu
Subjects: Computer Science - Computation and Language
Abstract: Personalization of Large Language Models (LLMs) has recently become increasingly important with a wide range of applications. Despite the importance and recent progress, most existing works on personalized LLMs have focused either entirely on (a) personalized text generation or (b) leveraging LLMs for personalization-related downstream applications, such as recommendation systems. In this work, we bridge the gap between these two separate main directions for the first time by introducing a taxonomy for personalized LLM usage and summarizing the key differences and challenges. We provide a formalization of the foundations of personalized LLMs that consolidates and expands notions of personalization of LLMs, defining and discussing novel facets of personalization, usage, and desiderata of personalized LLMs. We then unify the literature across these diverse fields and usage scenarios by proposing systematic taxonomies for the granularity of personalization, personalization techniques, datasets, evaluation methods, and applications of personalized LLMs. Finally, we highlight challenges and important open problems that remain to be addressed. By unifying and surveying recent research using the proposed taxonomies, we aim to provide a clear guide to the existing literature and different facets of personalization in LLMs, empowering both researchers and practitioners.
Published: 2024

5. Survey of User Interface Design and Interaction Techniques in Generative AI Applications

Author: Luera, Reuben, Rossi, Ryan A., Siu, Alexa, Dernoncourt, Franck, Yu, Tong, Kim, Sungchul, Zhang, Ruiyi, Chen, Xiang, Salehy, Hanieh, Zhao, Jian, Basu, Samyadeep, Mathur, Puneet, and Lipka, Nedim
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: The applications of generative AI have become extremely impressive, and the interplay between users and AI is even more so. Current human-AI interaction literature has taken a broad look at how humans interact with generative AI, but it lacks specificity regarding the user interface designs and patterns used to create these applications. Therefore, we present a survey that comprehensively presents taxonomies of how a human interacts with AI and the user interaction patterns designed to meet the needs of a variety of relevant use cases. We focus primarily on user-guided interactions, surveying interactions that are initiated by the user and do not include any implicit signals given by the user. With this survey, we aim to create a compendium of different user-interaction patterns that can be used as a reference for designers and developers alike. In doing so, we also strive to lower the entry barrier for those attempting to learn more about the design of generative AI applications.
Published: 2024

6. A Survey of Small Language Models

Author: Van Nguyen, Chien, Shen, Xuan, Aponte, Ryan, Xia, Yu, Basu, Samyadeep, Hu, Zhengmian, Chen, Jian, Parmar, Mihir, Kunapuli, Sasidhar, Barrow, Joe, Wu, Junda, Singh, Ashish, Wang, Yu, Gu, Jiuxiang, Dernoncourt, Franck, Ahmed, Nesreen K., Lipka, Nedim, Zhang, Ruiyi, Chen, Xiang, Yu, Tong, Kim, Sungchul, Deilamsalehy, Hanieh, Park, Namyong, Rimer, Mike, Zhang, Zhehao, Yang, Huanrui, Rossi, Ryan A., and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language
Abstract: Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources, making them ideal for various settings including on-device, mobile, edge devices, among many others. In this article, we present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques. We propose a novel taxonomy for categorizing the methods used to optimize SLMs, including model compression, pruning, and quantization techniques. We summarize the benchmark datasets that are useful for benchmarking SLMs along with the evaluation metrics commonly used. Additionally, we highlight key open challenges that remain to be addressed. Our survey aims to serve as a valuable resource for researchers and practitioners interested in developing and deploying small yet efficient language models.
Published: 2024

7. Taipan: Efficient and Expressive State Space Language Models with Selective Attention

Author: Van Nguyen, Chien, Nguyen, Huy Huu, Pham, Thang M., Zhang, Ruiyi, Deilamsalehy, Hanieh, Mathur, Puneet, Rossi, Ryan A., Bui, Trung, Lai, Viet Dac, Dernoncourt, Franck, and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Efficient long-context language modeling remains a significant challenge in Natural Language Processing (NLP). While Transformers dominate language tasks, they struggle with long sequences due to quadratic computational complexity in training and linearly scaling memory costs during inference. Recent State Space Models (SSMs) such as Mamba offer alternatives with constant memory usage, but they underperform in tasks requiring extensive in-context retrieval. We introduce Taipan, a novel hybrid architecture that combines Mamba-2 with Selective Attention Layers (SALs). These SALs identify tokens requiring long-range interactions, remove less important features, and then augment their representations using the attention module. This approach balances Mamba's efficiency with Transformer-like performance in memory-intensive tasks. By constraining the attention budget, Taipan extends accurate predictions to context lengths of up to 1 million tokens while preserving computational efficiency. Our experiments demonstrate Taipan's superior performance across various scales and tasks, offering a promising solution for efficient long-context language modeling.
Published: 2024

8. DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding

Author: Suri, Manan, Mathur, Puneet, Dernoncourt, Franck, Jain, Rajiv, Morariu, Vlad I, Sawhney, Ramit, Nakov, Preslav, and Manocha, Dinesh
Subjects: Computer Science - Computation and Language
Abstract: Document structure editing involves manipulating localized textual, visual, and layout components in document images based on the user's requests. Past works have shown that multimodal grounding of user requests in the document image and identifying the accurate structural components and their associated attributes remain key challenges for this task. To address these, we introduce the DocEdit-v2, a novel framework that performs end-to-end document editing by leveraging Large Multimodal Models (LMMs). It consists of three novel components: (1) Doc2Command, which simultaneously localizes edit regions of interest (RoI) and disambiguates user edit requests into edit commands; (2) LLM-based Command Reformulation prompting to tailor edit commands originally intended for specialized software into edit instructions suitable for generalist LMMs. (3) Moreover, DocEdit-v2 processes these outputs via Large Multimodal Models like GPT-4V and Gemini, to parse the document layout, execute edits on grounded Region of Interest (RoI), and generate the edited document image. Extensive experiments on the DocEdit dataset show that DocEdit-v2 significantly outperforms strong baselines on edit command generation (2-33%), RoI bounding box detection (12-31%), and overall document editing (1-12\%) tasks., Comment: EMNLP 2024 (Main)
Published: 2024

9. VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use

Author: Zhang, Zhehao, Rossi, Ryan, Yu, Tong, Dernoncourt, Franck, Zhang, Ruiyi, Gu, Jiuxiang, Kim, Sungchul, Chen, Xiang, Wang, Zichao, and Lipka, Nedim
Subjects: Computer Science - Computation and Language
Abstract: While vision-language models (VLMs) have demonstrated remarkable performance across various tasks combining textual and visual information, they continue to struggle with fine-grained visual perception tasks that require detailed pixel-level analysis. Effectively eliciting comprehensive reasoning from VLMs on such intricate visual elements remains an open challenge. In this paper, we present VipAct, an agent framework that enhances VLMs by integrating multi-agent collaboration and vision expert models, enabling more precise visual understanding and comprehensive reasoning. VipAct consists of an orchestrator agent, which manages task requirement analysis, planning, and coordination, along with specialized agents that handle specific tasks such as image captioning and vision expert models that provide high-precision perceptual information. This multi-agent approach allows VLMs to better perform fine-grained visual perception tasks by synergizing planning, reasoning, and tool use. We evaluate VipAct on benchmarks featuring a diverse set of visual perception tasks, with experimental results demonstrating significant performance improvements over state-of-the-art baselines across all tasks. Furthermore, comprehensive ablation studies reveal the critical role of multi-agent collaboration in eliciting more detailed System-2 reasoning and highlight the importance of image input for task planning. Additionally, our error analysis identifies patterns of VLMs' inherent limitations in visual perception, providing insights into potential future improvements. VipAct offers a flexible and extensible framework, paving the way for more advanced visual perception systems across various real-world applications.
Published: 2024

10. A Multi-LLM Debiasing Framework

Author: Owens, Deonna M., Rossi, Ryan A., Kim, Sungchul, Yu, Tong, Dernoncourt, Franck, Chen, Xiang, Zhang, Ruiyi, Gu, Jiuxiang, Deilamsalehy, Hanieh, and Lipka, Nedim
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) are powerful tools with the potential to benefit society immensely, yet, they have demonstrated biases that perpetuate societal inequalities. Despite significant advancements in bias mitigation techniques using data augmentation, zero-shot prompting, and model fine-tuning, biases continuously persist, including subtle biases that may elude human detection. Recent research has shown a growing interest in multi-LLM approaches, which have been demonstrated to be effective in improving the quality of reasoning and factuality in LLMs. Building on this approach, we propose a novel multi-LLM debiasing framework aimed at reducing bias in LLMs. Our work is the first to introduce and evaluate two distinct approaches within this framework for debiasing LLMs: a centralized method, where the conversation is facilitated by a single central LLM, and a decentralized method, where all models communicate directly. Our findings reveal that our multi-LLM framework significantly reduces bias in LLMs, outperforming the baseline method across several social groups.
Published: 2024

11. ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning

Author: Man, Hieu, Ngo, Nghia Trung, Dernoncourt, Franck, and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval
Abstract: Large Language Models (LLMs) excel in various natural language processing tasks, but leveraging them for dense passage embedding remains challenging. This is due to their causal attention mechanism and the misalignment between their pre-training objectives and the text ranking tasks. Despite some recent efforts to address these issues, existing frameworks for LLM-based text embeddings have been limited by their support for only a limited range of LLM architectures and fine-tuning strategies, limiting their practical application and versatility. In this work, we introduce the Unified framework for Large Language Model Embedding (ULLME), a flexible, plug-and-play implementation that enables bidirectional attention across various LLMs and supports a range of fine-tuning strategies. We also propose Generation-augmented Representation Learning (GRL), a novel fine-tuning method to boost LLMs for text embedding tasks. GRL enforces consistency between representation-based and generation-based relevance scores, leveraging LLMs' powerful generative abilities for learning passage embeddings. To showcase our framework's flexibility and effectiveness, we release three pre-trained models from ULLME with different backbone architectures, ranging from 1.5B to 8B parameters, all of which demonstrate strong performance on the Massive Text Embedding Benchmark. Our framework is publicly available at: https://github.com/nlp-uoregon/ullme. A demo video for ULLME can also be found at https://rb.gy/ws1ile.
Published: 2024

12. A Framework for Fine-Tuning LLMs using Heterogeneous Feedback

Author: Aponte, Ryan, Rossi, Ryan A., Guo, Shunan, Dernoncourt, Franck, Yu, Tong, Chen, Xiang, Mitra, Subrata, and Lipka, Nedim
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, I.2.7
Abstract: Large language models (LLMs) have been applied to a wide range of tasks, including text summarization, web navigation, and chatbots. They have benefitted from supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) following an unsupervised pretraining. These datasets can be difficult to collect, limited in scope, and vary in sample quality. Additionally, datasets can vary extensively in supervision format, from numerical to binary as well as multi-dimensional with many different values. We present a framework for fine-tuning LLMs using heterogeneous feedback, which has two main components. First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF. Next, given this unified feedback dataset, we extract a high-quality and diverse subset to obtain performance increases potentially exceeding the full dataset. We conduct extensive experiments to understand the effectiveness of these techniques for incorporating heterogeneous feedback, and demonstrate improvements from using a high-quality and diverse subset of the data. We find that our framework is able to improve models in multiple areas simultaneously, such as in instruction following and bias reduction., Comment: 7 pages, 1 figure
Published: 2024

13. KaPQA: Knowledge-Augmented Product Question-Answering

Author: Eppalapally, Swetha, Dangi, Daksh, Bhat, Chaithra, Gupta, Ankita, Zhang, Ruiyi, Agarwal, Shubham, Bagga, Karishma, Yoon, Seunghyun, Lipka, Nedim, Rossi, Ryan A., and Dernoncourt, Franck
Subjects: Computer Science - Computation and Language
Abstract: Question-answering for domain-specific applications has recently attracted much interest due to the latest advancements in large language models (LLMs). However, accurately assessing the performance of these applications remains a challenge, mainly due to the lack of suitable benchmarks that effectively simulate real-world scenarios. To address this challenge, we introduce two product question-answering (QA) datasets focused on Adobe Acrobat and Photoshop products to help evaluate the performance of existing models on domain-specific product QA tasks. Additionally, we propose a novel knowledge-driven RAG-QA framework to enhance the performance of the models in the product QA task. Our experiments demonstrated that inducing domain knowledge through query reformulation allowed for increased retrieval and generative performance when compared to standard RAG-QA methods. This improvement, however, is slight, and thus illustrates the challenge posed by the datasets introduced., Comment: Accepted at the ACL 2024 Workshop on Knowledge Augmented Methods for NLP
Published: 2024

14. Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models

Author: Nguyen, Minh, Dernoncourt, Franck, Yoon, Seunghyun, Deilamsalehy, Hanieh, Tan, Hao, Rossi, Ryan, Tran, Quan Hung, Bui, Trung, and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language
Abstract: We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these gaps, we present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names. Through extensive experiments, our best model achieves a great precision of 80.3\%, setting a new benchmark for SpeakerID. The data and code are publicly available here: \url{https://github.com/adobe-research/speaker-identification}, Comment: accepted to INTERSPEECH 2024
Published: 2024

15. Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs

Author: Parmar, Mihir, Deilamsalehy, Hanieh, Dernoncourt, Franck, Yoon, Seunghyun, Rossi, Ryan A., and Bui, Trung
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Extractive summarization plays a pivotal role in natural language processing due to its wide-range applications in summarizing diverse content efficiently, while also being faithful to the original content. Despite significant advancement achieved in extractive summarization by Large Language Models (LLMs), these summaries frequently exhibit incoherence. An important aspect of the coherent summary is its readability for intended users. Although there have been many datasets and benchmarks proposed for creating coherent extractive summaries, none of them currently incorporate user intent to improve coherence in extractive summarization. Motivated by this, we propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback, offering valuable insights into how to improve coherence in extractive summaries. We utilize this dataset for aligning LLMs through supervised fine-tuning with natural language human feedback to enhance the coherence of their generated summaries. Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (~10% Rouge-L) in terms of producing coherent summaries. We further utilize human feedback to benchmark results over instruction-tuned models such as FLAN-T5 which resulted in several interesting findings. Data and source code are available at https://github.com/Mihir3009/Extract-AI., Comment: 10 pages
Published: 2024

16. LongLaMP: A Benchmark for Personalized Long-form Text Generation

Author: Kumar, Ishita, Viswanathan, Snigdha, Yerra, Sushrita, Salemi, Alireza, Rossi, Ryan A., Dernoncourt, Franck, Deilamsalehy, Hanieh, Chen, Xiang, Zhang, Ruiyi, Agarwal, Shubham, Lipka, Nedim, Van Nguyen, Chien, Nguyen, Thien Huu, and Zamani, Hamed
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Long-text generation is seemingly ubiquitous in real-world applications of large language models such as generating an email or writing a review. Despite the fundamental importance and prevalence of long-text generation in many practical applications, existing work on personalized generation has focused on the generation of very short text. To overcome these limitations, we study the problem of personalized long-text generation, that is, generating long-text that is personalized for a specific user while being practically useful for the vast majority of real-world applications that naturally require the generation of longer text. In this work, we demonstrate the importance of user-specific personalization for long-text generation tasks and develop the Long-text Language Model Personalization (LongLaMP) Benchmark. LongLaMP provides a comprehensive and diverse evaluation framework for personalized long-text generation. Extensive experiments on LongLaMP for zero-shot and fine-tuned language tasks demonstrate the effectiveness of the proposed benchmark and its utility for developing and evaluating techniques for personalized long-text generation across a wide variety of long-text generation tasks. The results highlight the importance of personalization across a wide variety of long-text generation tasks. Finally, we release the benchmark for others to use for this important problem.
Published: 2024

17. An Analysis of Multilingual FActScore

Author: Vu, Kim Trong, Krumdick, Michael, Reddy, Varshini, Dernoncourt, Franck, and Lai, Viet Dac
Subjects: Computer Science - Computation and Language
Abstract: FActScore has gained popularity as a metric to estimate the factuality of long-form texts generated by Large Language Models (LLMs) in English. However, there has not been any work in studying the behavior of FActScore in other languages. This paper studies the limitations of each component in the four-component pipeline of FActScore in the multilingual setting. We introduce a new dataset for FActScore on texts generated by strong multilingual LLMs. Our evaluation shows that LLMs exhibit distinct behaviors in both fact extraction and fact scoring tasks. No LLM produces consistent and reliable FActScore across languages with varying levels of resources. We also find that the knowledge source plays an important role in the quality of the estimated FActScore. Using Wikipedia as the knowledge source may hinder the true FActScore of long-form text due to its limited coverage in medium- and low-resource languages. We also incorporate three mitigations to our knowledge source that ultimately improve FActScore estimation across all languages.
Published: 2024

18. Large Generative Graph Models

Author: Wang, Yu, Rossi, Ryan A., Park, Namyong, Chen, Huiyuan, Ahmed, Nesreen K., Trivedi, Puja, Dernoncourt, Franck, Koutra, Danai, and Derr, Tyler
Subjects: Computer Science - Machine Learning
Abstract: Large Generative Models (LGMs) such as GPT, Stable Diffusion, Sora, and Suno are trained on a huge amount of language corpus, images, videos, and audio that are extremely diverse from numerous domains. This training paradigm over diverse well-curated data lies at the heart of generating creative and sensible content. However, all previous graph generative models (e.g., GraphRNN, MDVAE, MoFlow, GDSS, and DiGress) have been trained only on one dataset each time, which cannot replicate the revolutionary success achieved by LGMs in other fields. To remedy this crucial gap, we propose a new class of graph generative model called Large Graph Generative Model (LGGM) that is trained on a large corpus of graphs (over 5000 graphs) from 13 different domains. We empirically demonstrate that the pre-trained LGGM has superior zero-shot generative capability to existing graph generative models. Furthermore, our pre-trained LGGM can be easily fine-tuned with graphs from target domains and demonstrate even better performance than those directly trained from scratch, behaving as a solid starting point for real-world customization. Inspired by Stable Diffusion, we further equip LGGM with the capability to generate graphs given text prompts (Text-to-Graph), such as the description of the network name and domain (i.e., "The power-1138-bus graph represents a network of buses in a power distribution system."), and network statistics (i.e., "The graph has a low average degree, suitable for modeling social media interactions."). This Text-to-Graph capability integrates the extensive world knowledge in the underlying language model, offering users fine-grained control of the generated graphs. We release the code, the model checkpoint, and the datasets at https://lggm-lg.github.io/.
Published: 2024

19. ROAST: Review-level Opinion Aspect Sentiment Target Joint Detection for ABSA

Author: Chebolu, Siva Uday Sampreeth, Dernoncourt, Franck, Lipka, Nedim, and Solorio, Thamar
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Aspect-Based Sentiment Analysis (ABSA) has experienced tremendous expansion and diversity due to various shared tasks spanning several languages and fields and organized via SemEval workshops and Germeval. Nonetheless, a few shortcomings still need to be addressed, such as the lack of low-resource language evaluations and the emphasis on sentence-level analysis. To thoroughly assess ABSA techniques in the context of complete reviews, this research presents a novel task, Review-Level Opinion Aspect Sentiment Target (ROAST). ROAST seeks to close the gap between sentence-level and text-level ABSA by identifying every ABSA constituent at the review level. We extend the available datasets to enable ROAST, addressing the drawbacks noted in previous research by incorporating low-resource languages, numerous languages, and a variety of topics. Through this effort, ABSA research will be able to cover more ground and get a deeper comprehension of the task and its practical application in a variety of languages and domains (https://github.com/RiTUAL-UH/ROAST-ABSA)., Comment: arXiv admin note: text overlap with arXiv:2309.13297
Published: 2024

20. Retrieval Augmented Generation for Domain-specific Question Answering

Author: Sharma, Sanat, Yoon, David Seunghyun, Dernoncourt, Franck, Sultania, Dewang, Bagga, Karishma, Zhang, Mengjiao, Bui, Trung, and Kotte, Varun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we build an in-house question-answering system for Adobe products. We propose a novel framework to compile a large question-answer database and develop the approach for retrieval-aware finetuning of a Large Language model. We showcase that fine-tuning the retriever leads to major improvements in the final generation. Our overall approach reduces hallucinations during generation while keeping in context the latest retrieval information for contextual grounding., Comment: AAAI 2024 (Association for the Advancement of Artificial Intelligence) Scientific Document Understanding Workshop
Published: 2024

21. Scaling Up Video Summarization Pretraining with Large Language Models

Author: Argaw, Dawit Mureja, Yoon, Seunghyun, Heilbron, Fabian Caba, Deilamsalehy, Hanieh, Bui, Trung, Wang, Zhaowen, Dernoncourt, Franck, and Chung, Joon Son
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem. However, existing video summarization datasets are notably limited in their size, constraining the effectiveness of state-of-the-art methods for generalization. Our work aims to overcome this limitation by capitalizing on the abundance of long-form videos with dense speech-to-video alignment and the remarkable capabilities of recent large language models (LLMs) in summarizing long text. We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset using LLMs as Oracle summarizers. By leveraging the generated dataset, we analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them. To facilitate further research in the field, our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals. Extensive experiments clearly indicate that our proposed approach sets a new state-of-the-art in video summarization across several benchmarks., Comment: Accepted to CVPR 2024
Published: 2024

22. Fine-tuning CLIP Text Encoders with Two-step Paraphrasing

Author: Kim, Hyunjae, Yoon, Seunghyun, Bui, Trung, Zhao, Handong, Tran, Quan, Dernoncourt, Franck, and Kang, Jaewoo
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Contrastive language-image pre-training (CLIP) models have demonstrated considerable success across various vision-language tasks, such as text-to-image retrieval, where the model is required to effectively process natural language input to produce an accurate visual output. However, current models still face limitations in dealing with linguistic variations in input queries, such as paraphrases, making it challenging to handle a broad range of user queries in real-world applications. In this study, we introduce a straightforward fine-tuning approach to enhance the representations of CLIP models for paraphrases. Our approach involves a two-step paraphrase generation process, where we automatically create two categories of paraphrases from web-scale image captions by leveraging large language models. Subsequently, we fine-tune the CLIP text encoder using these generated paraphrases while freezing the image encoder. Our resulting model, which we call ParaCLIP, exhibits significant improvements over baseline CLIP models across various tasks, including paraphrased retrieval (with rank similarity scores improved by up to 2.0% and 5.6%), Visual Genome Relation and Attribution, as well as seven semantic textual similarity tasks., Comment: EACL 2024 (Findings of the ACL)
Published: 2024

23. Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes

Author: Gallegos, Isabel O., Rossi, Ryan A., Barrow, Joe, Tanjim, Md Mehrab, Yu, Tong, Deilamsalehy, Hanieh, Zhang, Ruiyi, Kim, Sungchul, and Dernoncourt, Franck
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: Large language models (LLMs) have shown remarkable advances in language generation and understanding but are also prone to exhibiting harmful social biases. While recognition of these behaviors has generated an abundance of bias mitigation techniques, most require modifications to the training data, model parameters, or decoding strategy, which may be infeasible without access to a trainable model. In this work, we leverage the zero-shot capabilities of LLMs to reduce stereotyping in a technique we introduce as zero-shot self-debiasing. With two approaches, self-debiasing via explanation and self-debiasing via reprompting, we show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups while relying only on the LLM itself and a simple prompt, with explanations correctly identifying invalid assumptions and reprompting delivering the greatest reductions in bias. We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
Published: 2024

24. Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Author: Xing, Linzi, Tran, Quan, Caba, Fabian, Dernoncourt, Franck, Yoon, Seunghyun, Wang, Zhaowen, Bui, Trung, and Carenini, Giuseppe
Subjects: Computer Science - Multimedia, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Video topic segmentation unveils the coarse-grained semantic structure underlying videos and is essential for other video understanding tasks. Given the recent surge in multi-modal, relying solely on a single modality is arguably insufficient. On the other hand, prior solutions for similar tasks like video scene/shot segmentation cater to short videos with clear visual shifts but falter for long videos with subtle changes, such as livestreams. In this paper, we introduce a multi-modal video topic segmenter that utilizes both video transcripts and frames, bolstered by a cross-modal attention mechanism. Furthermore, we propose a dual-contrastive learning framework adhering to the unsupervised domain adaptation paradigm, enhancing our model's adaptability to longer, more semantically complex videos. Experiments on short and long video corpora demonstrate that our proposed solution, significantly surpasses baseline methods in terms of both accuracy and transferability, in both intra- and cross-domain settings., Comment: Accepted at the 30th International Conference on Multimedia Modeling (MMM 2024)
Published: 2023

25. Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with Weak Supervision on Sentence Classification

Author: Deng, Zhongfen, Yoon, Seunghyun, Bui, Trung, Dernoncourt, Franck, Tran, Quan Hung, Liu, Shuaiqi, Zhao, Wenting, Zhang, Tao, Wang, Yibo, and Yu, Philip S.
Subjects: Computer Science - Computation and Language
Abstract: Aspect-based meeting transcript summarization aims to produce multiple summaries, each focusing on one aspect of content in a meeting transcript. It is challenging as sentences related to different aspects can mingle together, and those relevant to a specific aspect can be scattered throughout the long transcript of a meeting. The traditional summarization methods produce one summary mixing information of all aspects, which cannot deal with the above challenges of aspect-based meeting transcript summarization. In this paper, we propose a two-stage method for aspect-based meeting transcript summarization. To select the input content related to specific aspects, we train a sentence classifier on a dataset constructed from the AMI corpus with pseudo-labeling. Then we merge the sentences selected for a specific aspect as the input for the summarizer to produce the aspect-based summary. Experimental results on the AMI corpus outperform many strong baselines, which verifies the effectiveness of our proposed method., Comment: Accepted by 2023 IEEE International Conference on Big Data
Published: 2023

26. OATS: Opinion Aspect Target Sentiment Quadruple Extraction Dataset for Aspect-Based Sentiment Analysis

Author: Chebolu, Siva Uday Sampreeth, Dernoncourt, Franck, Lipka, Nedim, and Solorio, Thamar
Subjects: Computer Science - Computation and Language
Abstract: Aspect-based sentiment analysis (ABSA) delves into understanding sentiments specific to distinct elements within a user-generated review. It aims to analyze user-generated reviews to determine a) the target entity being reviewed, b) the high-level aspect to which it belongs, c) the sentiment words used to express the opinion, and d) the sentiment expressed toward the targets and the aspects. While various benchmark datasets have fostered advancements in ABSA, they often come with domain limitations and data granularity challenges. Addressing these, we introduce the OATS dataset, which encompasses three fresh domains and consists of 27,470 sentence-level quadruples and 17,092 review-level tuples. Our initiative seeks to bridge specific observed gaps: the recurrent focus on familiar domains like restaurants and laptops, limited data for intricate quadruple extraction tasks, and an occasional oversight of the synergy between sentence and review-level sentiments. Moreover, to elucidate OATS's potential and shed light on various ABSA subtasks that OATS can solve, we conducted experiments, establishing initial baselines. We hope the OATS dataset augments current resources, paving the way for an encompassing exploration of ABSA (https://github.com/RiTUAL-UH/OATS-ABSA)., Comment: Accepted in COLING/LREC-2024. Camera Ready submission
Published: 2023

27. CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Author: Nguyen, Thuat, Van Nguyen, Chien, Lai, Viet Dac, Man, Hieu, Ngo, Nghia Trung, Dernoncourt, Franck, Rossi, Ryan A., and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, especially the recent state-of-the-art models, they are often not fully disclosed. Creating training data for high-performing LLMs involves extensive cleaning and deduplication to ensure the necessary level of quality. The lack of transparency for training data has thus hampered research on attributing and addressing hallucination and bias issues in LLMs, hindering replication efforts and further advancements in the community. These challenges become even more pronounced in multilingual learning scenarios, where the available multilingual text datasets are often inadequately collected and cleaned. Consequently, there is a lack of open-source and readily usable dataset to effectively train LLMs in multiple languages. To overcome this issue, we present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages, tailored for LLM development. Our dataset undergoes meticulous cleaning and deduplication through a rigorous pipeline of multiple stages to accomplish the best quality for model training, including language identification, URL-based filtering, metric-based cleaning, document refinement, and data deduplication. CulturaX is fully released to the public in HuggingFace to facilitate research and advancements in multilingual LLMs: https://huggingface.co/datasets/uonlp/CulturaX., Comment: Ongoing Work
Published: 2023

28. PDFTriage: Question Answering over Long, Structured Documents

Author: Saad-Falcon, Jon, Barrow, Joe, Siu, Alexa, Nenkova, Ani, Yoon, David Seunghyun, Rossi, Ryan A., and Dernoncourt, Franck
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) have issues with document question answering (QA) in situations where the document is unable to fit in the small context length of an LLM. To overcome this issue, most existing works focus on retrieving the relevant context from the document, representing them as plain text. However, documents such as PDFs, web pages, and presentations are naturally structured with different pages, tables, sections, and so on. Representing such structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. When a system has to query the document for context, this incongruity is brought to the fore, and seemingly trivial questions can trip up the QA system. To bridge this fundamental gap in handling structured documents, we propose an approach called PDFTriage that enables models to retrieve the context based on either structure or content. Our experiments demonstrate the effectiveness of the proposed PDFTriage-augmented models across several classes of questions where existing retrieval-augmented LLMs fail. To facilitate further research on this fundamental problem, we release our benchmark dataset consisting of 900+ human-generated questions over 80 structured documents from 10 different categories of question types for document QA. Our code and datasets will be released soon on Github.
Published: 2023

29. Multilingual Sentence-Level Semantic Search using Meta-Distillation Learning

Author: M'hamdi, Meryem, May, Jonathan, Dernoncourt, Franck, Bui, Trung, and Yoon, Seunghyun
Subjects: Computer Science - Computation and Language
Abstract: Multilingual semantic search is the task of retrieving relevant contents to a query expressed in different language combinations. This requires a better semantic understanding of the user's intent and its contextual meaning. Multilingual semantic search is less explored and more challenging than its monolingual or bilingual counterparts, due to the lack of multilingual parallel resources for this task and the need to circumvent "language bias". In this work, we propose an alignment approach: MAML-Align, specifically for low-resource scenarios. Our approach leverages meta-distillation learning based on MAML, an optimization-based Model-Agnostic Meta-Learner. MAML-Align distills knowledge from a Teacher meta-transfer model T-MAML, specialized in transferring from monolingual to bilingual semantic search, to a Student model S-MAML, which meta-transfers from bilingual to multilingual semantic search. To the best of our knowledge, we are the first to extend meta-distillation to a multilingual search application. Our empirical results show that on top of a strong baseline based on sentence transformers, our meta-distillation approach boosts the gains provided by MAML and significantly outperforms naive fine-tuning methods. Furthermore, multilingual meta-distillation learning improves generalization even to unseen languages.
Published: 2023

30. TaleStream: Supporting Story Ideation with Trope Knowledge

Author: Chou, Jean-Peïc, Siu, Alexa F., Lipka, Nedim, Rossi, Ryan, Dernoncourt, Franck, and Agrawala, Maneesh
Subjects: Computer Science - Human-Computer Interaction, D.2.2, H.1.2, H.5.2
Abstract: Story ideation is a critical part of the story-writing process. It is challenging to support computationally due to its exploratory and subjective nature. Tropes, which are recurring narrative elements across stories, are essential in stories as they shape the structure of narratives and our understanding of them. In this paper, we propose to use tropes as an intermediate representation of stories to approach story ideation. We present TaleStream, a canvas system that uses tropes as building blocks of stories while providing steerable suggestions of story ideas in the form of tropes. Our trope suggestion methods leverage data from the tvtropes.org wiki. We find that 97% of the time, trope suggestions generated by our methods provide better story ideation materials than random tropes. Our system evaluation suggests that TaleStream can support writers' creative flow and greatly facilitates story development. Tropes, as a rich lexicon of narratives with available examples, play a key role in TaleStream and hold promise for story-creation support systems., Comment: 12 pages, 6 figures, 3 tables
Published: 2023
Full Text: View/download PDF

31. Bias and Fairness in Large Language Models: A Survey

Author: Gallegos, Isabel O., Rossi, Ryan A., Barrow, Joe, Tanjim, Md Mehrab, Kim, Sungchul, Dernoncourt, Franck, Yu, Tong, Zhang, Ruiyi, and Ahmed, Nesreen K.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs., Comment: Accepted at Computational Linguistics, Volume 50, Number 3
Published: 2023

32. Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Author: Lai, Viet Dac, Van Nguyen, Chien, Ngo, Nghia Trung, Nguyen, Thuat, Dernoncourt, Franck, Rossi, Ryan A., and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi.
Published: 2023

33. Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Author: Lai, Viet Dac, Salinas, Abel, Tan, Hao, Bui, Trung, Tran, Quan, Yoon, Seunghyun, Deilamsalehy, Hanieh, Dernoncourt, Franck, and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language
Abstract: Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability. While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap. The experiments show that our method achieves state-of-the-art performance on the ASR test set on two benchmark datasets for punctuation restoration., Comment: Accepted at INTERSPEECH 2023, 6 pages
Published: 2023

34. Learning Navigational Visual Representations with Semantic Map Supervision

Author: Hong, Yicong, Zhou, Yang, Zhang, Ruiyi, Dernoncourt, Franck, Bui, Trung, Gould, Stephen, and Tan, Hao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot. However, most existing works only employ visual backbones pre-trained either with independent images for classification or with self-supervised learning methods to adapt to the indoor navigation domain, neglecting the spatial relationships that are essential to the learning of navigation. Inspired by the behavior that humans naturally build semantically and spatially meaningful cognitive maps in their brains during navigation, in this paper, we propose a novel navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps (Ego$^2$-Map). We apply the visual transformer as the backbone encoder and train the model with data collected from the large-scale Habitat-Matterport3D environments. Ego$^2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation. Experiments show that agents using our learned representations on object-goal navigation outperform recent visual pre-training methods. Moreover, our representations significantly improve vision-and-language navigation in continuous environments for both high-level and low-level action spaces, achieving new state-of-the-art results of 47% SR and 41% SPL on the test server.
Published: 2023

35. Fairness-Aware Graph Neural Networks: A Survey

Author: Chen, April, Rossi, Ryan A., Park, Namyong, Trivedi, Puja, Wang, Yu, Yu, Tong, Kim, Sungchul, Dernoncourt, Franck, and Ahmed, Nesreen K.
Subjects: Computer Science - Machine Learning, Computer Science - Information Retrieval, Computer Science - Social and Information Networks
Abstract: Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data and the fundamental aggregation mechanism that lies at the heart of the large class of GNN models. In this article, we examine and categorize fairness techniques for improving the fairness of GNNs. Previous work on fair GNN models and techniques are discussed in terms of whether they focus on improving fairness during a preprocessing step, during training, or in a post-processing phase. Furthermore, we discuss how such techniques can be used together whenever appropriate, and highlight the advantages and intuition as well. We also introduce an intuitive taxonomy for fairness evaluation metrics including graph-level fairness, neighborhood-level fairness, embedding-level fairness, and prediction-level fairness metrics. In addition, graph datasets that are useful for benchmarking the fairness of GNN models are summarized succinctly. Finally, we highlight key open problems and challenges that remain to be addressed.
Published: 2023

36. Efficient Spoken Language Recognition via Multilabel Classification

Author: Nieto, Oriol, Jin, Zeyu, Dernoncourt, Franck, and Salamon, Justin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Spoken language recognition (SLR) is the task of automatically identifying the language present in a speech signal. Existing SLR models are either too computationally expensive or too large to run effectively on devices with limited resources. For real-world deployment, a model should also gracefully handle unseen languages outside of the target language set, yet prior work has focused on closed-set classification where all input languages are known a-priori. In this paper we address these two limitations: we explore efficient model architectures for SLR based on convolutional networks, and propose a multilabel training strategy to handle non-target languages at inference time. Using the VoxLingua107 dataset, we show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods, and that our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification., Comment: Accepted to InterSpeech 2023
Published: 2023

37. MeetingBank: A Benchmark Dataset for Meeting Summarization

Author: Hu, Yebowen, Ganter, Tim, Deilamsalehy, Hanieh, Dernoncourt, Franck, Foroosh, Hassan, and Liu, Fei
Subjects: Computer Science - Computation and Language
Abstract: As the number of recorded meetings increases, it becomes increasingly important to utilize summarization technology to create useful summaries of these recordings. However, there is a crucial lack of annotated meeting corpora for developing this technology, as it can be hard to collect meetings, especially when the topics discussed are confidential. Furthermore, meeting summaries written by experienced writers are scarce, making it hard for abstractive summarizers to produce sensible output without a reliable reference. This lack of annotated corpora has hindered the development of meeting summarization technology. In this paper, we present MeetingBank, a new benchmark dataset of city council meetings over the past decade. MeetingBank is unique among other meeting corpora due to its divide-and-conquer approach, which involves dividing professionally written meeting minutes into shorter passages and aligning them with specific segments of the meeting. This breaks down the process of summarizing a lengthy meeting into smaller, more manageable tasks. The dataset provides a new testbed of various meeting summarization systems and also allows the public to gain insight into how council decisions are made. We make the collection, including meeting video links, transcripts, reference summaries, agenda, and other metadata, publicly available to facilitate the development of better meeting summarization techniques. Our dataset can be accessed at: https://meetingbank.github.io, Comment: ACL 2023 Long Paper
Published: 2023

38. ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning

Author: Lai, Viet Dac, Ngo, Nghia Trung, Veyseh, Amir Pouran Ben, Man, Hieu, Dernoncourt, Franck, Bui, Trung, and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Over the last few years, large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP) that fundamentally transform research and developments in the field. ChatGPT represents one of the most exciting LLM systems developed recently to showcase impressive skills for language generation and highly attract public attention. Among various exciting applications discovered for ChatGPT in English, the model can process and generate texts for multiple languages due to its multilingual training data. Given the broad adoption of ChatGPT for English in different problems and areas, a natural question is whether ChatGPT can also be applied effectively for other languages or it is necessary to develop more language-specific technologies. The answer to this question requires a thorough evaluation of ChatGPT over multiple tasks with diverse languages and large datasets (i.e., beyond reported anecdotes), which is still missing or limited in current research. Our work aims to fill this gap for the evaluation of ChatGPT and similar LLMs to provide more comprehensive information for multilingual NLP applications. While this work will be an ongoing effort to include additional experiments in the future, our current paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources. We also focus on the zero-shot learning setting for ChatGPT to improve reproducibility and better simulate the interactions of general users. Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages, calling for further research to develop better models and understanding for multilingual learning.
Published: 2023

39. Envisioning the Next-Gen Document Reader

Author: Yeh, Catherine, Lipka, Nedim, and Dernoncourt, Franck
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: People read digital documents on a daily basis to share, exchange, and understand information in electronic settings. However, current document readers create a static, isolated reading experience, which does not support users' goals of gaining more knowledge and performing additional tasks through document interaction. In this work, we present our vision for the next-gen document reader that strives to enhance user understanding and create a more connected, trustworthy information experience. We describe 18 NLP-powered features to add to existing document readers and propose a novel plug-in marketplace that allows users to further customize their reading experience, as demonstrated through 3 exploratory UI prototypes available at https://github.com/catherinesyeh/nextgen-prototypes, Comment: Paper accepted at the AAAI 2023 Workshop on Scientific Document Understanding
Published: 2023

40. Curriculum-Guided Abstractive Summarization

Author: Sotudeh, Sajad, Deilamsalehy, Hanieh, Dernoncourt, Franck, and Goharian, Nazli
Subjects: Computer Science - Computation and Language
Abstract: Recent Transformer-based summarization models have provided a promising approach to abstractive summarization. They go beyond sentence selection and extractive strategies to deal with more complicated tasks such as novel word generation and sentence paraphrasing. Nonetheless, these models have two shortcomings: (1) they often perform poorly in content selection, and (2) their training strategy is not quite efficient, which restricts model performance. In this paper, we explore two orthogonal ways to compensate for these pitfalls. First, we augment the Transformer network with a sentence cross-attention module in the decoder, encouraging more abstraction of salient content. Second, we include a curriculum learning approach to reweight the training samples, bringing about an efficient learning procedure. Our second approach to enhance the training strategy of Transformers networks makes stronger gains as compared to the first approach. We apply our model on extreme summarization dataset of Reddit TIFU posts. We further look into three cross-domain summarization datasets (Webis-TLDR-17, CNN/DM, and XSum), measuring the efficacy of curriculum learning when applied in summarization. Moreover, a human evaluation is conducted to show the efficacy of the proposed method in terms of qualitative criteria, namely, fluency, informativeness, and overall quality., Comment: 8 pages, Long paper. arXiv admin note: text overlap with arXiv:2302.00954
Published: 2023

41. Curriculum-guided Abstractive Summarization for Mental Health Online Posts

Author: Sotudeh, Sajad, Goharian, Nazli, Deilamsalehy, Hanieh, and Dernoncourt, Franck
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Automatically generating short summaries from users' online mental health posts could save counselors' reading time and reduce their fatigue so that they can provide timely responses to those seeking help for improving their mental state. Recent Transformers-based summarization models have presented a promising approach to abstractive summarization. They go beyond sentence selection and extractive strategies to deal with more complicated tasks such as novel word generation and sentence paraphrasing. Nonetheless, these models have a prominent shortcoming; their training strategy is not quite efficient, which restricts the model's performance. In this paper, we include a curriculum learning approach to reweigh the training samples, bringing about an efficient learning procedure. We apply our model on extreme summarization dataset of MentSum posts -- a dataset of mental health related posts from Reddit social media. Compared to the state-of-the-art model, our proposed method makes substantial gains in terms of Rouge and Bertscore evaluation metrics, yielding 3.5% (Rouge-1), 10.4% (Rouge-2), and 4.7% (Rouge-L), 1.5% (Bertscore) relative improvements., Comment: 4 pages, short paper, accepted to The 13th International Workshop on Health Text Mining and Information Analysis (LOUHI 2022)
Published: 2023

42. Multi-modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Author: Xing, Linzi, Tran, Quan, Caba, Fabian, Dernoncourt, Franck, Yoon, Seunghyun, Wang, Zhaowen, Bui, Trung, Carenini, Giuseppe, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rudinac, Stevan, editor, Hanjalic, Alan, editor, Liem, Cynthia, editor, Worring, Marcel, editor, Jónsson, Björn Þór, editor, Liu, Bei, editor, and Yamakata, Yoko, editor
Published: 2024
Full Text: View/download PDF

43. MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection

Author: Veyseh, Amir Pouran Ben, Van Nguyen, Minh, Dernoncourt, Franck, and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language
Abstract: Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text. Despite considerable research efforts in recent years for English text, the task of ED in other languages has been significantly less explored. Switching to non-English languages, important research questions for ED include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages. To answer those questions, it is crucial to obtain multilingual ED datasets that provide consistent event annotation for multiple languages. There exist some multilingual ED datasets; however, they tend to cover a handful of languages and mainly focus on popular ones. Many languages are not covered in existing multilingual ED datasets. In addition, the current datasets are often small and not accessible to the public. To overcome those shortcomings, we introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages; 5 of them have not been supported by existing multilingual datasets. We also perform extensive experiments and analysis to demonstrate the challenges and transferability of ED across languages in MINION that in all call for more research effort in this area., Comment: Accepted at NAACL 2022
Published: 2022

44. MEE: A Novel Multilingual Event Extraction Dataset

Author: Veyseh, Amir Pouran Ben, Ebrahimi, Javid, Dernoncourt, Franck, and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language
Abstract: Event Extraction (EE) is one of the fundamental tasks in Information Extraction (IE) that aims to recognize event mentions and their arguments (i.e., participants) from text. Due to its importance, extensive methods and resources have been developed for Event Extraction. However, one limitation of current research for EE involves the under-exploration for non-English languages in which the lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance. To address this limitation, we propose a novel Multilingual Event Extraction dataset (MEE) that provides annotation for more than 50K event mentions in 8 typologically different languages. MEE comprehensively annotates data for entity mentions, event triggers and event arguments. We conduct extensive experiments on the proposed dataset to reveal challenges and opportunities for multilingual EE., Comment: Accepted at EMNLP 2022
Published: 2022

45. User-Entity Differential Privacy in Learning Natural Language Models

Author: Lai, Phung, Phan, NhatHai, Sun, Tong, Jain, Rajiv, Dernoncourt, Franck, Gu, Jiuxiang, and Barmpalios, Nikolaos
Subjects: Computer Science - Cryptography and Security, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs). To preserve UeDP, we developed a novel algorithm, called UeDP-Alg, optimizing the trade-off between privacy loss and model utility with a tight sensitivity bound derived from seamlessly combining user and sensitive entity sampling processes. An extensive theoretical analysis and evaluation show that our UeDP-Alg outperforms baseline approaches in model utility under the same privacy budget consumption on several NLM tasks, using benchmark datasets., Comment: Accepted at IEEE BigData 2022
Published: 2022

46. LiveSeg: Unsupervised Multimodal Temporal Segmentation of Long Livestream Videos

Author: Qiu, Jielin, Dernoncourt, Franck, Bui, Trung, Wang, Zhaowen, Zhao, Ding, and Jin, Hailin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Livestream videos have become a significant part of online learning, where design, digital marketing, creative painting, and other skills are taught by experienced experts in the sessions, making them valuable materials. However, Livestream tutorial videos are usually hours long, recorded, and uploaded to the Internet directly after the live sessions, making it hard for other people to catch up quickly. An outline will be a beneficial solution, which requires the video to be temporally segmented according to topics. In this work, we introduced a large Livestream video dataset named MultiLive, and formulated the temporal segmentation of the long Livestream videos (TSLLV) task. We propose LiveSeg, an unsupervised Livestream video temporal Segmentation solution, which takes advantage of multimodal features from different domains. Our method achieved a $16.8\%$ F1-score performance improvement compared with the state-of-the-art method.
Published: 2022

47. Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment

Author: Qiu, Jielin, Zhu, Jiacheng, Xu, Mengdi, Dernoncourt, Franck, Bui, Trung, Wang, Zhaowen, Li, Bo, Zhao, Ding, and Jin, Hailin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimedia summarization with multimodal output (MSMO) is a recently explored application in language grounding. It plays an essential role in real-world applications, i.e., automatically generating cover images and titles for news articles or providing introductions to online videos. However, existing methods extract features from the whole video and article and use fusion methods to select the representative one, thus usually ignoring the critical structure and varying semantics. In this work, we propose a Semantics-Consistent Cross-domain Summarization (SCCS) model based on optimal transport alignment with visual and textual segmentation. In specific, our method first decomposes both video and article into segments in order to capture the structural semantics, respectively. Then SCCS follows a cross-domain alignment objective with optimal transport distance, which leverages multimodal interaction to match and select the visual and textual summary. We evaluated our method on three recent multimodal datasets and demonstrated the effectiveness of our method in producing high-quality multimodal summaries.
Published: 2022

48. Medical Question Understanding and Answering with Knowledge Grounding and Semantic Self-Supervision

Author: Mrini, Khalil, Singh, Harpreet, Dernoncourt, Franck, Yoon, Seunghyun, Bui, Trung, Chang, Walter, Farcas, Emilia, and Nakashole, Ndapa
Subjects: Computer Science - Computation and Language
Abstract: Current medical question answering systems have difficulty processing long, detailed and informally worded questions submitted by patients, called Consumer Health Questions (CHQs). To address this issue, we introduce a medical question understanding and answering system with knowledge grounding and semantic self-supervision. Our system is a pipeline that first summarizes a long, medical, user-written question, using a supervised summarization loss. Then, our system performs a two-step retrieval to return answers. The system first matches the summarized user question with an FAQ from a trusted medical knowledge base, and then retrieves a fixed number of relevant sentences from the corresponding answer document. In the absence of labels for question matching or answer relevance, we design 3 novel, self-supervised and semantically-guided losses. We evaluate our model against two strong retrieval-based question answering baselines. Evaluators ask their own questions and rate the answers retrieved by our baselines and own system according to their relevance. They find that our system retrieves more relevant answers, while achieving speeds 20 times faster. Our self-supervised losses also help the summarizer achieve higher scores in ROUGE, as well as in human evaluation metrics. We release our code to encourage further research., Comment: Accepted as Main Conference Long paper at COLING 2022
Published: 2022

49. Tutorial Recommendation for Livestream Videos using Discourse-Level Consistency and Ontology-Based Filtering

Author: Veyseh, Amir Pouran Ben, Dernoncourt, Franck, and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language
Abstract: Streaming videos is one of the methods for creators to share their creative works with their audience. In these videos, the streamer share how they achieve their final objective by using various tools in one or several programs for creative projects. To this end, the steps required to achieve the final goal can be discussed. As such, these videos could provide substantial educational content that can be used to learn how to employ the tools used by the streamer. However, one of the drawbacks is that the streamer might not provide enough details for every step. Therefore, for the learners, it might be difficult to catch up with all the steps. In order to alleviate this issue, one solution is to link the streaming videos with the relevant tutorial available for the tools used in the streaming video. More specifically, a system can analyze the content of the live streaming video and recommend the most relevant tutorials. Since the existing document recommendation models cannot handle this situation, in this work, we present a novel dataset and model for the task of tutorial recommendation for live-streamed videos. We conduct extensive analyses on the proposed dataset and models, revealing the challenging nature of this task.
Published: 2022

50. Improving Keyphrase Extraction with Data Augmentation and Information Filtering

Author: Veyseh, Amir Pouran Ben, Meister, Nicole, Dernoncourt, Franck, and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language
Abstract: Keyphrase extraction is one of the essential tasks for document understanding in NLP. While the majority of the prior works are dedicated to the formal setting, e.g., books, news or web-blogs, informal texts such as video transcripts are less explored. To address this limitation, in this work we present a novel corpus and method for keyphrase extraction from the transcripts of the videos streamed on the Behance platform. More specifically, in this work, a novel data augmentation is proposed to enrich the model with the background knowledge about the keyphrase extraction task from other domains. Extensive experiments on the proposed dataset dataset show the effectiveness of the introduced method.
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

450 results on '"Dernoncourt, Franck"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources