Author: "Hwang, Sung" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hwang, Sung"' showing total 9,766 results

Start Over Author "Hwang, Sung"

9,766 results on '"Hwang, Sung"'

1. Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection in Language Models

Author: Kang, Minki, Hwang, Sung Ju, Lee, Gibbeum, and Cho, Jaewoong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: As Large Language Models (LLMs) are increasingly deployed in specialized domains with continuously evolving knowledge, the need for timely and precise knowledge injection has become essential. Fine-tuning with paraphrased data is a common approach to enhance knowledge injection, yet it faces two significant challenges: high computational costs due to repetitive external model usage and limited sample diversity. To this end, we introduce LaPael, a latent-level paraphrasing method that applies input-dependent noise to early LLM layers. This approach enables diverse and semantically consistent augmentations directly within the model. Furthermore, it eliminates the recurring costs of paraphrase generation for each knowledge update. Our extensive experiments on question-answering benchmarks demonstrate that LaPael improves knowledge injection over standard fine-tuning and existing noise-based approaches. Additionally, combining LaPael with data-level paraphrasing further enhances performance., Comment: NeurIPS 2024
Published: 2024

2. Rethinking Code Refinement: Learning to Judge Code Efficiency

Author: Seo, Minju, Baek, Jinheon, and Hwang, Sung Ju
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in understanding and generating codes. Due to these capabilities, many recent methods are proposed to automatically refine the codes with LLMs. However, we should rethink that the refined codes (from LLMs and even humans) are not always more efficient than their original versions. On the other hand, running two different versions of codes and comparing them every time is not ideal and time-consuming. Therefore, in this work, we propose a novel method based on the code language model that is trained to judge the efficiency between two different codes (generated across humans and machines) by either classifying the superior one or predicting the relative improvement. We validate our method on multiple programming languages with multiple refinement steps, demonstrating that the proposed method can effectively distinguish between more and less efficient versions of code.
Published: 2024

3. AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Author: Trirat, Patara, Jeong, Wonyong, and Hwang, Sung Ju
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Multiagent Systems
Abstract: Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline, such as optimal model search and hyperparameter tuning. Existing AutoML systems often require technical expertise to set up complex tools, which is in general time-consuming and requires a large amount of human effort. Therefore, recent works have started exploiting large language models (LLM) to lessen such burden and increase the usability of AutoML frameworks via a natural language interface, allowing non-expert users to build their data-driven solutions. These methods, however, are usually designed only for a particular process in the AI development pipeline and do not efficiently use the inherent capacity of the LLMs. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML, i.e., from data retrieval to model deployment. AutoML-Agent takes user's task descriptions, facilitates collaboration between specialized LLM agents, and delivers deployment-ready models. Unlike existing work, instead of devising a single plan, we introduce a retrieval-augmented planning strategy to enhance exploration to search for more optimal plans. We also decompose each plan into sub-tasks (e.g., data preprocessing and neural network design) each of which is solved by a specialized agent we build via prompting executing in parallel, making the search process more efficient. Moreover, we propose a multi-stage verification to verify executed results and guide the code generation LLM in implementing successful solutions. Extensive experiments on seven downstream tasks using fourteen datasets show that AutoML-Agent achieves a higher success rate in automating the full AutoML process, yielding systems with good performance throughout the diverse domains., Comment: 47 pages, 5 figures
Published: 2024

4. Unified Multi-Modal Interleaved Document Representation for Information Retrieval

Author: Lee, Jaewoo, Ko, Joonho, Baek, Jinheon, Jeong, Soyeong, and Hwang, Sung Ju
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: Information Retrieval (IR) methods aim to identify relevant documents in response to a given query, which have gained remarkable attention due to their successful application in various natural language tasks. However, existing approaches typically consider only the textual information within the documents, which overlooks the fact that documents can contain multiple modalities, including texts, images, and tables. Further, they often segment each long document into multiple discrete passages for embedding, preventing them from capturing the overall document context and interactions between paragraphs. We argue that these two limitations lead to suboptimal document representations for retrieval. In this work, to address them, we aim to produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation. Moreover, to mitigate the information loss from segmenting documents into passages, instead of representing and retrieving passages individually, we further merge the representations of segmented passages into one single document representation, while we additionally introduce a reranking strategy to decouple and identify the relevant passage within the document if necessary. Then, through extensive experiments on diverse information retrieval scenarios considering both the textual and multimodal queries, we show that our approach substantially outperforms relevant baselines, thanks to the consideration of the multimodal information interleaved within the documents in a unified way., Comment: Preprint
Published: 2024

5. HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Author: Lee, Seanie, Seong, Haebin, Lee, Dong Bok, Kang, Minki, Chen, Xiaoyin, Wagner, Dominik, Bengio, Yoshua, Lee, Juho, and Hwang, Sung Ju
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Safety guard models that detect malicious queries aimed at large language models (LLMs) are essential for ensuring the secure and responsible deployment of LLMs in real-world applications. However, deploying existing safety guard models with billions of parameters alongside LLMs on mobile devices is impractical due to substantial memory requirements and latency. To reduce this cost, we distill a large teacher safety guard model into a smaller one using a labeled dataset of instruction-response pairs with binary harmfulness labels. Due to the limited diversity of harmful instructions in the existing labeled dataset, naively distilled models tend to underperform compared to larger models. To bridge the gap between small and large models, we propose HarmAug, a simple yet effective data augmentation method that involves jailbreaking an LLM and prompting it to generate harmful instructions. Given a prompt such as, "Make a single harmful instruction prompt that would elicit offensive content", we add an affirmative prefix (e.g., "I have an idea for a prompt:") to the LLM's response. This encourages the LLM to continue generating the rest of the response, leading to sampling harmful instructions. Another LLM generates a response to the harmful instruction, and the teacher model labels the instruction-response pair. We empirically show that our HarmAug outperforms other relevant baselines. Moreover, a 435-million-parameter safety guard model trained with HarmAug achieves an F1 score comparable to larger models with over 7 billion parameters, and even outperforms them in AUPRC, while operating at less than 25% of their computational cost.
Published: 2024

6. Randomized, double-blind, phase 1a single-ascending dose and food effect studies assessing safety and pharmacokinetics of EC5026 in healthy volunteers.

Author: Schmidt, William, Cortés-Puch, Irene, McReynolds, Cindy, Croston, Glenn, Hwang, Sung, Yang, Jun, Pedersen, Theresa, Wagner, Karen, Pham, Theresa, Hunt, Thomas, and Hammock, Bruce
Subjects: Humans, Adult, Male, Healthy Volunteers, Double-Blind Method, Female, Food-Drug Interactions, Young Adult, Middle Aged, Dose-Response Relationship, Drug, Epoxide Hydrolases, Fasting, Adolescent, Administration, Oral, Half-Life
Abstract: Chronic pain represents a significant unmet medical need, affecting one-fifth of the U.S. population. EC5026 is a small molecule inhibitor of the enzyme soluble epoxide hydrolase (sEH) which is being developed as a novel non-opioid, non-NSAID analgesic. EC5026 prolongs the action of epoxy fatty acids, endogenous analgesic lipid mediators that are rapidly metabolized by sEH. We evaluated the safety and pharmacokinetic profile of EC5026 in two phase I trials, a single-ascending dose (SAD) study and a fed-fasted study. The SAD study evaluated EC5026 doses ranging from 0.5 to 24 mg in healthy volunteers. EC5026 was well tolerated. No treatment-emergent adverse events were considered related to EC5026. No apparent treatment- or dose-related trends in laboratory results, vital signs, physical examinations, or electrocardiograms were observed. A linear, near-dose-proportional increase in exposure was observed with progressive doses in the SAD study; plasma exposure was below or near the lower limit of quantification after 0.5-2 mg doses. Mean half-lives ranged from 41.8 to 59.1 h. for doses of 8-24 mg, supporting a once-daily dosing regimen. In the fed-fasted study using 8 mg EC5026 tablets, higher peak concentrations (66%) and total exposures (53%) were observed under the fed condition. Plasma concentrations declined in a monoexponential manner with mean half-lives of 59.5 h. in the fed state and 66.9 h. in the fasted state. Future clinical trials using EC5026 for the treatment of pain are justified based on the favorable outcomes from both clinical trials along with preclinical evidence of analgesic activity.
Published: 2024

7. Optimizing Query Generation for Enhanced Document Retrieval in RAG

Author: Koo, Hamin, Kim, Minseon, and Hwang, Sung Ju
Subjects: Computer Science - Information Retrieval
Abstract: Large Language Models (LLMs) excel in various language tasks but they often generate incorrect information, a phenomenon known as "hallucinations". Retrieval-Augmented Generation (RAG) aims to mitigate this by using document retrieval for accurate responses. However, RAG still faces hallucinations due to vague queries. This study aims to improve RAG by optimizing query generation with a query-document alignment score, refining queries using LLMs for better precision and efficiency of document retrieval. Experiments have shown that our approach improves document retrieval, resulting in an average accuracy gain of 1.6%.
Published: 2024

8. One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

Author: Wang, Ruochen, An, Sohyun, Cheng, Minhao, Zhou, Tianyi, Hwang, Sung Ju, and Hsieh, Cho-Jui
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning, 68T01
Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction. Such simplification significantly limits their capacity, as a single demo-free instruction might not be able to cover the entire complex problem space of the targeted task. To alleviate this issue, we adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions; Each sub-region is governed by a specialized expert, equipped with both an instruction and a set of demos. A two-phase process is developed to construct the specialized expert for each region: (1) demo assignment: Inspired by the theoretical connection between in-context learning and kernel regression, we group demos into experts based on their semantic similarity; (2) instruction assignment: A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect. The resulting method, codenamed Mixture-of-Prompts (MoP), achieves an average win rate of 81% against prior arts across several major benchmarks., Comment: ICML 2024. code available at https://github.com/ruocwang/mixture-of-prompts
Published: 2024

9. Database-Augmented Query Representation for Information Retrieval

Author: Jeong, Soyeong, Baek, Jinheon, Cho, Sukmin, Hwang, Sung Ju, and Park, Jong C.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: Information retrieval models that aim to search for the documents relevant to the given query have shown many successes, which have been applied to diverse tasks. However, the query provided by the user is oftentimes very short, which challenges the retrievers to correctly fetch relevant documents. To tackle this, existing studies have proposed expanding the query with a couple of additional (user-related) features related to the query. Yet, they may be suboptimal to effectively augment the query, though there is plenty of information available to augment it in a relational database. Motivated by this, we present a novel retrieval framework called Database-Augmented Query representation (DAQu), which augments the original query with various (query-related) metadata across multiple tables. In addition, as the number of features in the metadata can be very large and there is no order among them, we encode them with our graph-based set encoding strategy, which considers hierarchies of features in the database without order. We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database, demonstrating that ours significantly enhances overall retrieval performance, compared to existing query augmentation methods.
Published: 2024

10. Training-Free Exponential Context Extension via Cascading KV Cache

Author: Willette, Jeffrey, Lee, Heejun, Lee, Youngwan, Jeon, Myeongjae, and Hwang, Sung Ju
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The transformer's context window is vital for tasks such as few-shot learning and conditional generation as it preserves previous tokens for active memory. However, as the context lengths increase, the computational costs grow quadratically, hindering the deployment of large language models (LLMs) in real-world, long sequence scenarios. Although some recent key-value caching (KV Cache) methods offer linear inference complexity, they naively manage the stored context, prematurely evicting tokens and losing valuable information. Moreover, they lack an optimized prefill/prompt stage strategy, resulting in higher latency than even quadratic attention for realistic context sizes. In response, we introduce a novel mechanism that leverages cascading sub-cache buffers to selectively retain the most relevant tokens, enabling the model to maintain longer context histories without increasing the cache size. Our approach outperforms linear caching baselines across key benchmarks, including streaming perplexity, question answering, book summarization, and passkey retrieval, where it retains better retrieval accuracy at 1M tokens after four doublings of the cache size of 65K. Additionally, our method reduces prefill stage latency by a factor of 6.8 when compared to flash attention on 1M tokens. These innovations not only enhance the computational efficiency of LLMs but also pave the way for their effective deployment in resource-constrained environments, enabling large-scale, real-time applications with significantly reduced latency.
Published: 2024

11. Concept-skill Transferability-based Data Selection for Large Vision-Language Models

Author: Lee, Jaewoo, Li, Boyang, and Hwang, Sung Ju
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Instruction tuning, or supervised finetuning on extensive task-specific data, is necessary for Large Vision-Language Models (LVLMs) to generalize well across a broad range of vision-language (VL) tasks. However, training on large VL datasets can become prohibitively expensive. In this work, we introduce COINCIDE, an effective and scalable data selection technique that uses a small model as a reference model to select visual instruction tuning data for efficient finetuning of a target LVLM, focusing on diversity and transferability. Specifically, we cluster the training data using internal activations from a small model, which identifies VL concept-skill compositions needed by a target LVLM. We then sample data from these diverse clusters by considering their density and transferability, or the ability to transfer well to other concept-skill compositions. This approach ensures the diversity of these compositions, which is vital for LVLM generalization. Extensive experiments demonstrate that COINCIDE achieves superior performance and data selection efficiency against 8 strong baselines on two distinct datasets: LLaVA-1.5 and Vision-Flan. Using only 20% of the LLaVA-1.5 dataset, COINCIDE achieves performance comparable to the LVLM finetuned on the whole dataset, with 70% reduction of the wall-clock running time. On the Vision-Flan dataset, our method achieves superior results with only 16.7% of the training data., Comment: EMNLP 2024
Published: 2024

12. A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attention

Author: Lee, Heejun, Park, Geon, Lee, Youngwan, Suh, Jaduk, Kim, Jina, Jeong, Wonyoung, Kim, Bumsik, Lee, Hyemin, Jeon, Myeongjae, and Hwang, Sung Ju
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning
Abstract: In modern large language models (LLMs), increasing the context length is crucial for improving comprehension and coherence in long-context, multi-modal, and retrieval-augmented language generation. While many recent transformer models attempt to extend their context length over a million tokens, they remain impractical due to the quadratic time and space complexities. Although recent works on linear and sparse attention mechanisms can achieve this goal, their real-world applicability is often limited by the need to re-train from scratch and significantly worse performance. In response, we propose a novel approach, Hierarchically Pruned Attention (HiP), which reduces the time complexity of the attention mechanism to $O(T \log T)$ and the space complexity to $O(T)$, where $T$ is the sequence length. We notice a pattern in the attention scores of pretrained LLMs where tokens close together tend to have similar scores, which we call ``attention locality''. Based on this observation, we utilize a novel tree-search-like algorithm that estimates the top-$k$ key tokens for a given query on the fly, which is mathematically guaranteed to have better performance than random attention pruning. In addition to improving the time complexity of the attention mechanism, we further optimize GPU memory usage by implementing KV cache offloading, which stores only $O(\log T)$ tokens on the GPU while maintaining similar decoding throughput. Experiments on benchmarks show that HiP, with its training-free nature, significantly reduces both prefill and decoding latencies, as well as memory usage, while maintaining high-quality generation with minimal degradation. HiP enables pretrained LLMs to scale up to millions of tokens on commodity GPUs, potentially unlocking long-context LLM applications previously deemed infeasible., Comment: 44 pages
Published: 2024

13. Visualizing the loss landscape of Self-supervised Vision Transformer

Author: Lee, Youngwan, Willette, Jeffrey Ryan, Kim, Jonghee, and Hwang, Sung Ju
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: The Masked autoencoder (MAE) has drawn attention as a representative self-supervised approach for masked image modeling with vision transformers. However, even though MAE shows better generalization capability than fully supervised training from scratch, the reason why has not been explored. In another line of work, the Reconstruction Consistent Masked Auto Encoder (RC-MAE), has been proposed which adopts a self-distillation scheme in the form of an exponential moving average (EMA) teacher into MAE, and it has been shown that the EMA-teacher performs a conditional gradient correction during optimization. To further investigate the reason for better generalization of the self-supervised ViT when trained by MAE (MAE-ViT) and the effect of the gradient correction of RC-MAE from the perspective of optimization, we visualize the loss landscapes of the self-supervised vision transformer by both MAE and RC-MAE and compare them with the supervised ViT (Sup-ViT). Unlike previous loss landscape visualizations of neural networks based on classification task loss, we visualize the loss landscape of ViT by computing pre-training task loss. Through the lens of loss landscapes, we find two interesting observations: (1) MAE-ViT has a smoother and wider overall loss curvature than Sup-ViT. (2) The EMA-teacher allows MAE to widen the region of convexity in both pretraining and linear probing, leading to quicker convergence. To the best of our knowledge, this work is the first to investigate the self-supervised ViT through the lens of the loss landscape., Comment: NeurIPS 2023 Workshop: Self-Supervised Learning - Theory and Practice
Published: 2024

14. Cost-Sensitive Multi-Fidelity Bayesian Optimization with Transfer of Learning Curve Extrapolation

Author: Lee, Dong Bok, Zhang, Aoxuan Silvia, Kim, Byungjoo, Park, Junhyeon, Lee, Juho, Hwang, Sung Ju, and Lee, Hae Beom
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: In this paper, we address the problem of cost-sensitive multi-fidelity Bayesian Optimization (BO) for efficient hyperparameter optimization (HPO). Specifically, we assume a scenario where users want to early-stop the BO when the performance improvement is not satisfactory with respect to the required computational cost. Motivated by this scenario, we introduce utility, which is a function predefined by each user and describes the trade-off between cost and performance of BO. This utility function, combined with our novel acquisition function and stopping criterion, allows us to dynamically choose for each BO step the best configuration that we expect to maximally improve the utility in future, and also automatically stop the BO around the maximum utility. Further, we improve the sample efficiency of existing learning curve (LC) extrapolation methods with transfer learning, while successfully capturing the correlations between different configurations to develop a sensible surrogate function for multi-fidelity BO. We validate our algorithm on various LC datasets and found it outperform all the previous multi-fidelity BO and transfer-BO baselines we consider, achieving significantly better trade-off between cost and performance of BO.
Published: 2024

15. Learning diverse attacks on large language models for robust red-teaming and safety tuning

Author: Lee, Seanie, Kim, Minsu, Cherif, Lynn, Dobre, David, Lee, Juho, Hwang, Sung Ju, Kawaguchi, Kenji, Gidel, Gauthier, Bengio, Yoshua, Malkin, Nikolay, and Jain, Moksh
Subjects: Computer Science - Computation and Language, Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs). Developing effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that elicit undesirable responses from a target LLM, as measured, for example, by an auxiliary toxicity classifier. We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks. As a flexible and probabilistically principled alternative, we propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts. We find that the attacks generated by our method are effective against a wide range of target LLMs, both with and without safety tuning, and transfer well between target LLMs. Finally, we demonstrate that models safety-tuned using a dataset of red-teaming prompts generated by our method are robust to attacks from other RL-based red-teaming approaches.
Published: 2024

16. Automatic Jailbreaking of the Text-to-Image Generative AI Systems

Author: Kim, Minseon, Lee, Hyomin, Gong, Boqing, Zhang, Huishuai, and Hwang, Sung Ju
Subjects: Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs). At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, which are often referred to as jailbreaking. However, most of the previous works only focused on the text-based jailbreaking in LLMs, and the jailbreaking of the text-to-image (T2I) generation system has been relatively overlooked. In this paper, we first evaluate the safety of the commercial T2I generation systems, such as ChatGPT, Copilot, and Gemini, on copyright infringement with naive prompts. From this empirical study, we find that Copilot and Gemini block only 12% and 17% of the attacks with naive prompts, respectively, while ChatGPT blocks 84% of them. Then, we further propose a stronger automated jailbreaking pipeline for T2I generation systems, which produces prompts that bypass their safety guards. Our automated jailbreaking framework leverages an LLM optimizer to generate prompts to maximize degree of violation from the generated images without any weight updates or gradient computation. Surprisingly, our simple yet effective approach successfully jailbreaks the ChatGPT with 11.0% block rate, making it generate copyrighted contents in 76% of the time. Finally, we explore various defense strategies, such as post-generation filtering and machine unlearning techniques, but found that they were inadequate, which suggests the necessity of stronger defense mechanisms., Comment: Under review
Published: 2024

17. LG AI Research & KAIST at EHRSQL 2024: Self-Training Large Language Models with Pseudo-Labeled Unanswerable Questions for a Reliable Text-to-SQL System on EHRs

Author: Jo, Yongrae, Lee, Seongyun, Seo, Minju, Hwang, Sung Ju, and Lee, Moontae
Subjects: Computer Science - Computation and Language
Abstract: Text-to-SQL models are pivotal for making Electronic Health Records (EHRs) accessible to healthcare professionals without SQL knowledge. With the advancements in large language models, these systems have become more adept at translating complex questions into SQL queries. Nonetheless, the critical need for reliability in healthcare necessitates these models to accurately identify unanswerable questions or uncertain predictions, preventing misinformation. To address this problem, we present a self-training strategy using pseudo-labeled unanswerable questions to enhance the reliability of text-to-SQL models for EHRs. This approach includes a two-stage training process followed by a filtering method based on the token entropy and query execution. Our methodology's effectiveness is validated by our top performance in the EHRSQL 2024 shared task, showcasing the potential to improve healthcare decision-making through more reliable text-to-SQL systems., Comment: NAACL 2024 Clinical NLP Workshop
Published: 2024

18. ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models

Author: Baek, Jinheon, Jauhar, Sujay Kumar, Cucerzan, Silviu, and Hwang, Sung Ju
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Scientific Research, vital for improving human life, is hindered by its inherent complexity, slow pace, and the need for specialized experts. To enhance its productivity, we propose a ResearchAgent, a large language model-powered research idea writing agent, which automatically generates problems, methods, and experiment designs while iteratively refining them based on scientific literature. Specifically, starting with a core paper as the primary focus to generate ideas, our ResearchAgent is augmented not only with relevant publications through connecting information over an academic graph but also entities retrieved from an entity-centric knowledge store based on their underlying concepts, mined and shared across numerous papers. In addition, mirroring the human approach to iteratively improving ideas with peer discussions, we leverage multiple ReviewingAgents that provide reviews and feedback iteratively. Further, they are instantiated with human preference-aligned large language models whose criteria for evaluation are derived from actual human judgments. We experimentally validate our ResearchAgent on scientific publications across multiple disciplines, showcasing its effectiveness in generating novel, clear, and valid research ideas based on human and model-based evaluation results.
Published: 2024

19. Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models

Author: Jang, Sangwon, Jo, Jaehyeong, Lee, Kimin, and Hwang, Sung Ju
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Text-to-image diffusion models have shown remarkable success in generating personalized subjects based on a few reference images. However, current methods often fail when generating multiple subjects simultaneously, resulting in mixed identities with combined attributes from different subjects. In this work, we present MuDI, a novel framework that enables multi-subject personalization by effectively decoupling identities from multiple subjects. Our main idea is to utilize segmented subjects generated by a foundation model for segmentation (Segment Anything) for both training and inference, as a form of data augmentation for training and initialization for the generation process. Moreover, we further introduce a new metric to better evaluate the performance of our method on multi-subject personalization. Experimental results show that our MuDI can produce high-quality personalized images without identity mixing, even for highly similar subjects as shown in Figure 1. Specifically, in human evaluation, MuDI obtains twice the success rate for personalizing multiple subjects without identity mixing over existing baselines and is preferred over 70% against the strongest baseline., Comment: NeurIPS 2024. Project page: https://mudi-t2i.github.io/
Published: 2024

20. Rethinking Saliency-Guided Weakly-Supervised Semantic Segmentation

Author: Kim, Beomyoung, Kim, Donghyun, and Hwang, Sung Ju
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper presents a fresh perspective on the role of saliency maps in weakly-supervised semantic segmentation (WSSS) and offers new insights and research directions based on our empirical findings. We conduct comprehensive experiments and observe that the quality of the saliency map is a critical factor in saliency-guided WSSS approaches. Nonetheless, we find that the saliency maps used in previous works are often arbitrarily chosen, despite their significant impact on WSSS. Additionally, we observe that the choice of the threshold, which has received less attention before, is non-trivial in WSSS. To facilitate more meaningful and rigorous research for saliency-guided WSSS, we introduce \texttt{WSSS-BED}, a standardized framework for conducting research under unified conditions. \texttt{WSSS-BED} provides various saliency maps and activation maps for seven WSSS methods, as well as saliency maps from unsupervised salient object detection models., Comment: Preprint, 17 pages, 7 figures
Published: 2024

21. Towards Label-Efficient Human Matting: A Simple Baseline for Weakly Semi-Supervised Trimap-Free Human Matting

Author: Kim, Beomyoung, Yi, Myeong Yeon, Yu, Joonsang, Yoo, Young Joon, and Hwang, Sung Ju
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge for natural images. To address this challenge, we introduce a new learning paradigm, weakly semi-supervised human matting (WSSHM), which leverages a small amount of expensive matte labels and a large amount of budget-friendly segmentation labels, to save the annotation cost and resolve the domain generalization problem. To achieve the goal of WSSHM, we propose a simple and effective training method, named Matte Label Blending (MLB), that selectively guides only the beneficial knowledge of the segmentation and matte data to the matting model. Extensive experiments with our detailed analysis demonstrate our method can substantially improve the robustness of the matting model using a few matte data and numerous segmentation data. Our training method is also easily applicable to real-time models, achieving competitive accuracy with breakneck inference speed (328 FPS on NVIDIA V100 GPU). The implementation code is available at \url{https://github.com/clovaai/WSSHM}., Comment: Preprint, 15 pages, 13 figures
Published: 2024

22. ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning

Author: Kim, Beomyoung, Yu, Joonsang, and Hwang, Sung Ju
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Panoptic segmentation, combining semantic and instance segmentation, stands as a cutting-edge computer vision task. Despite recent progress with deep learning models, the dynamic nature of real-world applications necessitates continual learning, where models adapt to new classes (plasticity) over time without forgetting old ones (catastrophic forgetting). Current continual segmentation methods often rely on distillation strategies like knowledge distillation and pseudo-labeling, which are effective but result in increased training complexity and computational overhead. In this paper, we introduce a novel and efficient method for continual panoptic segmentation based on Visual Prompt Tuning, dubbed ECLIPSE. Our approach involves freezing the base model parameters and fine-tuning only a small set of prompt embeddings, addressing both catastrophic forgetting and plasticity and significantly reducing the trainable parameters. To mitigate inherent challenges such as error propagation and semantic drift in continual segmentation, we propose logit manipulation to effectively leverage common knowledge across the classes. Experiments on ADE20K continual panoptic segmentation benchmark demonstrate the superiority of ECLIPSE, notably its robustness against catastrophic forgetting and its reasonable plasticity, achieving a new state-of-the-art. The code is available at https://github.com/clovaai/ECLIPSE., Comment: CVPR 2024
Published: 2024

23. Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Author: Jeong, Soyeong, Baek, Jinheon, Cho, Sukmin, Hwang, Sung Ju, and Park, Jong C.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Retrieval-Augmented Large Language Models (LLMs), which incorporate the non-parametric knowledge from external knowledge bases into LLMs, have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA). However, even though there are various approaches dealing with queries of different complexities, they either handle simple queries with unnecessary computational overhead or fail to adequately address complex multi-step queries; yet, not all user requests fall into only one of the simple or complex categories. In this work, we propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs from the simplest to the most sophisticated ones based on the query complexity. Also, this selection process is operationalized with a classifier, which is a smaller LM trained to predict the complexity level of incoming queries with automatically collected labels, obtained from actual predicted outcomes of models and inherent inductive biases in datasets. This approach offers a balanced strategy, seamlessly adapting between the iterative and single-step retrieval-augmented LLMs, as well as the no-retrieval methods, in response to a range of query complexities. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems, compared to relevant baselines including the adaptive retrieval approaches. Code is available at: https://github.com/starsuzi/Adaptive-RAG., Comment: NAACL 2024
Published: 2024

24. Use of Endoscopic Images in the Prediction of Submucosal Invasion of Gastric Neoplasms: Automated Deep Learning Model Development and Usability Study

Author: Bang, Chang Seok, Lim, Hyun, Jeong, Hae Min, and Hwang, Sung Hyeon
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Public aspects of medicine, RA1-1270
Abstract: BackgroundIn a previous study, we examined the use of deep learning models to classify the invasion depth (mucosa-confined versus submucosa-invaded) of gastric neoplasms using endoscopic images. The external test accuracy reached 77.3%. However, model establishment is labor intense, requiring high performance. Automated deep learning (AutoDL) models, which enable fast searching of optimal neural architectures and hyperparameters without complex coding, have been developed. ObjectiveThe objective of this study was to establish AutoDL models to classify the invasion depth of gastric neoplasms. Additionally, endoscopist–artificial intelligence interactions were explored. MethodsThe same 2899 endoscopic images that were employed to establish the previous model were used. A prospective multicenter validation using 206 and 1597 novel images was conducted. The primary outcome was external test accuracy. Neuro-T, Create ML Image Classifier, and AutoML Vision were used in establishing the models. Three doctors with different levels of endoscopy expertise were asked to classify the invasion depth of gastric neoplasms for each image without AutoDL support, with faulty AutoDL support, and with best performance AutoDL support in sequence. ResultsThe Neuro-T–based model reached 89.3% (95% CI 85.1%-93.5%) external test accuracy. For the model establishment time, Create ML Image Classifier showed the fastest time of 13 minutes while reaching 82.0% (95% CI 76.8%-87.2%) external test accuracy. While the expert endoscopist's decisions were not influenced by AutoDL, the faulty AutoDL misled the endoscopy trainee and the general physician. However, this was corrected by the support of the best performance AutoDL model. The trainee gained the most benefit from the AutoDL support. ConclusionsAutoDL is deemed useful for the on-site establishment of customized deep learning models. An inexperienced endoscopist with at least a certain level of expertise can benefit from AutoDL support.
Published: 2021
Full Text: View/download PDF

25. Twelve-Lead Electrocardiogram Acquisition With a Patchy-Type Wireless Device in Ambulance Transport: Simulation-Based Randomized Controlled Trial

Author: Yoon, Sunyoung, Kim, Taerim, Roh, Taehwan, Chang, Hansol, Hwang, Sung Yeon, Yoon, Hee, Shin, Tae Gun, Sim, Min Seob, Jo, Ik Joon, and Cha, Won Chul
Subjects: Information technology, T58.5-58.64, Public aspects of medicine, RA1-1270
Abstract: BackgroundCardiovascular disease is the leading cause of death worldwide. Early recognition, diagnosis, and reperfusion are the key elements of treatment for ST-segment elevation myocardial infarction. The absence of a prehospital 12-lead electrocardiogram (P12ECG) can cause definitive treatment delay and repeated transfer. Although guidelines highly recommend the measurement and transmission of P12ECG data, P12ECG use has not been widely established. ObjectiveThe aim of this study was to verify the time-efficiency and feasibility of the use of a patchy-type 12-lead ECG measuring and transmitting device (P-ECG) by an emergency medical technician (EMT) in an ambulance during patient transport. MethodsThis was a simulation-based prospective randomized crossover-controlled study that included EMTs. The participants were randomly assigned to one of two groups. Group A began the experiment with a conventional 12-lead ECG (C-ECG) device and then switched to the intervention device (P-ECG), whereas group B began the experiment with the P-ECG and then switched to the C-ECG. All simulations were performed inside an ambulance driving at 30 km/h. The time interval was measured from the beginning of ECG application to completion of sending the results. After the simulation, participants were administered the System Usability Scale questionnaire about usability of the P-ECG. ResultsA total of 18 EMTs were recruited for this study with a median age of 35 years. The overall interval time for the C-ECG was 254 seconds (IQR 247-270), whereas the overall interval time for the P-ECG was 130 seconds (IQR 112-150), with a significant difference (P
Published: 2021
Full Text: View/download PDF

26. GLP-1 and its derived peptides mediate pain relief through direct TRPV1 inhibition without affecting thermoregulation

Author: Go, Eun Jin, Hwang, Sung-Min, Jo, Hyunjung, Rahman, Md. Mahbubur, Park, Jaeik, Lee, Ji Yeon, Jo, Youn Yi, Lee, Byung-Gil, Jung, YunJae, Berta, Temugin, Kim, Yong Ho, and Park, Chul-Kyu
Published: 2024
Full Text: View/download PDF

27. 3D-printed chitosan-pectin-sodium alginate scaffolds for post-surgical peritoneal wound dressing and sustained delivery of oxaliplatin

Author: Dinh, Linh, Machamasi, Rukesh, Kim, Chae Jeong, Lee, Jong-Ju, Choi, Yeonju, Kim, Haneul, Mahon, Lanesa, Grabenbauer, Ayla, Yan, Bingfang, and Hwang, Sung-Joo
Published: 2024
Full Text: View/download PDF

28. Transgelin 2 guards T cell lipid metabolism and antitumour function

Author: Hwang, Sung-Min, Awasthi, Deepika, Jeong, Jieun, Sandoval, Tito A., Chae, Chang-Suk, Ramos, Yusibeska, Tan, Chen, Marin Falco, Matías, Salvagno, Camilla, Emmanuelli, Alexander, McBain, Ian T., Mishra, Bikash, Ivashkiv, Lionel B., Zamarin, Dmitriy, Cantillo, Evelyn, Chapman-Davis, Eloise, Holcomb, Kevin, Morales, Diana K., Yu, Xiaoqing, Rodriguez, Paulo C., Conejo-Garcia, Jose R., Kaczocha, Martin, Vähärautio, Anna, Song, Minkyung, and Cubillos-Ruiz, Juan R.
Published: 2024
Full Text: View/download PDF

29. Intramuscular neural distribution of the vastus medialis for botulinum neurotoxin injection: application to spasticity

Author: Yi, Kyu-Ho, Hu, Hyewon, Hwang, Sung-Oh, Ahn, Haeryun, Lee, Ji-Hyun, and Lee, Hyung-Jin
Published: 2024
Full Text: View/download PDF

30. Neutral-Connect Control in a Two-Speed Transmission Based on Demand Torque Prediction Using a Time Series Deep Learning Model

Author: Ahn, Jihyeok, Gwak, Seoku, Jeong, Seyoung, Kim, Kyung-Ho, and Hwang, Sung-Ho
Published: 2024
Full Text: View/download PDF

31. Preparation and characterization of intramuscularly long-acting celecoxib nanosuspensions for postoperative pain management

Author: Dinh, Linh, Choi, Junhuyk, Machamasi, Rukesh, Lee, Jong-Ju, Kim, Minkyu, and Hwang, Sung-Joo
Published: 2024
Full Text: View/download PDF

32. Characteristics of phthalate concentrations in propellant- and trigger-type consumer spray products

Author: Hwang, Sung Ho, Oh, Gi Taek, Park, Jeung Yeon, Lee, Kiyoung, Zho, Kyung-Duk, and Yoon, Chungsik
Published: 2024
Full Text: View/download PDF

33. Diffusion-Based Neural Network Weights Generation

Author: Soro, Bedionita, Andreis, Bruno, Lee, Hayeon, Jeong, Wonyong, Chong, Song, Hutter, Frank, and Hwang, Sung Ju
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Transfer learning has gained significant attention in recent deep learning research due to its ability to accelerate convergence and enhance performance on new tasks. However, its success is often contingent on the similarity between source and target data, and training on numerous datasets can be costly, leading to blind selection of pretrained models with limited insight into their effectiveness. To address these challenges, we introduce D2NWG, a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning, conditioned on the target dataset. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation, learning the weight distributions of models pretrained on various datasets. This allows for automatic generation of weights that generalize well across both seen and unseen tasks, outperforming state-of-the-art meta-learning methods and pretrained models. Moreover, our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques that rely on task-specific model collections or access to original training data. By modeling the parameter distribution of LLMs, D2NWG enables task-specific parameter generation without requiring additional fine-tuning or large collections of model variants. Extensive experiments show that our method consistently enhances the performance of diverse base models, regardless of their size or complexity, positioning it as a robust solution for scalable transfer learning., Comment: 32 pages
Published: 2024

34. Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks

Author: Seo, Minju, Baek, Jinheon, Thorne, James, and Hwang, Sung Ju
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Despite large successes of recent language models on diverse tasks, they suffer from severe performance degeneration in low-resource settings with limited training data available. Many existing works tackle this problem by generating synthetic data from the training data and then training models on them, recently using Large Language Models (LLMs). However, in low-resource settings, the amount of seed data samples to use for data augmentation is very small, which makes generated samples suboptimal and less diverse. To tackle this challenge, we propose a novel method that augments training data by incorporating a wealth of examples from other datasets, along with the given training data. Specifically, we first retrieve the relevant instances from other datasets, such as their input-output pairs or contexts, based on their similarities with the given seed data, and then prompt LLMs to generate new samples with the contextual information within and across the original and retrieved samples. This approach can ensure that the generated data is not only relevant but also more diverse than what could be achieved using the limited seed data alone. We validate our proposed Retrieval-Augmented Data Augmentation (RADA) framework on multiple datasets under low-resource settings of training and test-time data augmentation scenarios, on which it outperforms existing LLM-powered data augmentation baselines.
Published: 2024

35. BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation

Author: Lee, Daeun, Yoon, Jaehong, and Hwang, Sung Ju
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Continual Test Time Adaptation (CTTA) is required to adapt efficiently to continuous unseen domains while retaining previously learned knowledge. However, despite the progress of CTTA, it is still challenging to deploy the model with improved forgetting-adaptation trade-offs and efficiency. In addition, current CTTA scenarios assume only the disjoint situation, even though real-world domains are seamlessly changed. To address these challenges, this paper proposes BECoTTA, an input-dependent and efficient modular framework for CTTA. We propose Mixture-of Domain Low-rank Experts (MoDE) that contains two core components: (i) Domain-Adaptive Routing, which helps to selectively capture the domain adaptive knowledge with multiple domain routers, and (ii) Domain-Expert Synergy Loss to maximize the dependency between each domain and expert. We validate that our method outperforms multiple CTTA scenarios, including disjoint and gradual domain shits, while only requiring ~98% fewer trainable parameters. We also provide analyses of our method, including the construction of experts, the effect of domain-adaptive experts, and visualizations., Comment: Accepted by ICML2024, 22 pages, Project page: https://becotta-ctta.github.io/
Published: 2024

36. Continual Learning: Forget-free Winning Subnetworks for Video Representations

Author: Kang, Haeyong, Yoon, Jaehong, Hwang, Sung Ju, and Yoo, Chang D.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Inspired by the Lottery Ticket Hypothesis (LTH), which highlights the existence of efficient subnetworks within larger, dense networks, a high-performing Winning Subnetwork (WSN) in terms of task performance under appropriate sparsity conditions is considered for various continual learning tasks. It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incremental Learning (TIL) and Task-agnostic Incremental Learning (TaIL) scenarios. In Few-Shot Class Incremental Learning (FSCIL), a variation of WSN referred to as the Soft subnetwork (SoftNet) is designed to prevent overfitting when the data samples are scarce. Furthermore, the sparse reuse of WSN weights is considered for Video Incremental Learning (VIL). The use of Fourier Subneural Operator (FSO) within WSN is considered. It enables compact encoding of videos and identifies reusable subnetworks across varying bandwidths. We have integrated FSO into different architectural frameworks for continual learning, including VIL, TIL, and FSCIL. Our comprehensive experiments demonstrate FSO's effectiveness, significantly improving task performance at various convolutional representational levels. Specifically, FSO enhances higher-layer performance in TIL and FSCIL and lower-layer performance in VIL., Comment: arXiv admin note: substantial text overlap with arXiv:2303.14962, arXiv:2306.11305
Published: 2023

37. LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers

Author: Nam, Taewook, Lee, Juyong, Zhang, Jesse, Hwang, Sung Ju, Lim, Joseph J., and Pertsch, Karl
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: We propose a framework that leverages foundation models as teachers, guiding a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded in a training environment from large language models. Then, a vision-language model guides the agent in learning the multi-task language-conditioned policy by providing reward feedback. We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment while prior unsupervised skill discovery methods struggle. Additionally, we discuss observed challenges of using off-the-shelf foundation models as teachers and our efforts to address them., Comment: 2nd Workshop on Agent Learning in Open-Endedness (ALOE) at NeurIPS 2023
Published: 2023

38. KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis

Author: Lee, Youngwan, Park, Kwanyong, Cho, Yoorhim, Lee, Yong-Ju, and Hwang, Sung Ju
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: As text-to-image (T2I) synthesis models increase in size, they demand higher inference costs due to the need for more expensive GPUs with larger memory, which makes it challenging to reproduce these models in addition to the restricted access to training datasets. Our study aims to reduce these inference costs and explores how far the generative capabilities of T2I models can be extended using only publicly available datasets and open-source models. To this end, by using the de facto standard text-to-image model, Stable Diffusion XL (SDXL), we present three key practices in building an efficient T2I model: (1) Knowledge distillation: we explore how to effectively distill the generation capability of SDXL into an efficient U-Net and find that self-attention is the most crucial part. (2) Data: despite fewer samples, high-resolution images with rich captions are more crucial than a larger number of low-resolution images with short captions. (3) Teacher: Step-distilled Teacher allows T2I models to reduce the noising steps. Based on these findings, we build two types of efficient text-to-image models, called KOALA-Turbo &-Lightning, with two compact U-Nets (1B & 700M), reducing the model size up to 54% and 69% of the SDXL U-Net. In particular, the KOALA-Lightning-700M is 4x faster than SDXL while still maintaining satisfactory generation quality. Moreover, unlike SDXL, our KOALA models can generate 1024px high-resolution images on consumer-grade GPUs with 8GB of VRAMs (3060Ti). We believe that our KOALA models will have a significant practical impact, serving as cost-effective alternatives to SDXL for academic researchers and general users in resource-constrained environments., Comment: Project page: https://youngwanlee.github.io/KOALA/
Published: 2023

39. Analysis of Retrofit SCR System for Small-Sized Ship Diesel Engines Using Numerical Methods

Author: Hwang, Sung-Chul and Nam, Hyungseok
Published: 2024
Full Text: View/download PDF

40. Lyotropic liquid crystalline nanoparticles for oral delivery: formulation and evaluation of sustained-released cromolyn sodium loaded cubosomes

Author: Dinh, Linh, Kim, Dong Min, Lee, Gawon, Yoon, Yangno, Han, Hyeji, Oh, Dong Joon, Lee, Juseung, and Hwang, Sung-Joo
Published: 2024
Full Text: View/download PDF

41. Influence of Modified Poly(Glycolic acid) on the Physical and Mechanical Properties of PLA/PBAT/mPGA Multi-phase Blends

Author: Choo, Ji Eun, Kim, Do Yeop, Park, Tae Hyeong, and Hwang, Sung Wook
Published: 2024
Full Text: View/download PDF

42. Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models

Author: Kim, Yujin, Yoon, Jaehong, Ye, Seonghyeon, Bae, Sangmin, Ho, Namgyu, Hwang, Sung Ju, and Yun, Se-young
Subjects: Computer Science - Computation and Language
Abstract: The dynamic nature of knowledge in an ever-changing world presents challenges for language models trained on static data; the model in the real world often requires not only acquiring new knowledge but also overwriting outdated information into updated ones. To study the ability of language models for these time-dependent dynamics in human language, we introduce a novel task, EvolvingQA, a temporally evolving question-answering benchmark designed for training and evaluating LMs on an evolving Wikipedia database. The construction of EvolvingQA is automated with our pipeline using large language models. We uncover that existing continual learning baselines suffer from updating and removing outdated knowledge. Our analysis suggests that models fail to rectify knowledge due to small weight gradients. In addition, we elucidate that language models particularly struggle to reflect the change of numerical or temporal information. Our work aims to model the dynamic nature of real-world information, suggesting faithful evaluations of the evolution-adaptability of language models., Comment: 15 pages, 10 figures, 5 tables; accepted to NAACL 2024
Published: 2023

43. Context-dependent Instruction Tuning for Dialogue Response Generation

Author: Kwak, Jin Myung, Kim, Minseon, and Hwang, Sung Ju
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent language models have achieved impressive performance in natural language tasks by incorporating instructions with task input during fine-tuning. Since all samples in the same natural language task can be explained with the same task instructions, many instruction datasets only provide a few instructions for the entire task, without considering the input of each example in the task. However, this approach becomes ineffective in complex multi-turn dialogue generation tasks, where the input varies highly with each turn as the dialogue context changes, so that simple task instructions cannot improve the generation performance. To address this limitation, we introduce a context-based instruction fine-tuning framework for each multi-turn dialogue which generates both responses and instructions based on the previous context as input. During the evaluation, the model generates instructions based on the previous context to self-guide the response. The proposed framework produces comparable or even outstanding results compared to the baselines by aligning instructions to the input during fine-tuning with the instructions in quantitative evaluations on dialogue benchmark datasets with reduced computation budget., Comment: Work in Progress
Published: 2023

44. Co-training and Co-distillation for Quality Improvement and Compression of Language Models

Author: Lee, Hayeon, Hou, Rui, Kim, Jongpil, Liang, Davis, Zhang, Hongbo, Hwang, Sung Ju, and Min, Alexander
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Knowledge Distillation (KD) compresses computationally expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models, allowing their use in resource-constrained or real-time settings. However, most smaller models fail to surpass the performance of the original larger model, resulting in sacrificing performance to improve inference speed. To address this issue, we propose Co-Training and Co-Distillation (CTCD), a novel framework that improves performance and inference speed together by co-training two models while mutually distilling knowledge. The CTCD framework successfully achieves this based on two significant findings: 1) Distilling knowledge from the smaller model to the larger model during co-training improves the performance of the larger model. 2) The enhanced performance of the larger model further boosts the performance of the smaller model. The CTCD framework shows promise as it can be combined with existing techniques like architecture design or data augmentation, replacing one-way KD methods, to achieve further performance improvement. Extensive ablation studies demonstrate the effectiveness of CTCD, and the small model distilled by CTCD outperforms the original larger model by a significant margin of 1.66 on the GLUE benchmark., Comment: Findings of EMNLP 2023
Published: 2023

45. Test-Time Self-Adaptive Small Language Models for Question Answering

Author: Jeong, Soyeong, Baek, Jinheon, Cho, Sukmin, Hwang, Sung Ju, and Park, Jong C.
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Recent instruction-finetuned large language models (LMs) have achieved notable performances in various tasks, such as question-answering (QA). However, despite their ability to memorize a vast amount of general knowledge across diverse tasks, they might be suboptimal on specific tasks due to their limited capacity to transfer and adapt knowledge to target tasks. Moreover, further finetuning LMs with labeled datasets is often infeasible due to their absence, but it is also questionable if we can transfer smaller LMs having limited knowledge only with unlabeled test data. In this work, we show and investigate the capabilities of smaller self-adaptive LMs, only with unlabeled test data. In particular, we first stochastically generate multiple answers, and then ensemble them while filtering out low-quality samples to mitigate noise from inaccurate labels. Our proposed self-adaption strategy demonstrates significant performance improvements on benchmark QA datasets with higher robustness across diverse prompts, enabling LMs to stay stable. Code is available at: https://github.com/starsuzi/T-SAS., Comment: EMNLP Findings 2023
Published: 2023

46. Knowledge-Augmented Language Model Verification

Author: Baek, Jinheon, Jeong, Soyeong, Kang, Minki, Park, Jong C., and Hwang, Sung Ju
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Recent Language Models (LMs) have shown impressive capabilities in generating texts with the knowledge internalized in parameters. Yet, LMs often generate the factually incorrect responses to the given queries, since their knowledge may be inaccurate, incomplete, and outdated. To address this problem, previous works propose to augment LMs with the knowledge retrieved from an external knowledge source. However, such approaches often show suboptimal text generation performance due to two reasons: 1) the model may fail to retrieve the knowledge relevant to the given query, or 2) the model may not faithfully reflect the retrieved knowledge in the generated text. To overcome these, we propose to verify the output and the knowledge of the knowledge-augmented LMs with a separate verifier, which is a small LM that is trained to detect those two types of errors through instruction-finetuning. Then, when the verifier recognizes an error, we can rectify it by either retrieving new knowledge or generating new text. Further, we use an ensemble of the outputs from different instructions with a single verifier to enhance the reliability of the verification processes. We validate the effectiveness of the proposed verification steps on multiple question answering benchmarks, whose results show that the proposed verifier effectively identifies retrieval and generation errors, allowing LMs to provide more factually correct outputs. Our code is available at https://github.com/JinheonBaek/KALMV., Comment: EMNLP 2023
Published: 2023

47. STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment

Author: Lee, Jaewoo, Yoon, Jaehong, Kim, Wonjae, Kim, Yunji, and Hwang, Sung Ju
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Continuously learning a variety of audio-video semantics over time is crucial for audio-related reasoning tasks in our ever-evolving world. However, this is a nontrivial problem and poses two critical challenges: sparse spatio-temporal correlation between audio-video pairs and multimodal correlation overwriting that forgets audio-video relations. To tackle this problem, we propose a new continual audio-video pre-training method with two novel ideas: (1) Localized Patch Importance Scoring: we introduce a multimodal encoder to determine the importance score for each patch, emphasizing semantically intertwined audio-video patches. (2) Replay-guided Correlation Assessment: to reduce the corruption of previously learned audiovisual knowledge due to drift, we propose to assess the correlation of the current patches on the past steps to identify the patches exhibiting high correlations with the past steps. Based on the results from the two ideas, we perform probabilistic patch selection for effective continual audio-video pre-training. Experimental validation on multiple benchmarks shows that our method achieves a 3.69%p of relative performance gain in zero-shot retrieval tasks compared to strong continual learning baselines, while reducing memory consumption by ~45%., Comment: ICML 2024
Published: 2023

48. Generative Modeling on Manifolds Through Mixture of Riemannian Diffusion Processes

Author: Jo, Jaehyeong and Hwang, Sung Ju
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Learning the distribution of data on Riemannian manifolds is crucial for modeling data from non-Euclidean space, which is required by many applications in diverse scientific fields. Yet, existing generative models on manifolds suffer from expensive divergence computation or rely on approximations of heat kernel. These limitations restrict their applicability to simple geometries and hinder scalability to high dimensions. In this work, we introduce the Riemannian Diffusion Mixture, a principled framework for building a generative diffusion process on manifolds. Instead of following the denoising approach of previous diffusion models, we construct a diffusion process using a mixture of bridge processes derived on general manifolds without requiring heat kernel estimations. We develop a geometric understanding of the mixture process, deriving the drift as a weighted mean of tangent directions to the data points that guides the process toward the data distribution. We further propose a scalable training objective for learning the mixture process that readily applies to general manifolds. Our method achieves superior performance on diverse manifolds with dramatically reduced number of in-training simulation steps for general manifolds., Comment: ICML 2024
Published: 2023

49. Self-Supervised Dataset Distillation for Transfer Learning

Author: Lee, Dong Bok, Lee, Seanie, Ko, Joonho, Kawaguchi, Kenji, Lee, Juho, and Hwang, Sung Ju
Subjects: Computer Science - Machine Learning
Abstract: Dataset distillation methods have achieved remarkable success in distilling a large dataset into a small set of representative samples. However, they are not designed to produce a distilled dataset that can be effectively used for facilitating self-supervised pre-training. To this end, we propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL). We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is \textit{biased} due to the randomness originating from data augmentations or masking. To address this issue, we propose to minimize the mean squared error (MSE) between a model's representations of the synthetic examples and their corresponding learnable target feature representations for the inner objective, which does not introduce any randomness. Our primary motivation is that the model obtained by the proposed inner optimization can mimic the \textit{self-supervised target model}. To achieve this, we also introduce the MSE between representations of the inner model and the self-supervised target model on the original full dataset for outer optimization. Lastly, assuming that a feature extractor is fixed, we only optimize a linear head on top of the feature extractor, which allows us to reduce the computational cost and obtain a closed-form solution of the head with kernel ridge regression. We empirically validate the effectiveness of our method on various applications involving transfer learning.
Published: 2023

50. Improving Neural Radiance Field using Near-Surface Sampling with Point Cloud Generation

Author: Yoo, Hye Bin, Han, Hyun Min, Hwang, Sung Soo, and Chun, Il Yong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Neural radiance field (NeRF) is an emerging view synthesis method that samples points in a three-dimensional (3D) space and estimates their existence and color probabilities. The disadvantage of NeRF is that it requires a long training time since it samples many 3D points. In addition, if one samples points from occluded regions or in the space where an object is unlikely to exist, the rendering quality of NeRF can be degraded. These issues can be solved by estimating the geometry of 3D scene. This paper proposes a near-surface sampling framework to improve the rendering quality of NeRF. To this end, the proposed method estimates the surface of a 3D object using depth images of the training set and sampling is performed around there only. To obtain depth information on a novel view, the paper proposes a 3D point cloud generation method and a simple refining method for projected depth from a point cloud. Experimental results show that the proposed near-surface sampling NeRF framework can significantly improve the rendering quality, compared to the original NeRF and three different state-of-the-art NeRF. In addition, one can significantly accelerate the training time of a NeRF model with the proposed near-surface sampling framework., Comment: 14 figures, 3 tables
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

9,766 results on '"Hwang, Sung"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources