2,477 results on '"Kim, HyunWoo"'
Search Results
2. VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
- Author
-
Lee, Ji Soo, Kim, Jongha, Na, Jeehye, Park, Jinyoung, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Despite the advancements of Video Large Language Models (VideoLLMs) in various tasks, they struggle with fine-grained temporal understanding, such as Dense Video Captioning (DVC). DVC is a complicated task of describing all events within a video while also temporally localizing them, which integrates multiple fine-grained tasks, including video segmentation, video captioning, and temporal video grounding. Previous VideoLLMs attempt to solve DVC in a single step, failing to utilize their reasoning capability. Moreover, previous training objectives for VideoLLMs do not fully reflect the evaluation metrics, therefore not providing supervision directly aligned to target tasks. To address such a problem, we propose a novel framework named VidChain comprised of Chain-of-Tasks (CoTasks) and Metric-based Direct Preference Optimization (M-DPO). CoTasks decompose a complex task into a sequence of sub-tasks, allowing VideoLLMs to leverage their reasoning capabilities more effectively. M-DPO aligns a VideoLLM with evaluation metrics, providing fine-grained supervision to each task that is well-aligned with metrics. Applied to two different VideoLLMs, VidChain consistently improves their fine-grained video understanding, thereby outperforming previous VideoLLMs on two different DVC benchmarks and also on the temporal video grounding task. Code is available at \url{https://github.com/mlvlab/VidChain}., Comment: AAAI 2025
- Published
- 2025
3. Super-class guided Transformer for Zero-Shot Attribute Classification
- Author
-
Kim, Sehyung, Yang, Chanhyeong, Park, Jihwan, Song, Taehoon, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Attribute classification is crucial for identifying specific characteristics within image regions. Vision-Language Models (VLMs) have been effective in zero-shot tasks by leveraging their general knowledge from large-scale datasets. Recent studies demonstrate that transformer-based models with class-wise queries can effectively address zero-shot multi-label classification. However, poor utilization of the relationship between seen and unseen attributes makes the model lack generalizability. Additionally, attribute classification generally involves many attributes, making maintaining the model's scalability difficult. To address these issues, we propose Super-class guided transFormer (SugaFormer), a novel framework that leverages super-classes to enhance scalability and generalizability for zero-shot attribute classification. SugaFormer employs Super-class Query Initialization (SQI) to reduce the number of queries, utilizing common semantic information from super-classes, and incorporates Multi-context Decoding (MD) to handle diverse visual cues. To strengthen generalizability, we introduce two knowledge transfer strategies that utilize VLMs. During training, Super-class guided Consistency Regularization (SCR) aligns model's features with VLMs using super-class guided prompts, and during inference, Zero-shot Retrieval-based Score Enhancement (ZRSE) refines predictions for unseen attributes. Extensive experiments demonstrate that SugaFormer achieves state-of-the-art performance across three widely-used attribute classification benchmarks under zero-shot, and cross-dataset transfer settings. Our code is available at https://github.com/mlvlab/SugaFormer., Comment: AAAI25
- Published
- 2025
4. EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
- Author
-
Lee, Sanghyeok, Choi, Joonmyung, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
For the deployment of neural networks in resource-constrained environments, prior works have built lightweight architectures with convolution and attention for capturing local and global dependencies, respectively. Recently, the state space model has emerged as an effective global token interaction with its favorable linear computational cost in the number of tokens. Yet, efficient vision backbones built with SSM have been explored less. In this paper, we introduce Efficient Vision Mamba (EfficientViM), a novel architecture built on hidden state mixer-based state space duality (HSM-SSD) that efficiently captures global dependencies with further reduced computational cost. In the HSM-SSD layer, we redesign the previous SSD layer to enable the channel mixing operation within hidden states. Additionally, we propose multi-stage hidden state fusion to further reinforce the representation power of hidden states, and provide the design alleviating the bottleneck caused by the memory-bound operations. As a result, the EfficientViM family achieves a new state-of-the-art speed-accuracy trade-off on ImageNet-1k, offering up to a 0.7% performance improvement over the second-best model SHViT with faster speed. Further, we observe significant improvements in throughput and accuracy compared to prior works, when scaling images or employing distillation training. Code is available at https://github.com/mlvlab/EfficientViM., Comment: preprint
- Published
- 2024
5. Inversion-based Latent Bayesian Optimization
- Author
-
Chu, Jaewon, Park, Jinyoung, Lee, Seunghun, and Kim, Hyunwoo J.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Latent Bayesian optimization (LBO) approaches have successfully adopted Bayesian optimization over a continuous latent space by employing an encoder-decoder architecture to address the challenge of optimization in a high dimensional or discrete input space. LBO learns a surrogate model to approximate the black-box objective function in the latent space. However, we observed that most LBO methods suffer from the `misalignment problem`, which is induced by the reconstruction error of the encoder-decoder architecture. It hinders learning an accurate surrogate model and generating high-quality solutions. In addition, several trust region-based LBO methods select the anchor, the center of the trust region, based solely on the objective function value without considering the trust region`s potential to enhance the optimization process. To address these issues, we propose Inversion-based Latent Bayesian Optimization (InvBO), a plug-and-play module for LBO. InvBO consists of two components: an inversion method and a potential-aware trust region anchor selection. The inversion method searches the latent code that completely reconstructs the given target data. The potential-aware trust region anchor selection considers the potential capability of the trust region for better local optimization. Experimental results demonstrate the effectiveness of InvBO on nine real-world benchmarks, such as molecule design and arithmetic expression fitting tasks. Code is available at https://github.com/mlvlab/InvBO., Comment: Accepted to NeurIPS 2024
- Published
- 2024
6. Constant Acceleration Flow
- Author
-
Park, Dogyun, Lee, Sojin, Kim, Sihyeon, Lee, Taehoon, Hong, Youngjoon, and Kim, Hyunwoo J.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Rectified flow and reflow procedures have significantly advanced fast generation by progressively straightening ordinary differential equation (ODE) flows. They operate under the assumption that image and noise pairs, known as couplings, can be approximated by straight trajectories with constant velocity. However, we observe that modeling with constant velocity and using reflow procedures have limitations in accurately learning straight trajectories between pairs, resulting in suboptimal performance in few-step generation. To address these limitations, we introduce Constant Acceleration Flow (CAF), a novel framework based on a simple constant acceleration equation. CAF introduces acceleration as an additional learnable variable, allowing for more expressive and accurate estimation of the ODE flow. Moreover, we propose two techniques to further improve estimation accuracy: initial velocity conditioning for the acceleration model and a reflow process for the initial velocity. Our comprehensive studies on toy datasets, CIFAR-10, and ImageNet 64x64 demonstrate that CAF outperforms state-of-the-art baselines for one-step generation. We also show that CAF dramatically improves few-step coupling preservation and inversion over Rectified flow. Code is available at \href{https://github.com/mlvlab/CAF}{https://github.com/mlvlab/CAF}.
- Published
- 2024
7. LLaMo: Large Language Model-based Molecular Graph Assistant
- Author
-
Park, Jinyoung, Bae, Minseong, Ko, Dohwan, and Kim, Hyunwoo J.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Quantitative Biology - Molecular Networks - Abstract
Large Language Models (LLMs) have demonstrated remarkable generalization and instruction-following capabilities with instruction tuning. The advancements in LLMs and instruction tuning have led to the development of Large Vision-Language Models (LVLMs). However, the competency of the LLMs and instruction tuning have been less explored in the molecular domain. Thus, we propose LLaMo: Large Language Model-based Molecular graph assistant, which is an end-to-end trained large molecular graph-language model. To bridge the discrepancy between the language and graph modalities, we present the multi-level graph projector that transforms graph representations into graph tokens by abstracting the output representations of each GNN layer and motif representations with the cross-attention mechanism. We also introduce machine-generated molecular graph instruction data to instruction-tune the large molecular graph-language model for general-purpose molecule and language understanding. Our extensive experiments demonstrate that LLaMo shows the best performance on diverse tasks, such as molecular description generation, property prediction, and IUPAC name prediction. The code of LLaMo is available at https://github.com/mlvlab/LLaMo., Comment: NeurIPS 2024
- Published
- 2024
8. LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
- Author
-
Shen, Xiaoqian, Xiong, Yunyang, Zhao, Changsheng, Wu, Lemeng, Chen, Jun, Zhu, Chenchen, Liu, Zechun, Xiao, Fanyi, Varadarajan, Balakrishnan, Bordes, Florian, Liu, Zhuang, Xu, Hu, Kim, Hyunwoo J., Soran, Bilge, Krishnamoorthi, Raghuraman, Elhoseiny, Mohamed, and Chandra, Vikas
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multimodal Large Language Models (MLLMs) have shown promising progress in understanding and analyzing video content. However, processing long videos remains a significant challenge constrained by LLM's context size. To address this limitation, we propose LongVU, a spatiotemporal adaptive compression mechanism thats reduces the number of video tokens while preserving visual details of long videos. Our idea is based on leveraging cross-modal query and inter-frame dependencies to adaptively reduce temporal and spatial redundancy in videos. Specifically, we leverage DINOv2 features to remove redundant frames that exhibit high similarity. Then we utilize text-guided cross-modal query for selective frame feature reduction. Further, we perform spatial token reduction across frames based on their temporal dependencies. Our adaptive compression strategy effectively processes a large number of frames with little visual information loss within given context length. Our LongVU consistently surpass existing methods across a variety of video understanding benchmarks, especially on hour-long video understanding tasks such as VideoMME and MLVU. Given a light-weight LLM, our LongVU also scales effectively into a smaller size with state-of-the-art video understanding performance., Comment: Project page: https://vision-cair.github.io/LongVU
- Published
- 2024
9. SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
- Author
-
Gu, Yuling, Tafjord, Oyvind, Kim, Hyunwoo, Moore, Jared, Bras, Ronan Le, Clark, Peter, and Choi, Yejin
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
While prior work has explored whether large language models (LLMs) possess a "theory of mind" (ToM) - the ability to attribute mental states to oneself and others - there has been little work testing whether LLMs can implicitly apply such knowledge to predict behavior, or to judge whether an observed behavior is rational. Such skills are critical for appropriate interaction in social environments. We create a new dataset, SimpleTom, containing concise, diverse stories (e.g., "The can of Pringles has moldy chips in it. Mary picks up the can in the supermarket and walks to the cashier."), each with three questions that test different degrees of ToM reasoning, asking models to predict (a) mental state ("Is Mary aware of the mold?"), (b) behavior ("Will Mary pay for the chips or report the mold?"), and (c) judgment ("Mary paid for the chips. Was that reasonable?"). To our knowledge, SimpleToM is the first dataset to systematically explore downstream reasoning requiring knowledge of mental states in realistic scenarios. Our experimental results are intriguing: While most models can reliably predict mental state on our dataset (a), they often fail to correctly predict the behavior (b), and fare even worse at judging whether given behaviors are reasonable (c), despite being correctly aware of the protagonist's mental state should make such secondary predictions obvious. We further show that we can help models do better at (b) and (c) via interventions such as reminding the model of its earlier mental state answer and mental-state-specific chain-of-thought prompting, raising the action prediction accuracies (e.g., from 49.5% to 93.5% for GPT-4o) and judgment accuracies (e.g., from 15.3% to 94.7% in GPT-4o). While this shows that models can be coaxed to perform well, it requires task-specific interventions, and the natural model performances remain low, a cautionary tale for LLM deployment.
- Published
- 2024
10. Generative Subgraph Retrieval for Knowledge Graph-Grounded Dialog Generation
- Author
-
Park, Jinyoung, Joo, Minseok, Kim, Joo-Kyung, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Knowledge graph-grounded dialog generation requires retrieving a dialog-relevant subgraph from the given knowledge base graph and integrating it with the dialog history. Previous works typically represent the graph using an external encoder, such as graph neural networks, and retrieve relevant triplets based on the similarity between single-vector representations of triplets and the dialog history. However, these external encoders fail to leverage the rich knowledge of pretrained language models, and the retrieval process is also suboptimal due to the information bottleneck caused by the single-vector abstraction of the dialog history. In this work, we propose Dialog generation with Generative Subgraph Retrieval (DialogGSR), which retrieves relevant knowledge subgraphs by directly generating their token sequences on top of language models. For effective generative subgraph retrieval, we introduce two key methods: (i) structure-aware knowledge graph linearization with self-supervised graph-specific tokens and (ii) graph-constrained decoding utilizing graph structural proximity-based entity informativeness scores for valid and relevant generative retrieval. DialogGSR achieves state-of-the-art performance in knowledge graph-grounded dialog generation, as demonstrated on OpenDialKG and KOMODIS datasets., Comment: EMNLP (main)
- Published
- 2024
11. HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
- Author
-
Zhou, Xuhui, Kim, Hyunwoo, Brahman, Faeze, Jiang, Liwei, Zhu, Hao, Lu, Ximing, Xu, Frank, Lin, Bill Yuchen, Choi, Yejin, Mireshghallah, Niloofar, Bras, Ronan Le, and Sap, Maarten
- Subjects
Computer Science - Artificial Intelligence - Abstract
AI agents are increasingly autonomous in their interactions with human users and tools, leading to increased interactional safety risks. We present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions. HAICOSYSTEM features a modular sandbox environment that simulates multi-turn interactions between human users and AI agents, where the AI agents are equipped with a variety of tools (e.g., patient management platforms) to navigate diverse scenarios (e.g., a user attempting to access other patients' profiles). To examine the safety of AI agents in these interactions, we develop a comprehensive multi-dimensional evaluation framework that uses metrics covering operational, content-related, societal, and legal risks. Through running 1840 simulations based on 92 scenarios across seven domains (e.g., healthcare, finance, education), we demonstrate that HAICOSYSTEM can emulate realistic user-AI interactions and complex tool use by AI agents. Our experiments show that state-of-the-art LLMs, both proprietary and open-sourced, exhibit safety risks in over 50\% cases, with models generally showing higher risks when interacting with simulated malicious users. Our findings highlight the ongoing challenge of building agents that can safely navigate complex interactions, particularly when faced with malicious users. To foster the AI agent safety ecosystem, we release a code platform that allows practitioners to create custom scenarios, simulate interactions, and evaluate the safety and performance of their agents., Comment: Both the second and third authors contributed equally
- Published
- 2024
12. SoccerNet 2024 Challenges Results
- Author
-
Cioppa, Anthony, Giancola, Silvio, Somers, Vladimir, Joos, Victor, Magera, Floriane, Held, Jan, Ghasemzadeh, Seyed Abolfazl, Zhou, Xin, Seweryn, Karolina, Kowalczyk, Mateusz, Mróz, Zuzanna, Łukasik, Szymon, Hałoń, Michał, Mkhallati, Hassan, Deliège, Adrien, Hinojosa, Carlos, Sanchez, Karen, Mansourian, Amir M., Miralles, Pierre, Barnich, Olivier, De Vleeschouwer, Christophe, Alahi, Alexandre, Ghanem, Bernard, Van Droogenbroeck, Marc, Gorski, Adam, Clapés, Albert, Boiarov, Andrei, Afanasiev, Anton, Xarles, Artur, Scott, Atom, Lim, ByoungKwon, Yeung, Calvin, Gonzalez, Cristian, Rüfenacht, Dominic, Pacilio, Enzo, Deuser, Fabian, Altawijri, Faisal Sami, Cachón, Francisco, Kim, HanKyul, Wang, Haobo, Choe, Hyeonmin, Kim, Hyunwoo J, Kim, Il-Min, Kang, Jae-Mo, Tursunboev, Jamshid, Yang, Jian, Hong, Jihwan, Lee, Jimin, Zhang, Jing, Lee, Junseok, Zhang, Kexin, Habel, Konrad, Jiao, Licheng, Li, Linyi, Gutiérrez-Pérez, Marc, Ortega, Marcelo, Li, Menglong, Lopatto, Milosz, Kasatkin, Nikita, Nemtsev, Nikolay, Oswald, Norbert, Udin, Oleg, Kononov, Pavel, Geng, Pei, Alotaibi, Saad Ghazai, Kim, Sehyung, Ulasen, Sergei, Escalera, Sergio, Zhang, Shanshan, Yang, Shuyuan, Moon, Sunghwan, Moeslund, Thomas B., Shandyba, Vasyl, Golovkin, Vladimir, Dai, Wei, Chung, WonTaek, Liu, Xinyu, Zhu, Yongqiang, Kim, Youngseo, Li, Yuan, Yang, Yuting, Xiao, Yuxuan, Cheng, Zehua, and Li, Zhihao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely localizing when and which soccer actions related to the ball occur, (2) Dense Video Captioning, focusing on describing the broadcast with natural language and anchored timestamps, (3) Multi-View Foul Recognition, a novel task focusing on analyzing multiple viewpoints of a potential foul incident to classify whether a foul occurred and assess its severity, (4) Game State Reconstruction, another novel task focusing on reconstructing the game state from broadcast videos onto a 2D top-view map of the field. Detailed information about the tasks, challenges, and leaderboards can be found at https://www.soccer-net.org, with baselines and development kits available at https://github.com/SoccerNet., Comment: 7 pages, 1 figure
- Published
- 2024
13. MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation
- Author
-
Kim, Hyunwoo, Lang, Itai, Aigerman, Noam, Groueix, Thibault, Kim, Vladimir G., and Hanocka, Rana
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics - Abstract
We propose MeshUp, a technique that deforms a 3D mesh towards multiple target concepts, and intuitively controls the region where each concept is expressed. Conveniently, the concepts can be defined as either text queries, e.g., "a dog" and "a turtle," or inspirational images, and the local regions can be selected as any number of vertices on the mesh. We can effectively control the influence of the concepts and mix them together using a novel score distillation approach, referred to as the Blended Score Distillation (BSD). BSD operates on each attention layer of the denoising U-Net of a diffusion model as it extracts and injects the per-objective activations into a unified denoising pipeline from which the deformation gradients are calculated. To localize the expression of these activations, we create a probabilistic Region of Interest (ROI) map on the surface of the mesh, and turn it into 3D-consistent masks that we use to control the expression of these activations. We demonstrate the effectiveness of BSD empirically and show that it can deform various meshes towards multiple objectives. Our project page is at https://threedle.github.io/MeshUp., Comment: Project page: https://threedle.github.io/MeshUp
- Published
- 2024
14. Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization
- Author
-
Lim, Geuntaek, Kim, Hyunwoo, Kim, Joonsoo, and Choi, Yukyung
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Weakly supervised temporal action localization (WTAL) aims to detect action instances in untrimmed videos using only video-level annotations. Since many existing works optimize WTAL models based on action classification labels, they encounter the task discrepancy problem (i.e., localization-by-classification). To tackle this issue, recent studies have attempted to utilize action category names as auxiliary semantic knowledge through vision-language pre-training (VLP). However, there are still areas where existing research falls short. Previous approaches primarily focused on leveraging textual information from language models but overlooked the alignment of dynamic human action and VLP knowledge in a joint space. Furthermore, the deterministic representation employed in previous studies struggles to capture fine-grained human motions. To address these problems, we propose a novel framework that aligns human action knowledge and VLP knowledge in a probabilistic embedding space. Moreover, we propose intra- and inter-distribution contrastive learning to enhance the probabilistic embedding space based on statistical similarities. Extensive experiments and ablation studies reveal that our method significantly outperforms all previous state-of-the-art methods. Code is available at https://github.com/sejong-rcv/PVLR., Comment: Accepted to ACM MM 2024
- Published
- 2024
15. Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble
- Author
-
Cha, Juhan, Joo, Minseok, Park, Jihwan, Lee, Sanghyeok, Kim, Injae, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advancements in 3D object detection have benefited from multi-modal information from the multi-view cameras and LiDAR sensors. However, the inherent disparities between the modalities pose substantial challenges. We observe that existing multi-modal 3D object detection methods heavily rely on the LiDAR sensor, treating the camera as an auxiliary modality for augmenting semantic details. This often leads to not only underutilization of camera data but also significant performance degradation in scenarios where LiDAR data is unavailable. Additionally, existing fusion methods overlook the detrimental impact of sensor noise induced by environmental changes, on detection performance. In this paper, we propose MEFormer to address the LiDAR over-reliance problem by harnessing critical information for 3D object detection from every available modality while concurrently safeguarding against corrupted signals during the fusion process. Specifically, we introduce Modality Agnostic Decoding (MOAD) that extracts geometric and semantic features with a shared transformer decoder regardless of input modalities and provides promising improvement with a single modality as well as multi-modality. Additionally, our Proximity-based Modality Ensemble (PME) module adaptively utilizes the strengths of each modality depending on the environment while mitigating the effects of a noisy sensor. Our MEFormer achieves state-of-the-art performance of 73.9% NDS and 71.5% mAP in the nuScenes validation set. Extensive analyses validate that our MEFormer improves robustness against challenging conditions such as sensor malfunctions or environmental changes. The source code is available at https://github.com/hanchaa/MEFormer
- Published
- 2024
16. Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems
- Author
-
Lee, Sojin, Park, Dogyun, Kong, Inho, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent studies on inverse problems have proposed posterior samplers that leverage the pre-trained diffusion models as powerful priors. These attempts have paved the way for using diffusion models in a wide range of inverse problems. However, the existing methods entail computationally demanding iterative sampling procedures and optimize a separate solution for each measurement, which leads to limited scalability and lack of generalization capability across unseen samples. To address these limitations, we propose a novel approach, Diffusion prior-based Amortized Variational Inference (DAVI) that solves inverse problems with a diffusion prior from an amortized variational inference perspective. Specifically, instead of separate measurement-wise optimization, our amortized inference learns a function that directly maps measurements to the implicit posterior distributions of corresponding clean data, enabling a single-step posterior sampling even for unseen measurements. Extensive experiments on image restoration tasks, e.g., Gaussian deblur, 4$\times$ super-resolution, and box inpainting with two benchmark datasets, demonstrate our approach's superior performance over strong baselines. Code is available at https://github.com/mlvlab/DAVI., Comment: ECCV 2024; 41 pages, 19 figures
- Published
- 2024
17. Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User Satisfaction
- Author
-
Kim, Hyunwoo, Choi, Yoonseo, Yang, Taehyun, Lee, Honggu, Park, Chaneon, Lee, Yongju, Kim, Jin Young, and Kim, Juho
- Subjects
Computer Science - Human-Computer Interaction ,Computer Science - Information Retrieval - Abstract
With large language models (LLMs), conversational search engines shift how users retrieve information from the web by enabling natural conversations to express their search intents over multiple turns. Users' natural conversation embodies rich but implicit signals of users' search intents and evaluation of search results to understand user experience with the system. However, it is underexplored how and why users ask follow-up queries to continue conversations with conversational search engines and how the follow-up queries signal users' satisfaction. From qualitative analysis of 250 conversational turns from an in-lab user evaluation of Naver Cue:, a commercial conversational search engine, we propose a taxonomy of 18 users' follow-up query patterns from conversational search, comprising two major axes: (1) users' motivations behind continuing conversations (N = 7) and (2) actions of follow-up queries (N = 11). Compared to the existing literature on query reformulations, we uncovered a new set of motivations and actions behind follow-up queries, including asking for subjective opinions or providing natural language feedback on the engine's responses. To analyze conversational search logs with our taxonomy in a scalable and efficient manner, we built an LLM-powered classifier (73% accuracy). With our classifier, we analyzed 2,061 conversational tuples collected from real-world usage logs of Cue: and examined how the conversation patterns from our taxonomy correlates with satisfaction. Our initial findings suggest some signals of dissatisfactions, such as Clarifying Queries, Excluding Condition, and Substituting Condition with follow-up queries. We envision our approach could contribute to automated evaluation of conversation search experience by providing satisfaction signals and grounds for realistic user simulations., Comment: Accepted to LLM4Eval @ SIGIR 2024 - The First Workshop on Large Language Models (LLMs) for Evaluation in Information Retrieval
- Published
- 2024
18. Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models
- Author
-
Jung, Chani, Kim, Dongkwan, Jin, Jiho, Kim, Jiseon, Seonwoo, Yeon, Choi, Yejin, Oh, Alice, and Kim, Hyunwoo
- Subjects
Computer Science - Computation and Language - Abstract
While humans naturally develop theory of mind (ToM), the capability to understand other people's mental states and beliefs, state-of-the-art large language models (LLMs) underperform on simple ToM benchmarks. We posit that we can extend our understanding of LLMs' ToM abilities by evaluating key human ToM precursors$-$perception inference and perception-to-belief inference$-$in LLMs. We introduce two datasets, Percept-ToMi and Percept-FANToM, to evaluate these precursory inferences for ToM in LLMs by annotating characters' perceptions on ToMi and FANToM, respectively. Our evaluation of eight state-of-the-art LLMs reveals that the models generally perform well in perception inference while exhibiting limited capability in perception-to-belief inference (e.g., lack of inhibitory control). Based on these results, we present PercepToM, a novel ToM method leveraging LLMs' strong perception inference capability while supplementing their limited perception-to-belief inference. Experimental results demonstrate that PercepToM significantly enhances LLM's performance, especially in false belief scenarios.
- Published
- 2024
19. Conductive-bridge interlayer contacts for two-dimensional optoelectronic devices
- Author
-
Jang, Jisu, Hong, Jung Pyo, Kim, Sang-Jun, Ahn, Jongtae, Yu, Byoung-Soo, Han, Jaewon, Lee, Kihyun, Ha, Aelim, Yoon, Eunki, Kim, Wonsik, Jo, Suyeon, Ko, Hyun Woo, Yoon, Seon Kyu, Taniguchi, Takashi, Watanabe, Kenji, Baek, Hogil, Kim, Dae-Yeon, Lee, Kimoon, Mun, Sungchul, Lee, Kyu Hyoung, Park, Soohyung, Kim, Kwanpyo, Song, Young Jae, Lee, Seung Ah, Kim, Hyunwoo J., Shim, Jae Won, Wang, Gunuk, Kang, Ji-Hoon, Park, Min-Chul, and Hwang, Do Kyung
- Published
- 2025
- Full Text
- View/download PDF
20. Longitudinal Treatment Patterns of Chorea in North American Patients with Huntington’s Disease: Data from Enroll-HD
- Author
-
Stimming, Erin Furr, Claassen, Daniel O., Sen, Ginny P., Klepitskaya, Olga, Serbin, Michael, Kim, Hyunwoo, Hinton, Sean C., and Haubenberger, Dietrich
- Published
- 2025
- Full Text
- View/download PDF
21. A Review of 25 Spontaneous and Dynamic Facial Expression Databases of Basic Emotions
- Author
-
Kim, Hyunwoo, Bian, Yifan, and Krumhuber, Eva G.
- Published
- 2025
- Full Text
- View/download PDF
22. Theoretical Modeling on Three Operation Modes of a Scramjet Isolator
- Author
-
Yun, Donggyu, Chun, Hoseok, Kim, Hyunwoo, and Sung, Hong-Gye
- Published
- 2025
- Full Text
- View/download PDF
23. Development of funnel-type fluidized bed reactor system using microcarriers for cultures of adherent cells
- Author
-
Park, Seohyun, Kim, Hyunwoo, and Oh, Duk Jae
- Published
- 2024
- Full Text
- View/download PDF
24. Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems
- Author
-
Lee, Sojin, Park, Dogyun, Kong, Inho, Kim, Hyunwoo J., Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
25. Understanding Multi-compositional Learning in Vision and Language Models via Category Theory
- Author
-
Chytas, Sotirios Panagiotis, Kim, Hyunwoo J., Singh, Vikas, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
26. The integration of heterogeneous resources in the CMS Submission Infrastructure for the LHC Run 3 and beyond
- Author
-
Yzquierdo, Antonio Perez-Calero, Mascheroni, Marco, Kizinevic, Edita, Khan, Farrukh Aftab, Kim, Hyunwoo, Flechas, Maria Acosta, Tsipinakis, Nikos, and Haleem, Saqib
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
While the computing landscape supporting LHC experiments is currently dominated by x86 processors at WLCG sites, this configuration will evolve in the coming years. LHC collaborations will be increasingly employing HPC and Cloud facilities to process the vast amounts of data expected during the LHC Run 3 and the future HL-LHC phase. These facilities often feature diverse compute resources, including alternative CPU architectures like ARM and IBM Power, as well as a variety of GPU specifications. Using these heterogeneous resources efficiently is thus essential for the LHC collaborations reaching their future scientific goals. The Submission Infrastructure (SI) is a central element in CMS Computing, enabling resource acquisition and exploitation by CMS data processing, simulation and analysis tasks. The SI must therefore be adapted to ensure access and optimal utilization of this heterogeneous compute capacity. Some steps in this evolution have been already taken, as CMS is currently using opportunistically a small pool of GPU slots provided mainly at the CMS WLCG sites. Additionally, Power9 processors have been validated for CMS production at the Marconi-100 cluster at CINECA. This note will describe the updated capabilities of the SI to continue ensuring the efficient allocation and use of computing resources by CMS, despite their increasing diversity. The next steps towards a full integration and support of heterogeneous resources according to CMS needs will also be reported., Comment: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS - 2023
- Published
- 2024
- Full Text
- View/download PDF
27. Adoption of a token-based authentication model for the CMS Submission Infrastructure
- Author
-
Yzquierdo, Antonio Perez-Calero, Mascheroni, Marco, Kizinevic, Edita, Khan, Farrukh Aftab, Kim, Hyunwoo, Flechas, Maria Acosta, Tsipinakis, Nikos, Haleem, Saqib, and Wurthwein, Frank
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
The CMS Submission Infrastructure (SI) is the main computing resource provisioning system for CMS workloads. A number of HTCondor pools are employed to manage this infrastructure, which aggregates geographically distributed resources from the WLCG and other providers. Historically, the model of authentication among the diverse components of this infrastructure has relied on the Grid Security Infrastructure (GSI), based on identities and X509 certificates. In contrast, commonly used modern authentication standards are based on capabilities and tokens. The WLCG has identified this trend and aims at a transparent replacement of GSI for all its workload management, data transfer and storage access operations, to be completed during the current LHC Run 3. As part of this effort, and within the context of CMS computing, the Submission Infrastructure group is in the process of phasing out the GSI part of its authentication layers, in favor of IDTokens and Scitokens. The use of tokens is already well integrated into the HTCondor Software Suite, which has allowed us to fully migrate the authentication between internal components of SI. Additionally, recent versions of the HTCondor-CE support tokens as well, enabling CMS resource requests to Grid sites employing this CE technology to be granted by means of token exchange. After a rollout campaign to sites, successfully completed by the third quarter of 2022, the totality of HTCondor CEs in use by CMS are already receiving Scitoken-based pilot jobs. On the ARC CE side, a parallel campaign was launched to foster the adoption of the REST interface at CMS sites (required to enable token-based job submission via HTCondor-G), which is nearing completion as well. In this contribution, the newly adopted authentication model will be described. We will then report on the migration status and final steps towards complete GSI phase out in the CMS SI., Comment: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS - 2023
- Published
- 2024
- Full Text
- View/download PDF
28. Repurposing of the Run 2 CMS High Level Trigger Infrastructure as a Cloud Resource for Offline Computing
- Author
-
Mascheroni, Marco, Yzquierdo, Antonio Perez-Calero, Kizinevic, Edita, Khan, Farrukh Aftab, Kim, Hyunwoo, Flechas, Maria Acosta, Tsipinakis, Nikos, Haleem, Saqib, Spiga, Damiele, Wissing, Christoph, and Wurthwein, Frank
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
The former CMS Run 2 High Level Trigger (HLT) farm is one of the largest contributors to CMS compute resources, providing about 25k job slots for offline computing. This CPU farm was initially employed as an opportunistic resource, exploited during inter-fill periods, in the LHC Run 2. Since then, it has become a nearly transparent extension of the CMS capacity at CERN, being located on-site at the LHC interaction point 5 (P5), where the CMS detector is installed. This resource has been configured to support the execution of critical CMS tasks, such as prompt detector data reconstruction. It can therefore be used in combination with the dedicated Tier 0 capacity at CERN, in order to process and absorb peaks in the stream of data coming from the CMS detector. The initial configuration for this resource, based on statically configured VMs, provided the required level of functionality. However, regular operations of this cluster revealed certain limitations compared to the resource provisioning and use model employed in the case of WLCG sites. A new configuration, based on a vacuum-like model, has been implemented for this resource in order to solve the detected shortcomings. This paper reports about this redeployment work on the permanent cloud for an enhanced support to CMS offline computing, comparing the former and new models' respective functionalities, along with the commissioning effort for the new setup., Comment: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS - 2023
- Published
- 2024
- Full Text
- View/download PDF
29. HPC resources for CMS offline computing: An integration and scalability challenge for the Submission Infrastructure
- Author
-
Yzquierdo, Antonio Perez-Calero, Mascheroni, Marco, Kizinevic, Edita, Khan, Farrukh Aftab, Kim, Hyunwoo, Flechas, Maria Acosta, Tsipinakis, Nikos, and Haleem, Saqib
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
The computing resource needs of LHC experiments are expected to continue growing significantly during the Run 3 and into the HL-LHC era. The landscape of available resources will also evolve, as High Performance Computing (HPC) and Cloud resources will provide a comparable, or even dominant, fraction of the total compute capacity. The future years present a challenge for the experiments' resource provisioning models, both in terms of scalability and increasing complexity. The CMS Submission Infrastructure (SI) provisions computing resources for CMS workflows. This infrastructure is built on a set of federated HTCondor pools, currently aggregating 400k CPU cores distributed worldwide and supporting the simultaneous execution of over 200k computing tasks. Incorporating HPC resources into CMS computing represents firstly an integration challenge, as HPC centers are much more diverse compared to Grid sites. Secondly, evolving the present SI, dimensioned to harness the current CMS computing capacity, to reach the resource scales required for the HLLHC phase, while maintaining global flexibility and efficiency, will represent an additional challenge for the SI. To preventively address future potential scalability limits, the SI team regularly runs tests to explore the maximum reach of our infrastructure. In this note, the integration of HPC resources into CMS offline computing is summarized, the potential concerns for the SI derived from the increased scale of operations are described, and the most recent results of scalability test on the CMS SI are reported., Comment: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS - 2023
- Published
- 2024
- Full Text
- View/download PDF
30. CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
- Author
-
Li, Huihan, Jiang, Liwei, Hwang, Jena D., Kim, Hyunwoo, Santy, Sebastin, Sorensen, Taylor, Lin, Bill Yuchen, Dziri, Nouha, Ren, Xiang, and Choi, Yejin
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
As the utilization of large language models (LLMs) has proliferated world-wide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations, and extract symbols from these generations that are associated to each culture by the LLM. We discover that culture-conditioned generation consist of linguistic "markers" that distinguish marginalized cultures apart from default cultures. We also discover that LLMs have an uneven degree of diversity in the culture symbols, and that cultures from different geographic regions have different presence in LLMs' culture-agnostic generation. Our findings promote further research in studying the knowledge and fairness of global culture perception in LLMs. Code and Data can be found here: https://github.com/huihanlhh/Culture-Gen/
- Published
- 2024
31. Retrieval-Augmented Open-Vocabulary Object Detection
- Author
-
Kim, Jooyeon, Cho, Eulrang, Kim, Sehyung, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose Retrieval-Augmented Losses and visual Features (RALF). Our method retrieves related 'negative' classes and augments loss functions. Also, visual features are augmented with 'verbalized concepts' of classes, e.g., worn on the feet, handheld music player, and sharp teeth. Specifically, RALF consists of two modules: Retrieval Augmented Losses (RAL) and Retrieval-Augmented visual Features (RAF). RAL constitutes two losses reflecting the semantic similarity with negative vocabularies. In addition, RAF augments visual features with the verbalized concepts from a large language model (LLM). Our experiments demonstrate the effectiveness of RALF on COCO and LVIS benchmark datasets. We achieve improvement up to 3.4 box AP$_{50}^{\text{N}}$ on novel categories of the COCO dataset and 3.6 mask AP$_{\text{r}}$ gains on the LVIS dataset. Code is available at https://github.com/mlvlab/RALF ., Comment: Accepted paper at CVPR 2024
- Published
- 2024
32. Learning Equi-angular Representations for Online Continual Learning
- Author
-
Seo, Minhyuk, Koh, Hyunseo, Jeung, Wonje, Lee, Minjae, Kim, San, Lee, Hankook, Cho, Sungjun, Choi, Sungik, Kim, Hyunwoo, and Choi, Jonghyun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so that the continuously learned model with a single epoch can better fit to the streamed data by proposing preparatory data training and residual correction in the representation space. With an extensive set of empirical validations using CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K, we show that our proposed method outperforms state-of-the-art methods by a noticeable margin in various online continual learning scenarios such as disjoint and Gaussian scheduled continuous (i.e., boundary-free) data setups., Comment: CVPR 2024
- Published
- 2024
33. Prompt Learning via Meta-Regularization
- Author
-
Park, Jinyoung, Ko, Juyeon, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of the pre-trained vision language models is forgotten while the prompts are finetuned on a small data set from a specific target task. To address this issue, we propose a Prompt Meta-Regularization (ProMetaR) to improve the generalizability of prompt learning for vision-language models. Specifically, ProMetaR meta-learns both the regularizer and the soft prompts to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the vision-language models. Further, ProMetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how ProMetaR improves the generalizability of prompt tuning in the perspective of the gradient alignment. Our extensive experiments demonstrate that our ProMetaR improves the generalizability of conventional prompt learning methods under base-to-base/base-to-new and domain generalization settings. The code of ProMetaR is available at https://github.com/mlvlab/ProMetaR., Comment: CVPR 2024
- Published
- 2024
34. Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection
- Author
-
Kim, Jongha, Park, Jihwan, Park, Jinyoung, Kim, Jinyoung, Kim, Sehyung, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of mapping a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to detect every relation, which makes it difficult for a query to specialize in specific relations. Furthermore, a query is also insufficiently trained since a GT is assigned only to a single prediction, therefore near-correct or even correct predictions are suppressed by being assigned no relation as a GT. To address these issues, we propose Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization trains a specialized query by dividing queries and relations into disjoint groups and directing a query in a specific query group solely toward relations in the corresponding relation group. Quality-Aware Multi-Assignment further facilitates the training by assigning a GT to multiple predictions that are significantly close to a GT in terms of a subject, an object, and the relation in between. Experimental results and analyses show that SpeaQ effectively trains specialized queries, which better utilize the capacity of a model, resulting in consistent performance gains with zero additional inference cost across multiple VRD models and benchmarks. Code is available at https://github.com/mlvlab/SpeaQ., Comment: CVPR 2024
- Published
- 2024
35. vid-TLDR: Training Free Token merging for Light-weight Video Transformer
- Author
-
Choi, Joonmyung, Lee, Sanghyeok, Chu, Jaewon, Choi, Minhyuk, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Video Transformers have become the prevalent solution for various video downstream tasks with superior expressive power and flexibility. However, these video transformers suffer from heavy computational costs induced by the massive number of tokens across the entire video frames, which has been the major barrier to training the model. Further, the patches irrelevant to the main contents, e.g., backgrounds, degrade the generalization performance of models. To tackle these issues, we propose training free token merging for lightweight video Transformer (vid-TLDR) that aims to enhance the efficiency of video Transformers by merging the background tokens without additional training. For vid-TLDR, we introduce a novel approach to capture the salient regions in videos only with the attention map. Further, we introduce the saliency-aware token merging strategy by dropping the background tokens and sharpening the object scores. Our experiments show that vid-TLDR significantly mitigates the computational complexity of video Transformers while achieving competitive performance compared to the base model without vid-TLDR. Code is available at https://github.com/mlvlab/vid-TLDR., Comment: Conference on Computer Vision and Pattern Recognition (CVPR), 2024
- Published
- 2024
36. Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
- Author
-
Lee, Sanghyeok, Choi, Joonmyung, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Vision Transformer (ViT) has emerged as a prominent backbone for computer vision. For more efficient ViTs, recent works lessen the quadratic cost of the self-attention layer by pruning or fusing the redundant tokens. However, these works faced the speed-accuracy trade-off caused by the loss of information. Here, we argue that token fusion needs to consider diverse relations between tokens to minimize information loss. In this paper, we propose a Multi-criteria Token Fusion (MCTF), that gradually fuses the tokens based on multi-criteria (e.g., similarity, informativeness, and size of fused tokens). Further, we utilize the one-step-ahead attention, which is the improved approach to capture the informativeness of the tokens. By training the model equipped with MCTF using a token reduction consistency, we achieve the best speed-accuracy trade-off in the image classification (ImageNet1K). Experimental results prove that MCTF consistently surpasses the previous reduction methods with and without training. Specifically, DeiT-T and DeiT-S with MCTF reduce FLOPs by about 44% while improving the performance (+0.5%, and +0.3%) over the base model, respectively. We also demonstrate the applicability of MCTF in various Vision Transformers (e.g., T2T-ViT, LV-ViT), achieving at least 31% speedup without performance degradation. Code is available at https://github.com/mlvlab/MCTF., Comment: Conference on Computer Vision and Pattern Recognition (CVPR), 2024
- Published
- 2024
37. Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
- Author
-
Zhou, Xuhui, Su, Zhe, Eisape, Tiwalayo, Kim, Hyunwoo, and Sap, Maarten
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Recent advances in large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that LLMs perform better in unrealistic, omniscient simulation settings but struggle in ones that more accurately reflect real-world conditions with information asymmetry. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents., Comment: EMNLP 2024
- Published
- 2024
38. Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs
- Author
-
Kassem, Aly M., Mahmoud, Omar, Mireshghallah, Niloofar, Kim, Hyunwoo, Tsvetkov, Yulia, Choi, Yejin, Saad, Sherif, and Rana, Santu
- Subjects
Computer Science - Computation and Language - Abstract
In this paper, we introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent, compared to what is revealed by prompting the target model with the training data directly, which is the dominant approach of quantifying memorization in LLMs. We use an iterative rejection-sampling optimization process to find instruction-based prompts with two main characteristics: (1) minimal overlap with the training data to avoid presenting the solution directly to the model, and (2) maximal overlap between the victim model's output and the training data, aiming to induce the victim to spit out training data. We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements. Our findings show that (1) instruction-tuned models can expose pre-training data as much as their base-models, if not more so, (2) contexts other than the original training data can lead to leakage, and (3) using instructions proposed by other LLMs can open a new avenue of automated attacks that we should further study and explore. The code can be found at https://github.com/Alymostafa/Instruction_based_attack .
- Published
- 2024
39. Profiling protein–protein interactions to predict the efficacy of B-cell-lymphoma-2-homology-3 mimetics for acute myeloid leukaemia
- Author
-
Chun, Changju, Byun, Ja Min, Cha, Minkwon, Lee, Hongwon, Choi, Byungsan, Kim, Hyunwoo, Hong, Saem, Lee, Yunseo, Park, Hayoung, Koh, Youngil, and Yoon, Tae-Young
- Published
- 2024
- Full Text
- View/download PDF
40. Nitric oxide-releasing albumin/chondroitin sulfate bioadhesive dressing for the treatment of MRSA-infected wounds
- Author
-
Kim, Hyunwoo, Lee, Juho, Kwak, Dongmin, Kim, Jihyun, Kwon, Mina, Kim, Ki Su, and Yoo, Jin-Wook
- Published
- 2024
- Full Text
- View/download PDF
41. Characterization of 3D printed plates using ultrasounds
- Author
-
Kim, Hyunwoo, Cho, Younho, and Kim, Young H.
- Published
- 2024
- Full Text
- View/download PDF
42. Stochastic Conditional Diffusion Models for Robust Semantic Image Synthesis
- Author
-
Ko, Juyeon, Kong, Inho, Park, Dogyun, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Semantic image synthesis (SIS) is a task to generate realistic images corresponding to semantic maps (labels). However, in real-world applications, SIS often encounters noisy user inputs. To address this, we propose Stochastic Conditional Diffusion Model (SCDM), which is a robust conditional diffusion model that features novel forward and generation processes tailored for SIS with noisy labels. It enhances robustness by stochastically perturbing the semantic label maps through Label Diffusion, which diffuses the labels with discrete diffusion. Through the diffusion of labels, the noisy and clean semantic maps become similar as the timestep increases, eventually becoming identical at $t=T$. This facilitates the generation of an image close to a clean image, enabling robust generation. Furthermore, we propose a class-wise noise schedule to differentially diffuse the labels depending on the class. We demonstrate that the proposed method generates high-quality samples through extensive experiments and analyses on benchmark datasets, including a novel experimental setup simulating human errors during real-world applications. Code is available at https://github.com/mlvlab/SCDM., Comment: ICML 2024
- Published
- 2024
43. Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models
- Author
-
Sicilia, Anthony, Kim, Hyunwoo, Chandu, Khyathi Raghavi, Alikhani, Malihe, and Hessel, Jack
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Effective interlocutors account for the uncertain goals, beliefs, and emotions of others. But even the best human conversationalist cannot perfectly anticipate the trajectory of a dialogue. How well can language models represent inherent uncertainty in conversations? We propose FortUne Dial, an expansion of the long-standing "conversation forecasting" task: instead of just accuracy, evaluation is conducted with uncertainty-aware metrics, effectively enabling abstention on individual instances. We study two ways in which language models potentially represent outcome uncertainty (internally, using scores and directly, using tokens) and propose fine-tuning strategies to improve calibration of both representations. Experiments on eight difficult negotiation corpora demonstrate that our proposed fine-tuning strategies (a traditional supervision strategy and an off-policy reinforcement learning strategy) can calibrate smaller open-source models to compete with pre-trained models 10x their size., Comment: 2 Figures; 7 Tables; 27 pages
- Published
- 2024
44. DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations
- Author
-
Park, Dogyun, Kim, Sihyeon, Lee, Sojin, and Kim, Hyunwoo J.
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Recent studies have introduced a new class of generative models for synthesizing implicit neural representations (INRs) that capture arbitrary continuous signals in various domains. These models opened the door for domain-agnostic generative models, but they often fail to achieve high-quality generation. We observed that the existing methods generate the weights of neural networks to parameterize INRs and evaluate the network with fixed positional embeddings (PEs). Arguably, this architecture limits the expressive power of generative models and results in low-quality INR generation. To address this limitation, we propose Domain-agnostic Latent Diffusion Model for INRs (DDMI) that generates adaptive positional embeddings instead of neural networks' weights. Specifically, we develop a Discrete-to-continuous space Variational AutoEncoder (D2C-VAE), which seamlessly connects discrete data and the continuous signal functions in the shared latent space. Additionally, we introduce a novel conditioning mechanism for evaluating INRs with the hierarchically decomposed PEs to further enhance expressive power. Extensive experiments across four modalities, e.g., 2D images, 3D shapes, Neural Radiance Fields, and videos, with seven benchmark datasets, demonstrate the versatility of DDMI and its superior performance compared to the existing INR generative models.
- Published
- 2024
45. UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection
- Author
-
Kim, Bumsoo, Choi, Taeho, Kang, Jaewoo, and Kim, Hyunwoo J.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advances in deep neural networks have achieved significant progress in detecting individual objects from an image. However, object detection is not sufficient to fully understand a visual scene. Towards a deeper visual understanding, the interactions between objects, especially humans and objects are essential. Most prior works have obtained this information with a bottom-up approach, where the objects are first detected and the interactions are predicted sequentially by pairing the objects. This is a major bottleneck in HOI detection inference time. To tackle this problem, we propose UnionDet, a one-stage meta-architecture for HOI detection powered by a novel union-level detector that eliminates this additional inference stage by directly capturing the region of interaction. Our one-stage detector for human-object interaction shows a significant reduction in interaction prediction time 4x~14x while outperforming state-of-the-art methods on two public datasets: V-COCO and HICO-DET., Comment: ECCV 2020
- Published
- 2023
46. Acute myeloid leukemia and myelodysplastic neoplasms: clinical implications of myelodysplasia-related genes mutations and TP53 aberrations
- Author
-
Kim, Hyunwoo, Lee, Ja Young, Yu, Shinae, Yoo, Eunkyoung, Kim, Hye Ran, Lee, Sang Min, and Lee, Won Sik
- Published
- 2024
- Full Text
- View/download PDF
47. Identifying a key spot for electron mediator-interaction to tailor CO dehydrogenase’s affinity
- Author
-
Kim, Suk Min, Kang, Sung Heuck, Lee, Jinhee, Heo, Yoonyoung, Poloniataki, Eleni G., Kang, Jingu, Yoon, Hye-Jin, Kong, So Yeon, Yun, Yaejin, Kim, Hyunwoo, Ryu, Jungki, Lee, Hyung Ho, and Kim, Yong Hwan
- Published
- 2024
- Full Text
- View/download PDF
48. Structural effects of asymmetric magnet shape on performance of surface permanent magnet synchronous motors
- Author
-
Choe, Jungwoo, Kwon, Hyuksung, Kim, Hyunwoo, Koo, Doheon, and So, Hongyun
- Published
- 2024
- Full Text
- View/download PDF
49. PCB-based digital microfluidic platform for droplet mixing on an open surface
- Author
-
Kim, Hyunwoo, Chung, Sang Kug, and Lee, Jeongmin
- Published
- 2024
- Full Text
- View/download PDF
50. Graph Elicitation for Guiding Multi-Step Reasoning in Large Language Models
- Author
-
Park, Jinyoung, Patel, Ameen, Khan, Omar Zia, Kim, Hyunwoo J., and Kim, Joo-Kyung
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Chain-of-Thought (CoT) prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities of Large Language Models (LLMs). However, prompting the LLMs to directly generate sub-questions is suboptimal since they sometimes generate redundant or irrelevant questions. To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-questions and corresponding answers. Concretely, given an input question, we first prompt the LLM to generate knowledge triplets, forming a graph representation of the question. Unlike conventional knowledge triplets, our approach allows variables as head or tail entities, effectively representing a question as knowledge triplets. Second, for each triplet, the LLM generates a corresponding sub-question and answer along with using knowledge retrieval. If the prediction confidence exceeds a threshold, the sub-question and prediction are incorporated into the prompt for subsequent processing. This approach encourages that sub-questions are grounded in the extracted knowledge triplets, reducing redundancy and irrelevance. Our experiments demonstrate that our approach outperforms previous CoT prompting methods and their variants on multi-hop question answering benchmark datasets., Comment: Preprint
- Published
- 2023
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.