Author: "Luu, P." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Luu, P."' showing total 5,572 results

Start Over Author "Luu, P."

5,572 results on '"Luu, P."'

1. ModeDreamer: Mode Guiding Score Distillation for Text-to-3D Generation using Reference Image Prompts

Author: Tran, Uy Dieu, Luu, Minh, Nguyen, Phong Ha, Nguyen, Khoi, and Hua, Binh-Son
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing Score Distillation Sampling (SDS)-based methods have driven significant progress in text-to-3D generation. However, 3D models produced by SDS-based methods tend to exhibit over-smoothing and low-quality outputs. These issues arise from the mode-seeking behavior of current methods, where the scores used to update the model oscillate between multiple modes, resulting in unstable optimization and diminished output quality. To address this problem, we introduce a novel image prompt score distillation loss named ISD, which employs a reference image to direct text-to-3D optimization toward a specific mode. Our ISD loss can be implemented by using IP-Adapter, a lightweight adapter for integrating image prompt capability to a text-to-image diffusion model, as a mode-selection module. A variant of this adapter, when not being prompted by a reference image, can serve as an efficient control variate to reduce variance in score estimates, thereby enhancing both output quality and optimization stability. Our experiments demonstrate that the ISD loss consistently achieves visually coherent, high-quality outputs and improves optimization speed compared to prior text-to-3D methods, as demonstrated through both qualitative and quantitative evaluations on the T3Bench benchmark suite.
Published: 2024

2. Curriculum Demonstration Selection for In-Context Learning

Author: Vu, Duc Anh, Duy, Nguyen Tran Cong, Wu, Xiaobao, Nhat, Hoang Minh, Mingzhe, Du, Thong, Nguyen Thanh, and Luu, Anh Tuan
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) have shown strong in-context learning (ICL) abilities with a few demonstrations. However, one critical challenge is how to select demonstrations to elicit the full potential of LLMs. In this paper, we propose Curriculum Demonstration Selection (CDS), a novel demonstration selection method for ICL. Instead of merely using similarity, CDS additionally partitions samples by their complexity measurements. Following curriculum learning, CDS then selects demonstrations from easy to difficult. Thus the selected demonstrations cover a wide range of difficulty levels, enabling LLMs to learn from varied complexities within the training set. Experiments demonstrate that our CDS consistently outperforms baseline methods, achieving notable improvements across nine LLMs on three benchmarks. Moreover, CDS proves especially effective in enhancing LLM performance in solving challenging problems., Comment: Accepted at the 40th ACM/SIGAPP Symposium On Applied Computing (SAC 2025), Main Conference
Published: 2024

3. HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation

Author: Nguyen, Trong-Thuan, Nguyen, Pha, Cothren, Jackson, Yilmaz, Alper, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal LLMs have advanced vision-language tasks but still struggle with understanding video scenes. To bridge this gap, Video Scene Graph Generation (VidSGG) has emerged to capture multi-object relationships across video frames. However, prior methods rely on pairwise connections, limiting their ability to handle complex multi-object interactions and reasoning. To this end, we propose Multimodal LLMs on a Scene HyperGraph (HyperGLM), promoting reasoning about multi-way interactions and higher-order relationships. Our approach uniquely integrates entity scene graphs, which capture spatial relationships between objects, with a procedural graph that models their causal transitions, forming a unified HyperGraph. Significantly, HyperGLM enables reasoning by injecting this unified HyperGraph into LLMs. Additionally, we introduce a new Video Scene Graph Reasoning (VSGR) dataset featuring 1.9M frames from third-person, egocentric, and drone views and supports five tasks: Scene Graph Generation, Scene Graph Anticipation, Video Question Answering, Video Captioning, and Relation Reasoning. Empirically, HyperGLM consistently outperforms state-of-the-art methods across five tasks, effectively modeling and reasoning complex relationships in diverse video scenes.
Published: 2024

4. An Attempt to Develop a Neural Parser based on Simplified Head-Driven Phrase Structure Grammar on Vietnamese

Author: Nguyen, Duc-Vu, Phan, Thang Chau, Nguyen, Quoc-Nam, Van Nguyen, Kiet, and Nguyen, Ngan Luu-Thuy
Subjects: Computer Science - Computation and Language
Abstract: In this paper, we aimed to develop a neural parser for Vietnamese based on simplified Head-Driven Phrase Structure Grammar (HPSG). The existing corpora, VietTreebank and VnDT, had around 15% of constituency and dependency tree pairs that did not adhere to simplified HPSG rules. To attempt to address the issue of the corpora not adhering to simplified HPSG rules, we randomly permuted samples from the training and development sets to make them compliant with simplified HPSG. We then modified the first simplified HPSG Neural Parser for the Penn Treebank by replacing it with the PhoBERT or XLM-RoBERTa models, which can encode Vietnamese texts. We conducted experiments on our modified VietTreebank and VnDT corpora. Our extensive experiments showed that the simplified HPSG Neural Parser achieved a new state-of-the-art F-score of 82% for constituency parsing when using the same predicted part-of-speech (POS) tags as the self-attentive constituency parser. Additionally, it outperformed previous studies in dependency parsing with a higher Unlabeled Attachment Score (UAS). However, our parser obtained lower Labeled Attachment Score (LAS) scores likely due to our focus on arc permutation without changing the original labels, as we did not consult with a linguistic expert. Lastly, the research findings of this paper suggest that simplified HPSG should be given more attention to linguistic expert when developing treebanks for Vietnamese natural language processing., Comment: Accepted at SoICT 2024
Published: 2024

5. Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking

Author: Nguyen, Phuc, Luu, Minh, Tran, Anh, Pham, Cuong, and Nguyen, Khoi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing 3D instance segmentation methods frequently encounter issues with over-segmentation, leading to redundant and inaccurate 3D proposals that complicate downstream tasks. This challenge arises from their unsupervised merging approach, where dense 2D instance masks are lifted across frames into point clouds to form 3D candidate proposals without direct supervision. These candidates are then hierarchically merged based on heuristic criteria, often resulting in numerous redundant segments that fail to combine into precise 3D proposals. To overcome these limitations, we propose a 3D-Aware 2D Mask Tracking module that uses robust 3D priors from a 2D mask segmentation and tracking foundation model (SAM-2) to ensure consistent object masks across video frames. Rather than merging all visible superpoints across views to create a 3D mask, our 3D Mask Optimization module leverages a dynamic programming algorithm to select an optimal set of views, refining the superpoints to produce a final 3D proposal for each object. Our approach achieves comprehensive object coverage within the scene while reducing unnecessary proposals, which could otherwise impair downstream applications. Evaluations on ScanNet200 and ScanNet++ confirm the effectiveness of our method, with improvements across Class-Agnostic, Open-Vocabulary, and Open-Ended 3D Instance Segmentation tasks., Comment: Project page: https://any3dis.github.io/
Published: 2024

6. Will an AI with Private Information Allow Itself to Be Switched Off?

Author: Garber, Andrew, Subramani, Rohan, Luu, Linus, Bedaywi, Mark, Russell, Stuart, and Emmons, Scott
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: A wide variety of goals could cause an AI to disable its off switch because "you can't fetch the coffee if you're dead" (Russell 2019). Prior theoretical work on this shutdown problem assumes that humans know everything that AIs do. In practice, however, humans have only limited information. Moreover, in many of the settings where the shutdown problem is most concerning, AIs might have vast amounts of private information. To capture these differences in knowledge, we introduce the Partially Observable Off-Switch Game (POSG), a game-theoretic model of the shutdown problem with asymmetric information. Unlike when the human has full observability, we find that in optimal play, even AI agents assisting perfectly rational humans sometimes avoid shutdown. As expected, increasing the amount of communication or information available always increases (or leaves unchanged) the agents' expected common payoff. But counterintuitively, introducing bounded communication can make the AI defer to the human less in optimal play even though communication mitigates information asymmetry. In particular, communication sometimes enables new optimal behavior requiring strategic AI deference to achieve outcomes that were previously inaccessible. Thus, designing safe artificial agents in the presence of asymmetric information requires careful consideration of the tradeoffs between maximizing payoffs (potentially myopically) and maintaining AIs' incentives to defer to humans.
Published: 2024

7. COBRA: A Continual Learning Approach to Vision-Brain Understanding

Author: Nguyen, Xuan-Bac, Choudhary, Arabinda Kumar, Sinha, Pawan, Li, Xin, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-Brain Understanding (VBU) aims to extract visual information perceived by humans from brain activity recorded through functional Magnetic Resonance Imaging (fMRI). Despite notable advancements in recent years, existing studies in VBU continue to face the challenge of catastrophic forgetting, where models lose knowledge from prior subjects as they adapt to new ones. Addressing continual learning in this field is, therefore, essential. This paper introduces a novel framework called Continual Learning for Vision-Brain (COBRA) to address continual learning in VBU. Our approach includes three novel modules: a Subject Commonality (SC) module, a Prompt-based Subject Specific (PSS) module, and a transformer-based module for fMRI, denoted as MRIFormer module. The SC module captures shared vision-brain patterns across subjects, preserving this knowledge as the model encounters new subjects, thereby reducing the impact of catastrophic forgetting. On the other hand, the PSS module learns unique vision-brain patterns specific to each subject. Finally, the MRIFormer module contains a transformer encoder and decoder that learns the fMRI features for VBU from common and specific patterns. In a continual learning setup, COBRA is trained in new PSS and MRIFormer modules for new subjects, leaving the modules of previous subjects unaffected. As a result, COBRA effectively addresses catastrophic forgetting and achieves state-of-the-art performance in both continual learning and vision-brain reconstruction tasks, surpassing previous methods.
Published: 2024

8. Construction and Preliminary Validation of a Dynamic Programming Concept Inventory

Author: Ferland, Matthew, Rao, Varun Nagaraj, Arora, Arushi, van der Poel, Drew, Luu, Michael, Huynh, Randy, Reiber, Freddy, Ossman, Sandra, Poulsen, Seth, and Shindler, Michael
Subjects: Computer Science - Data Structures and Algorithms, Computer Science - Computers and Society
Abstract: Concept inventories are standardized assessments that evaluate student understanding of key concepts within academic disciplines. While prevalent across STEM fields, their development lags for advanced computer science topics like dynamic programming (DP) -- an algorithmic technique that poses significant conceptual challenges for undergraduates. To fill this gap, we developed and validated a Dynamic Programming Concept Inventory (DPCI). We detail the iterative process used to formulate multiple-choice questions targeting known student misconceptions about DP concepts identified through prior research studies. We discuss key decisions, tradeoffs, and challenges faced in crafting probing questions to subtly reveal these conceptual misunderstandings. We conducted a preliminary psychometric validation by administering the DPCI to 172 undergraduate CS students finding our questions to be of appropriate difficulty and effectively discriminating between differing levels of student understanding. Taken together, our validated DPCI will enable instructors to accurately assess student mastery of DP. Moreover, our approach for devising a concept inventory for an advanced theoretical computer science concept can guide future efforts to create assessments for other under-evaluated areas currently lacking coverage., Comment: Accepted to SIGCSE 2025
Published: 2024

9. FLRNet: A Deep Learning Method for Regressive Reconstruction of Flow Field From Limited Sensor Measurements

Author: Nguyen, Phong C. H., Choi, Joseph B., and Luu, Quang-Trung
Subjects: Physics - Fluid Dynamics, Computer Science - Machine Learning
Abstract: Many applications in computational and experimental fluid mechanics require effective methods for reconstructing the flow fields from limited sensor data. However, this task remains a significant challenge because the measurement operator, which provides the punctual sensor measurement for a given state of the flow field, is often ill-conditioned and non-invertible. This issue impedes the feasibility of identifying the forward map, theoretically the inverse of the measurement operator, for field reconstruction purposes. While data-driven methods are available, their generalizability across different flow conditions (\textit{e.g.,} different Reynold numbers) remains questioned. Moreover, they frequently face the problem of spectral bias, which leads to smooth and blurry reconstructed fields, thereby decreasing the accuracy of reconstruction. We introduce FLRNet, a deep learning method for flow field reconstruction from sparse sensor measurements. FLRNet employs an variational autoencoder with Fourier feature layers and incorporates an extra perceptual loss term during training to learn a rich, low-dimensional latent representation of the flow field. The learned latent representation is then correlated to the sensor measurement using a fully connected (dense) network. We validated the reconstruction capability and the generalizability of FLRNet under various fluid flow conditions and sensor configurations, including different sensor counts and sensor layouts. Numerical experiments show that in all tested scenarios, FLRNet consistently outperformed other baselines, delivering the most accurate reconstructed flow field and being the most robust to noise.
Published: 2024

10. Transformer-Based Contextualized Language Models Joint with Neural Networks for Natural Language Inference in Vietnamese

Author: Nguyen, Dat Van-Thanh, Van Huynh, Tin, Van Nguyen, Kiet, and Nguyen, Ngan Luu-Thuy
Subjects: Computer Science - Computation and Language
Abstract: Natural Language Inference (NLI) is a task within Natural Language Processing (NLP) that holds value for various AI applications. However, there have been limited studies on Natural Language Inference in Vietnamese that explore the concept of joint models. Therefore, we conducted experiments using various combinations of contextualized language models (CLM) and neural networks. We use CLM to create contextualized work presentations and use Neural Networks for classification. Furthermore, we have evaluated the strengths and weaknesses of each joint model and identified the model failure points in the Vietnamese context. The highest F1 score in this experiment, up to 82.78% in the benchmark dataset (ViNLI). By conducting experiments with various models, the most considerable size of the CLM is XLM-R (355M). That combination has consistently demonstrated superior performance compared to fine-tuning strong pre-trained language models like PhoBERT (+6.58%), mBERT (+19.08%), and XLM-R (+0.94%) in terms of F1-score. This article aims to introduce a novel approach or model that attains improved performance for Vietnamese NLI. Overall, we find that the joint approach of CLM and neural networks is simple yet capable of achieving high-quality performance, which makes it suitable for applications that require efficient resource utilization.
Published: 2024

11. Quantum-Brain: Quantum-Inspired Neural Network Approach to Vision-Brain Understanding

Author: Nguyen, Hoang-Quan, Nguyen, Xuan-Bac, Churchill, Hugh, Choudhary, Arabinda Kumar, Sinha, Pawan, Khan, Samee U., and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-brain understanding aims to extract semantic information about brain signals from human perceptions. Existing deep learning methods for vision-brain understanding are usually introduced in a traditional learning paradigm missing the ability to learn the connectivities between brain regions. Meanwhile, the quantum computing theory offers a new paradigm for designing deep learning models. Motivated by the connectivities in the brain signals and the entanglement properties in quantum computing, we propose a novel Quantum-Brain approach, a quantum-inspired neural network, to tackle the vision-brain understanding problem. To compute the connectivity between areas in brain signals, we introduce a new Quantum-Inspired Voxel-Controlling module to learn the impact of a brain voxel on others represented in the Hilbert space. To effectively learn connectivity, a novel Phase-Shifting module is presented to calibrate the value of the brain signals. Finally, we introduce a new Measurement-like Projection module to present the connectivity information from the Hilbert space into the feature space. The proposed approach can learn to find the connectivities between fMRI voxels and enhance the semantic information obtained from human perceptions. Our experimental results on the Natural Scene Dataset benchmarks illustrate the effectiveness of the proposed method with Top-1 accuracies of 95.1% and 95.6% on image and brain retrieval tasks and an Inception score of 95.3% on fMRI-to-image reconstruction task. Our proposed quantum-inspired network brings a potential paradigm to solving the vision-brain problems via the quantum computing theory.
Published: 2024

12. CLIP Unreasonable Potential in Single-Shot Face Recognition

Author: Luu, Nhan T.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Face recognition is a core task in computer vision designed to identify and authenticate individuals by analyzing facial patterns and features. This field intersects with artificial intelligence image processing and machine learning with applications in security authentication and personalization. Traditional approaches in facial recognition focus on capturing facial features like the eyes, nose and mouth and matching these against a database to verify identities. However challenges such as high false positive rates have persisted often due to the similarity among individuals facial features. Recently Contrastive Language Image Pretraining (CLIP) a model developed by OpenAI has shown promising advancements by linking natural language processing with vision tasks allowing it to generalize across modalities. Using CLIP's vision language correspondence and single-shot finetuning the model can achieve lower false positive rates upon deployment without the need of mass facial features extraction. This integration demonstrating CLIP's potential to address persistent issues in face recognition model performance without complicating our training paradigm.
Published: 2024

13. ZeFaV: Boosting Large Language Models for Zero-shot Fact Verification

Author: Luu, Son T., Nguyen, Hiep, Vo, Trung, and Nguyen, Le-Minh
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: In this paper, we propose ZeFaV - a zero-shot based fact-checking verification framework to enhance the performance on fact verification task of large language models by leveraging the in-context learning ability of large language models to extract the relations among the entities within a claim, re-organized the information from the evidence in a relationally logical form, and combine the above information with the original evidence to generate the context from which our fact-checking model provide verdicts for the input claims. We conducted empirical experiments to evaluate our approach on two multi-hop fact-checking datasets including HoVer and FEVEROUS, and achieved potential results results comparable to other state-of-the-art fact verification task methods., Comment: This pre-print has been published in PRICAI 2024: Trends in Artificial Intelligence. The published version is available at https://doi.org/10.1007/978-981-96-0119-6_28
Published: 2024
Full Text: View/download PDF

14. Public Health Advocacy Dataset: A Dataset of Tobacco Usage Videos from Social Media

Author: Chappa, Naga VS Raviteja, McCormick, Charlotte, Gongora, Susana Rodriguez, Dobbs, Page Daniel, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The Public Health Advocacy Dataset (PHAD) is a comprehensive collection of 5,730 videos related to tobacco products sourced from social media platforms like TikTok and YouTube. This dataset encompasses 4.3 million frames and includes detailed metadata such as user engagement metrics, video descriptions, and search keywords. This is the first dataset with these features providing a valuable resource for analyzing tobacco-related content and its impact. Our research employs a two-stage classification approach, incorporating a Vision-Language (VL) Encoder, demonstrating superior performance in accurately categorizing various types of tobacco products and usage scenarios. The analysis reveals significant user engagement trends, particularly with vaping and e-cigarette content, highlighting areas for targeted public health interventions. The PHAD addresses the need for multi-modal data in public health research, offering insights that can inform regulatory policies and public health strategies. This dataset is a crucial step towards understanding and mitigating the impact of tobacco usage, ensuring that public health efforts are more inclusive and effective., Comment: Under review at International Journal of Computer Vision (IJCV); 29 figures, 5 figures
Published: 2024

15. Hubbard interaction at finite $T$ on a hexagonal lattice

Author: Razmadze, Lado and Luu, Thomas
Subjects: Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: The temporal finite volume induces significant effects in Monte Carlo simulations of systems in low dimensions, such as graphene, a 2-D hexagonal system known for its unique electronic properties and numerous potential applications. In this work, we explore the behavior of fermions on a hexagonal sheet with a Hubbard-type interaction characterized by coupling $U$. This system exhibits zero or near zero-energy excitations that are highly sensitive to finite temperature effects. We compute corrections to the self-energy and the effective mass of low-energy excitations, arriving at a quantization condition that includes the temporal finite volume. These analyses are then conducted for both zero and finite temperatures. Our findings reveal that the first-order $\mathcal{O}(U)$ contributions are absent, leading to non-trivial corrections starting at $\mathcal{O}(U^2)$. We validate our calculations against exact and numerical results obtained from Hybrid Monte Carlo simulations on small lattices., Comment: 9 pages, 6 figures, Lattice QCD 2024
Published: 2024

16. Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

Author: Long, Do Xuan, Yen, Duong Ngoc, Luu, Anh Tuan, Kawaguchi, Kenji, Kan, Min-Yen, and Chen, Nancy F.
Subjects: Computer Science - Computation and Language
Abstract: We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al., 2023), designed to improve the large language model (LLM) generation. Specifically, it guides an LLM to fulfill an input instruction by simulating multiple experts, aggregating their responses, and selecting the best among individual and aggregated responses. This process is performed in a single chain of thoughts through our seven carefully designed subtasks derived from the Nominal Group Technique (Ven and Delbecq, 1974), a well-established decision-making framework. Our evaluations demonstrate that Multi-expert Prompting significantly outperforms ExpertPrompting and comparable baselines in enhancing the truthfulness, factuality, informativeness, and usefulness of responses while reducing toxicity and hurtfulness. It further achieves state-of-the-art truthfulness by outperforming the best baseline by 8.69% with ChatGPT. Multi-expert Prompting is efficient, explainable, and highly adaptable to diverse scenarios, eliminating the need for manual prompt construction., Comment: EMNLP 2024 Main Conference
Published: 2024

17. LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition

Author: Chappa, Naga Venkata Sai Raviteja and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Group Activity Recognition (GAR) remains challenging in computer vision due to the complex nature of multi-agent interactions. This paper introduces LiGAR, a LIDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition. LiGAR leverages LiDAR data as a structural backbone to guide the processing of visual and textual information, enabling robust handling of occlusions and complex spatial arrangements. Our framework incorporates a Multi-Scale LIDAR Transformer, Cross-Modal Guided Attention, and an Adaptive Fusion Module to integrate multi-modal data at different semantic levels effectively. LiGAR's hierarchical architecture captures group activities at various granularities, from individual actions to scene-level dynamics. Extensive experiments on the JRDB-PAR, Volleyball, and NBA datasets demonstrate LiGAR's superior performance, achieving state-of-the-art results with improvements of up to 10.6% in F1-score on JRDB-PAR and 5.9% in Mean Per Class Accuracy on the NBA dataset. Notably, LiGAR maintains high performance even when LiDAR data is unavailable during inference, showcasing its adaptability. Our ablation studies highlight the significant contributions of each component and the effectiveness of our multi-modal, multi-scale approach in advancing the field of group activity recognition., Comment: 14 pages, 4 figures, 10 tables
Published: 2024

18. FLAASH: Flow-Attention Adaptive Semantic Hierarchical Fusion for Multi-Modal Tobacco Content Analysis

Author: Chappa, Naga VS Raviteja, Dobbs, Page Daniel, Raj, Bhiksha, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The proliferation of tobacco-related content on social media platforms poses significant challenges for public health monitoring and intervention. This paper introduces a novel multi-modal deep learning framework named Flow-Attention Adaptive Semantic Hierarchical Fusion (FLAASH) designed to analyze tobacco-related video content comprehensively. FLAASH addresses the complexities of integrating visual and textual information in short-form videos by leveraging a hierarchical fusion mechanism inspired by flow network theory. Our approach incorporates three key innovations, including a flow-attention mechanism that captures nuanced interactions between visual and textual modalities, an adaptive weighting scheme that balances the contribution of different hierarchical levels, and a gating mechanism that selectively emphasizes relevant features. This multi-faceted approach enables FLAASH to effectively process and analyze diverse tobacco-related content, from product showcases to usage scenarios. We evaluate FLAASH on the Multimodal Tobacco Content Analysis Dataset (MTCAD), a large-scale collection of tobacco-related videos from popular social media platforms. Our results demonstrate significant improvements over existing methods, outperforming state-of-the-art approaches in classification accuracy, F1 score, and temporal consistency. The proposed method also shows strong generalization capabilities when tested on standard video question-answering datasets, surpassing current models. This work contributes to the intersection of public health and artificial intelligence, offering an effective tool for analyzing tobacco promotion in digital media., Comment: Under review at International Journal of Computer Vision; 20 pages, 4 figures, 5 tables
Published: 2024

19. Overcoming Ergodicity Problems of the Hybrid Monte Carlo Method using Radial Updates

Author: Temmen, Finn, Berkowitz, Evan, Kennedy, Anthony, Luu, Thomas, Ostmeyer, Johann, and Yu, Xinhao
Subjects: Condensed Matter - Strongly Correlated Electrons
Abstract: Despite its many advantages, the sensible application of the Hybrid Monte Carlo (HMC) method is often hindered by the presence of large - or even infinite - potential barriers. These potential barriers partition the configuration space into distinct sectors, which leads to ergodicity violations and biased measurements of observables. In this work, we address this problem by augmenting the HMC method with a multiplicative Metropolis-Hastings update in a so-called "radial direction" of the fields, which enables jumps over the aforementioned potential barriers at comparably low computational cost. The effectiveness of this approach is demonstrated for the Hubbard model, formulated in a non-compact space by means of a continuous Hubbard-Stratonovich transformation. Our numerical results show that the radial updates successfully resolve the ergodicity violation, while simultaneously reducing autocorrelations., Comment: 10 pages, 5 figures, contribution to the 41st International Symposium on Lattice Field Theory (Lattice 2024), July 28th - August 3rd, 2024, Liverpool, UK
Published: 2024

20. Who's Who: Large Language Models Meet Knowledge Conflicts in Practice

Author: Pham, Quang Hieu, Ngo, Hoang, Luu, Anh Tuan, and Nguyen, Dat Quoc
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: Retrieval-augmented generation (RAG) methods are viable solutions for addressing the static memory limits of pre-trained language models. Nevertheless, encountering conflicting sources of information within the retrieval context is an inevitable practical challenge. In such situations, the language models are recommended to transparently inform users about the conflicts rather than autonomously deciding what to present based on their inherent biases. To analyze how current large language models (LLMs) align with our recommendation, we introduce WhoQA, a public benchmark dataset to examine model's behavior in knowledge conflict situations. We induce conflicts by asking about a common property among entities having the same name, resulting in questions with up to 8 distinctive answers. WhoQA evaluation set includes 5K questions across 13 Wikidata property types and 150K Wikipedia entities. Our experiments show that despite the simplicity of WhoQA questions, knowledge conflicts significantly degrades LLMs' performance in RAG settings., Comment: Accepted to EMNLP 2024 Findings
Published: 2024

21. A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

Author: Xiao, Wenyi, Wang, Zechuan, Gan, Leilei, Zhao, Shuai, He, Wanggui, Tuan, Luu Anh, Chen, Long, Jiang, Hao, Zhao, Zhou, and Wu, Fei
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an RL-free alternative to Reinforcement Learning from Human Feedback (RLHF). Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature. In this work, we present a comprehensive review of the challenges and opportunities in DPO, covering theoretical analyses, variants, relevant preference datasets, and applications. Specifically, we categorize recent studies on DPO based on key research questions to provide a thorough understanding of DPO's current landscape. Additionally, we propose several future research directions to offer insights on model alignment for the research community.
Published: 2024

22. Are LLMs Good Zero-Shot Fallacy Classifiers?

Author: Pan, Fengjun, Wu, Xiaobao, Li, Zongrui, and Luu, Anh Tuan
Subjects: Computer Science - Computation and Language
Abstract: Fallacies are defective arguments with faulty reasoning. Detecting and classifying them is a crucial NLP task to prevent misinformation, manipulative claims, and biased decisions. However, existing fallacy classifiers are limited by the requirement for sufficient labeled data for training, which hinders their out-of-distribution (OOD) generalization abilities. In this paper, we focus on leveraging Large Language Models (LLMs) for zero-shot fallacy classification. To elicit fallacy-related knowledge and reasoning abilities of LLMs, we propose diverse single-round and multi-round prompting schemes, applying different task-specific instructions such as extraction, summarization, and Chain-of-Thought reasoning. With comprehensive experiments on benchmark datasets, we suggest that LLMs could be potential zero-shot fallacy classifiers. In general, LLMs under single-round prompting schemes have achieved acceptable zero-shot performances compared to the best full-shot baselines and can outperform them in all OOD inference scenarios and some open-domain tasks. Our novel multi-round prompting schemes can effectively bring about more improvements, especially for small LLMs. Our analysis further underlines the future research on zero-shot fallacy classification. Codes and data are available at: https://github.com/panFJCharlotte98/Fallacy_Detection., Comment: Accepted to EMNLP2024 main conference
Published: 2024

23. Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation

Author: Zhao, Shuai, Wu, Xiaobao, Nguyen, Cong-Duy, Jia, Meihuizi, Feng, Yichao, and Tuan, Luu Anh
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Parameter-efficient fine-tuning (PEFT) can bridge the gap between large language models (LLMs) and downstream tasks. However, PEFT has been proven vulnerable to malicious attacks. Research indicates that poisoned LLMs, even after PEFT, retain the capability to activate internalized backdoors when input samples contain predefined triggers. In this paper, we introduce a novel weak-to-strong unlearning algorithm to defend against backdoor attacks based on feature alignment knowledge distillation, named W2SDefense. Specifically, we first train a small-scale language model through full-parameter fine-tuning to serve as the clean teacher model. Then, this teacher model guides the large-scale poisoned student model in unlearning the backdoor, leveraging PEFT. Theoretical analysis suggests that W2SDefense has the potential to enhance the student model's ability to unlearn backdoor features, preventing the activation of the backdoor. We conduct experiments on text classification tasks involving three state-of-the-art language models and three different backdoor attack algorithms. Our empirical results demonstrate the outstanding performance of W2SDefense in defending against backdoor attacks without compromising model performance.
Published: 2024

24. ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering

Author: Nguyen, Nghia Hieu, Quan, Tho Thanh, and Nguyen, Ngan Luu-Thuy
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Text-based VQA is a challenging task that requires machines to use scene texts in given images to yield the most appropriate answer for the given question. The main challenge of text-based VQA is exploiting the meaning and information from scene texts. Recent studies tackled this challenge by considering the spatial information of scene texts in images via embedding 2D coordinates of their bounding boxes. In this study, we follow the definition of meaning from linguistics to introduce a novel method that effectively exploits the information from scene texts written in Vietnamese. Experimental results show that our proposed method obtains state-of-the-art results on two large-scale Vietnamese Text-based VQA datasets. The implementation can be found at this link., Comment: PACLIC 2024
Published: 2024

25. DINTR: Tracking via Diffusion-based Interpolation

Author: Nguyen, Pha, Le, Ngan, Cothren, Jackson, Yilmaz, Alper, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Object tracking is a fundamental task in computer vision, requiring the localization of objects of interest across video frames. Diffusion models have shown remarkable capabilities in visual generation, making them well-suited for addressing several requirements of the tracking problem. This work proposes a novel diffusion-based methodology to formulate the tracking task. Firstly, their conditional process allows for injecting indications of the target object into the generation process. Secondly, diffusion mechanics can be developed to inherently model temporal correspondences, enabling the reconstruction of actual frames in video. However, existing diffusion models rely on extensive and unnecessary mapping to a Gaussian noise domain, which can be replaced by a more efficient and stable interpolation process. Our proposed interpolation mechanism draws inspiration from classic image-processing techniques, offering a more interpretable, stable, and faster approach tailored specifically for the object tracking task. By leveraging the strengths of diffusion models while circumventing their limitations, our Diffusion-based INterpolation TrackeR (DINTR) presents a promising new paradigm and achieves a superior multiplicity on seven benchmarks across five indicator representations., Comment: Accepted at NeurIPS 2024
Published: 2024

26. Stability criteria for rough systems

Author: Duc, Luu Hoang, Hong, Phan Thanh, and Cong, Nguyen Dinh
Subjects: Mathematics - Dynamical Systems, Mathematics - Probability, 60G15, 60G18, 60H05, 60H10, 62J10, 62P05, 91B28
Abstract: We propose a quantitative direct method of proving the local stability for the trivial solution of a rough differential equation and of its regular discretization scheme. Using Doss-Sussmann technique and stopping time analysis, we prove that the trivial solution of the rough system is exponentially stable as long as the noise is small. The same conclusions hold for the regular discretization scheme with small noise and small step size. Our results are significantly stronger than \cite[Theorem 14]{garrido-atienzaetal} and \cite[Theorem 18]{GABSch18} and can be applied to non-flat bounded or linear noises.
Published: 2024

27. Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths

Author: Chia, Yew Ken, Chen, Guizhen, Xu, Weiwen, Tuan, Luu Anh, Poria, Soujanya, and Bing, Lidong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Advanced models such as OpenAI o1 exhibit impressive problem-solving capabilities through step-by-step reasoning. However, they may still falter on more complex problems, making errors that disrupt their reasoning paths. We attribute this to the expansive solution space, where each step has the risk of diverging into mistakes. To enhance language model reasoning, we introduce a specialized training framework called Reasoning Paths Optimization (RPO), which enables learning to reason and explore from diverse paths. Our approach encourages favorable branches at each reasoning step while penalizing unfavorable ones, enhancing the model's overall problem-solving performance. Reasoning Paths Optimization does not rely on large-scale human-annotated rationales or outputs from closed-source models, making it scalable and data-efficient. We focus on multi-step reasoning tasks, such as math word problems and science-based exam questions. The experiments demonstrate that our framework significantly enhances the reasoning performance of large language models, with up to 3.1% and 4.3% improvement on GSM8K and MMLU (STEM) respectively. Our data and code can be found at https://reasoning-paths.github.io., Comment: EMNLP 2024 camera ready version
Published: 2024

28. As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

Author: Mao, Xin, Li, Feng-Lin, Xu, Huimin, Zhang, Wei, Chen, Wang, and Luu, Anh Tuan
Subjects: Computer Science - Computation and Language
Abstract: Direct Preference Optimization (DPO) has emerged as a more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO), eliminating the need for reward models and online sampling. Despite these benefits, DPO and its variants remain sensitive to hyper-parameters and prone to instability, particularly on mathematical datasets. We argue that these issues arise from the unidirectional likelihood-derivative negative feedback inherent in the log-likelihood loss function. To address this, we propose a novel LLM alignment loss that establishes a stable Bidirectional Negative Feedback (BNF) during optimization. Our proposed BNF loss eliminates the need for pairwise contrastive losses and does not require any extra tunable hyper-parameters or pairwise preference data, streamlining the alignment pipeline to be as simple as supervised fine-tuning. We conduct extensive experiments across two challenging QA benchmarks and four reasoning benchmarks. The experimental results show that BNF achieves comparable performance to the best methods on QA benchmarks, while its performance decrease on the four reasoning benchmarks is significantly lower compared to the best methods, thus striking a better balance between value alignment and reasoning ability. In addition, we further validate the performance of BNF on non-pairwise datasets, and conduct in-depth analysis of log-likelihood and logit shifts across different preference optimization methods., Comment: 20 pages, 9 figures
Published: 2024

29. Is Gibbs sampling faster than Hamiltonian Monte Carlo on GLMs?

Author: Luu, Son, Xu, Zuheng, Surjanovic, Nikola, Biron-Lattes, Miguel, Campbell, Trevor, and Bouchard-Côté, Alexandre
Subjects: Statistics - Computation, Statistics - Methodology
Abstract: The Hamiltonian Monte Carlo (HMC) algorithm is often lauded for its ability to effectively sample from high-dimensional distributions. In this paper we challenge the presumed domination of HMC for the Bayesian analysis of GLMs. By utilizing the structure of the compute graph rather than the graphical model, we reduce the time per sweep of a full-scan Gibbs sampler from $O(d^2)$ to $O(d)$, where $d$ is the number of GLM parameters. Our simple changes to the implementation of the Gibbs sampler allow us to perform Bayesian inference on high-dimensional GLMs that are practically infeasible with traditional Gibbs sampler implementations. We empirically demonstrate a substantial increase in effective sample size per time when comparing our Gibbs algorithms to state-of-the-art HMC algorithms. While Gibbs is superior in terms of dimension scaling, neither Gibbs nor HMC dominate the other: we provide numerical and theoretical evidence that HMC retains an edge in certain circumstances thanks to its advantageous condition number scaling. Interestingly, for GLMs of fixed data size, we observe that increasing dimensionality can stabilize or even decrease condition number, shedding light on the empirical advantage of our efficient Gibbs sampler.
Published: 2024

30. Predictive Coding for Decision Transformer

Author: Luu, Tung M., Lee, Donghoon, and Yoo, Chang D.
Subjects: Computer Science - Machine Learning
Abstract: Recent work in offline reinforcement learning (RL) has demonstrated the effectiveness of formulating decision-making as return-conditioned supervised learning. Notably, the decision transformer (DT) architecture has shown promise across various domains. However, despite its initial success, DTs have underperformed on several challenging datasets in goal-conditioned RL. This limitation stems from the inefficiency of return conditioning for guiding policy learning, particularly in unstructured and suboptimal datasets, resulting in DTs failing to effectively learn temporal compositionality. Moreover, this problem might be further exacerbated in long-horizon sparse-reward tasks. To address this challenge, we propose the Predictive Coding for Decision Transformer (PCDT) framework, which leverages generalized future conditioning to enhance DT methods. PCDT utilizes an architecture that extends the DT framework, conditioned on predictive codings, enabling decision-making based on both past and future factors, thereby improving generalization. Through extensive experiments on eight datasets from the AntMaze and FrankaKitchen environments, our proposed method achieves performance on par with or surpassing existing popular value-based and transformer-based methods in offline goal-conditioned RL. Furthermore, we also evaluate our method on a goal-reaching task with a physical robot., Comment: 8 pages, IROS 2024 (Code: https://github.com/tunglm2203/pcdt)
Published: 2024

31. Mitigating Adversarial Perturbations for Deep Reinforcement Learning via Vector Quantization

Author: Luu, Tung M., Nguyen, Thanh, Jin, Tee Joshua Tian, Kim, Sungwoon, and Yoo, Chang D.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Recent studies reveal that well-performing reinforcement learning (RL) agents in training often lack resilience against adversarial perturbations during deployment. This highlights the importance of building a robust agent before deploying it in the real world. Most prior works focus on developing robust training-based procedures to tackle this problem, including enhancing the robustness of the deep neural network component itself or adversarially training the agent on strong attacks. In this work, we instead study an input transformation-based defense for RL. Specifically, we propose using a variant of vector quantization (VQ) as a transformation for input observations, which is then used to reduce the space of adversarial attacks during testing, resulting in the transformed observations being less affected by attacks. Our method is computationally efficient and seamlessly integrates with adversarial training, further enhancing the robustness of RL agents against adversarial attacks. Through extensive experiments in multiple environments, we demonstrate that using VQ as the input transformation effectively defends against adversarial attacks on the agent's observations., Comment: 8 pages, IROS 2024 (Code: https://github.com/tunglm2203/vq_robust_rl)
Published: 2024

32. Stochastic variance-reduced Gaussian variational inference on the Bures-Wasserstein manifold

Author: Luu, Hoang Phuc Hau, Yu, Hanlin, Williams, Bernardo, Hartmann, Marcelo, and Klami, Arto
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Optimization in the Bures-Wasserstein space has been gaining popularity in the machine learning community since it draws connections between variational inference and Wasserstein gradient flows. The variational inference objective function of Kullback-Leibler divergence can be written as the sum of the negative entropy and the potential energy, making forward-backward Euler the method of choice. Notably, the backward step admits a closed-form solution in this case, facilitating the practicality of the scheme. However, the forward step is no longer exact since the Bures-Wasserstein gradient of the potential energy involves "intractable" expectations. Recent approaches propose using the Monte Carlo method -- in practice a single-sample estimator -- to approximate these terms, resulting in high variance and poor performance. We propose a novel variance-reduced estimator based on the principle of control variates. We theoretically show that this estimator has a smaller variance than the Monte-Carlo estimator in scenarios of interest. We also prove that variance reduction helps improve the optimization bounds of the current analysis. We demonstrate that the proposed estimator gains order-of-magnitude improvements over the previous Bures-Wasserstein methods.
Published: 2024

33. Improvement of Spiking Neural Network with Bit Planes and Color Models

Author: Luu, Nhan T., Luu, Duong T., Pham, Nam N., and Truong, Thang C.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Neural and Evolutionary Computing, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Spiking neural network (SNN) has emerged as a promising paradigm in computational neuroscience and artificial intelligence, offering advantages such as low energy consumption and small memory footprint. However, their practical adoption is constrained by several challenges, prominently among them being performance optimization. In this study, we present a novel approach to enhance the performance of SNN for images through a new coding method that exploits bit plane representation. Our proposed technique is designed to improve the accuracy of SNN without increasing model size. Also, we investigate the impacts of color models of the proposed coding process. Through extensive experimental validation, we demonstrate the effectiveness of our coding strategy in achieving performance gain across multiple datasets. To the best of our knowledge, this is the first research that considers bit planes and color models in the context of SNN. By leveraging the unique characteristics of bit planes, we hope to unlock new potentials in SNNs performance, potentially paving the way for more efficient and effective SNNs models in future researches and applications.
Published: 2024

34. Universal Quantum Tomography With Deep Neural Networks

Author: Luu, Nhan T., Truong, Thang C., and Luu, Duong T.
Subjects: Quantum Physics, Computer Science - Artificial Intelligence
Abstract: Quantum state tomography is a crucial technique for characterizing the state of a quantum system, which is essential for many applications in quantum technologies. In recent years, there has been growing interest in leveraging neural networks to enhance the efficiency and accuracy of quantum state tomography. Still, many of them did not include mixed quantum state, since pure states are arguably less common in practical situations. In this research paper, we present two neural networks based approach for both pure and mixed quantum state tomography: Restricted Feature Based Neural Network and Mixed States Conditional Generative Adversarial Network, evaluate its effectiveness in comparison to existing neural based methods. We demonstrate that our proposed methods can achieve state-of-the-art results in reconstructing mixed quantum states from experimental data. Our work highlights the potential of neural networks in revolutionizing quantum state tomography and facilitating the development of quantum technologies., Comment: 10 pages, 5 figures, 17 illustration, 1 table
Published: 2024

35. Determined Factors and Effective Strategies for Developing English Speaking Fluency among Vietnamese University Students

Author: Luu Thi Mai Vy, Tran Le Thu Huong, Tran Ngoc Quy, Vo Quoc Cuong, and Nguyen Truc Anh
Abstract: Fluency development is one of the prominent components of a well-balanced language course. In particular, speaking fluency interests second language (L2) researchers who have continuously tried to look for the finest approach for helping L2 learners attain a certain level of fluency to achieve effective communication. To this end, the primary purpose of this research is to explore what factors affect English-speaking fluency and what strategies can effectively boost its development among a cohort of Vietnamese university students. The participants were 142 English majors who filled out a questionnaire measuring their perceived self-efficacy concerning English-speaking fluency. Four teachers and six students joined the semi-structured interviews. Quantitative and qualitative data revealed that the most influential factors were linguistic elements, followed by performance, and affective factors. The most effective strategy for enhancing fluency was task repetition. Notably, the findings revealed a mismatch between teachers' and students' understandings of speaking fluency, which may negatively impact the achievement of fluent speech. Based on these results, pedagogical implications are discussed for English teachers and students regarding fluency development.
Published: 2024

36. The Orbitofrontal Cortex Is Required for Learned Modulation of Innate Olfactory Behavior.

Author: Miyamoto, Kiana, Stark, Jeremy, Kathrotia, Mayuri, Luu, Amanda, Victoriano, Joelle, Chan, Chung, Lee, Donghyung, and Root, Cory
Subjects: aversion, innate, olfactory, orbitofrontal, Animals, Prefrontal Cortex, Male, Optogenetics, Mice, Odorants, Olfactory Perception, Mice, Inbred C57BL, Instinct, Smell, Mice, Transgenic, Proto-Oncogene Proteins c-fos, Behavior, Animal
Abstract: Animals have evolved innate responses to cues including social, food, and predator odors. In the natural environment, animals are faced with choices that involve balancing risk and reward where innate significance may be at odds with internal need. The ability to update the value of a cue through learning is essential for navigating changing and uncertain environments. However, the mechanisms involved in this modulation are not well defined in mammals. We have established a new olfactory assay that challenges a thirsty mouse to choose an aversive odor over an attractive odor in foraging for water, thus overriding their innate behavioral response to odor. Innately, mice prefer the attractive odor port over the aversive odor port. However, decreasing the probability of water at the attractive port leads mice to prefer the aversive port, reflecting a learned override of the innate response to the odors. The orbitofrontal cortex (OFC) is a fourth-order olfactory brain area, involved in flexible value association, with behaviorally relevant outputs throughout the limbic system. We performed optogenetic and chemogenetic silencing experiments that demonstrate the OFC is necessary for this learned modulation of innate aversion to odor. Further, we characterized odor evoked c-fos expression in learned and control mice and found significant suppression of activity in the bed nucleus of the stria terminalis, lateral septum, and central and medial amygdala. These findings reveal that the OFC is necessary for the learned override of innate behavior and may signal to limbic structures to modulate innate response to odor.
Published: 2024

37. Nonparametric Inference Framework for Time-dependent Epidemic Models

Author: Luu, Son, Susko, Edward, and Ho, Lam Si Tung
Subjects: Statistics - Methodology
Abstract: Compartmental models, especially the Susceptible-Infected-Removed (SIR) model, have long been used to understand the behaviour of various diseases. Allowing parameters, such as the transmission rate, to be time-dependent functions makes it possible to adjust for and make inferences about changes in the process due to mitigation strategies or evolutionary changes of the infectious agent. In this article, we attempt to build a nonparametric inference framework for stochastic SIR models with time dependent infection rate. The framework includes three main steps: likelihood approximation, parameter estimation and confidence interval construction. The likelihood function of the stochastic SIR model, which is often intractable, can be approximated using methods such as diffusion approximation or tau leaping. The infection rate is modelled by a B-spline basis whose knot location and number of knots are determined by a fast knot placement method followed by a criterion-based model selection procedure. Finally, a point-wise confidence interval is built using a parametric bootstrap procedure. The performance of the framework is observed through various settings for different epidemic patterns. The model is then applied to the Ontario COVID-19 data across multiple waves.
Published: 2024

38. Weak-to-Strong Backdoor Attack for Large Language Models

Author: Zhao, Shuai, Gan, Leilei, Guo, Zhongliang, Wu, Xiaobao, Xiao, Luwei, Xu, Xiaoyu, Nguyen, Cong-Duy, and Tuan, Luu Anh
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Despite being widely applied due to their exceptional capabilities, Large Language Models (LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce targeted vulnerabilities into LLMs by poisoning training samples and full-parameter fine-tuning. However, this kind of backdoor attack is limited since they require significant computational resources, especially as the size of LLMs increases. Besides, parameter-efficient fine-tuning (PEFT) offers an alternative but the restricted parameter updating may impede the alignment of triggers with target labels. In this study, we first verify that backdoor attacks with PEFT may encounter challenges in achieving feasible performance. To address these issues and improve the effectiveness of backdoor attacks with PEFT, we propose a novel backdoor attack algorithm from weak to strong based on feature alignment-enhanced knowledge distillation (W2SAttack). Specifically, we poison small-scale language models through full-parameter fine-tuning to serve as the teacher model. The teacher model then covertly transfers the backdoor to the large-scale student model through feature alignment-enhanced knowledge distillation, which employs PEFT. Theoretical analysis reveals that W2SAttack has the potential to augment the effectiveness of backdoor attacks. We demonstrate the superior performance of W2SAttack on classification tasks across four language models, four backdoor attack algorithms, and two different architectures of teacher models. Experimental results indicate success rates close to 100% for backdoor attacks targeting PEFT.
Published: 2024

39. SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA

Author: Zhang, Siyue, Luu, Anh Tuan, and Zhao, Chen
Subjects: Computer Science - Computation and Language
Abstract: Text-to-SQL parsing and end-to-end question answering (E2E TQA) are two main approaches for Table-based Question Answering task. Despite success on multiple benchmarks, they have yet to be compared and their synergy remains unexplored. In this paper, we identify different strengths and weaknesses through evaluating state-of-the-art models on benchmark datasets: Text-to-SQL demonstrates superiority in handling questions involving arithmetic operations and long tables; E2E TQA excels in addressing ambiguous questions, non-standard table schema, and complex table contents. To combine both strengths, we propose a Synergistic Table-based Question Answering approach that integrate different models via answer selection, which is agnostic to any model types. Further experiments validate that ensembling models by either feature-based or LLM-based answer selector significantly improves the performance over individual models., Comment: EMNLP 2024
Published: 2024

40. RAMBO: Enhancing RAG-based Repository-Level Method Body Completion

Author: Bui, Tuan-Dung, Luu-Van, Duc-Thieu, Nguyen, Thanh-Phat, Nguyen, Thu-Trang, Nguyen, Son, and Vo, Hieu Dinh
Subjects: Computer Science - Software Engineering, Computer Science - Machine Learning
Abstract: Code completion is essential in software development, helping developers by predicting code snippets based on context. Among completion tasks, Method Body Completion (MBC) is particularly challenging as it involves generating complete method bodies based on their signatures and context. This task becomes significantly harder in large repositories, where method bodies must integrate repositoryspecific elements such as custom APIs, inter-module dependencies, and project-specific conventions. In this paper, we introduce RAMBO, a novel RAG-based approach for repository-level MBC. Instead of retrieving similar method bodies, RAMBO identifies essential repository-specific elements, such as classes, methods, and variables/fields, and their relevant usages. By incorporating these elements and their relevant usages into the code generation process, RAMBO ensures more accurate and contextually relevant method bodies. Our experimental results with leading code LLMs across 40 Java projects show that RAMBO significantly outperformed the state-of-the-art repository-level MBC approaches, with the improvements of up to 46% in BLEU, 57% in CodeBLEU, 36% in Compilation Rate, and up to 3X in Exact Match. Notably, RAMBO surpassed RepoCoder Oracle method by up to 12% in Exact Match, setting a new benchmark for repository-level MBC.
Published: 2024

41. Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels

Author: Liu, Chaoqun, Chao, Qin, Zhang, Wenxuan, Wu, Xiaobao, Li, Boyang, Luu, Anh Tuan, and Bing, Lidong
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels. However, this paradigm is limited by the availability of gold labels, while in certain scenarios, LLMs may need to perform tasks that are too complex for humans to provide such labels. To tackle this challenge, this study explores whether solely utilizing unlabeled data can elicit strong model capabilities. We propose a new paradigm termed zero-to-strong generalization. We iteratively prompt LLMs to annotate unlabeled data and retain high-quality labels by filtering. Surprisingly, we obverse that this iterative process gradually unlocks LLMs' potential on downstream tasks. Our experiments on extensive classification and reasoning tasks confirm the effectiveness of our proposed framework. Our analysis indicates that this paradigm is effective for both in-context learning and fine-tuning, and for various model sizes., Comment: 15 pages
Published: 2024

42. SoVAR: Building Generalizable Scenarios from Accident Reports for Autonomous Driving Testing

Author: Guo, An, Zhou, Yuan, Tian, Haoxiang, Fang, Chunrong, Sun, Yunjian, Sun, Weisong, Gao, Xinyu, Luu, Anh Tuan, Liu, Yang, and Chen, Zhenyu
Subjects: Computer Science - Software Engineering
Abstract: Autonomous driving systems (ADSs) have undergone remarkable development and are increasingly employed in safety-critical applications. However, recently reported data on fatal accidents involving ADSs suggests that the desired level of safety has not yet been fully achieved. Consequently, there is a growing need for more comprehensive and targeted testing approaches to ensure safe driving. Scenarios from real-world accident reports provide valuable resources for ADS testing, including critical scenarios and high-quality seeds. However, existing scenario reconstruction methods from accident reports often exhibit limited accuracy in information extraction. Moreover, due to the diversity and complexity of road environments, matching current accident information with the simulation map data for reconstruction poses significant challenges. In this paper, we design and implement SoVAR, a tool for automatically generating road-generalizable scenarios from accident reports. SoVAR utilizes well-designed prompts with linguistic patterns to guide the large language model in extracting accident information from textual data. Subsequently, it formulates and solves accident-related constraints in conjunction with the extracted accident information to generate accident trajectories. Finally, SoVAR reconstructs accident scenarios on various map structures and converts them into test scenarios to evaluate its capability to detect defects in industrial ADSs. We experiment with SoVAR, using accident reports from the National Highway Traffic Safety Administration's database to generate test scenarios for the industrial-grade ADS Apollo. The experimental findings demonstrate that SoVAR can effectively generate generalized accident scenarios across different road structures. Furthermore, the results confirm that SoVAR identified 5 distinct safety violation types that contributed to the crash of Baidu Apollo.
Published: 2024
Full Text: View/download PDF

43. Volatile-rich Sub-Neptunes as Hydrothermal Worlds: The Case of K2-18 b

Author: Luu, Cindy N., Yu, Xinting, Glein, Christopher R., Innes, Hamish, Aguichine, Artyom, Krissansen-Totton, Joshua, Moses, Julianne I., Tsai, Shang-Min, Zhang, Xi, Truong, Ngoc, and Fortney, Jonathan J.
Subjects: Astrophysics - Earth and Planetary Astrophysics
Abstract: Temperate exoplanets between the sizes of Earth and Neptune, known as "sub-Neptunes", have emerged as intriguing targets for astrobiology. It is unknown whether these planets resemble Earth-like terrestrial worlds with a habitable surface, Neptune-like giant planets with deep atmospheres and no habitable surface, or something exotic in between. Recent JWST transmission spectroscopy observations of the canonical sub-Neptune K2-18 b revealed ~1% CH4, ~1% CO2, and a non-detection of CO in the atmosphere. While previous studies have proposed that the observed atmospheric composition could help constrain the lower atmosphere conditions and determine the interior structure of sub-Neptunes like K2-18 b, the possible interactions between the atmosphere and a hot, supercritical water ocean at its base remain unexplored. In this work, we investigate whether a global supercritical water ocean, resembling a planetary-scale hydrothermal system, can explain these observations on K2-18 b-like sub-Neptunes through equilibrium aqueous geochemical calculations. We find that the observed atmospheric CH4/CO2 ratio implies a minimum ocean temperature of ~715 K, whereas the corresponding CO/CO2 ratio allows ocean temperatures up to ~1060 K. These results indicate that a global supercritical water ocean on K2-18 b is plausible. While life cannot survive in this ocean, this work represents the first step towards understanding how a global supercritical water ocean may influence observable atmospheric characteristics on volatile-rich sub-Neptunes. Future observations with better constrained NH3 and CO mixing ratios could further help distinguish between possible interior compositions of K2-18 b., Comment: 15 pages, 5 figures, 1 table
Published: 2024

44. A Novel Dataset for Video-Based Autism Classification Leveraging Extra-Stimulatory Behavior

Author: Serna-Aguilera, Manuel, Nguyen, Xuan Bac, Seo, Han-Seok, and Luu, Khoa
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Autism Spectrum Disorder (ASD) can affect individuals at varying degrees of intensity, from challenges in overall health, communication, and sensory processing, and this often begins at a young age. Thus, it is critical for medical professionals to be able to accurately diagnose ASD in young children, but doing so is difficult. Deep learning can be responsibly leveraged to improve productivity in addressing this task. The availability of data, however, remains a considerable obstacle. Hence, in this work, we introduce the Video ASD dataset--a dataset that contains video frame convolutional and attention map feature data--to foster further progress in the task of ASD classification. The original videos showcase children reacting to chemo-sensory stimuli, among auditory, touch, and vision This dataset contains the features of the frames spanning 2,467 videos, for a total of approximately 1.4 million frames. Additionally, head pose angles are included to account for head movement noise, as well as full-sentence text labels for the taste and smell videos that describe how the facial expression changes before, immediately after, and long after interaction with the stimuli. In addition to providing features, we also test foundation models on this data to showcase how movement noise affects performance and the need for more data and more complex labels.
Published: 2024

45. Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities

Author: Lu, Wei, Luu, Rachel K., and Buehler, Markus J.
Subjects: Computer Science - Computation and Language, Condensed Matter - Materials Science, Computer Science - Artificial Intelligence
Abstract: The advancement of Large Language Models (LLMs) for domain applications in fields such as materials science and engineering depends on the development of fine-tuning strategies that adapt models for specialized, technical capabilities. In this work, we explore the effects of Continued Pretraining (CPT), Supervised Fine-Tuning (SFT), and various preference-based optimization approaches, including Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), on fine-tuned LLM performance. Our analysis shows how these strategies influence model outcomes and reveals that the merging of multiple fine-tuned models can lead to the emergence of capabilities that surpass the individual contributions of the parent models. We find that model merging leads to new functionalities that neither parent model could achieve alone, leading to improved performance in domain-specific assessments. Experiments with different model architectures are presented, including Llama 3.1 8B and Mistral 7B models, where similar behaviors are observed. Exploring whether the results hold also for much smaller models, we use a tiny LLM with 1.7 billion parameters and show that very small LLMs do not necessarily feature emergent capabilities under model merging, suggesting that model scaling may be a key component. In open-ended yet consistent chat conversations between a human and AI models, our assessment reveals detailed insights into how different model variants perform and show that the smallest model achieves a high intelligence score across key criteria including reasoning depth, creativity, clarity, and quantitative precision. Other experiments include the development of image generation prompts based on disparate biological material design concepts, to create new microstructures, architectural concepts, and urban design based on biological materials-inspired construction principles.
Published: 2024

46. LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

Author: Hu, Zhiyuan, Liu, Yuliang, Zhao, Jinman, Wang, Suyuchen, Wang, Yan, Shen, Wei, Gu, Qing, Luu, Anh Tuan, Ng, See-Kiong, Jiang, Zhiwei, and Hooi, Bryan
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training strategy for extending the context window of LLMs, including impactful token analysis, position index transformation, and training optimization strategies. It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies. Experiments on three types of LLMs show that LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training. Furthermore, LongRecipe also preserves the original LLM's capabilities in general tasks. Ultimately, we can extend the effective context window of open-source LLMs from 8k to 128k, achieving performance close to GPT-4 with just one day of dedicated training using a single GPU with 80G memory. Our code is released at https://github.com/zhiyuanhubj/LongRecipe., Comment: Work in Progress
Published: 2024

47. Accurate Performance Characterization, Reporting, and Benchmarking for Indoor Photovoltaics

Author: Jailani, Javith Mohammed, Luu, Amanda, Salvosa, Elizabeth, Clegg, Charlotte, Kamalon, Vishnupriya P., Nasrollahi, Bahareh, Valitova, Irina, Meier, Sebastian B., Shore, Andrew M., Hamadani, Behrang H., and Pecunia, Vincenzo
Subjects: Physics - Applied Physics
Abstract: Indoor photovoltaics (IPVs) provide an increasingly promising solution for powering Internet-of-Things smart devices, which has led to a surge in IPV research and the development of new IPV technologies. However, the diverse lighting scenarios adopted in IPV research pose unique challenges in characterization, reporting, and benchmarking, which may obscure genuine performance improvements and result in inaccurate conclusions due to characterization errors. Spectral variations among artificial light sources further complicate benchmarking. This study provides a comprehensive, quantitative analysis of these challenges, investigating them through experimental characterization of IPVs covering a broad performance parameter space, including c-Si, a-Si:H, perovskite, and organic devices. We reveal that many of these challenges can lead to unacceptable error levels. A particularly critical issue is the angular interplay among the light source, measuring device, and IPV device, which compromises accuracy under diffuse illumination. To address these challenges, we evaluate practical protocols to overcome angular issues and enable benchmarking against standardized spectral conditions. To facilitate the implementation of our findings, we provide comprehensive checklists for accurate IPV characterization, reporting, and benchmarking. We anticipate that our analyses and guidelines will stimulate further advancements in IPV technologies by ensuring reliable performance evaluation, thereby facilitating the realization of IPVs' full potential.
Published: 2024

48. Open-Ended 3D Point Cloud Instance Segmentation

Author: Nguyen, Phuc D. A., Luu, Minh, Tran, Anh, Pham, Cuong, and Nguyen, Khoi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Open-Vocab 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated their ability to generalize to unseen objects. However, these methods still depend on predefined class names during testing, restricting the autonomy of agents. To mitigate this constraint, we propose a novel problem termed Open-Ended 3D Instance Segmentation (OE-3DIS), which eliminates the necessity for predefined class names during testing. Moreover, we contribute a comprehensive set of strong baselines, derived from OV-3DIS approaches and leveraging 2D Multimodal Large Language Models. To assess the performance of our OE-3DIS system, we introduce a novel Open-Ended score, evaluating both the semantic and geometric quality of predicted masks and their associated class names, alongside the standard AP score. Our approach demonstrates significant performance improvements over the baselines on the ScanNet200 and ScanNet++ datasets. Remarkably, our method surpasses the performance of Open3DIS, the current state-of-the-art method in OV-3DIS, even in the absence of ground-truth object class names.
Published: 2024

49. In-Flight Performance of Spider's 280 GHz Receivers

Author: Shaw, Elle C., Ade, P. A. R., Akers, S., Amiri, M., Austermann, J., Beall, J., Becker, D. T., Benton, S. J., Bergman, A. S., Bock, J. J., Bond, J. R., Bryan, S. A., Chiang, H. C., Contaldi, C. R., Domagalski, R. S., Doré, O., Duff, S. M., Duivenvoorden, A. J., Eriksen, H. K., Farhang, M., Filippini, J. P., Fissel, L. M., Fraisse, A. A., Freese, K., Galloway, M., Gambrel, A. E., Gandilo, N. N., Ganga, K., Gibbs, S. M., Gourapura, S., Grigorian, A., Gualtieri, R., Gudmundsson, J. E., Halpern, M., Hartley, J., Hasselfield, M., Hilton, G., Holmes, W., Hristov, V. V., Huang, Z., Hubmayr, J., Irwin, K. D., Jones, W. C., Kahn, A., Kermish, Z. D., King, C., Kuo, C. L., Lennox, A. R., Leung, J. S. -Y., Li, S., Luu, T. V., Mason, P. V., May, J., Megerian, K., Moncelsi, L., Morford, T. A., Nagy, J. M., Nie, R., Netterfield, C. B., Nolta, M., Osherson, B., Padilla, I. L., Rahlin, A. S., Redmond, S., Reintsema, C., Romualdez, L. J., Ruhl, J. E., Runyan, M. C., Shariff, J. A., Shiu, C., Soler, J. D., Song, X., Tartakovsky, S., Thommesen, H., Trangsrud, A., Tucker, C., Tucker, R. S., Turner, A. D., Ullom, J., van der List, J. F., Van Lanen, J., Vissers, M. R., Weber, A. C., Wehus, I. K., Wen, S., Wiebe, D. V., and Young, E. Y.
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Cosmology and Nongalactic Astrophysics
Abstract: SPIDER is a balloon-borne instrument designed to map the cosmic microwave background at degree-angular scales in the presence of Galactic foregrounds. SPIDER has mapped a large sky area in the Southern Hemisphere using more than 2000 transition-edge sensors (TESs) during two NASA Long Duration Balloon flights above the Antarctic continent. During its first flight in January 2015, SPIDER observed in the 95 GHz and 150 GHz frequency bands, setting constraints on the B-mode signature of primordial gravitational waves. Its second flight in the 2022-23 season added new receivers at 280 GHz, each using an array of TESs coupled to the sky through feedhorns formed from stacks of silicon wafers. These receivers are optimized to produce deep maps of polarized Galactic dust emission over a large sky area, providing a unique data set with lasting value to the field. In this work, we describe the instrument's performance during SPIDER's second flight., Comment: Submitted to SPIE Astronomical Telescopes + Instrumentation 2024, JATIS
Published: 2024

50. Hierarchical Quantum Control Gates for Functional MRI Understanding

Author: Nguyen, Xuan-Bac, Nguyen, Hoang-Quan, Churchill, Hugh, Khan, Samee U., and Luu, Khoa
Subjects: Quantum Physics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Quantum computing has emerged as a powerful tool for solving complex problems intractable for classical computers, particularly in popular fields such as cryptography, optimization, and neurocomputing. In this paper, we present a new quantum-based approach named the Hierarchical Quantum Control Gates (HQCG) method for efficient understanding of Functional Magnetic Resonance Imaging (fMRI) data. This approach includes two novel modules: the Local Quantum Control Gate (LQCG) and the Global Quantum Control Gate (GQCG), which are designed to extract local and global features of fMRI signals, respectively. Our method operates end-to-end on a quantum machine, leveraging quantum mechanics to learn patterns within extremely high-dimensional fMRI signals, such as 30,000 samples which is a challenge for classical computers. Empirical results demonstrate that our approach significantly outperforms classical methods. Additionally, we found that the proposed quantum model is more stable and less prone to overfitting than the classical methods., Comment: Accepted to IEEE Workshop on Signal Processing Systems (SiPS 2024)
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

5,572 results on '"Luu, P."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources