63,694 results on '"Li, Bo"'
Search Results
2. Self-Branding through NFL Team Fanship: Fans’ Desired Self-Image and Its Implications for Branding Practices
- Author
-
Wang, Jerred Junqi, Braunstein-Minkove, Jessica R., Baker, Thomas A., Li, Bo, and Zhang, James J.
- Published
- 2024
3. Does Star Power Boost Soccer Match Attendance? Empirical Evidence from the Chinese Soccer League
- Author
-
Li, Bo, Liu, Yuanyang, Wang, Jerred Junqi, Scott, Olan K.M., and Stokowski, Sarah
- Published
- 2024
4. Translating Forensic Science in Detective Stories in Early Hong Kong Chinese Newspapers
- Author
-
Li, Bo
- Published
- 2021
5. A Complete Landscape of EFX Allocations of Mixed Manna on Graphs
- Author
-
Zhou, Yu, Wei, Tianze, Li, Minming, and Li, Bo
- Subjects
Computer Science - Computer Science and Game Theory - Abstract
We study envy-free up to any item (EFX) allocations on graphs where vertices and edges represent agents and items respectively. An agent is only interested in items that are incident to her and all other items have zero marginal values to her. Christodoulou et al. [EC, 2023] first proposed this setting and studied the case of goods. We extend this setting to the case of mixed manna where an item may be liked or disliked by its endpoint agents. In our problem, an agent has an arbitrary valuation over her incident items such that the items she likes have non-negative marginal values to her and those she dislikes have non-positive marginal values. We provide a complete study of the four notions of EFX for mixed manna in the literature, which differ by whether the removed item can have zero marginal value. We prove that an allocation that satisfies the notion of EFX where the virtually-removed item could always have zero marginal value may not exist and determining its existence is NP-complete, while one that satisfies any of the other three notions always exists and can be computed in polynomial time. We also prove that an orientation (i.e., a special allocation where each edge must be allocated to one of its endpoint agents) that satisfies any of the four notions may not exist, and determining its existence is NP-complete., Comment: Accepted in IJCAI 2024
- Published
- 2024
6. Recent Decade's Power Outage Data Reveals the Increasing Vulnerability of U.S. Power Infrastructure
- Author
-
Li, Bo, Ma, Junwei, Omitaomu, Femi, and Mostafavi, Ali
- Subjects
Physics - Physics and Society - Abstract
Despite significant anecdotal evidence regarding the vulnerability of the U.S. power infrastructure, there is a dearth of longitudinal and nation-level characterization of the spatial and temporal patterns in the frequency and extent of power outages. A data-driven national-level characterization of power outage vulnerability is particularly essential for understanding the urgency and formulating policies to promote the resilience of power infrastructure systems. Recognizing this, we retrieved 179,053,397 county-level power outage records with a 15-minute interval across 3,022 US counties during 2014-2023 to capture power outage characteristics. We focus on three dimensions--power outage intensity, frequency, and duration--and develop multiple metrics to quantify each dimension of power outage vulnerability. The results show that in the past ten years, the vulnerability of U.S. power system has consistently been increasing. Counties experienced an average of 999.4 outages over the decade, affecting an average of more than 540,000 customers per county, with disruptions occurring approximately every week. Coastal areas, particularly in California, Florida and New Jersey, faced more frequent and prolonged outages, while inland regions showed higher outage rates. A concerning increase in outage frequency and intensity was noted, especially after 2017, with a sharp rise in prolonged outages since 2019. The research also found positive association between social vulnerability and outage metrics, with the association becoming stronger over the years under study. Areas with higher social vulnerability experienced more severe and frequent outages, exacerbating challenges in these regions. These findings reveal the much-needed empirical evidence for stakeholders to inform policy formulation and program development for enhancing the resilience of the U.S. power infrastructure.
- Published
- 2024
7. Dynamics of Meta-learning Representation in the Teacher-student Scenario
- Author
-
Wang, Hui, Yip, Cho Tung, and Li, Bo
- Subjects
Computer Science - Machine Learning ,Condensed Matter - Disordered Systems and Neural Networks - Abstract
Gradient-based meta-learning algorithms have gained popularity for their ability to train models on new tasks using limited data. Empirical observations indicate that such algorithms are able to learn a shared representation across tasks, which is regarded as a key factor in their success. However, the in-depth theoretical understanding of the learning dynamics and the origin of the shared representation remains underdeveloped. In this work, we investigate the meta-learning dynamics of the non-linear two-layer neural networks trained on streaming tasks in the teach-student scenario. Through the lens of statistical physics analysis, we characterize the macroscopic behavior of the meta-training processes, the formation of the shared representation, and the generalization ability of the model on new tasks. The analysis also points to the importance of the choice of certain hyper-parameters of the learning algorithms.
- Published
- 2024
8. Robust Principal Component Analysis via Discriminant Sample Weight Learning
- Author
-
Deng, Yingzhuo, Hu, Ke, Li, Bo, and Zhang, Yao
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Principal component analysis (PCA) is a classical feature extraction method, but it may be adversely affected by outliers, resulting in inaccurate learning of the projection matrix. This paper proposes a robust method to estimate both the data mean and the PCA projection matrix by learning discriminant sample weights from data containing outliers. Each sample in the dataset is assigned a weight, and the proposed algorithm iteratively learns the weights, the mean, and the projection matrix, respectively. Specifically, when the mean and the projection matrix are available, via fine-grained analysis of outliers, a weight for each sample is learned hierarchically so that outliers have small weights while normal samples have large weights. With the learned weights available, a weighted optimization problem is solved to estimate both the data mean and the projection matrix. Because the learned weights discriminate outliers from normal samples, the adverse influence of outliers is mitigated due to the corresponding small weights. Experiments on toy data, UCI dataset, and face dataset demonstrate the effectiveness of the proposed method in estimating the mean and the projection matrix from the data containing outliers.
- Published
- 2024
9. LLM-PBE: Assessing Data Privacy in Large Language Models
- Author
-
Li, Qinbin, Hong, Junyuan, Xie, Chulin, Tan, Jeffrey, Xin, Rachel, Hou, Junyi, Yin, Xavier, Wang, Zhun, Hendrycks, Dan, Wang, Zhangyang, Li, Bo, He, Bingsheng, and Song, Dawn
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence - Abstract
Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex language data, however, bring to light pressing concerns regarding data privacy, especially the risk of unintentional training data leakage. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Addressing this gap, our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs. LLM-PBE is designed to analyze privacy across the entire lifecycle of LLMs, incorporating diverse attack and defense strategies, and handling various data types and metrics. Through detailed experimentation with multiple LLMs, LLM-PBE facilitates an in-depth exploration of data privacy concerns, shedding light on influential factors such as model size, data characteristics, and evolving temporal dimensions. This study not only enriches the understanding of privacy issues in LLMs but also serves as a vital resource for future research in the field. Aimed at enhancing the breadth of knowledge in this area, the findings, resources, and our full technical report are made available at https://llm-pbe.github.io/, providing an open platform for academic and practical advancements in LLM privacy assessment.
- Published
- 2024
10. Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge
- Author
-
Raju, Ravi, Jain, Swayambhoo, Li, Bo, Li, Jonathan, and Thakker, Urmish
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Large Language Models (LLMs) have revolutionized the landscape of machine learning, yet current benchmarks often fall short in capturing the diverse behavior of these models in real-world applications. A benchmark's usefulness is determined by its ability to clearly differentiate between models of varying capabilities (separability) and closely align with human preferences. Existing frameworks like Alpaca-Eval 2.0 LC \cite{dubois2024lengthcontrolledalpacaevalsimpleway} and Arena-Hard v0.1 \cite{li2024crowdsourced} are limited by their focus on general-purpose queries and lack of diversity across domains such as law, medicine, and multilingual contexts. In this paper, we address these limitations by introducing a novel data pipeline that curates diverse, domain-specific evaluation sets tailored for LLM-as-a-Judge frameworks. Our approach leverages a combination of manual curation, semi-supervised learning to generate clusters, and stratified sampling to ensure balanced representation across a wide range of domains and languages. The resulting evaluation set, which includes 1573 samples across 14 categories, demonstrates high separability (84\%) across ten top-ranked models, and agreement (84\%) with Chatbot Arena and (0.915) Spearman correlation. The agreement values are 9\% better than Arena Hard and 20\% better than AlpacaEval 2.0 LC, while the Spearman coefficient is 0.7 more than the next best benchmark, showcasing a significant improvement in the usefulness of the benchmark. We further provide an open-source evaluation tool that enables fine-grained analysis of model performance across user-defined categories, offering valuable insights for practitioners. This work contributes to the ongoing effort to enhance the transparency, diversity, and effectiveness of LLM evaluation methodologies., Comment: 14 pages, 8 figures, Under review
- Published
- 2024
11. Ultrabright-entanglement-based quantum key distribution over a 404-km-long optical fiber
- Author
-
Zhuang, Shi-Chang, Li, Bo, Zheng, Ming-Yang, Zeng, Yi-Xi, Wu, Hui-Nan, Li, Guang-Bing, Yao, Quan, Xie, Xiu-Ping, Li, Yu-Huai, Qin, Hao, You, Li-Xing, Xu, Fei-Hu, Yin, Juan, Cao, Yuan, Zhang, Qiang, Peng, Cheng-Zhi, and Pan, Jian-Wei
- Subjects
Quantum Physics - Abstract
The entangled photons are crucial resources for quantum communications and networking. Here, we present an ultra-bright polarization-entangled photon source based on a periodically poled lithium niobate waveguide designed for practical quantum communication networks. Using a 780 nm pump laser, the source achieves a pair generation rate of 2.4 $\times 10^{10}$ pairs/s/mW. This work has achieved a directly measured power of 17.9 nW in entangled photon generation with a 3.2 mW pump power. Based on this, we demonstrate the practicality of the source by conducting quantum key distribution experiments over long-distance fiber links, achieving the applicable secure key rates of up to 440.80 bits/s over 200 km with 62 dB loss and reaching a maximum secure key generation distance of 404 km. These results demonstrate the potential of wavelength-multiplexed polarization-entangled photon sources for high-speed, long-distance quantum communication, positioning them as key components for future large-scale quantum networks., Comment: 18 pages, 6 figures
- Published
- 2024
12. InstantStyleGaussian: Efficient Art Style Transfer with 3D Gaussian Splatting
- Author
-
Yu, Xin-Yi, Yu, Jun-Xin, Zhou, Li-Bo, Wei, Yan, and Ou, Lin-Lin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We present InstantStyleGaussian, an innovative 3D style transfer method based on the 3D Gaussian Splatting (3DGS) scene representation. By inputting a target-style image, it quickly generates new 3D GS scenes. Our method operates on pre-reconstructed GS scenes, combining diffusion models with an improved iterative dataset update strategy. It utilizes diffusion models to generate target style images, adds these new images to the training dataset, and uses this dataset to iteratively update and optimize the GS scenes, significantly accelerating the style editing process while ensuring the quality of the generated scenes. Extensive experimental results demonstrate that our method ensures high-quality stylized scenes while offering significant advantages in style transfer speed and consistency.
- Published
- 2024
13. LLaVA-OneVision: Easy Visual Task Transfer
- Author
-
Li, Bo, Zhang, Yuanhan, Guo, Dong, Zhang, Renrui, Li, Feng, Zhang, Hao, Zhang, Kaichen, Li, Yanwei, Liu, Ziwei, and Li, Chunyuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three important computer vision scenarios: single-image, multi-image, and video scenarios. Importantly, the design of LLaVA-OneVision allows strong transfer learning across different modalities/scenarios, yielding new emerging capabilities. In particular, strong video understanding and cross-scenario capabilities are demonstrated through task transfer from images to videos., Comment: Project Homepage: https://llava-vl.github.io/blog/2024-08-05-llava-onevision/
- Published
- 2024
14. Single-photon interference over 8.4 km urban atmosphere: towards testing quantum effects in curved spacetime with photons
- Author
-
Wu, Hui-Nan, Li, Yu-Huai, Li, Bo, You, Xiang, Liu, Run-Ze, Ren, Ji-Gang, Yin, Juan, Lu, Chao-Yang, Cao, Yuan, Peng, Cheng-Zhi, and Pan, Jian-Wei
- Subjects
Quantum Physics ,General Relativity and Quantum Cosmology ,Physics - Optics - Abstract
The emergence of quantum mechanics and general relativity has transformed our understanding of the natural world significantly. However, integrating these two theories presents immense challenges, and their interplay remains untested. Recent theoretical studies suggest that the single-photon interference covering huge space can effectively probe the interface between quantum mechanics and general relativity. We developed an alternative design using unbalanced Michelson interferometers to address this and validated its feasibility over an 8.4 km free-space channel. Using a high-brightness single-photon source based on quantum dots, we demonstrated single-photon interference along this long-distance baseline. We achieved a phase measurement precision of 16.2 mrad, which satisfied the measurement requirements for a gravitational redshift at the geosynchronous orbit by five times the standard deviation. Our results confirm the feasibility of the single-photon version of the Colella-Overhauser-Werner experiment for testing the quantum effects in curved spacetime., Comment: 22 pages, 6 figures
- Published
- 2024
- Full Text
- View/download PDF
15. From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
- Author
-
Jin, Haolin, Huang, Linghan, Cai, Haipeng, Yan, Jun, Li, Bo, and Chen, Huaming
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
With the rise of large language models (LLMs), researchers are increasingly exploring their applications in var ious vertical domains, such as software engineering. LLMs have achieved remarkable success in areas including code generation and vulnerability detection. However, they also exhibit numerous limitations and shortcomings. LLM-based agents, a novel tech nology with the potential for Artificial General Intelligence (AGI), combine LLMs as the core for decision-making and action-taking, addressing some of the inherent limitations of LLMs such as lack of autonomy and self-improvement. Despite numerous studies and surveys exploring the possibility of using LLMs in software engineering, it lacks a clear distinction between LLMs and LLM based agents. It is still in its early stage for a unified standard and benchmarking to qualify an LLM solution as an LLM-based agent in its domain. In this survey, we broadly investigate the current practice and solutions for LLMs and LLM-based agents for software engineering. In particular we summarise six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance. We review and differentiate the work of LLMs and LLM-based agents from these six topics, examining their differences and similarities in tasks, benchmarks, and evaluation metrics. Finally, we discuss the models and benchmarks used, providing a comprehensive analysis of their applications and effectiveness in software engineering. We anticipate this work will shed some lights on pushing the boundaries of LLM-based agents in software engineering for future research.
- Published
- 2024
16. Tamper-Resistant Safeguards for Open-Weight LLMs
- Author
-
Tamirisa, Rishub, Bharathi, Bhrugu, Phan, Long, Zhou, Andy, Gatti, Alice, Suresh, Tarun, Lin, Maxwell, Wang, Justin, Wang, Rowan, Arel, Ron, Zou, Andy, Song, Dawn, Li, Bo, Hendrycks, Dan, and Mazeika, Mantas
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Rapid advances in the capabilities of large language models (LLMs) have raised widespread concerns regarding their potential for malicious use. Open-weight LLMs present unique challenges, as existing safeguards lack robustness to tampering attacks that modify model weights. For example, recent works have demonstrated that refusal and unlearning safeguards can be trivially removed with a few steps of fine-tuning. These vulnerabilities necessitate new approaches for enabling the safe release of open-weight LLMs. We develop a method, called TAR, for building tamper-resistant safeguards into open-weight LLMs such that adversaries cannot remove the safeguards even after thousands of steps of fine-tuning. In extensive evaluations and red teaming analyses, we find that our method greatly improves tamper-resistance while preserving benign capabilities. Our results demonstrate that tamper-resistance is a tractable problem, opening up a promising new avenue to improve the safety and security of open-weight LLMs., Comment: Website: https://www.tamper-resistant-safeguards.com
- Published
- 2024
17. Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2
- Author
-
Tang, Lv and Li, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The Segment Anything Model (SAM), introduced by Meta AI Research as a generic object segmentation model, quickly garnered widespread attention and significantly influenced the academic community. To extend its application to video, Meta further develops Segment Anything Model 2 (SAM2), a unified model capable of both video and image segmentation. SAM2 shows notable improvements over its predecessor in terms of applicable domains, promptable segmentation accuracy, and running speed. However, this report reveals a decline in SAM2's ability to perceive different objects in images without prompts in its auto mode, compared to SAM. Specifically, we employ the challenging task of camouflaged object detection to assess this performance decrease, hoping to inspire further exploration of the SAM model family by researchers. The results of this paper are provided in \url{https://github.com/luckybird1994/SAMCOD}.
- Published
- 2024
18. OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance
- Author
-
Yao, Yongqiang, Tan, Jingru, Hu, Jiahao, Zhang, Feizhao, Jin, Xin, Li, Bo, Gong, Ruihao, and Liu, Pengfei
- Subjects
Computer Science - Artificial Intelligence - Abstract
Recently, vision-language instruct-tuning models have made significant progress due to their more comprehensive understanding of the world. In this work, we discovered that large-scale 3D parallel training on those models leads to an imbalanced computation load across different devices. The vision and language parts are inherently heterogeneous: their data distribution and model architecture differ significantly, which affects distributed training efficiency. We rebalanced the computational loads from data, model, and memory perspectives to address this issue, achieving more balanced computation across devices. These three components are not independent but are closely connected, forming an omniverse balanced training framework. Specifically, for the data, we grouped instances into new balanced mini-batches within and across devices. For the model, we employed a search-based method to achieve a more balanced partitioning. For memory optimization, we adaptively adjusted the re-computation strategy for each partition to utilize the available memory fully. We conducted extensive experiments to validate the effectiveness of our method. Compared with the open-source training code of InternVL-Chat, we significantly reduced GPU days, achieving about 1.8x speed-up. Our method's efficacy and generalizability were further demonstrated across various models and datasets. Codes will be released at https://github.com/ModelTC/OmniBal.
- Published
- 2024
19. Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy
- Author
-
Li, Bo, Wang, Wei, and Ye, Peng
- Subjects
Computer Science - Machine Learning - Abstract
Machine Learning has made remarkable progress in a wide range of fields. In many scenarios, learning is performed on datasets involving sensitive information, in which privacy protection is essential for learning algorithms. In this work, we study pure private learning in the agnostic model -- a framework reflecting the learning process in practice. We examine the number of users required under item-level (where each user contributes one example) and user-level (where each user contributes multiple examples) privacy and derive several improved upper bounds. For item-level privacy, our algorithm achieves a near optimal bound for general concept classes. We extend this to the user-level setting, rendering a tighter upper bound than the one proved by Ghazi et al. (2023). Lastly, we consider the problem of learning thresholds under user-level privacy and present an algorithm with a nearly tight user complexity.
- Published
- 2024
20. Observation of robust intrinsic C points generation with magneto-optical bound states in the continuum
- Author
-
Lv, Wenjing, Qin, Haoye, Su, Zengping, Zhang, Chengzhi, Huang, Jiongpeng, Shi, Yuzhi, Li, Bo, Genevet, Patrice, and Song, Qinghua
- Subjects
Physics - Optics - Abstract
C points, characterized by circular polarization in momentum space, play crucial roles in chiral wave manipulations. However, conventional approaches of achieving intrinsic C points using photonic crystals with broken symmetries suffer from low Q factor and are highly sensitive to structural geometry, rendering them fragile and susceptible to perturbations and disorders. In this letter, we report the realization of magneto-optical (MO) bound states in the continuum (BICs) using a symmetry-preserved planar photonic crystal, achieving intrinsic at-{\Gamma} C points that are robust against variation in structural geometry and external magnetic field. MO coupling between two dipole modes induces Zeeman splitting of the eigenfrequencies, leading to MO BICs and quasi-BICs with circular eigenstates for high-Q chiral responses. Furthermore, switchable C point handedness and circular dichroism are enabled by reversing the magnetic field. These findings unveil a new type of BICs with circular eigenstates and on-demand control of C points, paving the way for advanced chiral wave manipulation with enhanced light-matter interaction., Comment: 13 pages, 4 figures
- Published
- 2024
21. Balancing Complementarity and Consistency via Delayed Activation in Incomplete Multi-view Clustering
- Author
-
Li, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper study one challenging issue in incomplete multi-view clustering, where valuable complementary information from other views is always ignored. To be specific, we propose a framework that effectively balances Complementarity and Consistency information in Incomplete Multi-view Clustering (CoCo-IMC). Specifically, we design a dual network of delayed activation, which achieves a balance of complementarity and consistency among different views. The delayed activation could enriches the complementarity information that was ignored during consistency learning. Then, we recover the incomplete information and enhance the consistency learning by minimizing the conditional entropy and maximizing the mutual information across different views. This could be the first theoretical attempt to incorporate delayed activation into incomplete data recovery and the balance of complementarity and consistency. We have proved the effectiveness of CoCo-IMC in extensive comparative experiments with 12 state-of-the-art baselines on four publicly available datasets.
- Published
- 2024
22. SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging
- Author
-
Kong, Lingtong, Li, Bo, Xiong, Yike, Zhang, Hao, Gu, Hong, and Chen, Jinwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Multi-exposure High Dynamic Range (HDR) imaging is a challenging task when facing truncated texture and complex motion. Existing deep learning-based methods have achieved great success by either following the alignment and fusion pipeline or utilizing attention mechanism. However, the large computation cost and inference delay hinder them from deploying on resource limited devices. In this paper, to achieve better efficiency, a novel Selective Alignment Fusion Network (SAFNet) for HDR imaging is proposed. After extracting pyramid features, it jointly refines valuable area masks and cross-exposure motion in selected regions with shared decoders, and then fuses high quality HDR image in an explicit way. This approach can focus the model on finding valuable regions while estimating their easily detectable and meaningful motion. For further detail enhancement, a lightweight refine module is introduced which enjoys privileges from previous optical flow, selection masks and initial prediction. Moreover, to facilitate learning on samples with large motion, a new window partition cropping method is presented during training. Experiments on public and newly developed challenging datasets show that proposed SAFNet not only exceeds previous SOTA competitors quantitatively and qualitatively, but also runs order of magnitude faster. Code and dataset is available at https://github.com/ltkong218/SAFNet., Comment: Accepted by ECCV 2024
- Published
- 2024
23. White matter tract crossing and bottleneck regions in the fetal brain
- Author
-
Calixto, Camilo, Soldatelli, Matheus D., Li, Bo, Pierotich, Lana, Gholipour, Ali, Warfield, Simon K., and Karimi, Davood
- Subjects
Quantitative Biology - Neurons and Cognition - Abstract
There is a growing interest in using diffusion MRI to study the white matter tracts and structural connectivity of the fetal brain. Recent progress in data acquisition and processing suggests that this imaging modality has a unique role in elucidating the normal and abnormal patterns of neurodevelopment in utero. However, there have been no efforts to quantify the prevalence of crossing tracts and bottleneck regions, important issues that have been extensively researched for adult brains. In this work, we determined the brain regions with crossing tracts and bottlenecks between 23 and 36 gestational weeks. We performed probabilistic tractography on 59 fetal brain scans and extracted a set of 51 distinct white tracts, which we grouped into 10 major tract bundle groups. We analyzed the results to determine the patterns of tract crossings and bottlenecks. Our results showed that 20-25% of the white matter voxels included two or three crossing tracts. Bottlenecks were more prevalent. Between 75-80% of the voxels were characterized as bottlenecks, with more than 40% of the voxels involving four or more tracts. The results of this study highlight the challenge of fetal brain tractography and structural connectivity assessment and call for innovative image acquisition and analysis methods to mitigate these problems.
- Published
- 2024
24. Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation
- Author
-
Liu, Jinfeng, Kong, Lingtong, Li, Bo, Wang, Zerong, Gu, Hong, and Chen, Jinwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Self-supervised monocular depth estimation has gathered notable interest since it can liberate training from dependency on depth annotations. In monocular video training case, recent methods only conduct view synthesis between existing camera views, leading to insufficient guidance. To tackle this, we try to synthesize more virtual camera views by flow-based video frame interpolation (VFI), termed as temporal augmentation. For multi-frame inference, to sidestep the problem of dynamic objects encountered by explicit geometry-based methods like ManyDepth, we return to the feature fusion paradigm and design a VFI-assisted multi-frame fusion module to align and aggregate multi-frame features, using motion and occlusion information obtained by the flow-based VFI model. Finally, we construct a unified self-supervised learning framework, named Mono-ViFI, to bilaterally connect single- and multi-frame depth. In this framework, spatial data augmentation through image affine transformation is incorporated for data diversity, along with a triplet depth consistency loss for regularization. The single- and multi-frame models can share weights, making our framework compact and memory-efficient. Extensive experiments demonstrate that our method can bring significant improvements to current advanced architectures. Source code is available at https://github.com/LiuJF1226/Mono-ViFI., Comment: 27 pages, accepted by ECCV 2024
- Published
- 2024
25. Combining Climate Models using Bayesian Regression Trees and Random Paths
- Author
-
Yannotty, John C., Santner, Thomas J., Li, Bo, and Pratola, Matthew T.
- Subjects
Statistics - Methodology - Abstract
Climate models, also known as general circulation models (GCMs), are essential tools for climate studies. Each climate model may have varying accuracy across the input domain, but no single model is uniformly better than the others. One strategy to improving climate model prediction performance is to integrate multiple model outputs using input-dependent weights. Along with this concept, weight functions modeled using Bayesian Additive Regression Trees (BART) were recently shown to be useful for integrating multiple Effective Field Theories in nuclear physics applications. However, a restriction of this approach is that the weights could only be modeled as piecewise constant functions. To smoothly integrate multiple climate models, we propose a new tree-based model, Random Path BART (RPBART), that incorporates random path assignments into the BART model to produce smooth weight functions and smooth predictions of the physical system, all in a matrix-free formulation. The smoothness feature of RPBART requires a more complex prior specification, for which we introduce a semivariogram to guide its hyperparameter selection. This approach is easy to interpret, computationally cheap, and avoids an expensive cross-validation study. Finally, we propose a posterior projection technique to enable detailed analysis of the fitted posterior weight functions. This allows us to identify a sparse set of climate models that can largely recover the underlying system within a given spatial region as well as quantifying model discrepancy within the model set under consideration. Our method is demonstrated on an ensemble of 8 GCMs modeling the average monthly surface temperature., Comment: 52 pages, 18 figures
- Published
- 2024
26. AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
- Author
-
Chen, Zhaorun, Xiang, Zhen, Xiao, Chaowei, Song, Dawn, and Li, Bo
- Subjects
Computer Science - Machine Learning ,Computer Science - Cryptography and Security ,Computer Science - Information Retrieval - Abstract
LLM agents have demonstrated remarkable performance across various applications, primarily due to their advanced capabilities in reasoning, utilizing external knowledge and tools, calling APIs, and executing actions to interact with environments. Current agents typically utilize a memory module or a retrieval-augmented generation (RAG) mechanism, retrieving past knowledge and instances with similar embeddings from knowledge bases to inform task planning and execution. However, the reliance on unverified knowledge bases raises significant concerns about their safety and trustworthiness. To uncover such vulnerabilities, we propose a novel red teaming approach AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base. In particular, we form the trigger generation process as a constrained optimization to optimize backdoor triggers by mapping the triggered instances to a unique embedding space, so as to ensure that whenever a user instruction contains the optimized backdoor trigger, the malicious demonstrations are retrieved from the poisoned memory or knowledge base with high probability. In the meantime, benign instructions without the trigger will still maintain normal performance. Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning, and the optimized backdoor trigger exhibits superior transferability, in-context coherence, and stealthiness. Extensive experiments demonstrate AgentPoison's effectiveness in attacking three types of real-world LLM agents: RAG-based autonomous driving agent, knowledge-intensive QA agent, and healthcare EHRAgent. On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance (less than 1%) with a poison rate less than 0.1%., Comment: 22 pages, 13 figures, 7 tables
- Published
- 2024
27. LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
- Author
-
Zhang, Kaichen, Li, Bo, Zhang, Peiyuan, Pu, Fanyi, Cahyono, Joshua Adrian, Hu, Kairui, Liu, Shuai, Zhang, Yuanhan, Yang, Jingkang, Li, Chunyuan, and Liu, Ziwei
- Subjects
Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
The advances of large foundation models necessitate wide-coverage, low-cost, and zero-contamination benchmarks. Despite continuous exploration of language model evaluations, comprehensive studies on the evaluation of Large Multi-modal Models (LMMs) remain limited. In this work, we introduce LMMS-EVAL, a unified and standardized multimodal benchmark framework with over 50 tasks and more than 10 models to promote transparent and reproducible evaluations. Although LMMS-EVAL offers comprehensive coverage, we find it still falls short in achieving low cost and zero contamination. To approach this evaluation trilemma, we further introduce LMMS-EVAL LITE, a pruned evaluation toolkit that emphasizes both coverage and efficiency. Additionally, we present Multimodal LIVEBENCH that utilizes continuously updating news and online forums to assess models' generalization abilities in the wild, featuring a low-cost and zero-contamination evaluation approach. In summary, our work highlights the importance of considering the evaluation trilemma and provides practical solutions to navigate the trade-offs in evaluating large multi-modal models, paving the way for more effective and reliable benchmarking of LMMs. We opensource our codebase and maintain leaderboard of LIVEBENCH at https://github.com/EvolvingLMMs-Lab/lmms-eval and https://huggingface.co/spaces/lmms-lab/LiveBench., Comment: Code ad leaderboard are available at https://github.com/EvolvingLMMs-Lab/lmms-eval and https://huggingface.co/spaces/lmms-lab/LiveBench
- Published
- 2024
28. BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
- Author
-
Lin, Haohong, Ding, Wenhao, Chen, Jian, Shi, Laixi, Zhu, Jiacheng, Li, Bo, and Zhao, Ding
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies, especially in scenarios where exploration is costly or infeasible. Nevertheless, its performance often suffers from the objective mismatch between model and policy learning, resulting in inferior performance despite accurate model predictions. This paper first identifies the primary source of this mismatch comes from the underlying confounders present in offline data for MBRL. Subsequently, we introduce \textbf{B}ilin\textbf{E}ar \textbf{CAUS}al r\textbf{E}presentation~(BECAUSE), an algorithm to capture causal representation for both states and actions to reduce the influence of the distribution shift, thus mitigating the objective mismatch problem. Comprehensive evaluations on 18 tasks that vary in data quality and environment context demonstrate the superior performance of BECAUSE over existing offline RL algorithms. We show the generalizability and robustness of BECAUSE under fewer samples or larger numbers of confounders. Additionally, we offer theoretical analysis of BECAUSE to prove its error bound and sample efficiency when integrating causal representation into offline MBRL.
- Published
- 2024
29. AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies
- Author
-
Zeng, Yi, Yang, Yu, Zhou, Andy, Tan, Jeffrey Ziwei, Tu, Yuheng, Mai, Yifan, Klyman, Kevin, Pan, Minzhou, Jia, Ruoxi, Song, Dawn, Liang, Percy, and Li, Bo
- Subjects
Computer Science - Computers and Society ,Computer Science - Artificial Intelligence - Abstract
Foundation models (FMs) provide societal benefits but also amplify risks. Governments, companies, and researchers have proposed regulatory frameworks, acceptable use policies, and safety benchmarks in response. However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in recent regulations and policies, which makes it challenging to evaluate and compare FMs across these benchmarks. To bridge this gap, we introduce AIR-Bench 2024, the first AI safety benchmark aligned with emerging government regulations and company policies, following the regulation-based safety categories grounded in our AI risks study, AIR 2024. AIR 2024 decomposes 8 government regulations and 16 company policies into a four-tiered safety taxonomy with 314 granular risk categories in the lowest tier. AIR-Bench 2024 contains 5,694 diverse prompts spanning these categories, with manual curation and human auditing to ensure quality. We evaluate leading language models on AIR-Bench 2024, uncovering insights into their alignment with specified safety concerns. By bridging the gap between public benchmarks and practical AI risks, AIR-Bench 2024 provides a foundation for assessing model safety across jurisdictions, fostering the development of safer and more responsible AI systems.
- Published
- 2024
30. Chiral dynamics in soft triaxial nuclei
- Author
-
Li, Bo, Zhao, Pengwei, and Meng, Jie
- Subjects
Nuclear Theory - Abstract
The chirality in soft triaxial nuclei is investigated for the first time with the time-dependent and tilted axis cranking covariant density functional theories on a three-dimensional space lattice in a microscopic and self-consistent way. Taking the puzzling chiral nucleus $^{106}$Ag as an example, the experimental energies of the observed nearly degenerate bands are well reproduced without any free parameters beyond the well-defined density functional. A novel chiral mode in soft triaxial nuclei is newly revealed from the microscopic dynamics of the total angular momentum. This opens a new research area for the study of chirality, particularly in relation to soft nuclear shapes., Comment: 6 pages, 4 figures. arXiv admin note: text overlap with arXiv:2202.03043
- Published
- 2024
31. LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
- Author
-
Li, Feng, Zhang, Renrui, Zhang, Hao, Zhang, Yuanhan, Li, Bo, Li, Wei, Ma, Zejun, and Li, Chunyuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Visual instruction tuning has made considerable strides in enhancing the capabilities of Large Multimodal Models (LMMs). However, existing open LMMs largely focus on single-image tasks, their applications to multi-image scenarios remains less explored. Additionally, prior LMM research separately tackles different scenarios, leaving it impossible to generalize cross scenarios with new emerging capabilities. To this end, we introduce LLaVA-NeXT-Interleave, which simultaneously tackles Multi-image, Multi-frame (video), Multi-view (3D), and Multi-patch (single-image) scenarios in LMMs. To enable these capabilities, we regard the interleaved data format as a general template and compile the M4-Instruct dataset with 1,177.6k samples, spanning 4 primary domains with 14 tasks and 41 datasets. We also curate the LLaVA-Interleave Bench to comprehensively evaluate the multi-image performance of LMMs. Through extensive experiments, LLaVA-NeXT-Interleave achieves leading results in multi-image, video, and 3D benchmarks, while maintaining the performance of single-image tasks. Besides, our model also exhibits several emerging capabilities, e.g., transferring tasks across different settings and modalities. Code is available at https://github.com/LLaVA-VL/LLaVA-NeXT, Comment: Project Page: https://llava-vl.github.io/blog/2024-06-16-llava-next-interleave/
- Published
- 2024
32. $R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
- Author
-
Kang, Mintong and Li, Bo
- Subjects
Computer Science - Artificial Intelligence - Abstract
As LLMs become increasingly prevalent across various applications, it is critical to establish safety guardrails to moderate input/output content of LLMs. Existing guardrail models treat various safety categories independently and fail to explicitly capture the intercorrelations among them. This has led to limitations such as ineffectiveness due to inadequate training on long-tail data from correlated safety categories, susceptibility to jailbreaking attacks, and inflexibility regarding new safety categories. To address these limitations, we propose $R^2$-Guard, a robust reasoning enabled LLM guardrail via knowledge-enhanced logical reasoning. Specifically, $R^2$-Guard comprises two parts: data-driven category-specific learning and reasoning components. The data-driven guardrail models provide unsafety probabilities of moderated content on different safety categories. We then encode safety knowledge among different categories as first-order logical rules and embed them into a probabilistic graphic model (PGM) based reasoning component. The unsafety probabilities of different categories from data-driven guardrail models are sent to the reasoning component for final inference. We employ two types of PGMs: Markov logic networks (MLNs) and probabilistic circuits (PCs), and optimize PCs to achieve precision-efficiency balance via improved graph structure. To further perform stress tests for guardrail models, we employ a pairwise construction method to construct a new safety benchmark TwinSafety, which features principled categories. We demonstrate the effectiveness of $R^2$-Guard by comparisons with eight strong guardrail models on six safety benchmarks, and demonstrate the robustness of $R^2$-Guard against four SOTA jailbreaking attacks. $R^2$-Guard significantly surpasses SOTA method LlamaGuard by 30.2% on ToxicChat and by 59.5% against jailbreaking attacks.
- Published
- 2024
33. Role of membrane lipid hydrolysis genes in the aroma formalion of Chinese white pear ‘Xiang Mian Li'
- Author
-
Yi, Xingkai, Gao, Zhenghui, Zhang, Jinyun, Zhang, Xiaoling, Pan, Haifa, Qi, Yongjie, Qin, Gaihua, Liu, Chunyan, Chen, Zhengfeng, Li, Bo, and Xu, Yiliu
- Published
- 2020
- Full Text
- View/download PDF
34. Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness
- Author
-
Li, Yiquan, Chen, Zhongzhu, Jin, Kun, Wang, Jiongxiao, Li, Bo, and Xiao, Chaowei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion Purification, purifying noised images with diffusion models, has been widely used for enhancing certified robustness via randomized smoothing. However, existing frameworks often grapple with the balance between efficiency and effectiveness. While the Denoising Diffusion Probabilistic Model (DDPM) offers an efficient single-step purification, it falls short in ensuring purified images reside on the data manifold. Conversely, the Stochastic Diffusion Model effectively places purified images on the data manifold but demands solving cumbersome stochastic differential equations, while its derivative, the Probability Flow Ordinary Differential Equation (PF-ODE), though solving simpler ordinary differential equations, still requires multiple computational steps. In this work, we demonstrated that an ideal purification pipeline should generate the purified images on the data manifold that are as much semantically aligned to the original images for effectiveness in one step for efficiency. Therefore, we introduced Consistency Purification, an efficiency-effectiveness Pareto superior purifier compared to the previous work. Consistency Purification employs the consistency model, a one-step generative model distilled from PF-ODE, thus can generate on-manifold purified images with a single network evaluation. However, the consistency model is designed not for purification thus it does not inherently ensure semantic alignment between purified and original images. To resolve this issue, we further refine it through Consistency Fine-tuning with LPIPS loss, which enables more aligned semantic meaning while keeping the purified images on data manifold. Our comprehensive experiments demonstrate that our Consistency Purification framework achieves state-of the-art certified robustness and efficiency compared to baseline methods.
- Published
- 2024
35. Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules
- Author
-
Pan, Xinglin, Lin, Wenxiang, Shi, Shaohuai, Chu, Xiaowen, Sun, Weinong, and Li, Bo
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Machine Learning - Abstract
Sparsely-activated Mixture-of-Expert (MoE) layers have found practical applications in enlarging the model size of large-scale foundation models, with only a sub-linear increase in computation demands. Despite the wide adoption of hybrid parallel paradigms like model parallelism, expert parallelism, and expert-sharding parallelism (i.e., MP+EP+ESP) to support MoE model training on GPU clusters, the training efficiency is hindered by communication costs introduced by these parallel paradigms. To address this limitation, we propose Parm, a system that accelerates MP+EP+ESP training by designing two dedicated schedules for placing communication tasks. The proposed schedules eliminate redundant computations and communications and enable overlaps between intra-node and inter-node communications, ultimately reducing the overall training time. As the two schedules are not mutually exclusive, we provide comprehensive theoretical analyses and derive an automatic and accurate solution to determine which schedule should be applied in different scenarios. Experimental results on an 8-GPU server and a 32-GPU cluster demonstrate that Parm outperforms the state-of-the-art MoE training system, DeepSpeed-MoE, achieving 1.13$\times$ to 5.77$\times$ speedup on 1296 manually configured MoE layers and approximately 3$\times$ improvement on two real-world MoE models based on BERT and GPT-2.
- Published
- 2024
36. AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies
- Author
-
Zeng, Yi, Klyman, Kevin, Zhou, Andy, Yang, Yu, Pan, Minzhou, Jia, Ruoxi, Song, Dawn, Liang, Percy, and Li, Bo
- Subjects
Computer Science - Computers and Society ,Computer Science - Artificial Intelligence - Abstract
We present a comprehensive AI risk taxonomy derived from eight government policies from the European Union, United States, and China and 16 company policies worldwide, making a significant step towards establishing a unified language for generative AI safety evaluation. We identify 314 unique risk categories organized into a four-tiered taxonomy. At the highest level, this taxonomy encompasses System & Operational Risks, Content Safety Risks, Societal Risks, and Legal & Rights Risks. The taxonomy establishes connections between various descriptions and approaches to risk, highlighting the overlaps and discrepancies between public and private sector conceptions of risk. By providing this unified framework, we aim to advance AI safety through information sharing across sectors and the promotion of best practices in risk mitigation for generative AI models and systems.
- Published
- 2024
37. Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
- Author
-
Liu, Deyuan, Qin, Zhanyue, Wang, Hairu, Yang, Zhao, Wang, Zecheng, Rong, Fangying, Liu, Qingbin, Hao, Yanchao, Chen, Xi, Fan, Cunhang, Lv, Zhao, Tu, Zhiying, Chu, Dianhui, Li, Bo, and Sui, Dianbo
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
While large language models (LLMs) excel in many domains, their complexity and scale challenge deployment in resource-limited environments. Current compression techniques, such as parameter pruning, often fail to effectively utilize the knowledge from pruned parameters. To address these challenges, we propose Manifold-Based Knowledge Alignment and Layer Merging Compression (MKA), a novel approach that uses manifold learning and the Normalized Pairwise Information Bottleneck (NPIB) measure to merge similar layers, reducing model size while preserving essential performance. We evaluate MKA on multiple benchmark datasets and various LLMs. Our findings show that MKA not only preserves model performance but also achieves substantial compression ratios, outperforming traditional pruning methods. Moreover, when coupled with quantization, MKA delivers even greater compression. Specifically, on the MMLU dataset using the Llama3-8B model, MKA achieves a compression ratio of 43.75% with a minimal performance decrease of only 2.82\%. The proposed MKA method offers a resource-efficient and performance-preserving model compression technique for LLMs.
- Published
- 2024
38. BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
- Author
-
Zeng, Yi, Sun, Weiyu, Huynh, Tran Ngoc, Song, Dawn, Li, Bo, and Jia, Ruoxi
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence - Abstract
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions. The high dimensionality of potential triggers in the token space and the diverse range of malicious behaviors make this a critical challenge. We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relatively uniform drifts in the model's embedding space. Our bi-level optimization method identifies universal embedding perturbations that elicit unwanted behaviors and adjusts the model parameters to reinforce safe behaviors against these perturbations. Experiments show BEEAR reduces the success rate of RLHF time backdoor attacks from >95% to <1% and from 47% to 0% for instruction-tuning time backdoors targeting malicious code generation, without compromising model utility. Requiring only defender-defined safe and unwanted behaviors, BEEAR represents a step towards practical defenses against safety backdoors in LLMs, providing a foundation for further advancements in AI safety and security.
- Published
- 2024
39. Long Context Transfer from Language to Vision
- Author
-
Zhang, Peiyuan, Zhang, Kaichen, Li, Bo, Zeng, Guangtao, Yang, Jingkang, Zhang, Yuanhan, Wang, Ziyue, Tan, Haoran, Li, Chunyuan, and Liu, Ziwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Video sequences offer valuable temporal information, but existing large multimodal models (LMMs) fall short in understanding extremely long videos. Many works address this by reducing the number of visual tokens using visual resamplers. Alternatively, in this paper, we approach this problem from the perspective of the language model. By simply extrapolating the context length of the language backbone, we enable LMMs to comprehend orders of magnitude more visual tokens without any video training. We call this phenomenon long context transfer and carefully ablate its properties. To effectively measure LMMs' ability to generalize to long contexts in the vision modality, we develop V-NIAH (Visual Needle-In-A-Haystack), a purely synthetic long vision benchmark inspired by the language model's NIAH test. Our proposed Long Video Assistant (LongVA) can process 2000 frames or over 200K visual tokens without additional complexities. With its extended context length, LongVA achieves state-of-the-art performance on Video-MME among 7B-scale models by densely sampling more input frames. Our work is open-sourced at https://github.com/EvolvingLMMs-Lab/LongVA., Comment: Code, demo, and models are available at https://github.com/EvolvingLMMs-Lab/LongVA
- Published
- 2024
40. SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
- Author
-
Xie, Tinghao, Qi, Xiangyu, Zeng, Yi, Huang, Yangsibo, Sehwag, Udari Madhushani, Huang, Kaixuan, He, Luxi, Wei, Boyi, Li, Dacheng, Sheng, Ying, Jia, Ruoxi, Li, Bo, Li, Kai, Chen, Danqi, Henderson, Peter, and Mittal, Prateek
- Subjects
Computer Science - Artificial Intelligence - Abstract
Evaluating aligned large language models' (LLMs) ability to recognize and reject unsafe user requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts, however, face three limitations that we address with SORRY-Bench, our proposed benchmark. First, existing methods often use coarse-grained taxonomies of unsafe topics, and are over-representing some fine-grained topics. For example, among the ten existing datasets that we evaluated, tests for refusals of self-harm instructions are over 3x less represented than tests for fraudulent activities. SORRY-Bench improves on this by using a fine-grained taxonomy of 45 potentially unsafe topics, and 450 class-balanced unsafe instructions, compiled through human-in-the-loop methods. Second, linguistic characteristics and formatting of prompts are often overlooked, like different languages, dialects, and more -- which are only implicitly considered in many evaluations. We supplement SORRY-Bench with 20 diverse linguistic augmentations to systematically examine these effects. Third, existing evaluations rely on large LLMs (e.g., GPT-4) for evaluation, which can be computationally expensive. We investigate design choices for creating a fast, accurate automated safety evaluator. By collecting 7K+ human annotations and conducting a meta-evaluation of diverse LLM-as-a-judge designs, we show that fine-tuned 7B LLMs can achieve accuracy comparable to GPT-4 scale LLMs, with lower computational cost. Putting these together, we evaluate over 40 proprietary and open-source LLMs on SORRY-Bench, analyzing their distinctive refusal behaviors. We hope our effort provides a building block for systematic evaluations of LLMs' safety refusal capabilities, in a balanced, granular, and efficient manner.
- Published
- 2024
41. GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
- Author
-
Xiang, Zhen, Zheng, Linzhi, Li, Yanjie, Hong, Junyuan, Li, Qinbin, Xie, Han, Zhang, Jiawei, Xiong, Zidi, Xie, Chulin, Yang, Carl, Song, Dawn, and Li, Bo
- Subjects
Computer Science - Machine Learning - Abstract
The rapid advancement of large language models (LLMs) has catalyzed the deployment of LLM-powered agents across numerous applications, raising new concerns regarding their safety and trustworthiness. Existing methods for enhancing the safety of LLMs are not directly transferable to LLM-powered agents due to their diverse objectives and output modalities. In this paper, we propose GuardAgent, the first LLM agent as a guardrail to other LLM agents. Specifically, GuardAgent oversees a target LLM agent by checking whether its inputs/outputs satisfy a set of given guard requests defined by the users. GuardAgent comprises two steps: 1) creating a task plan by analyzing the provided guard requests, and 2) generating guardrail code based on the task plan and executing the code by calling APIs or using external engines. In both steps, an LLM is utilized as the core reasoning component, supplemented by in-context demonstrations retrieved from a memory module. Such knowledge-enabled reasoning allows GuardAgent to understand various textual guard requests and accurately "translate" them into executable code that provides reliable guardrails. Furthermore, GuardAgent is equipped with an extendable toolbox containing functions and APIs and requires no additional LLM training, which underscores its generalization capabilities and low operational overhead. Additionally, we propose two novel benchmarks: an EICU-AC benchmark for assessing privacy-related access control for healthcare agents and a Mind2Web-SC benchmark for safety evaluation for web agents. We show the effectiveness of GuardAgent on these two benchmarks with 98.7% and 90.0% accuracy in moderating invalid inputs and outputs for the two types of agents, respectively. We also show that GuardAgent is able to define novel functions in adaption to emergent LLM agents and guard requests, which underscores its strong generalization capabilities.
- Published
- 2024
42. Where to place a mosquito trap for West Nile Virus surveillance?
- Author
-
Chakravarti, Anwesha, Li, Bo, Bartlett, Dan, Irwin, Patrick, and Smith, Rebecca
- Subjects
Statistics - Applications - Abstract
The rapid spread of West Nile Virus (WNV) is a growing concern. With no vaccines or specific medications available, prevention through mosquito control is the only solution to curb the spread. Mosquito traps, used to detect viral presence in mosquito populations, are essential tools for WNV surveillance. But how do we decide where to place a mosquito trap? And what makes a good trap location, anyway? We present a robust statistical approach to determine a mosquito trap's ability to predict human WNV cases in the Chicago metropolitan area and its suburbs. We then use this value to detect the landscape, demographic, and socioeconomic factors associated with a mosquito trap's predictive ability. This approach enables resource-limited mosquito control programs to identify better trap locations while reducing trap numbers to increase trap-based surveillance efficiency. The approach can also be applied to a wide range of different environmental surveillance programs., Comment: 22 pages, 9 figures
- Published
- 2024
43. Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing
- Author
-
Shu, Youwei, Xiao, Xi, Wang, Derui, Cao, Yuxin, Chen, Siji, Xue, Jason, Li, Linyi, and Li, Bo
- Subjects
Computer Science - Machine Learning - Abstract
Randomized Smoothing (RS) is currently a scalable certified defense method providing robustness certification against adversarial examples. Although significant progress has been achieved in providing defenses against $\ell_p$ adversaries, the interaction between the smoothing distribution and the robustness certification still remains vague. In this work, we comprehensively study the effect of two families of distributions, named Exponential Standard Gaussian (ESG) and Exponential General Gaussian (EGG) distributions, on Randomized Smoothing and Double Sampling Randomized Smoothing (DSRS). We derive an analytic formula for ESG's certified radius, which converges to the origin formula of RS as the dimension $d$ increases. Additionally, we prove that EGG can provide tighter constant factors than DSRS in providing $\Omega(\sqrt{d})$ lower bounds of $\ell_2$ certified radius, and thus further addresses the curse of dimensionality in RS. Our experiments on real-world datasets confirm our theoretical analysis of the ESG distributions, that they provide almost the same certification under different exponents $\eta$ for both RS and DSRS. In addition, EGG brings a significant improvement to the DSRS certification, but the mechanism can be different when the classifier properties are different. Compared to the primitive DSRS, the increase in certified accuracy provided by EGG is prominent, up to 6.4% on ImageNet., Comment: ICML 2024 Poster
- Published
- 2024
44. Certifiably Byzantine-Robust Federated Conformal Prediction
- Author
-
Kang, Mintong, Lin, Zhen, Sun, Jimeng, Xiao, Cao, and Li, Bo
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Conformal prediction has shown impressive capacity in constructing statistically rigorous prediction sets for machine learning models with exchangeable data samples. The siloed datasets, coupled with the escalating privacy concerns related to local data sharing, have inspired recent innovations extending conformal prediction into federated environments with distributed data samples. However, this framework for distributed uncertainty quantification is susceptible to Byzantine failures. A minor subset of malicious clients can significantly compromise the practicality of coverage guarantees. To address this vulnerability, we introduce a novel framework Rob-FCP, which executes robust federated conformal prediction, effectively countering malicious clients capable of reporting arbitrary statistics with the conformal calibration process. We theoretically provide the conformal coverage bound of Rob-FCP in the Byzantine setting and show that the coverage of Rob-FCP is asymptotically close to the desired coverage level. We also propose a malicious client number estimator to tackle a more challenging setting where the number of malicious clients is unknown to the defender and theoretically shows its effectiveness. We empirically demonstrate the robustness of Rob-FCP against diverse proportions of malicious clients under a variety of Byzantine attacks on five standard benchmark and real-world healthcare datasets., Comment: Accepted to ICML 2024
- Published
- 2024
45. ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning
- Author
-
Xu, Zhangchen, Jiang, Fengqing, Niu, Luyao, Jia, Jinyuan, Li, Bo, and Poovendran, Radha
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
In Federated Learning (FL), a set of clients collaboratively train a machine learning model (called global model) without sharing their local training data. The local training data of clients is typically non-i.i.d. and heterogeneous, resulting in varying contributions from individual clients to the final performance of the global model. In response, many contribution evaluation methods were proposed, where the server could evaluate the contribution made by each client and incentivize the high-contributing clients to sustain their long-term participation in FL. Existing studies mainly focus on developing new metrics or algorithms to better measure the contribution of each client. However, the security of contribution evaluation methods of FL operating in adversarial environments is largely unexplored. In this paper, we propose the first model poisoning attack on contribution evaluation methods in FL, termed ACE. Specifically, we show that any malicious client utilizing ACE could manipulate the parameters of its local model such that it is evaluated to have a high contribution by the server, even when its local training data is indeed of low quality. We perform both theoretical analysis and empirical evaluations of ACE. Theoretically, we show our design of ACE can effectively boost the malicious client's perceived contribution when the server employs the widely-used cosine distance metric to measure contribution. Empirically, our results show ACE effectively and efficiently deceive five state-of-the-art contribution evaluation methods. In addition, ACE preserves the accuracy of the final global models on testing inputs. We also explore six countermeasures to defend ACE. Our results show they are inadequate to thwart ACE, highlighting the urgent need for new defenses to safeguard the contribution evaluation methods in FL., Comment: To appear in the 33rd USENIX Security Symposium, 2024
- Published
- 2024
46. RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
- Author
-
Yuan, Mingqi, Castanyer, Roger Creus, Li, Bo, Jin, Xin, Berseth, Glen, and Zeng, Wenjun
- Subjects
Computer Science - Machine Learning - Abstract
Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks. However, extrinsic rewards frequently fall short in complex environments due to the significant human effort needed for their design and annotation. This limitation underscores the necessity for intrinsic rewards, which offer auxiliary and dense signals and can enable agents to learn in an unsupervised manner. Although various intrinsic reward formulations have been proposed, their implementation and optimization details are insufficiently explored and lack standardization, thereby hindering research progress. To address this gap, we introduce RLeXplore, a unified, highly modularized, and plug-and-play framework offering reliable implementations of eight state-of-the-art intrinsic reward algorithms. Furthermore, we conduct an in-depth study that identifies critical implementation details and establishes well-justified standard practices in intrinsically-motivated RL. The source code for RLeXplore is available at https://github.com/RLE-Foundation/RLeXplore., Comment: 25 pages, 19 figures
- Published
- 2024
47. Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives
- Author
-
Yuan, Mingqi, Wang, Huijiang, Chu, Kai-Fung, Iida, Fumiya, Li, Bo, and Zeng, Wenjun
- Subjects
Computer Science - Robotics ,Computer Science - Machine Learning - Abstract
Advances in artificial intelligence (AI) have been propelling the evolution of human-robot interaction (HRI) technologies. However, significant challenges remain in achieving seamless interactions, particularly in tasks requiring physical contact with humans. These challenges arise from the need for accurate real-time perception of human actions, adaptive control algorithms for robots, and the effective coordination between human and robotic movements. In this paper, we propose an approach to enhancing physical HRI with a focus on dynamic robot-assisted hand-object interaction (HOI). Our methodology integrates hand pose estimation, adaptive robot control, and motion primitives to facilitate human-robot collaboration. Specifically, we employ a transformer-based algorithm to perform real-time 3D modeling of human hands from single RGB images, based on which a motion primitives model (MPM) is designed to translate human hand motions into robotic actions. The robot's action implementation is dynamically fine-tuned using the continuously updated 3D hand models. Experimental validations, including a ring-wearing task, demonstrate the system's effectiveness in adapting to real-time movements and assisting in precise task executions., Comment: 8 pages, 10 figures
- Published
- 2024
48. AI Risk Management Should Incorporate Both Safety and Security
- Author
-
Qi, Xiangyu, Huang, Yangsibo, Zeng, Yi, Debenedetti, Edoardo, Geiping, Jonas, He, Luxi, Huang, Kaixuan, Madhushani, Udari, Sehwag, Vikash, Shi, Weijia, Wei, Boyi, Xie, Tinghao, Chen, Danqi, Chen, Pin-Yu, Ding, Jeffrey, Jia, Ruoxi, Ma, Jiaqi, Narayanan, Arvind, Su, Weijie J, Wang, Mengdi, Xiao, Chaowei, Li, Bo, Song, Dawn, Henderson, Peter, and Mittal, Prateek
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence - Abstract
The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of "safety" and "security" themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.
- Published
- 2024
49. Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis
- Author
-
Li, Hui, Wang, Hongyu, Chen, Zhijin, Sun, Bohan, and Li, Bo
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Singing voice conversion is to convert the source sing voice into the target sing voice except for the content. Currently, flow-based models can complete the task of voice conversion, but they struggle to effectively extract latent variables in the more rhythmically rich and emotionally expressive task of singing voice conversion, while also facing issues with low efficiency in speech processing. In this paper, we propose a high-fidelity flow-based model based on multi-decoupling feature constraints, which enhances the capture of vocal details by integrating multiple encoders. We also use iSTFT to enhance the speed of speech processing by replacing some layers of the Vocoder. We compare the synthesized singing voice with other models from multiple dimensions, and our proposed model is highly consistent with the current state-of-the-art, with the demo which is available at \url{https://lazycat1119.github.io/RASVC-demo/}, Comment: 5 pages,4 figures
- Published
- 2024
50. Scalable Visual State Space Model with Fractal Scanning
- Author
-
Tang, Lv, Xiao, HaoKe, Jiang, Peng-Tao, Zhang, Hao, Chen, Jinwei, and Li, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Foundational models have significantly advanced in natural language processing (NLP) and computer vision (CV), with the Transformer architecture becoming a standard backbone. However, the Transformer's quadratic complexity poses challenges for handling longer sequences and higher resolution images. To address this challenge, State Space Models (SSMs) like Mamba have emerged as efficient alternatives, initially matching Transformer performance in NLP tasks and later surpassing Vision Transformers (ViTs) in various CV tasks. To improve the performance of SSMs, one crucial aspect is effective serialization of image patches. Existing methods, relying on linear scanning curves, often fail to capture complex spatial relationships and produce repetitive patterns, leading to biases. To address these limitations, we propose using fractal scanning curves for patch serialization. Fractal curves maintain high spatial proximity and adapt to different image resolutions, avoiding redundancy and enhancing SSMs' ability to model complex patterns accurately. We validate our method in image classification, detection, and segmentation tasks, and the superior performance validates its effectiveness., Comment: This paper is working in progress
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.