Author: "Li, Haoran" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Li, Haoran"' showing total 5,873 results

Start Over Author "Li, Haoran"

5,873 results on '"Li, Haoran"'

1. CPEG: Leveraging Consistency Policy with Consensus Guidance for Multi-agent Exploration

Author: Fu, Yuqian, Zhu, Yuanheng, Li, Haoran, Zhao, Zijie, Chai, Jiajun, and Zhao, Dongbin
Subjects: Computer Science - Multiagent Systems
Abstract: Efficient exploration is crucial in cooperative multi-agent reinforcement learning (MARL), especially in sparse-reward settings. However, due to the reliance on the unimodal policy, existing methods are prone to falling into the local optima, hindering the effective exploration of better policies. Furthermore, tackling multi-agent tasks in complex environments requires cooperation during exploration, posing substantial challenges for MARL methods. To address these issues, we propose a Consistency Policy with consEnsus Guidance (CPEG), with two primary components: (a) introducing a multimodal policy to enhance exploration capabilities, and (b) sharing the consensus among agents to foster agent cooperation. For component (a), CPEG incorporates a Consistency model as the policy, leveraging its multimodal nature and stochastic characteristics to facilitate exploration. Regarding component (b), CPEG introduces a Consensus Learner to deduce the consensus on the global state from local observations. This consensus then serves as a guidance for the Consistency Policy, promoting cooperation among agents. The proposed method is evaluated in multi-agent particle environments (MPE) and multi-agent MuJoCo (MAMuJoCo), and empirical results indicate that CPEG not only achieves improvements in sparse-reward settings but also matches the performance of baselines in dense-reward environments.
Published: 2024

2. Defense Against Prompt Injection Attack by Leveraging Attack Techniques

Author: Chen, Yulin, Li, Haoran, Zheng, Zihao, Song, Yangqiu, Wu, Dekai, and Hooi, Bryan
Subjects: Computer Science - Cryptography and Security
Abstract: With the advancement of technology, large language models (LLMs) have achieved remarkable performance across various natural language processing (NLP) tasks, powering LLM-integrated applications like Microsoft Copilot. However, as LLMs continue to evolve, new vulnerabilities, especially prompt injection attacks arise. These attacks trick LLMs into deviating from the original input instructions and executing the attacker's instructions injected in data content, such as retrieved results. Recent attack methods leverage LLMs' instruction-following abilities and their inabilities to distinguish instructions injected in the data content, and achieve a high attack success rate (ASR). When comparing the attack and defense methods, we interestingly find that they share similar design goals, of inducing the model to ignore unwanted instructions and instead to execute wanted instructions. Therefore, we raise an intuitive question: Could these attack techniques be utilized for defensive purposes? In this paper, we invert the intention of prompt injection methods to develop novel defense methods based on previous training-free attack methods, by repeating the attack process but with the original input instruction rather than the injected instruction. Our comprehensive experiments demonstrate that our defense techniques outperform existing training-free defense approaches, achieving state-of-the-art results., Comment: 9 pages
Published: 2024

3. MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Author: Xia, Peng, Zhu, Kangyu, Li, Haoran, Wang, Tianze, Shi, Weijia, Wang, Sheng, Zhang, Linjun, Zou, James, and Yao, Huaxiu
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Artificial Intelligence (AI) has demonstrated significant potential in healthcare, particularly in disease diagnosis and treatment planning. Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools. However, these models often suffer from factual hallucination, which can lead to incorrect diagnoses. Fine-tuning and retrieval-augmented generation (RAG) have emerged as methods to address these issues. However, the amount of high-quality data and distribution shifts between training data and deployment data limit the application of fine-tuning methods. Although RAG is lightweight and effective, existing RAG-based approaches are not sufficiently general to different medical domains and can potentially cause misalignment issues, both between modalities and between the model and the ground truth. In this paper, we propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs. Our approach introduces a domain-aware retrieval mechanism, an adaptive retrieved contexts selection method, and a provable RAG-based preference fine-tuning strategy. These innovations make the RAG process sufficiently general and reliable, significantly improving alignment when introducing retrieved contexts. Experimental results across five medical datasets (involving radiology, ophthalmology, pathology) on medical VQA and report generation demonstrate that MMed-RAG can achieve an average improvement of 43.8% in the factual accuracy of Med-LVLMs. Our data and code are available in https://github.com/richard-peng-xia/MMed-RAG.
Published: 2024

4. Persona Knowledge-Aligned Prompt Tuning Method for Online Debate

Author: Chan, Chunkit, Jiayang, Cheng, Liu, Xin, Yim, Yauwai, Jiang, Yuxin, Deng, Zheye, Li, Haoran, Song, Yangqiu, Wong, Ginny Y., and See, Simon
Subjects: Computer Science - Computation and Language
Abstract: Debate is the process of exchanging viewpoints or convincing others on a particular issue. Recent research has provided empirical evidence that the persuasiveness of an argument is determined not only by language usage but also by communicator characteristics. Researchers have paid much attention to aspects of languages, such as linguistic features and discourse structures, but combining argument persuasiveness and impact with the social personae of the audience has not been explored due to the difficulty and complexity. We have observed the impressive simulation and personification capability of ChatGPT, indicating a giant pre-trained language model may function as an individual to provide personae and exert unique influences based on diverse background knowledge. Therefore, we propose a persona knowledge-aligned framework for argument quality assessment tasks from the audience side. This is the first work that leverages the emergence of ChatGPT and injects such audience personae knowledge into smaller language models via prompt tuning. The performance of our pipeline demonstrates significant and consistent improvement compared to competitive architectures., Comment: Accepted to ECAI 2024
Published: 2024

5. SELU: Self-Learning Embodied MLLMs in Unknown Environments

Author: Li, Boyu, Jiang, Haobin, Ding, Ziluo, Xu, Xinrun, Li, Haoran, Zhao, Dongbin, and Lu, Zongqing
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, multimodal large language models (MLLMs) have demonstrated strong visual understanding and decision-making capabilities, enabling the exploration of autonomously improving MLLMs in unknown environments. However, external feedback like human or environmental feedback is not always available. To address this challenge, existing methods primarily focus on enhancing the decision-making capabilities of MLLMs through voting and scoring mechanisms, while little effort has been paid to improving the environmental comprehension of MLLMs in unknown environments. To fully unleash the self-learning potential of MLLMs, we propose a novel actor-critic self-learning paradigm, dubbed SELU, inspired by the actor-critic paradigm in reinforcement learning. The critic employs self-asking and hindsight relabeling to extract knowledge from interaction trajectories collected by the actor, thereby augmenting its environmental comprehension. Simultaneously, the actor is improved by the self-feedback provided by the critic, enhancing its decision-making. We evaluate our method in the AI2-THOR and VirtualHome environments, and SELU achieves critic improvements of approximately 28% and 30%, and actor improvements of about 20% and 24% via self-learning.
Published: 2024

6. Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

Author: Li, Haoran, Jiang, Zhennan, Chen, Yuhui, and Zhao, Dongbin
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art (SOTA) performance in 21 tasks across DeepMind control suite and Meta-world. To our knowledge, CP3ER is the first method to apply diffusion/consistency models to visual RL and demonstrates the potential of consistency models in visual RL. More visualization results are available at https://jzndd.github.io/CP3ER-Page/., Comment: Accepted at the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS2024)
Published: 2024

7. Advancing Event Causality Identification via Heuristic Semantic Dependency Inquiry Network

Author: Li, Haoran, Gao, Qiang, Wu, Hongmei, and Huang, Li
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval
Abstract: Event Causality Identification (ECI) focuses on extracting causal relations between events in texts. Existing methods for ECI primarily rely on causal features and external knowledge. However, these approaches fall short in two dimensions: (1) causal features between events in a text often lack explicit clues, and (2) external knowledge may introduce bias, while specific problems require tailored analyses. To address these issues, we propose SemDI - a simple and effective Semantic Dependency Inquiry Network for ECI. SemDI captures semantic dependencies within the context using a unified encoder. Then, it utilizes a Cloze Analyzer to generate a fill-in token based on comprehensive context understanding. Finally, this fill-in token is used to inquire about the causal relation between two events. Extensive experiments demonstrate the effectiveness of SemDI, surpassing state-of-the-art methods on three widely used benchmarks. Code is available at https://github.com/hrlics/SemDI., Comment: EMNLP 2024 camera-ready version. Code is released at https://github.com/hrlics/SemDI
Published: 2024

8. DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency

Author: Chen, Yang, Jia, Yuhang, Zhao, Shiwan, Jiang, Ziyue, Li, Haoran, Kang, Jiarong, and Qin, Yong
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: As text-based speech editing becomes increasingly prevalent, the demand for unrestricted free-text editing continues to grow. However, existing speech editing techniques encounter significant challenges, particularly in maintaining intelligibility and acoustic consistency when dealing with out-of-domain (OOD) text. In this paper, we introduce, DiffEditor, a novel speech editing model designed to enhance performance in OOD text scenarios through semantic enrichment and acoustic consistency. To improve the intelligibility of the edited speech, we enrich the semantic information of phoneme embeddings by integrating word embeddings extracted from a pretrained language model. Furthermore, we emphasize that interframe smoothing properties are critical for modeling acoustic consistency, and thus we propose a first-order loss function to promote smoother transitions at editing boundaries and enhance the overall fluency of the edited speech. Experimental results demonstrate that our model achieves state-of-the-art performance in both in-domain and OOD text scenarios.
Published: 2024

9. Spectroscopy of electric dipole and quadrupole transitions in $^{224}$Ra$^+$

Author: Kofford, Spencer, Li, Haoran, Kwapisz, Robert, Ready, Roy A., Sawhney, Akshay, Cheung, Oi Chee, Fan, Mingyu, and Jayich, Andrew M.
Subjects: Physics - Atomic Physics
Abstract: We report on spectroscopy of the low-lying electronic transitions in $^{224}$Ra$^+$. The ion's low charge to mass ratio and convenient wavelengths make $^{224}$Ra$^+$ a promising optical clock candidate. We measured the frequencies of the the $^2{S}_{1/2} \ $$\leftrightarrow$$\ ^2{P}_{1/2}$ cooling transition, the $^2{S}_{1/2}\ $$\leftrightarrow$$\ ^2{D}_{5/2}$ clock transition, the $^2{D}_{3/2} \ $$\leftrightarrow$$\ ^2{P}_{3/2}$ electric dipole transition, and the $^2{D}_{5/2} \ $$\leftrightarrow$$\ ^2{P}_{3/2}$ cleanout transition. From these measurements we calculate the frequencies of the $^2{D}_{3/2}\ $$\leftrightarrow$$\ ^2{P}_{1/2}$ repump transition, the $^2{S}_{1/2} \ $$\leftrightarrow$$\ ^2{D}_{3/2}$ electric quadrupole transition, and the $^2{S}_{1/2} \ $$\leftrightarrow$$\ ^2{P}_{3/2}$ electric dipole transition., Comment: 5 pages, 8 figures
Published: 2024

10. Distributed Invariant Kalman Filter for Object-level Multi-robot Pose SLAM

Author: Li, Haoying, Zeng, Qingcheng, Li, Haoran, Zhang, Yanglin, and Wu, Junfeng
Subjects: Computer Science - Robotics
Abstract: Cooperative localization and target tracking are essential for multi-robot systems to implement high-level tasks. To this end, we propose a distributed invariant Kalman filter based on covariance intersection for effective multi-robot pose estimation. The paper utilizes the object-level measurement models, which have condensed information further reducing the communication burden. Besides, by modeling states on special Lie groups, the better linearity and consistency of the invariant Kalman filter structure can be stressed. We also use a combination of CI and KF to avoid overly confident or conservative estimates in multi-robot systems with intricate and unknown correlations, and some level of robot degradation is acceptable through multi-robot collaboration. The simulation and real data experiment validate the practicability and superiority of the proposed algorithm.
Published: 2024

11. Metric learning guided sinogram denoising for cone beam CT enhancement.

Author: Li, Haoran, Tsai, Yun-Han, Liu, Hengjie, and Ruan, Dan
Subjects: cone beam computed tomography, deep learning, metric learning, Other Physical Sciences, Biomedical Engineering, Oncology and Carcinogenesis, Nuclear Medicine & Medical Imaging, Biomedical engineering, Medical and biological physics
Abstract: Cone beam computed tomography (CBCT) is a widely available modality, but its clinical utility has been limited by low detail conspicuity and quantitative accuracy. Convenient post-reconstruction denoising is subject to back projected patterned residual, but joint denoise-reconstruction is typically computationally expensive and complex. In this study, we develop and evaluate a novel Metric-learning guided wavelet transform reconstruction (MEGATRON) approach to enhance image domain quality with projection-domain processing. Projection domain based processing has the benefit of being simple, efficient, and compatible with various reconstruction toolkit and vendor platforms. However, they also typically show inferior performance in the final reconstructed image, because the denoising goals in projection and image domains do not necessarily align. Motivated by these observations, this work aims to translate the demand for quality enhancement from the quantitative image domain to the more easily operable projection domain. Specifically, the proposed paradigm consists of a metric learning module and a denoising network module. Via metric learning, enhancement objectives on the wavelet encoded sinogram domain data are defined to reflect post-reconstruction image discrepancy. The denoising network maps measured cone-beam projection to its enhanced version, driven by the learnt objective. In doing so, the denoiser operates in the convenient sinogram to sinogram fashion but reflects improvement in reconstructed image as the final goal. Implementation-wise, metric learning was formalized as optimizing the weighted fitting of wavelet subbands, and a res-Unet, which is a Unet structure with residual blocks, was used for denoising. To access quantitative reference, cone-beam projections were simulated using the X-ray based Cancer Imaging Simulation Toolkit (XCIST). In both learning modules, a data set of 123 human thoraxes, which was from Open-Source Imaging Consortium (OSIC) Pulmonary Fibrosis Progression challenge, was used. Reconstructed CBCT thoracic images were compared against ground truth FB and performance was assessed in root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). MEGATRON achieved RMSE in HU value, PSNR, and SSIM were 30.97 ± 4.25, 37.45 ± 1.78, and 93.23 ± 1.62, respectively. These values are on par with reported results from sophisticated physics-driven CBCT enhancement, demonstrating promise and utility of the proposed MEGATRON method. We have demonstrated that incorporating the proposed metric learning into sinogram denoising introduces awareness of reconstruction goal and improves final quantitative performance. The proposed approach is compatible with a wide range of denoiser network structures and reconstruction modules, to suit customized need or further improve performance.
Published: 2024

12. SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning

Author: Luo, Wang, Li, Haoran, Zhang, Zicheng, Han, Congying, Lv, Jiayu, and Guo, Tiande
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Model-based Offline Reinforcement Learning trains policies based on offline datasets and model dynamics, without direct real-world environment interactions. However, this method is inherently challenged by distribution shift. Previous approaches have primarily focused on tackling this issue directly leveraging off-policy mechanisms and heuristic uncertainty in model dynamics, but they resulted in inconsistent objectives and lacked a unified theoretical foundation. This paper offers a comprehensive analysis that disentangles the problem into two key components: model bias and policy shift. We provide both theoretical insights and empirical evidence to demonstrate how these factors lead to inaccuracies in value function estimation and impose implicit restrictions on policy learning. To address these challenges, we derive adjustment terms for model bias and policy shift within a unified probabilistic inference framework. These adjustments are seamlessly integrated into the vanilla reward function to create a novel Shifts-aware Reward (SAR), aiming at refining value learning and facilitating policy training. Furthermore, we introduce Shifts-aware Model-based Offline Reinforcement Learning (SAMBO-RL), a practical framework that efficiently trains classifiers to approximate the SAR for policy optimization. Empirically, we show that SAR effectively mitigates distribution shift, and SAMBO-RL demonstrates superior performance across various benchmarks, underscoring its practical effectiveness and validating our theoretical analysis.
Published: 2024

13. Analysis of The Limiting Spectral Distribution of Large Random Matrices of The Mar\v{c}enko-Pastur Type

Author: Li, Haoran
Subjects: Mathematics - Probability, Mathematics - Statistics Theory, 15B52
Abstract: Consider the random matrix $\bW_n = \bB_n + n^{-1}\bX_n^*\bA_n\bX_n$, where $\bA_n$ and $\bB_n$ are Hermitian matrices of dimensions $p \times p$ and $n \times n$, respectively, and $\bX_n$ is a $p \times n$ random matrix with independent and identically distributed entries of mean 0 and variance 1. Assume that $p$ and $n$ grow to infinity proportionally, and that the spectral measures of $\bA_n$ and $\bB_n$ converge as $p, n \to \infty$ towards two probability measures $\calA$ and $\calB$. Building on the groundbreaking work of \cite{marchenko1967distribution}, which demonstrated that the empirical spectral distribution of $\bW_n$ converges towards a probability measure $F$ characterized by its Stieltjes transform, this paper investigates the properties of $F$ when $\calB$ is a general measure. We show that $F$ has an analytic density at the region near where the Stieltjes transform of $\calB$ is bounded. The density closely resembles $C\sqrt{|x - x_0|}$ near certain edge points $x_0$ of its support for a wide class of $\calA$ and $\calB$. We provide a complete characterization of the support of $F$. Moreover, we show that $F$ can exhibit discontinuities at points where $\calB$ is discontinuous.
Published: 2024

14. Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory

Author: Li, Haoran, Fan, Wei, Chen, Yulin, Cheng, Jiayang, Chu, Tianshu, Zhou, Xuebing, Hu, Peizhao, and Song, Yangqiu
Subjects: Computer Science - Computation and Language, Computer Science - Cryptography and Security
Abstract: Privacy research has attracted wide attention as individuals worry that their private data can be easily leaked during interactions with smart devices, social platforms, and AI applications. Computer science researchers, on the other hand, commonly study privacy issues through privacy attacks and defenses on segmented fields. Privacy research is conducted on various sub-fields, including Computer Vision (CV), Natural Language Processing (NLP), and Computer Networks. Within each field, privacy has its own formulation. Though pioneering works on attacks and defenses reveal sensitive privacy issues, they are narrowly trapped and cannot fully cover people's actual privacy concerns. Consequently, the research on general and human-centric privacy research remains rather unexplored. In this paper, we formulate the privacy issue as a reasoning problem rather than simple pattern matching. We ground on the Contextual Integrity (CI) theory which posits that people's perceptions of privacy are highly correlated with the corresponding social context. Based on such an assumption, we develop the first comprehensive checklist that covers social identities, private attributes, and existing privacy regulations. Unlike prior works on CI that either cover limited expert annotated norms or model incomplete social context, our proposed privacy checklist uses the whole Health Insurance Portability and Accountability Act of 1996 (HIPAA) as an example, to show that we can resort to large language models (LLMs) to completely cover the HIPAA's regulations. Additionally, our checklist also gathers expert annotations across multiple ontologies to determine private information including but not limited to personally identifiable information (PII). We use our preliminary results on the HIPAA to shed light on future context-centric privacy research to cover more privacy regulations, social norms and standards.
Published: 2024

15. BaThe: Defense against the Jailbreak Attack in Multimodal Large Language Models by Treating Harmful Instruction as Backdoor Trigger

Author: Chen, Yulin, Li, Haoran, Zheng, Zihao, and Song, Yangqiu
Subjects: Computer Science - Cryptography and Security
Abstract: Multimodal Large Language Models (MLLMs) have showcased impressive performance in a variety of multimodal tasks. On the other hand, the integration of additional image modality may allow the malicious users to inject harmful content inside the images for jailbreaking. Unlike text-based LLMs, where adversaries need to select discrete tokens to conceal their malicious intent using specific algorithms, the continuous nature of image signals provides a direct opportunity for adversaries to inject harmful intentions. In this work, we propose $\textbf{BaThe}$ ($\textbf{Ba}$ckdoor $\textbf{T}$rigger S$\textbf{h}$i$\textbf{e}$ld), a simple yet effective jailbreak defense mechanism. Our work is motivated by recent research on jailbreak backdoor attack and virtual prompt backdoor attack in generative language models. Jailbreak backdoor attack uses harmful instructions combined with manually crafted strings as triggers to make the backdoored model generate prohibited responses. We assume that harmful instructions can function as triggers, and if we alternatively set rejection responses as the triggered response, the backdoored model then can defend against jailbreak attacks. We achieve this by utilizing virtual rejection prompt, similar to the virtual prompt backdoor attack. We embed the virtual rejection prompt into the soft text embeddings, which we call ``wedge''. Our comprehensive experiments demonstrate that BaThe effectively mitigates various types of jailbreak attacks and is adaptable to defend against unseen attacks, with minimal impact on MLLMs' performance.
Published: 2024

16. Node Level Graph Autoencoder: Unified Pretraining for Textual Graph Learning

Author: Hu, Wenbin, Jing, Huihao, Hu, Qi, Li, Haoran, and Song, Yangqiu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Textual graphs are ubiquitous in real-world applications, featuring rich text information with complex relationships, which enables advanced research across various fields. Textual graph representation learning aims to generate low-dimensional feature embeddings from textual graphs that can improve the performance of downstream tasks. A high-quality feature embedding should effectively capture both the structural and the textual information in a textual graph. However, most textual graph dataset benchmarks rely on word2vec techniques to generate feature embeddings, which inherently limits their capabilities. Recent works on textual graph representation learning can be categorized into two folds: supervised and unsupervised methods. Supervised methods finetune a language model on labeled nodes, which have limited capabilities when labeled data is scarce. Unsupervised methods, on the other hand, extract feature embeddings by developing complex training pipelines. To address these limitations, we propose a novel unified unsupervised learning autoencoder framework, named Node Level Graph AutoEncoder (NodeGAE). We employ language models as the backbone of the autoencoder, with pretraining on text reconstruction. Additionally, we add an auxiliary loss term to make the feature embeddings aware of the local graph structure. Our method maintains simplicity in the training process and demonstrates generalizability across diverse textual graphs and downstream tasks. We evaluate our method on two core graph representation learning downstream tasks: node classification and link prediction. Comprehensive experiments demonstrate that our approach substantially enhances the performance of diverse graph neural networks (GNNs) across multiple textual graph datasets.
Published: 2024

17. CrystalTac: 3D-Printed Vision-Based Tactile Sensor Family through Rapid Monolithic Manufacturing Technique

Author: Fan, Wen, Li, Haoran, and Zhang, Dandan
Subjects: Computer Science - Robotics, Electrical Engineering and Systems Science - Signal Processing
Abstract: Recently, vision-based tactile sensors (VBTSs) have gained popularity in robotics systems. The sensing mechanisms of most VBTSs can be categorised based on the type of tactile features they capture. Each category requires specific structural designs to convert physical contact into optical information. The complex architectures of VBTSs pose challenges for traditional manufacturing techniques in terms of design flexibility, cost-effectiveness, and quality stability. Previous research has shown that monolithic manufacturing using multi-material 3D printing technology can partially address these challenges. This study introduces the CrystalTac family, a series of VBTSs designed with a unique sensing mechanism and fabricated through rapid monolithic manufacturing. Case studies on CrystalTac-type sensors demonstrate their effective performance in tasks involving tactile perception, along with impressive cost-effectiveness and design flexibility. The CrystalTac family aims to highlight the potential of monolithic manufacturing in VBTS development and inspire further research in tactile sensing and manipulation., Comment: 32 pages, 12 figures
Published: 2024

18. Table-Filling via Mean Teacher for Cross-domain Aspect Sentiment Triplet Extraction

Author: Peng, Kun, Jiang, Lei, Li, Qian, Li, Haoran, Yu, Xiaoyan, Sun, Li, Sun, Shuo, Bi, Yanxian, and Peng, Hao
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Cross-domain Aspect Sentiment Triplet Extraction (ASTE) aims to extract fine-grained sentiment elements from target domain sentences by leveraging the knowledge acquired from the source domain. Due to the absence of labeled data in the target domain, recent studies tend to rely on pre-trained language models to generate large amounts of synthetic data for training purposes. However, these approaches entail additional computational costs associated with the generation process. Different from them, we discover a striking resemblance between table-filling methods in ASTE and two-stage Object Detection (OD) in computer vision, which inspires us to revisit the cross-domain ASTE task and approach it from an OD standpoint. This allows the model to benefit from the OD extraction paradigm and region-level alignment. Building upon this premise, we propose a novel method named \textbf{T}able-\textbf{F}illing via \textbf{M}ean \textbf{T}eacher (TFMT). Specifically, the table-filling methods encode the sentence into a 2D table to detect word relations, while TFMT treats the table as a feature map and utilizes a region consistency to enhance the quality of those generated pseudo labels. Additionally, considering the existence of the domain gap, a cross-domain consistency based on Maximum Mean Discrepancy is designed to alleviate domain shift problems. Our method achieves state-of-the-art performance with minimal parameters and computational costs, making it a strong baseline for cross-domain ASTE., Comment: Accepted by CIKM2024
Published: 2024

19. Adaptive Differentially Private Structural Entropy Minimization for Unsupervised Social Event Detection

Author: Yang, Zhiwei, Wei, Yuecen, Li, Haoran, Li, Qian, Jiang, Lei, Sun, Li, Yu, Xiaoyan, Hu, Chunming, and Peng, Hao
Subjects: Computer Science - Social and Information Networks, Computer Science - Artificial Intelligence
Abstract: Social event detection refers to extracting relevant message clusters from social media data streams to represent specific events in the real world. Social event detection is important in numerous areas, such as opinion analysis, social safety, and decision-making. Most current methods are supervised and require access to large amounts of data. These methods need prior knowledge of the events and carry a high risk of leaking sensitive information in the messages, making them less applicable in open-world settings. Therefore, conducting unsupervised detection while fully utilizing the rich information in the messages and protecting data privacy remains a significant challenge. To this end, we propose a novel social event detection framework, ADP-SEMEvent, an unsupervised social event detection method that prioritizes privacy. Specifically, ADP-SEMEvent is divided into two stages, i.e., the construction stage of the private message graph and the clustering stage of the private message graph. In the first stage, an adaptive differential privacy approach is used to construct a private message graph. In this process, our method can adaptively apply differential privacy based on the events occurring each day in an open environment to maximize the use of the privacy budget. In the second stage, to address the reduction in data utility caused by noise, a novel 2-dimensional structural entropy minimization algorithm based on optimal subgraphs is used to detect events in the message graph. The highlight of this process is unsupervised and does not compromise differential privacy. Extensive experiments on two public datasets demonstrate that ADP-SEMEvent can achieve detection performance comparable to state-of-the-art methods while maintaining reasonable privacy budget parameters., Comment: Accepted to ACM CIKM 2024
Published: 2024
Full Text: View/download PDF

20. Laser Cooling of Radium-225 Ions

Author: Ready, Roy, Li, Haoran, Kofford, Spencer, Kwapisz, Robert, Dan, Huaxu, Sawhney, Akshay, Fan, Mingyu, Holliman, Craig, Shi, Xiaoyang, Sever-Walter, Luka, Gaiser, A. N., Griswold, J. R., and Jayich, A. M.
Subjects: Physics - Atomic Physics
Abstract: Radium-225 (nuclear spin $I=1/2$) ions possess electronic hyperfine transitions that are first-order insensitive to magnetic field noise, which is advantageous for optical clocks and quantum information science. We report on laser cooling and trapping of radium-225 ions and hyperfine splitting measurements of the ion's $7s$ $^2S_{1/2}$, $7p$ $^2P_{1/2}$, and $6d$ $^2D_{3/2}$ states. We measured the ground state hyperfine constant, $A(^2S_{1/2}) = -27.684511056(9)\ \mathrm{GHz}$, and the quadratic Zeeman coefficient, $C_2 = 142.3(10)\ \mathrm{Hz\ G}^{-2}$, of the $^2S_{1/2} (F=0, m_F = 0) \leftrightarrow~^2S_{1/2} (F=1, m_{F} = 0)$ transition. We also measured the hyperfine constants of the $^2P_{1/2}$ state, $A(^2P_{1/2}) = -5.447(4)\ \mathrm{GHz}$, and the $^2D_{3/2}$ state, $A(^2D_{3/2}) = -619.7(11)\ \mathrm{MHz}$., Comment: 5 pages, 4 figures
Published: 2024

21. A review of graph neural network applications in mechanics-related domains

Author: Zhao, Yingxue, Li, Haoran, Zhou, Haosu, Attar, Hamid Reza, Pfaff, Tobias, and Li, Nan
Subjects: Computer Science - Machine Learning, Mathematical Physics
Abstract: Mechanics-related problems often present unique challenges in achieving accurate geometric and physical representations, particularly for non-uniform structures. Graph neural networks (GNNs) have emerged as a promising tool to tackle these challenges by adeptly learning from graph data with irregular underlying structures. Consequently, recent years have witnessed a surge in complex mechanics-related applications inspired by the advancements of GNNs. Despite this process, there is a notable absence of a systematic review addressing the recent advancement of GNNs in solving mechanics-related problems. To bridge this gap, this review article aims to provide an in-depth overview of the GNN applications in mechanics-related domains while identifying key challenges and outlining potential future research directions. In this review article, we begin by introducing the fundamental algorithms of GNNs that are widely employed in mechanics-related applications. We provide a concise explanation of their underlying principles to establish a solid understanding that will serve as a basis for exploring the applications of GNNs in mechanics-related domains. The scope of this paper is intended to cover the categorisation of literature into solid mechanics, fluid mechanics, and interdisciplinary mechanics-related domains, providing a comprehensive summary of graph representation methodologies, GNN architectures, and further discussions in their respective subdomains. Additionally, open data and source codes relevant to these applications are summarised for the convenience of future researchers. This article promotes an interdisciplinary integration of GNNs and mechanics and provides a guide for researchers interested in applying GNNs to solve complex mechanics-related problems., Comment: 28 pages, 10 figures, 4 tables
Published: 2024

22. RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Author: Xia, Peng, Zhu, Kangyu, Li, Haoran, Zhu, Hongtu, Li, Yun, Li, Gang, Zhang, Linjun, and Yao, Huaxiu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computers and Society
Abstract: The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges. First, limited retrieved contexts might not cover all necessary information, while excessive retrieval can introduce irrelevant and inaccurate references, interfering with the model's generation. Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers. To address these issues, we propose RULE, which consists of two components. First, we introduce a provably effective strategy for controlling factuality risk through the calibrated selection of the number of retrieved contexts. Second, based on samples where over-reliance on retrieved contexts led to errors, we curate a preference dataset to fine-tune the model, balancing its dependence on inherent knowledge and retrieved contexts for generation. We demonstrate the effectiveness of RULE on medical VQA and report generation tasks across three datasets, achieving an average improvement of 47.4% in factual accuracy. We publicly release our benchmark and code in https://github.com/richard-peng-xia/RULE., Comment: EMNLP 2024 main
Published: 2024

23. Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo Labeling

Author: Li, Haoran, Li, Xingjian, Shi, Jiahua, Chen, Huaming, Du, Bo, Kihara, Daisuke, Barthelemy, Johan, Shen, Jun, and Xu, Min
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology facilitating the study of macromolecular structures at near-atomic resolution. Recent volumetric segmentation approaches on cryo-ET images have drawn widespread interest in biological sector. However, existing methods heavily rely on manually labeled data, which requires highly professional skills, thereby hindering the adoption of fully-supervised approaches for cryo-ET images. Some unsupervised domain adaptation (UDA) approaches have been designed to enhance the segmentation network performance using unlabeled data. However, applying these methods directly to cryo-ET images segmentation tasks remains challenging due to two main issues: 1) the source data, usually obtained through simulation, contain a certain level of noise, while the target data, directly collected from raw-data from real-world scenario, have unpredictable noise levels. 2) the source data used for training typically consists of known macromoleculars, while the target domain data are often unknown, causing the model's segmenter to be biased towards these known macromolecules, leading to a domain shift problem. To address these challenges, in this work, we introduce the first voxel-wise unsupervised domain adaptation approach, termed Vox-UDA, specifically for cryo-ET subtomogram segmentation. Vox-UDA incorporates a noise generation module to simulate target-like noises in the source dataset for cross-noise level adaptation. Additionally, we propose a denoised pseudo-labeling strategy based on improved Bilateral Filter to alleviate the domain shift problem. Experimental results on both simulated and real cryo-ET subtomogram datasets demonstrate the superiority of our proposed approach compared to state-of-the-art UDA methods., Comment: 11 pages
Published: 2024

24. LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

Author: Du, Jiangshu, Wang, Yibo, Zhao, Wenting, Deng, Zhongfen, Liu, Shuaiqi, Lou, Renze, Zou, Henry Peng, Venkit, Pranav Narayanan, Zhang, Nan, Srinath, Mukund, Zhang, Haoran Ranran, Gupta, Vipul, Li, Yinghui, Li, Tao, Wang, Fei, Liu, Qin, Liu, Tianlin, Gao, Pengzhi, Xia, Congying, Xing, Chen, Cheng, Jiayang, Wang, Zhaowei, Su, Ying, Shah, Raj Sanjay, Guo, Ruohao, Gu, Jing, Li, Haoran, Wei, Kangda, Wang, Zihao, Cheng, Lu, Ranathunga, Surangika, Fang, Meng, Fu, Jie, Liu, Fei, Huang, Ruihong, Blanco, Eduardo, Cao, Yixin, Zhang, Rui, Yu, Philip S., and Yin, Wenpeng
Subjects: Computer Science - Computation and Language
Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload? This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with "deficiency" labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) "LLMs as Reviewers", how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) "LLMs as Metareviewers", how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis., Comment: Accepted by EMNLP 2024 main conference
Published: 2024

25. Tactile SoftHand-A: 3D-Printed, Tactile, Highly-underactuated, Anthropomorphic Robot Hand with an Antagonistic Tendon Mechanism

Author: Li, Haoran, Ford, Christopher J., Lu, Chenghua, Lin, Yijiong, Bianchi, Matteo, Catalano, Manuel G., Psomopoulou, Efi, and Lepora, Nathan F.
Subjects: Computer Science - Robotics
Abstract: For tendon-driven multi-fingered robotic hands, ensuring grasp adaptability while minimizing the number of actuators needed to provide human-like functionality is a challenging problem. Inspired by the Pisa/IIT SoftHand, this paper introduces a 3D-printed, highly-underactuated, five-finger robotic hand named the Tactile SoftHand-A, which features only two actuators. The dual-tendon design allows for the active control of specific (distal or proximal interphalangeal) joints to adjust the hand's grasp gesture. We have also developed a new design of fully 3D-printed tactile sensor that requires no hand assembly and is printed directly as part of the robotic finger. This sensor is integrated into the fingertips and combined with the antagonistic tendon mechanism to develop a human-hand-guided tactile feedback grasping system. The system can actively mirror human hand gestures, adaptively stabilize grasp gestures upon contact, and adjust grasp gestures to prevent object movement after detecting slippage. Finally, we designed four different experiments to evaluate the novel fingers coupled with the antagonistic mechanism for controlling the robotic hand's gestures, adaptive grasping ability, and human-hand-guided tactile feedback grasping capability. The experimental results demonstrate that the Tactile SoftHand-A can adaptively grasp objects of a wide range of shapes and automatically adjust its gripping gestures upon detecting contact and slippage. Overall, this study points the way towards a class of low-cost, accessible, 3D-printable, underactuated human-like robotic hands, and we openly release the designs to facilitate others to build upon this work. This work is Open-sourced at github.com/SoutheastWind/Tactile_SoftHand_A, Comment: 17 pages, 13 figures
Published: 2024

26. GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory

Author: Fan, Wei, Li, Haoran, Deng, Zheye, Wang, Weiqi, and Song, Yangqiu
Subjects: Computer Science - Computation and Language, Computer Science - Cryptography and Security
Abstract: Privacy issues arise prominently during the inappropriate transmission of information between entities. Existing research primarily studies privacy by exploring various privacy attacks, defenses, and evaluations within narrowly predefined patterns, while neglecting that privacy is not an isolated, context-free concept limited to traditionally sensitive data (e.g., social security numbers), but intertwined with intricate social contexts that complicate the identification and analysis of potential privacy violations. The advent of Large Language Models (LLMs) offers unprecedented opportunities for incorporating the nuanced scenarios outlined in privacy laws to tackle these complex privacy issues. However, the scarcity of open-source relevant case studies restricts the efficiency of LLMs in aligning with specific legal statutes. To address this challenge, we introduce a novel framework, GoldCoin, designed to efficiently ground LLMs in privacy laws for judicial assessing privacy violations. Our framework leverages the theory of contextual integrity as a bridge, creating numerous synthetic scenarios grounded in relevant privacy statutes (e.g., HIPAA), to assist LLMs in comprehending the complex contexts for identifying privacy risks in the real world. Extensive experimental results demonstrate that GoldCoin markedly enhances LLMs' capabilities in recognizing privacy risks across real court cases, surpassing the baselines on different judicial tasks., Comment: Accepted by EMNLP 2024
Published: 2024

27. On the Role of Entity and Event Level Conceptualization in Generalizable Reasoning: A Survey of Tasks, Methods, Applications, and Future Directions

Author: Wang, Weiqi, Fang, Tianqing, Shi, Haochen, Xu, Baixuan, Ding, Wenxuan, Zhang, Liyu, Fan, Wei, Bai, Jiaxin, Li, Haoran, Liu, Xin, and Song, Yangqiu
Subjects: Computer Science - Computation and Language
Abstract: Entity- and event-level conceptualization, as fundamental elements of human cognition, plays a pivotal role in generalizable reasoning. This process involves abstracting specific instances into higher-level concepts and forming abstract knowledge that can be applied in unfamiliar or novel situations, which can enhance models' inferential capabilities and support the effective transfer of knowledge across various domains. Despite its significance, there is currently a lack of a systematic overview that comprehensively examines existing works in the definition, execution, and application of conceptualization to enhance reasoning tasks. In this paper, we address this gap by presenting the first comprehensive survey of 150+ papers, categorizing various definitions, resources, methods, and downstream applications related to conceptualization into a unified taxonomy, with a focus on the entity and event levels. Furthermore, we shed light on potential future directions in this field and hope to garner more attention from the community.
Published: 2024

28. TacShade A New 3D-printed Soft Optical Tactile Sensor Based on Light, Shadow and Greyscale for Shape Reconstruction

Author: Lu, Zhenyu, Yang, Jialong, Li, Haoran, Li, Yifan, Si, Weiyong, Lepora, Nathan, and Yang, Chenguang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Robotics
Abstract: In this paper, we present the TacShade a newly designed 3D-printed soft optical tactile sensor. The sensor is developed for shape reconstruction under the inspiration of sketch drawing that uses the density of sketch lines to draw light and shadow, resulting in the creation of a 3D-view effect. TacShade, building upon the strengths of the TacTip, a single-camera tactile sensor of large in-depth deformation and being sensitive to edge and surface following, improves the structure in that the markers are distributed within the gap of papillae pins. Variations in light, dark, and grey effects can be generated inside the sensor through external contact interactions. The contours of the contacting objects are outlined by white markers, while the contact depth characteristics can be indirectly obtained from the distribution of black pins and white markers, creating a 2.5D visualization. Based on the imaging effect, we improve the Shape from Shading (SFS) algorithm to process tactile images, enabling a coarse but fast reconstruction for the contact objects. Two experiments are performed. The first verifies TacShade s ability to reconstruct the shape of the contact objects through one image for object distinction. The second experiment shows the shape reconstruction capability of TacShade for a large panel with ridged patterns based on the location of robots and image splicing technology., Comment: This paper has been accepted by ICRA 2024
Published: 2024

29. Positive Attitudes Toward LGBTQ People in Mainland China (Chinese)

Author: Meyer, Ilan H., Li, Haoran, Bouton, Lauren J.A., Hong, Chenglin, and Pachankis, John E.
Subjects: International, LGBTQ, China, public opinion, stigma, sexual minorities, sexual orientation, gender identity
Published: 2024

30. Positive Attitudes toward LGBTQ People in Mainland China

Author: Meyer, Ilan H., Li, Haoran, Bouton, Lauren J.A., Hong, Chenglin, and Pachankis, John E.
Subjects: International, LGBTQ, China, public opinion, stigma, sexual minorities, sexual orientation, gender identit
Published: 2024

31. Analytic Federated Learning

Author: Zhuang, Huiping, He, Run, Tong, Kai, Fang, Di, Sun, Han, Li, Haoran, Chen, Tianyi, and Zeng, Ziqian
Subjects: Computer Science - Machine Learning
Abstract: In this paper, we introduce analytic federated learning (AFL), a new training paradigm that brings analytical (i.e., closed-form) solutions to the federated learning (FL) community. Our AFL draws inspiration from analytic learning -- a gradient-free technique that trains neural networks with analytical solutions in one epoch. In the local client training stage, the AFL facilitates a one-epoch training, eliminating the necessity for multi-epoch updates. In the aggregation stage, we derive an absolute aggregation (AA) law. This AA law allows a single-round aggregation, removing the need for multiple aggregation rounds. More importantly, the AFL exhibits a \textit{weight-invariant} property, meaning that regardless of how the full dataset is distributed among clients, the aggregated result remains identical. This could spawn various potentials, such as data heterogeneity invariance, client-number invariance, absolute convergence, and being hyperparameter-free (our AFL is the first hyperparameter-free method in FL history). We conduct experiments across various FL settings including extremely non-IID ones, and scenarios with a large number of clients (e.g., $\ge 1000$). In all these settings, our AFL constantly performs competitively while existing FL techniques encounter various obstacles. Code is available at \url{https://github.com/ZHUANGHP/Analytic-federated-learning}
Published: 2024

32. Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data

Author: Li, Haoran, Zhao, Xinyuan, Guo, Dadi, Gu, Hanlin, Zeng, Ziqian, Han, Yuxing, Song, Yangqiu, Fan, Lixin, and Yang, Qiang
Subjects: Computer Science - Cryptography and Security, Computer Science - Computation and Language
Abstract: As large language models (LLMs) demonstrate unparalleled performance and generalization ability, LLMs are widely used and integrated into various applications. When it comes to sensitive domains, as commonly described in federated learning scenarios, directly using external LLMs on private data is strictly prohibited by stringent data security and privacy regulations. For local clients, the utilization of LLMs to improve the domain-specific small language models (SLMs), characterized by limited computational resources and domain-specific data, has attracted considerable research attention. By observing that LLMs can empower domain-specific SLMs, existing methods predominantly concentrate on leveraging the public data or LLMs to generate more data to transfer knowledge from LLMs to SLMs. However, due to the discrepancies between LLMs' generated data and clients' domain-specific data, these methods cannot yield substantial improvements in the domain-specific tasks. In this paper, we introduce a Federated Domain-specific Knowledge Transfer (FDKT) framework, which enables domain-specific knowledge transfer from LLMs to SLMs while preserving clients' data privacy. The core insight is to leverage LLMs to augment data based on domain-specific few-shot demonstrations, which are synthesized from private domain data using differential privacy. Such synthetic samples share similar data distribution with clients' private data and allow the server LLM to generate particular knowledge to improve clients' SLMs. The extensive experimental results demonstrate that the proposed FDKT framework consistently and greatly improves SLMs' task performance by around 5\% with a privacy budget of less than 10, compared to local training on private data.
Published: 2024

33. Backdoor Removal for Generative Large Language Models

Author: Li, Haoran, Chen, Yulin, Zheng, Zihao, Hu, Qi, Chan, Chunkit, Liu, Heshan, and Song, Yangqiu
Subjects: Computer Science - Cryptography and Security, Computer Science - Computation and Language
Abstract: With rapid advances, generative large language models (LLMs) dominate various Natural Language Processing (NLP) tasks from understanding to reasoning. Yet, language models' inherent vulnerabilities may be exacerbated due to increased accessibility and unrestricted model training on massive textual data from the Internet. A malicious adversary may publish poisoned data online and conduct backdoor attacks on the victim LLMs pre-trained on the poisoned data. Backdoored LLMs behave innocuously for normal queries and generate harmful responses when the backdoor trigger is activated. Despite significant efforts paid to LLMs' safety issues, LLMs are still struggling against backdoor attacks. As Anthropic recently revealed, existing safety training strategies, including supervised fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), fail to revoke the backdoors once the LLM is backdoored during the pre-training stage. In this paper, we present Simulate and Eliminate (SANDE) to erase the undesired backdoored mappings for generative LLMs. We initially propose Overwrite Supervised Fine-tuning (OSFT) for effective backdoor removal when the trigger is known. Then, to handle the scenarios where the trigger patterns are unknown, we integrate OSFT into our two-stage framework, SANDE. Unlike previous works that center on the identification of backdoors, our safety-enhanced LLMs are able to behave normally even when the exact triggers are activated. We conduct comprehensive experiments to show that our proposed SANDE is effective against backdoor attacks while bringing minimal harm to LLMs' powerful capability without any additional access to unbackdoored clean models. We will release the reproducible code.
Published: 2024

34. AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch

Author: Yang, Max, Lu, Chenghua, Church, Alex, Lin, Yijiong, Ford, Chris, Li, Haoran, Psomopoulou, Efi, Barton, David A. W., and Lepora, Nathan F.
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Human hands are capable of in-hand manipulation in the presence of different hand motions. For a robot hand, harnessing rich tactile information to achieve this level of dexterity still remains a significant challenge. In this paper, we present AnyRotate, a system for gravity-invariant multi-axis in-hand object rotation using dense featured sim-to-real touch. We tackle this problem by training a dense tactile policy in simulation and present a sim-to-real method for rich tactile sensing to achieve zero-shot policy transfer. Our formulation allows the training of a unified policy to rotate unseen objects about arbitrary rotation axes in any hand direction. In our experiments, we highlight the benefit of capturing detailed contact information when handling objects of varying properties. Interestingly, we found rich multi-fingered tactile sensing can detect unstable grasps and provide a reactive behavior that improves the robustness of the policy. The project website can be found at https://maxyang27896.github.io/anyrotate/., Comment: Project website can be found at https://maxyang27896.github.io/anyrotate/
Published: 2024

35. DrugLLM: Open Large Language Model for Few-shot Molecule Generation

Author: Liu, Xianggen, Guo, Yan, Li, Haoran, Liu, Jin, Huang, Shudong, Ke, Bowen, and Lv, Jiancheng
Subjects: Quantitative Biology - Biomolecules, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) have made great strides in areas such as language processing and computer vision. Despite the emergence of diverse techniques to improve few-shot learning capacity, current LLMs fall short in handling the languages in biology and chemistry. For example, they are struggling to capture the relationship between molecule structure and pharmacochemical properties. Consequently, the few-shot learning capacity of small-molecule drug modification remains impeded. In this work, we introduced DrugLLM, a LLM tailored for drug design. During the training process, we employed Group-based Molecular Representation (GMR) to represent molecules, arranging them in sequences that reflect modifications aimed at enhancing specific molecular properties. DrugLLM learns how to modify molecules in drug discovery by predicting the next molecule based on past modifications. Extensive computational experiments demonstrate that DrugLLM can generate new molecules with expected properties based on limited examples, presenting a powerful few-shot molecule generation capacity., Comment: 17 pages, 3 figures
Published: 2024

36. Converting High-Performance and Low-Latency SNNs through Explicit Modelling of Residual Error in ANNs

Author: Huang, Zhipeng, Ding, Jianhao, Pan, Zhiyu, Li, Haoran, Fang, Ying, Yu, Zhaofei, and Liu, Jian K.
Subjects: Computer Science - Neural and Evolutionary Computing
Abstract: Spiking neural networks (SNNs) have garnered interest due to their energy efficiency and superior effectiveness on neuromorphic chips compared with traditional artificial neural networks (ANNs). One of the mainstream approaches to implementing deep SNNs is the ANN-SNN conversion, which integrates the efficient training strategy of ANNs with the energy-saving potential and fast inference capability of SNNs. However, under extreme low-latency conditions, the existing conversion theory suggests that the problem of misrepresentation of residual membrane potentials in SNNs, i.e., the inability of IF neurons with a reset-by-subtraction mechanism to respond to residual membrane potentials beyond the range from resting potential to threshold, leads to a performance gap in the converted SNNs compared to the original ANNs. This severely limits the possibility of practical application of SNNs on delay-sensitive edge devices. Existing conversion methods addressing this problem usually involve modifying the state of the conversion spiking neurons. However, these methods do not consider their adaptability and compatibility with neuromorphic chips. We propose a new approach based on explicit modeling of residual errors as additive noise. The noise is incorporated into the activation function of the source ANN, which effectively reduces the residual error. Our experiments on the CIFAR10/100 dataset verify that our approach exceeds the prevailing ANN-SNN conversion methods and directly trained SNNs concerning accuracy and the required time steps. Overall, our method provides new ideas for improving SNN performance under ultra-low-latency conditions and is expected to promote practical neuromorphic hardware applications for further development.
Published: 2024

37. NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding

Author: Chan, Chunkit, Jiayang, Cheng, Yim, Yauwai, Deng, Zheye, Fan, Wei, Li, Haoran, Liu, Xin, Zhang, Hongming, Wang, Weiqi, and Song, Yangqiu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) have sparked substantial interest and debate concerning their potential emergence of Theory of Mind (ToM) ability. Theory of mind evaluations currently focuses on testing models using machine-generated data or game settings prone to shortcuts and spurious correlations, which lacks evaluation of machine ToM ability in real-world human interaction scenarios. This poses a pressing demand to develop new real-world scenario benchmarks. We introduce NegotiationToM, a new benchmark designed to stress-test machine ToM in real-world negotiation surrounding covered multi-dimensional mental states (i.e., desires, beliefs, and intentions). Our benchmark builds upon the Belief-Desire-Intention (BDI) agent modeling theory and conducts the necessary empirical experiments to evaluate large language models. Our findings demonstrate that NegotiationToM is challenging for state-of-the-art LLMs, as they consistently perform significantly worse than humans, even when employing the chain-of-thought (CoT) method., Comment: Accepted to EMNLP 2024 findings. Dataset: https://github.com/HKUST-KnowComp/NegotiationToM
Published: 2024

38. DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

Author: Li, Haoran, Shi, Haolin, Zhang, Wenli, Wu, Wenjun, Liao, Yong, Wang, Lin, Lee, Lik-hang, and Zhou, Pengyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at https://dreamscene-project.github.io .
Published: 2024

39. LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models

Author: Li, Haoran, Liu, Junqi, Wang, Zexian, Luo, Shiyuan, Jia, Xiaowei, and Yao, Huaxiu
Subjects: Computer Science - Computation and Language
Abstract: The modeling of environmental ecosystems plays a pivotal role in the sustainable management of our planet. Accurate prediction of key environmental variables over space and time can aid in informed policy and decision-making, thus improving people's livelihood. Recently, deep learning-based methods have shown promise in modeling the spatial-temporal relationships for predicting environmental variables. However, these approaches often fall short in handling incomplete features and distribution shifts, which are commonly observed in environmental data due to the substantial cost of data collection and malfunctions in measuring instruments. To address these issues, we propose LITE -- a multimodal large language model for environmental ecosystems modeling. Specifically, LITE unifies different environmental variables by transforming them into natural language descriptions and line graph images. Then, LITE utilizes unified encoders to capture spatial-temporal dynamics and correlations in different modalities. During this step, the incomplete features are imputed by a sparse Mixture-of-Experts framework, and the distribution shift is handled by incorporating multi-granularity information from past observations. Finally, guided by domain instructions, a language model is employed to fuse the multimodal representations for the prediction. Our experiments demonstrate that LITE significantly enhances performance in environmental spatial-temporal prediction across different domains compared to the best baseline, with a 41.25% reduction in prediction error. This justifies its effectiveness. Our data and code are available at https://github.com/hrlics/LITE., Comment: COLM camera-ready version. Code is released at https://github.com/hrlics/LITE
Published: 2024

40. A Lane Usage Strategy for General Traffic Access on Bus Lanes under Mixed Traffic Environment

Author: Li, Haoran, Yuan, Zhenzhou, Yue, Rui, Yang, Guangchuan, Zhu, Chuang, and Chen, Siyuan
Subjects: Mathematics - Optimization and Control
Abstract: The strategy of permitting general traffic to use the bus lane for improved utilization while ensuring bus priority has gained increasingly attention, particularly with the support of vehicle-to-everything technology. In this study, we propose a novel lane usage strategy called Dynamic Spatial-Temporal Priority (DSTP) to ensure bus priority and optimize bus lane usage in a mixed traffic environment. DSTP leverages dynamic methods to identify available spatial-temporal resources in the lane, utilizing signal timing, road information, and vehicle data. A Right-of-Way assignment optimization model is then developed based on these resources to determine which vehicles can enter the bus lane. The model is dynamically enacted using a rolling horizon scheme to accommodate time-varying traffic conditions. Numerical studies have validated the advantages of DSTP, showing maintained bus priority, improved traffic efficiency, reduced fuel consumption, and lower CO2 emissions, especially during periods of high traffic demand and concentrated bus arrivals., Comment: 16 pages, 22 figures
Published: 2024

41. SDPL: Shifting-Dense Partition Learning for UAV-View Geo-Localization

Author: Chen, Quan, Wang, Tingyu, Yang, Zihao, Li, Haoran, Lu, Rongfeng, Sun, Yaoqi, Zheng, Bolun, and Yan, Chenggang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Cross-view geo-localization aims to match images of the same target from different platforms, e.g., drone and satellite. It is a challenging task due to the changing appearance of targets and environmental content from different views. Most methods focus on obtaining more comprehensive information through feature map segmentation, while inevitably destroying the image structure, and are sensitive to the shifting and scale of the target in the query. To address the above issues, we introduce simple yet effective part-based representation learning, shifting-dense partition learning (SDPL). We propose a dense partition strategy (DPS), dividing the image into multiple parts to explore contextual information while explicitly maintaining the global structure. To handle scenarios with non-centered targets, we further propose the shifting-fusion strategy, which generates multiple sets of parts in parallel based on various segmentation centers, and then adaptively fuses all features to integrate their anti-offset ability. Extensive experiments show that SDPL is robust to position shifting, and performs com-petitively on two prevailing benchmarks, University-1652 and SUES-200. In addition, SDPL shows satisfactory compatibility with a variety of backbone networks (e.g., ResNet and Swin). https://github.com/C-water/SDPL release., Comment: IEEE TCSVT 2024
Published: 2024
Full Text: View/download PDF

42. Real-World Primary Resistance to First-Line Immune-Based Combinations in Patients with Advanced Renal Cell Carcinoma (ARON-1)

Author: Santini, Daniele, Li, Haoran, Roviello, Giandomenico, Park, Se Hoon, Grande, Enrique, Kucharz, Jakub, Basso, Umberto, Fiala, Ondrej, Monteiro, Fernando Sabino Marques, Poprach, Alexandr, Buti, Sebastiano, Molina-Cerrillo, Javier, Catalano, Martina, Buchler, Tomas, Seront, Emmanuel, Ansari, Jawaher, Myint, Zin W., Ghosn, Marwan, Calabrò, Fabio, Kopp, Ray Manneh, Bhuva, Dipen, Bourlon, Maria T., Roberto, Michela, Di Civita, Mattia Alberto, Mollica, Veronica, Marchetti, Andrea, Soares, Andrey, Battelli, Nicola, Ricci, Marco, Kanesvaran, Ravindran, Bamias, Aristotelis, Porta, Camillo, Massari, Francesco, and Santoni, Matteo
Published: 2024
Full Text: View/download PDF

43. Aerial-view geo-localization based on multi-layer local pattern cross-attention network

Author: Li, Haoran, Wang, Tingyu, Chen, Quan, Zhao, Qiang, Jiang, Shaowei, Yan, Chenggang, and Zheng, Bolun
Published: 2024
Full Text: View/download PDF

44. Research on active control of rigid-flexible coupling piezoelectric rectangular thin plate

Author: Li, Haoran, Zhang, Jie, Fan, Mu, and Xiao, Zhongmin
Published: 2024
Full Text: View/download PDF

45. Model selection of GLMMs in the analysis of count data in single-case studies: A Monte Carlo simulation

Author: Li, Haoran
Published: 2024
Full Text: View/download PDF

46. 3D Analysis of Breast Morphological Changes after Vertical-Scar Reduction Mammoplasty: A Prospective Study

Author: Li, Haoran, Lin, Yan, Zhang, Xiaoyu, Li, Zhengyao, and Mu, Dali
Published: 2024
Full Text: View/download PDF

47. Rapid tetracycline degradation by S-scheme Se/g-C3N4 heterostructure

Author: Li, Haoran, Guo, Zhijun, Azimi, Hassanali, Ebadi, Mehdi, Shirmardi, Abbas, and Yousefi, Ramin
Published: 2024
Full Text: View/download PDF

48. Aerodynamic Prediction of a CRM High-lift Configuration using a modified three equation turbulence mode

Author: Zhang, Shaoguang, Li, Haoran, and Zhang, Yufei
Subjects: Physics - Fluid Dynamics
Abstract: Aerodynamic simulations were carried out in the study presented in this paper focusing on the stall performance of the High-Lift Common Research Model obtained from the fourth AIAA High-Lift Prediction Workshop. Various turbulence models of Reynolds-average Navier-Stokes simulations are analyzed. A modified version of the transitional k-(v^2 )-{\omega} model developed to enhance stall prediction accuracy for high-lift configurations with a nacelle chine. The vortex generator, three-element airfoil, and high-lift model are numerically simulated. The results reveal that implementing a k-(v^2 )-{\omega} model with separation shear layer fixed notably enhances the stall prediction behavior for both the three-element airfoil and high-lift configuration without affecting the prediction of the vortex strength of a vortex generator. Moreover, incorporating rotation correction into the SPF k-(v^2 )-{\omega} model improves the prediction of vortex strength and further enhances stall prediction for the high-lift configuration. The relative error in predicting the maximum lift coefficient is less than 5% of the experimental data. The study also investigated the impact of the nacelle chine on the stall behavior of the high-lift configuration. The results demonstrate that the chine vortex can mitigate the adverse effects of the nacelle/pylon vortex system and increase the maximum lift coefficient., Comment: 40 pages, 34 figures
Published: 2024
Full Text: View/download PDF

49. Federated Neural Graph Databases

Author: Hu, Qi, Jiang, Weifeng, Li, Haoran, Wang, Zihao, Bai, Jiaxin, Mao, Qianren, Song, Yangqiu, Fan, Lixin, and Li, Jianxin
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Databases
Abstract: The increasing demand for large-scale language models (LLMs) has highlighted the importance of efficient data retrieval mechanisms. Neural graph databases (NGDBs) have emerged as a promising approach to storing and querying graph-structured data in neural space, enabling the retrieval of relevant information for LLMs. However, existing NGDBs are typically designed to operate on a single graph, limiting their ability to reason across multiple graphs. Furthermore, the lack of support for multi-source graph data in existing NGDBs hinders their ability to capture the complexity and diversity of real-world data. In many applications, data is distributed across multiple sources, and the ability to reason across these sources is crucial for making informed decisions. This limitation is particularly problematic when dealing with sensitive graph data, as directly sharing and aggregating such data poses significant privacy risks. As a result, many applications that rely on NGDBs are forced to choose between compromising data privacy or sacrificing the ability to reason across multiple graphs. To address these limitations, we propose Federated Neural Graph Database (FedNGDB), a novel framework that enables reasoning over multi-source graph-based data while preserving privacy. FedNGDB leverages federated learning to collaboratively learn graph representations across multiple sources, enriching relationships between entities and improving the overall quality of the graph data. Unlike existing methods, FedNGDB can handle complex graph structures and relationships, making it suitable for various downstream tasks.
Published: 2024

50. Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Author: Li, Haoran, Dong, Qingxiu, Tang, Zhengyang, Wang, Chaojun, Zhang, Xingxing, Huang, Haoyang, Huang, Shaohan, Huang, Xiaolong, Huang, Zeqiang, Zhang, Dongdong, Gu, Yuxian, Cheng, Xin, Wang, Xun, Chen, Si-Qing, Dong, Li, Lu, Wei, Sui, Zhifang, Wang, Benyou, Lam, Wai, and Wei, Furu
Subjects: Computer Science - Computation and Language
Abstract: We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs). Unlike prior work that relies on seed examples or existing datasets to construct instruction tuning data, GLAN exclusively utilizes a pre-curated taxonomy of human knowledge and capabilities as input and generates large-scale synthetic instruction data across all disciplines. Specifically, inspired by the systematic structure in human education system, we build the taxonomy by decomposing human knowledge and capabilities to various fields, sub-fields and ultimately, distinct disciplines semi-automatically, facilitated by LLMs. Subsequently, we generate a comprehensive list of subjects for every discipline and proceed to design a syllabus tailored to each subject, again utilizing LLMs. With the fine-grained key concepts detailed in every class session of the syllabus, we are able to generate diverse instructions with a broad coverage across the entire spectrum of human knowledge and skills. Extensive experiments on large language models (e.g., Mistral) demonstrate that GLAN excels in multiple dimensions from mathematical reasoning, coding, academic exams, logical reasoning to general instruction following without using task-specific training data of these tasks. In addition, GLAN allows for easy customization and new fields or skills can be added by simply incorporating a new node into our taxonomy., Comment: Work in progress
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

5,873 results on '"Li, Haoran"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources