Author: "Ma, Wei" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Ma, Wei"' showing total 16,825 results

Start Over Author "Ma, Wei"

16,825 results on '"Ma, Wei"'

1. Can Multimodal Large Language Model Think Analogically?

Author: Guo, Diandian, Cao, Cong, Yuan, Fangfang, Wang, Dakui, Ma, Wei, Liu, Yanbing, and Fu, Jianhui
Subjects: Computer Science - Computation and Language
Abstract: Analogical reasoning, particularly in multimodal contexts, is the foundation of human perception and creativity. Multimodal Large Language Model (MLLM) has recently sparked considerable discussion due to its emergent capabilities. In this paper, we delve into the multimodal analogical reasoning capability of MLLM. Specifically, we explore two facets: \textit{MLLM as an explainer} and \textit{MLLM as a predictor}. In \textit{MLLM as an explainer}, we primarily focus on whether MLLM can deeply comprehend multimodal analogical reasoning problems. We propose a unified prompt template and a method for harnessing the comprehension capabilities of MLLM to augment existing models. In \textit{MLLM as a predictor}, we aim to determine whether MLLM can directly solve multimodal analogical reasoning problems. The experiments show that our approach outperforms existing methods on popular datasets, providing preliminary evidence for the analogical reasoning capability of MLLM.
Published: 2024

2. Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization

Author: He, Junlin, Du, Jinxiao, and Ma, Wei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Self-supervised learning (SSL) has rapidly advanced in recent years, approaching the performance of its supervised counterparts through the extraction of representations from unlabeled data. However, dimensional collapse, where a few large eigenvalues dominate the eigenspace, poses a significant obstacle for SSL. When dimensional collapse occurs on features (e.g. hidden features and representations), it prevents features from representing the full information of the data; when dimensional collapse occurs on weight matrices, their filters are self-related and redundant, limiting their expressive power. Existing studies have predominantly concentrated on the dimensional collapse of representations, neglecting whether this can sufficiently prevent the dimensional collapse of the weight matrices and hidden features. To this end, we first time propose a mitigation approach employing orthogonal regularization (OR) across the encoder, targeting both convolutional and linear layers during pretraining. OR promotes orthogonality within weight matrices, thus safeguarding against the dimensional collapse of weight matrices, hidden features, and representations. Our empirical investigations demonstrate that OR significantly enhances the performance of SSL methods across diverse benchmarks, yielding consistent gains with both CNNs and Transformer-based architectures., Comment: accepted by NeurIPS 2024 as a poster
Published: 2024

3. Preventing Model Collapse in Deep Canonical Correlation Analysis by Noise Regularization

Author: He, Junlin, Du, Jinxiao, Xu, Susu, and Ma, Wei
Subjects: Computer Science - Machine Learning
Abstract: Multi-View Representation Learning (MVRL) aims to learn a unified representation of an object from multi-view data. Deep Canonical Correlation Analysis (DCCA) and its variants share simple formulations and demonstrate state-of-the-art performance. However, with extensive experiments, we observe the issue of model collapse, {\em i.e.}, the performance of DCCA-based methods will drop drastically when training proceeds. The model collapse issue could significantly hinder the wide adoption of DCCA-based methods because it is challenging to decide when to early stop. To this end, we develop NR-DCCA, which is equipped with a novel noise regularization approach to prevent model collapse. Theoretical analysis shows that the Correlation Invariant Property is the key to preventing model collapse, and our noise regularization forces the neural network to possess such a property. A framework to construct synthetic data with different common and complementary information is also developed to compare MVRL methods comprehensively. The developed NR-DCCA outperforms baselines stably and consistently in both synthetic and real-world datasets, and the proposed noise regularization approach can also be generalized to other DCCA-based methods such as DGCCA., Comment: Accepted by NeurIPS 2024 as a poster
Published: 2024

4. Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification

Author: Kuo, Hsun-Yu, Liao, Yin-Hsiang, Chao, Yu-Chieh, Ma, Wei-Yun, and Cheng, Pu-Jen
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Synthetic data augmentation via large language models (LLMs) allows researchers to leverage additional training data, thus enhancing the performance of downstream tasks, especially when real-world data is scarce. However, the generated data can deviate from the real-world data, and this misalignment can bring deficient outcomes while applying the trained model to applications. Therefore, we proposed efficient weighted-loss approaches to align synthetic data with real-world distribution by emphasizing high-quality and diversified data generated by LLMs with using merely a little real-world data. We empirically assessed the effectiveness of our method on multiple text classification tasks, and the results showed leveraging our approaches on a BERT-level model robustly outperformed standard cross-entropy and other data weighting approaches, providing potential solutions to effectively leveraging synthetic data from any suitable data generator for model training., Comment: 12 pages, 7 figures
Published: 2024

5. Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning

Author: Tang, Yihong, Qu, Ao, Wang, Zhaokai, Zhuang, Dingyi, Wu, Zhaofeng, Ma, Wei, Wang, Shenhao, Zheng, Yunhan, Zhao, Zhan, and Zhao, Jinhua
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Vision language models (VLMs) have demonstrated impressive performance across a wide range of downstream tasks. However, their proficiency in spatial reasoning remains limited, despite its crucial role in tasks involving navigation and interaction with physical environments. Specifically, much of the spatial reasoning in these tasks occurs in two-dimensional (2D) environments, and our evaluation reveals that state-of-the-art VLMs frequently generate implausible and incorrect responses to composite spatial reasoning problems, including simple pathfinding tasks that humans can solve effortlessly at a glance. To address this, we explore an effective approach to enhance 2D spatial reasoning within VLMs by training the model on basic spatial capabilities. We begin by disentangling the key components of 2D spatial reasoning: direction comprehension, distance estimation, and localization. Our central hypothesis is that mastering these basic spatial capabilities can significantly enhance a model's performance on composite spatial tasks requiring advanced spatial understanding and combinatorial problem-solving. To investigate this hypothesis, we introduce Sparkle, a framework that fine-tunes VLMs on these three basic spatial capabilities by synthetic data generation and targeted supervision to form an instruction dataset for each capability. Our experiments demonstrate that VLMs fine-tuned with Sparkle achieve significant performance gains, not only in the basic tasks themselves but also in generalizing to composite and out-of-distribution spatial reasoning tasks (e.g., improving from 13.5% to 40.0% on the shortest path problem). These findings underscore the effectiveness of mastering basic spatial capabilities in enhancing composite spatial problem-solving, offering insights for improving VLMs' spatial reasoning capabilities.
Published: 2024

6. Large Language Model-Enhanced Reinforcement Learning for Generic Bus Holding Control Strategies

Author: Yu, Jiajie, Wang, Yuhong, and Ma, Wei
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Bus holding control is a widely-adopted strategy for maintaining stability and improving the operational efficiency of bus systems. Traditional model-based methods often face challenges with the low accuracy of bus state prediction and passenger demand estimation. In contrast, Reinforcement Learning (RL), as a data-driven approach, has demonstrated great potential in formulating bus holding strategies. RL determines the optimal control strategies in order to maximize the cumulative reward, which reflects the overall control goals. However, translating sparse and delayed control goals in real-world tasks into dense and real-time rewards for RL is challenging, normally requiring extensive manual trial-and-error. In view of this, this study introduces an automatic reward generation paradigm by leveraging the in-context learning and reasoning capabilities of Large Language Models (LLMs). This new paradigm, termed the LLM-enhanced RL, comprises several LLM-based modules: reward initializer, reward modifier, performance analyzer, and reward refiner. These modules cooperate to initialize and iteratively improve the reward function according to the feedback from training and test results for the specified RL-based task. Ineffective reward functions generated by the LLM are filtered out to ensure the stable evolution of the RL agents' performance over iterations. To evaluate the feasibility of the proposed LLM-enhanced RL paradigm, it is applied to various bus holding control scenarios, including a synthetic single-line system and a real-world multi-line system. The results demonstrate the superiority and robustness of the proposed paradigm compared to vanilla RL strategies, the LLM-based controller, and conventional space headway-based feedback control. This study sheds light on the great potential of utilizing LLMs in various smart mobility applications., Comment: 41 pages, 15 figures
Published: 2024

7. UniGEM: A Unified Approach to Generation and Property Prediction for Molecules

Author: Feng, Shikun, Ni, Yuyan, Lu, Yan, Ma, Zhi-Ming, Ma, Wei-Ying, and Lan, Yanyan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Quantitative Biology - Biomolecules
Abstract: Molecular generation and molecular property prediction are both crucial for drug discovery, but they are often developed independently. Inspired by recent studies, which demonstrate that diffusion model, a prominent generative approach, can learn meaningful data representations that enhance predictive tasks, we explore the potential for developing a unified generative model in the molecular domain that effectively addresses both molecular generation and property prediction tasks. However, the integration of these tasks is challenging due to inherent inconsistencies, making simple multi-task learning ineffective. To address this, we propose UniGEM, the first unified model to successfully integrate molecular generation and property prediction, delivering superior performance in both tasks. Our key innovation lies in a novel two-phase generative process, where predictive tasks are activated in the later stages, after the molecular scaffold is formed. We further enhance task balance through innovative training strategies. Rigorous theoretical analysis and comprehensive experiments demonstrate our significant improvements in both tasks. The principles behind UniGEM hold promise for broader applications, including natural language processing and computer vision., Comment: 11 pages, 5 figures
Published: 2024

8. Exploring the Diversity of Nuclear Density through Information Entropy

Author: Ma, Wei-Hu and Ma, Yu-Gang
Subjects: Nuclear Theory, Nuclear Experiment
Abstract: This study explores the role of information entropy in understanding nuclear density distributions, including both stable configurations and non-traditional structures such as neutron halos and $\alpha$-clustering. By quantifying the uncertainty and disorder inherent in nucleon distributions in nuclear many-body systems, information entropy provides a macroscopic measure of the physical properties of the system. A more dispersed and disordered density distribution results in a higher value of information entropy. This intrinsic relationship between information entropy and system complexity allows us to quantify uncertainty and disorder in nuclear structures by analyzing various geometric parameters such as nuclear radius, diffuseness, neutron skin, and cluster structural features., Comment: 10 pages, 4 figures
Published: 2024
Full Text: View/download PDF

9. PathSeeker: Exploring LLM Security Vulnerabilities with a Reinforcement Learning-Based Jailbreak Approach

Author: Lin, Zhihao, Ma, Wei, Zhou, Mingyi, Zhao, Yanjie, Wang, Haoyu, Liu, Yang, Wang, Jun, and Li, Li
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence
Abstract: In recent years, Large Language Models (LLMs) have gained widespread use, raising concerns about their security. Traditional jailbreak attacks, which often rely on the model internal information or have limitations when exploring the unsafe behavior of the victim model, limiting their reducing their general applicability. In this paper, we introduce PathSeeker, a novel black-box jailbreak method, which is inspired by the game of rats escaping a maze. We think that each LLM has its unique "security maze", and attackers attempt to find the exit learning from the received feedback and their accumulated experience to compromise the target LLM's security defences. Our approach leverages multi-agent reinforcement learning, where smaller models collaborate to guide the main LLM in performing mutation operations to achieve the attack objectives. By progressively modifying inputs based on the model's feedback, our system induces richer, harmful responses. During our manual attempts to perform jailbreak attacks, we found that the vocabulary of the response of the target model gradually became richer and eventually produced harmful responses. Based on the observation, we also introduce a reward mechanism that exploits the expansion of vocabulary richness in LLM responses to weaken security constraints. Our method outperforms five state-of-the-art attack techniques when tested across 13 commercial and open-source LLMs, achieving high attack success rates, especially in strongly aligned commercial models like GPT-4o-mini, Claude-3.5, and GLM-4-air with strong safety alignment. This study aims to improve the understanding of LLM security vulnerabilities and we hope that this sturdy can contribute to the development of more robust defenses., Comment: update the abstract and cite a new related work
Published: 2024

10. Incorporating external data for analyzing randomized clinical trials: A transfer learning approach

Author: Gu, Yujia, Liu, Hanzhong, and Ma, Wei
Subjects: Statistics - Methodology
Abstract: Randomized clinical trials are the gold standard for analyzing treatment effects, but high costs and ethical concerns can limit recruitment, potentially leading to invalid inferences. Incorporating external trial data with similar characteristics into the analysis using transfer learning appears promising for addressing these issues. In this paper, we present a formal framework for applying transfer learning to the analysis of clinical trials, considering three key perspectives: transfer algorithm, theoretical foundation, and inference method. For the algorithm, we adopt a parameter-based transfer learning approach to enhance the lasso-adjusted stratum-specific estimator developed for estimating treatment effects. A key component in constructing the transfer learning estimator is deriving the regression coefficient estimates within each stratum, accounting for the bias between source and target data. To provide a theoretical foundation, we derive the $l_1$ convergence rate for the estimated regression coefficients and establish the asymptotic normality of the transfer learning estimator. Our results show that when external trial data resembles current trial data, the sample size requirements can be reduced compared to using only the current trial data. Finally, we propose a consistent nonparametric variance estimator to facilitate inference. Numerical studies demonstrate the effectiveness and robustness of our proposed estimator across various scenarios.
Published: 2024

11. Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach

Author: Nie, Tong, He, Junlin, Mei, Yuewen, Qin, Guoyang, Li, Guilong, Sun, Jian, and Ma, Wei
Subjects: Computer Science - Machine Learning
Abstract: The proliferation of e-commerce and urbanization has significantly intensified delivery operations in urban areas, boosting the volume and complexity of delivery demand. Data-driven predictive methods, especially those utilizing machine learning techniques, have emerged to handle these complexities in urban delivery demand management problems. One particularly pressing problem that has not yet been sufficiently studied is the joint estimation and prediction of city-wide delivery demand. To this end, we formulate this problem as a graph-based spatiotemporal learning task. First, a message-passing neural network model is formalized to capture the interaction between demand patterns of associated regions. Second, by exploiting recent advances in large language models, we extract general geospatial knowledge encodings from the unstructured locational data and integrate them into the demand predictor. Last, to encourage the cross-city transferability of the model, an inductive training scheme is developed in an end-to-end routine. Extensive empirical results on two real-world delivery datasets, including eight cities in China and the US, demonstrate that our model significantly outperforms state-of-the-art baselines in these challenging tasks.
Published: 2024

12. Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning

Author: He, Junlin, Nie, Tong, and Ma, Wei
Subjects: Computer Science - Artificial Intelligence
Abstract: In the geospatial domain, universal representation models are significantly less prevalent than their extensive use in natural language processing and computer vision. This discrepancy arises primarily from the high costs associated with the input of existing representation models, which often require street views and mobility data. To address this, we develop a novel, training-free method that leverages large language models (LLMs) and auxiliary map data from OpenStreetMap to derive geolocation representations (LLMGeovec). LLMGeovec can represent the geographic semantics of city, country, and global scales, which acts as a generic enhancer for spatio-temporal learning. Specifically, by direct feature concatenation, we introduce a simple yet effective paradigm for enhancing multiple spatio-temporal tasks including geographic prediction (GP), long-term time series forecasting (LTSF), and graph-based spatio-temporal forecasting (GSTF). LLMGeovec can seamlessly integrate into a wide spectrum of spatio-temporal learning models, providing immediate enhancements. Experimental results demonstrate that LLMGeovec achieves global coverage and significantly boosts the performance of leading GP, LTSF, and GSTF models.
Published: 2024

13. Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Author: Liu, Benlin, Dong, Yuhao, Wang, Yiqin, Rao, Yongming, Tang, Yansong, Ma, Wei-Chiu, and Krishna, Ranjay
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Multimodal language models (MLLMs) are increasingly being implemented in real-world environments, necessitating their ability to interpret 3D spaces and comprehend temporal dynamics. Despite their potential, current top models within our community still fall short in adequately understanding spatial and temporal dimensions. We introduce Coarse Correspondence, a simple, training-free, effective, and general-purpose visual prompting method to elicit 3D and temporal understanding in multimodal LLMs. Our method uses a lightweight tracking model to find object correspondences between frames in a video or between sets of image viewpoints. It selects the most frequent object instances and visualizes them with markers with unique IDs in the image. With this simple approach, we achieve state-of-the-art results on 3D understanding benchmarks including ScanQA (+20.5\%) and a subset of OpenEQA (+9.7\%), and on long-form video benchmarks such as EgoSchema (+6.0\%). We also curate a small diagnostic dataset to evaluate whether MLLMs can reason about space from a described viewpoint other than the camera viewpoint. Again, Coarse Correspondence improves spatial perspective-taking abilities but we highlight that MLLMs struggle with this task. Together, we demonstrate that our simple prompting method can significantly aid downstream tasks that require 3D or temporal reasoning., Comment: project page: https://coarse-correspondence.github.io
Published: 2024

14. Online Prediction-Assisted Safe Reinforcement Learning for Electric Vehicle Charging Station Recommendation in Dynamically Coupled Transportation-Power Systems

Author: Liao, Qionghua, Li, Guilong, Yu, Jiajie, Gu, Ziyuan, and Ma, Wei
Subjects: Computer Science - Computational Engineering, Finance, and Science
Abstract: With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. Existing literature largely overlooks the interactions between power grid security and traffic efficiency. In view of this, we study the en-route charging station (CS) recommendation problem for EVs in dynamically coupled transportation-power systems. The system-level objective is to maximize the overall traffic efficiency while ensuring the safety of the power grid. This problem is for the first time formulated as a constrained Markov decision process (CMDP), and an online prediction-assisted safe reinforcement learning (OP-SRL) method is proposed to learn the optimal and secure policy by extending the PPO method. To be specific, we mainly address two challenges. First, the constrained optimization problem is converted into an equivalent unconstrained optimization problem by applying the Lagrangian method. Second, to account for the uncertain long-time delay between performing CS recommendation and commencing charging, we put forward an online sequence-to-sequence (Seq2Seq) predictor for state augmentation to guide the agent in making forward-thinking decisions. Finally, we conduct comprehensive experimental studies based on the Nguyen-Dupuis network and a large-scale real-world road network, coupled with IEEE 33-bus and IEEE 69-bus distribution systems, respectively. Results demonstrate that the proposed method outperforms baselines in terms of road network efficiency, power grid safety, and EV user satisfaction. The case study on the real-world network also illustrates the applicability in the practical context., Comment: 33 pages, 31 figures
Published: 2024

15. Channel-Aware Low-Rank Adaptation in Time Series Forecasting

Author: Nie, Tong, Mei, Yuewen, Qin, Guoyang, Sun, Jian, and Ma, Wei
Subjects: Computer Science - Machine Learning
Abstract: The balance between model capacity and generalization has been a key focus of recent discussions in long-term time series forecasting. Two representative channel strategies are closely associated with model expressivity and robustness, including channel independence (CI) and channel dependence (CD). The former adopts individual channel treatment and has been shown to be more robust to distribution shifts, but lacks sufficient capacity to model meaningful channel interactions. The latter is more expressive for representing complex cross-channel dependencies, but is prone to overfitting. To balance the two strategies, we present a channel-aware low-rank adaptation method to condition CD models on identity-aware individual components. As a plug-in solution, it is adaptable for a wide range of backbone architectures. Extensive experiments show that it can consistently and significantly improve the performance of both CI and CD models with demonstrated efficiency and flexibility. The code is available at https://github.com/tongnie/C-LoRA., Comment: Accepted by CIKM 2024, short research paper track
Published: 2024
Full Text: View/download PDF

16. Pre-training with Fractional Denoising to Enhance Molecular Property Prediction

Author: Ni, Yuyan, Feng, Shikun, Hong, Xin, Sun, Yuancheng, Ma, Wei-Ying, Ma, Zhi-Ming, Ye, Qiwei, and Lan, Yanyan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Physics - Chemical Physics
Abstract: Deep learning methods have been considered promising for accelerating molecular screening in drug discovery and material design. Due to the limited availability of labelled data, various self-supervised molecular pre-training methods have been presented. While many existing methods utilize common pre-training tasks in computer vision (CV) and natural language processing (NLP), they often overlook the fundamental physical principles governing molecules. In contrast, applying denoising in pre-training can be interpreted as an equivalent force learning, but the limited noise distribution introduces bias into the molecular distribution. To address this issue, we introduce a molecular pre-training framework called fractional denoising (Frad), which decouples noise design from the constraints imposed by force learning equivalence. In this way, the noise becomes customizable, allowing for incorporating chemical priors to significantly improve molecular distribution modeling. Experiments demonstrate that our framework consistently outperforms existing methods, establishing state-of-the-art results across force prediction, quantum chemical properties, and binding affinity tasks. The refined noise design enhances force accuracy and sampling coverage, which contribute to the creation of physically consistent molecular representations, ultimately leading to superior predictive performance.
Published: 2024

17. Treatment effect estimation under covariate-adaptive randomization with heavy-tailed outcomes

Author: Li, Hongzi, Ma, Wei, Ma, Yingying, and Liu, Hanzhong
Subjects: Statistics - Methodology
Abstract: Randomized experiments are the gold standard for investigating causal relationships, with comparisons of potential outcomes under different treatment groups used to estimate treatment effects. However, outcomes with heavy-tailed distributions pose significant challenges to traditional statistical approaches. While recent studies have explored these issues under simple randomization, their application in more complex randomization designs, such as stratified randomization or covariate-adaptive randomization, has not been adequately addressed. To fill the gap, this paper examines the properties of the estimated influence function-based M-estimator under covariate-adaptive randomization with heavy-tailed outcomes, demonstrating its consistency and asymptotic normality. Yet, the existing variance estimator tends to overestimate the asymptotic variance, especially under more balanced designs, and lacks universal applicability across randomization methods. To remedy this, we introduce a novel stratified transformed difference-in-means estimator to enhance efficiency and propose a universally applicable variance estimator to facilitate valid inferences. Additionally, we establish the consistency of kernel-based density estimation in the context of covariate-adaptive randomization. Numerical results demonstrate the effectiveness of the proposed methods in finite samples.
Published: 2024

18. Generalized Gouy Rotation of Electron Vortex beams in uniform magnetic fields

Author: Meng, Qi, Liu, Xuan, Ma, Wei, Yang, Zhen, Lu, Liang, Silenko, Alexander J., Zhang, Pengming, and Zou, Liping
Subjects: Quantum Physics, Physics - Accelerator Physics, Physics - Optics
Abstract: The rotation of electron vortex beams (EVBs) presents a complex interplay of the Gouy phase characterizing free-space behavior and Landau states or Larmor rotation observed in magnetic fields. Despite being studied separately, these phenomena manifest within a single beam during its propagation in magnetic fields, lacking a comprehensive description. We address this by utilizing exact solutions of the relativistic paraxial equation in magnetic fields, termed "paraxial Landau modes". The paraxial Landau modes describe the quantum states of EVBs in magnetic fields. Our study of rotation angles demonstrates consistency with experimental data, supporting the practical presence of these modes. We provide a unified description of different regimes under generalized Gouy rotation, linking the Gouy phase to EVB rotation angles. This connection enhances our understanding of the Gouy phase and can be extended to nonuniform magnetic fields. Our theoretical analysis is validated through numerical simulations using the Chebyshev method. This work offers new insights into the dynamics of EVBs in magnetic fields and suggests practical applications in beam manipulation and beam optics of vortex particles.
Published: 2024

19. Task Me Anything

Author: Zhang, Jieyu, Huang, Weikai, Ma, Zixian, Michel, Oscar, He, Dong, Gupta, Tanmay, Ma, Wei-Chiu, Farhadi, Ali, Kembhavi, Aniruddha, and Krishna, Ranjay
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their specific use case. This paper introduces Task-Me-Anything, a benchmark generation engine which produces a benchmark tailored to a user's needs. Task-Me-Anything maintains an extendable taxonomy of visual assets and can programmatically generate a vast number of task instances. Additionally, it algorithmically addresses user queries regarding MLM performance efficiently within a computational budget. It contains 113K images, 10K videos, 2K 3D object assets, over 365 object categories, 655 attributes, and 335 relationships. It can generate 750M image/video question-answering pairs, which focus on evaluating MLM perceptual capabilities. Task-Me-Anything reveals critical insights: open-source MLMs excel in object and attribute recognition but lack spatial and temporal understanding; each model exhibits unique strengths and weaknesses; larger models generally perform better, though exceptions exist; and GPT4o demonstrates challenges in recognizing rotating/moving objects and distinguishing colors., Comment: website: https://www.task-me-anything.org
Published: 2024

20. Fudan Multi-purpose Active TArget Time Projection Chamber (fMeta-TPC) for Photonnuclear Reaction Experiments

Author: Wu, Huang-Kai, Wang, Xi-Yang, Wang, Yu-Miao, Wang, You-Jing, Fang, De-Qing, He, Wan-Bing, Ma, Wei-Hu, Cao, Xi-Guang, Fu, Chang-Bo, Deng, Xian-Gai, and Ma, Yu-Gang
Subjects: Physics - Instrumentation and Detectors, Nuclear Experiment, Nuclear Theory
Abstract: Active Target Time Projection Chambers (AT-TPCs) are state-of-the-art tools in the field of low-energy nuclear physics, particularly suitable for experiments using low-intensity radioactive ion beams or gamma rays. The Fudan Multi-purpose Active Target Time Projection Chamber (fMeta-TPC) with 2048 channels has been developed to study $\alpha$-clustering nuclei. {\fcb In this work, the focus is on the study of the photonuclear reaction with the Laser Compton Scattering (LCS) gamma source, especially for the decay of the highly excited $\alpha$-cluster state.} The design of fMeta-TPC is described and a comprehensive evaluation of its offline performance is performed by ultraviolet (UV) laser and $^{241}$Am $\alpha$ source. The result shows that the intrinsic angular resolution of the detector is within 0.30$^{\circ}$ and has an energy resolution of 6.85\% for 3.0 MeV $\alpha$ particles. The gain uniformity of the detector is about 10\% (RMS/Mean), tested by the $^{55}$Fe X-ray source., Comment: 10 pages, 12 figures
Published: 2024

21. From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics

Author: Gao, Bowen, Tan, Haichuan, Huang, Yanwen, Ren, Minsi, Huang, Xiao, Ma, Wei-Ying, Zhang, Ya-Qin, and Lan, Yanyan
Subjects: Quantitative Biology - Biomolecules, Computer Science - Machine Learning
Abstract: Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability of the Vina docking score, the current standard for assessing binding abilities, is increasingly questioned due to its susceptibility to overfitting. To address these limitations, we propose a comprehensive evaluation framework that includes assessing the similarity of generated molecules to known active compounds, introducing a virtual screening-based metric for practical deployment capabilities, and re-evaluating binding affinity more rigorously. Our experiments reveal that while current SBDD models achieve high Vina scores, they fall short in practical usability metrics, highlighting a significant gap between theoretical predictions and real-world applicability. Our proposed metrics and dataset aim to bridge this gap, enhancing the practical applicability of future SBDD models and aligning them more closely with the needs of pharmaceutical research and development.
Published: 2024

22. SIU: A Million-Scale Structural Small Molecule-Protein Interaction Dataset for Unbiased Bioactivity Prediction

Author: Huang, Yanwen, Gao, Bowen, Jia, Yinjun, Ma, Hongbo, Ma, Wei-Ying, Zhang, Ya-Qin, and Lan, Yanyan
Subjects: Quantitative Biology - Biomolecules, Computer Science - Machine Learning
Abstract: Small molecules play a pivotal role in modern medicine, and scrutinizing their interactions with protein targets is essential for the discovery and development of novel, life-saving therapeutics. The term "bioactivity" encompasses various biological effects resulting from these interactions, including both binding and functional responses. The magnitude of bioactivity dictates the therapeutic or toxic pharmacological outcomes of small molecules, rendering accurate bioactivity prediction crucial for the development of safe and effective drugs. However, existing structural datasets of small molecule-protein interactions are often limited in scale and lack systematically organized bioactivity labels, thereby impeding our understanding of these interactions and precise bioactivity prediction. In this study, we introduce a comprehensive dataset of small molecule-protein interactions, consisting of over a million binding structures, each annotated with real biological activity labels. This dataset is designed to facilitate unbiased bioactivity prediction. We evaluated several classical models on this dataset, and the results demonstrate that the task of unbiased bioactivity prediction is challenging yet essential.
Published: 2024

23. Preserving Identity with Variational Score for General-purpose 3D Editing

Author: Le, Duong H., Pham, Tuan, Kembhavi, Aniruddha, Mandt, Stephan, Ma, Wei-Chiu, and Lu, Jiasen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models. Specifically, our approach is inspired by the recently proposed method for 2D image editing - Delta Denoising Score (DDS). We pinpoint the limitations in DDS for 2D and 3D editing, which causes detail loss and over-saturation. To address this, we propose an additional score distillation term that enforces identity preservation. This results in a more stable editing process, gradually optimizing NeRF models to match target prompts while retaining crucial input characteristics. We demonstrate the effectiveness of our approach in zero-shot image and neural field editing. Our method successfully alters visual attributes, adds both subtle and substantial structural elements, translates shapes, and achieves competitive results on standard 2D and 3D editing benchmarks. Additionally, our method imposes no constraints like masking or pre-training, making it compatible with a wide range of pre-trained diffusion models. This allows for versatile editing without needing neural field-to-mesh conversion, offering a more user-friendly experience., Comment: 22 pages, 14 figures
Published: 2024

24. MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis

Author: Feng, Shikun, Zheng, Jiaxin, Jia, Yinjun, Huang, Yanwen, Zhou, Fengfeng, Ma, Wei-Ying, and Lan, Yanyan
Subjects: Physics - Chemical Physics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Molecular representation learning is pivotal for various molecular property prediction tasks related to drug discovery. Robust and accurate benchmarks are essential for refining and validating current methods. Existing molecular property benchmarks derived from wet experiments, however, face limitations such as data volume constraints, unbalanced label distribution, and noisy labels. To address these issues, we construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules, meticulously designed to capture an extensive array of chemical, physical, and biological properties, derived through a robust computational ligand-target binding analysis pipeline. We conduct extensive experiments on various deep learning models, demonstrating that our dataset offers significant physicochemical interpretability to guide model development and design. Notably, the dataset's properties are linked to binding affinity metrics, providing additional insights into model performance in drug-target interaction tasks. We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning, thereby expediting progress in the field of artificial intelligence-driven drug discovery.
Published: 2024

25. Generalizable Implicit Neural Representation As a Universal Spatiotemporal Traffic Data Learner

Author: Nie, Tong, Qin, Guoyang, Ma, Wei, and Sun, Jian
Subjects: Computer Science - Machine Learning
Abstract: $\textbf{This is the conference version of our paper: Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner}$. Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system. Existing methods aim to reconstruct STTD using low-dimensional models. However, they are limited to data-specific dimensions or source-dependent patterns, restricting them from unifying representations. Here, we present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation. To discern the underlying dynamics in low-dimensional regimes, coordinate-based neural networks that can encode high-frequency structures are employed to directly map coordinates to traffic variables. To unravel the entangled spatial-temporal interactions, the variability is decomposed into separate processes. We further enable modeling in irregular spaces such as sensor graphs using spectral embedding. Through continuous representations, our approach enables the modeling of a variety of STTD with a unified input, thereby serving as a generalized learner of the underlying traffic dynamics. It is also shown that it can learn implicit low-rank priors and smoothness regularization from the data, making it versatile for learning different dominating data patterns. We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales. Empirical results not only indicate that our model has significant superiority over conventional low-rank models, but also highlight that the versatility of the approach. We anticipate that this pioneering modeling perspective could lay the foundation for universal representation of STTD in various real-world tasks. $\textbf{The full version can be found at:}$ https://doi.org/10.48550/arXiv.2405.03185., Comment: Accepted by the Conference in Emerging Technologies in Transportation Systems (TRC-30). arXiv admin note: substantial text overlap with arXiv:2405.03185
Published: 2024

26. ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models

Author: Shih, Meng-Li, Ma, Wei-Chiu, Boyice, Lorenzo, Holynski, Aleksander, Cole, Forrester, Curless, Brian L., and Kontkanen, Janne
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose ExtraNeRF, a novel method for extrapolating the range of views handled by a Neural Radiance Field (NeRF). Our main idea is to leverage NeRFs to model scene-specific, fine-grained details, while capitalizing on diffusion models to extrapolate beyond our observed data. A key ingredient is to track visibility to determine what portions of the scene have not been observed, and focus on reconstructing those regions consistently with diffusion models. Our primary contributions include a visibility-aware diffusion-based inpainting module that is fine-tuned on the input imagery, yielding an initial NeRF with moderate quality (often blurry) inpainted regions, followed by a second diffusion model trained on the input imagery to consistently enhance, notably sharpen, the inpainted imagery from the first pass. We demonstrate high-quality results, extrapolating beyond a small number of (typically six or fewer) input views, effectively outpainting the NeRF as well as inpainting newly disoccluded regions inside the original viewing volume. We compare with related work both quantitatively and qualitatively and show significant gains over prior art., Comment: 8 pages, 8 figures, CVPR2024
Published: 2024

27. Inference under covariate-adaptive randomization with many strata

Author: Xin, Jiahui, Liu, Hanzhong, and Ma, Wei
Subjects: Statistics - Methodology, Mathematics - Statistics Theory
Abstract: Covariate-adaptive randomization is widely employed to balance baseline covariates in interventional studies such as clinical trials and experiments in development economics. Recent years have witnessed substantial progress in inference under covariate-adaptive randomization with a fixed number of strata. However, concerns have been raised about the impact of a large number of strata on its design and analysis, which is a common scenario in practice, such as in multicenter randomized clinical trials. In this paper, we propose a general framework for inference under covariate-adaptive randomization, which extends the seminal works of Bugni et al. (2018, 2019) by allowing for a diverging number of strata. Furthermore, we introduce a novel weighted regression adjustment that ensures efficiency improvement. On top of establishing the asymptotic theory, practical algorithms for handling situations involving an extremely large number of strata are also developed. Moreover, by linking design balance and inference robustness, we highlight the advantages of stratified block randomization, which enforces better covariate balance within strata compared to simple randomization. This paper offers a comprehensive landscape of inference under covariate-adaptive randomization, spanning from fixed to diverging to extremely large numbers of strata.
Published: 2024

28. Multilingual Diversity Improves Vision-Language Representations

Author: Nguyen, Thao, Wallingford, Matthew, Santy, Sebastin, Ma, Wei-Chiu, Oh, Sewoong, Schmidt, Ludwig, Koh, Pang Wei, and Krishna, Ranjay
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Massive web-crawled image-text datasets lay the foundation for recent progress in multimodal learning. These datasets are designed with the goal of training a model to do well on standard computer vision benchmarks, many of which, however, have been shown to be English-centric (e.g., ImageNet). Consequently, existing data curation techniques gravitate towards using predominantly English image-text pairs and discard many potentially useful non-English samples. Our work questions this practice. Multilingual data is inherently enriching not only because it provides a gateway to learn about culturally salient concepts, but also because it depicts common concepts differently from monolingual data. We thus conduct a systematic study to explore the performance benefits of using more samples of non-English origins with respect to English vision tasks. By translating all multilingual image-text pairs from a raw web crawl to English and re-filtering them, we increase the prevalence of (translated) multilingual data in the resulting training set. Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet, ImageNet distribution shifts, image-English-text retrieval and on average across 38 tasks from the DataComp benchmark. On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa. In addition, we quantitatively show that English and non-English data are significantly different in both image and (translated) text space. We hope that our findings motivate future work to be more intentional about including multicultural and multilingual data, not just when non-English or geographically diverse tasks are involved, but to enhance model capabilities at large., Comment: NeurIPS 2024 Spotlight paper
Published: 2024

29. UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

Author: Feng, Shikun, Ni, Yuyan, Li, Minghao, Huang, Yanwen, Ma, Zhi-Ming, Ma, Wei-Ying, and Lan, Yanyan
Subjects: Quantitative Biology - Biomolecules, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recently, a noticeable trend has emerged in developing pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound understanding of existing pre-training methods, including 2D graph masking, 2D-3D contrastive learning, and 3D denoising, hampers the advancement of molecular foundation models. In this work, we provide a unified comprehension of existing pre-training methods through the lens of contrastive learning. Thus their distinctions lie in clustering different views of molecules, which is shown beneficial to specific downstream tasks. To achieve a complete and general-purpose molecular representation, we propose a novel pre-training framework, named UniCorn, that inherits the merits of the three methods, depicting molecular views in three different levels. SOTA performance across quantum, physicochemical, and biological tasks, along with comprehensive ablation study, validate the universality and effectiveness of UniCorn.
Published: 2024

30. Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner

Author: Nie, Tong, Qin, Guoyang, Ma, Wei, and Sun, Jian
Subjects: Computer Science - Machine Learning
Abstract: Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system. Existing methods aim to reconstruct STTD using low-dimensional models. However, they are limited to data-specific dimensions or source-dependent patterns, restricting them from unifying representations. Here, we present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation. To discern the underlying dynamics in low-dimensional regimes, coordinate-based neural networks that can encode high-frequency structures are employed to directly map coordinates to traffic variables. To unravel the entangled spatial-temporal interactions, the variability is decomposed into separate processes. We further enable modeling in irregular spaces such as sensor graphs using spectral embedding. Through continuous representations, our approach enables the modeling of a variety of STTD with a unified input, thereby serving as a generalized learner of the underlying traffic dynamics. It is also shown that it can learn implicit low-rank priors and smoothness regularization from the data, making it versatile for learning different dominating data patterns. We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales. Empirical results not only indicate that our model has significant superiority over conventional low-rank models, but also highlight that the versatility of the approach extends to different data domains, output resolutions, and network topologies. Comprehensive model analyses provide further insight into the inductive bias of STTD. We anticipate that this pioneering modeling perspective could lay the foundation for universal representation of STTD in various real-world tasks. Code is available at https://github.com/tongnie/traffic_dynamics., Comment: Accepted at TR-Part C
Published: 2024
Full Text: View/download PDF

31. Potential Paradigm Shift in Hazard Risk Management: AI-Based Weather Forecast for Tropical Cyclone Hazards

Author: Feng, Kairui, Xi, Dazhi, Ma, Wei, Wang, Cao, Li, Yuanlong, and Chen, Xuanhong
Subjects: Physics - Atmospheric and Oceanic Physics, Astrophysics - Earth and Planetary Astrophysics, Computer Science - Machine Learning, Physics - Computational Physics
Abstract: The advents of Artificial Intelligence (AI)-driven models marks a paradigm shift in risk management strategies for meteorological hazards. This study specifically employs tropical cyclones (TCs) as a focal example. We engineer a perturbation-based method to produce ensemble forecasts using the advanced Pangu AI weather model. Unlike traditional approaches that often generate fewer than 20 scenarios from Weather Research and Forecasting (WRF) simulations for one event, our method facilitates the rapid nature of AI-driven model to create thousands of scenarios. We offer open-source access to our model and evaluate its effectiveness through retrospective case studies of significant TC events: Hurricane Irma (2017), Typhoon Mangkhut (2018), and TC Debbie (2017), affecting regions across North America, East Asia, and Australia. Our findings indicate that the AI-generated ensemble forecasts align closely with the European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble predictions up to seven days prior to landfall. This approach could substantially enhance the effectiveness of weather forecast-driven risk analysis and management, providing unprecedented operational speed, user-friendliness, and global applicability.
Published: 2024

32. Generating Attractive and Authentic Copywriting from Customer Reviews

Author: Lin, Yu-Xiang and Ma, Wei-Yun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The goal of product copywriting is to capture the interest of potential buyers by emphasizing the features of products through text descriptions. As e-commerce platforms offer a wide range of services, it's becoming essential to dynamically adjust the styles of these auto-generated descriptions. Typical approaches to copywriting generation often rely solely on specified product attributes, which may result in dull and repetitive content. To tackle this issue, we propose to generate copywriting based on customer reviews, as they provide firsthand practical experiences with products, offering a richer source of information than just product attributes. We have developed a sequence-to-sequence framework, enhanced with reinforcement learning, to produce copywriting that is attractive, authentic, and rich in information. Our framework outperforms all existing baseline and zero-shot large language models, including LLaMA-2-chat-7B and GPT-3.5, in terms of both attractiveness and faithfulness. Furthermore, this work features the use of LLMs for aspect-based summaries collection and argument allure assessment. Experiments demonstrate the effectiveness of using LLMs for marketing domain corpus construction. The code and the dataset is publicly available at: https://github.com/YuXiangLin1234/Copywriting-Generation., Comment: NAACL 2024 main conference paper
Published: 2024

33. BLINK: Multimodal Large Language Models Can See but Not Perceive

Author: Fu, Xingyu, Hu, Yushi, Li, Bangzheng, Feng, Yu, Wang, Haoyu, Lin, Xudong, Roth, Dan, Smith, Noah A., Ma, Wei-Chiu, and Krishna, Ranjay
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations. Most of the Blink tasks can be solved by humans "within a blink" (e.g., relative depth estimation, visual correspondence, forensics detection, and multi-view reasoning). However, we find these perception-demanding tasks cast significant challenges for current multimodal LLMs because they resist mediation through natural language. Blink reformats 14 classic computer vision tasks into 3,807 multiple-choice questions, paired with single or multiple images and visual prompting. While humans get 95.70% accuracy on average, Blink is surprisingly challenging for existing multimodal LLMs: even the best-performing GPT-4V and Gemini achieve accuracies of 51.26% and 45.72%, only 13.17% and 7.63% higher than random guessing, indicating that such perception abilities have not "emerged" yet in recent multimodal LLMs. Our analysis also highlights that specialist CV models could solve these problems much better, suggesting potential pathways for future improvements. We believe Blink will stimulate the community to help multimodal LLMs catch up with human-level visual perception., Comment: Multimodal Benchmark, Project Url: https://zeyofu.github.io/blink/, ECCV 2024
Published: 2024

34. MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space

Author: Qu, Yanru, Qiu, Keyue, Song, Yuxuan, Gong, Jingjing, Han, Jiawei, Zheng, Mingyue, Zhou, Hao, and Ma, Wei-Ying
Subjects: Quantitative Biology - Biomolecules, Computer Science - Machine Learning
Abstract: Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and diffusion to SBDD, including mode collapse and hybrid continuous-discrete space. In this paper, we introduce MolCRAFT, the first SBDD model that operates in the continuous parameter space, together with a novel noise reduced sampling strategy. Empirical results show that our model consistently achieves superior performance in binding affinity with more stable 3D structure, demonstrating our ability to accurately model interatomic interactions. To our best knowledge, MolCRAFT is the first to achieve reference-level Vina Scores (-6.59 kcal/mol) with comparable molecular size, outperforming other strong baselines by a wide margin (-0.84 kcal/mol). Code is available at https://github.com/AlgoMole/MolCRAFT., Comment: Accepted to ICML 2024
Published: 2024

35. Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Author: Xia, Hongchi, Lin, Zhi-Hao, Ma, Wei-Chiu, and Wang, Shenlong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components:(i) a neural radiance fields (NeRF) module that effectively captures the geometry and visual appearance of the scene; (ii) a mesh module that distills the knowledge from NeRF for faster rendering; and (iii) a physics module that models the interactions and physical dynamics among the objects. By following the carefully designed pipeline, one can construct an interactable and actionable digital replica of the real world. We benchmark our system on both indoor and large-scale outdoor scenes. We show that we can not only produce highly-realistic renderings in real-time, but also build interactive games on top., Comment: CVPR 2024. Project page (with code): https://video2game.github.io/
Published: 2024

36. Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation

Author: Liu, Shangqing, Ma, Wei, Wang, Jian, Xie, Xiaofei, Feng, Ruitao, and Liu, Yang
Subjects: Computer Science - Cryptography and Security
Abstract: Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task for example determining whether it is vulnerable or not. This poses a challenge for a single deep learning-based model to effectively learn the wide array of vulnerability characteristics. Furthermore, due to the challenges associated with collecting large-scale vulnerability data, these detectors often overfit limited training datasets, resulting in lower model generalization performance. To address the aforementioned challenges, in this work, we introduce a fine-grained vulnerability detector namely FGVulDet. Unlike previous approaches, FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability. Each classifier is designed to learn type-specific vulnerability semantics. Additionally, to address the scarcity of data for some vulnerability types and enhance data diversity for learning better vulnerability semantics, we propose a novel vulnerability-preserving data augmentation technique to augment the number of vulnerabilities. Taking inspiration from recent advancements in graph neural networks for learning program semantics, we incorporate a Gated Graph Neural Network (GGNN) and extend it to an edge-aware GGNN to capture edge-type information. FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities. Extensive experiments compared with static-analysis-based approaches and learning-based approaches have demonstrated the effectiveness of FGVulDet.
Published: 2024

37. Evaluation and Improvement of Fault Detection for Large Language Models

Author: Hu, Qiang, Wen, Jin, Cordy, Maxime, Huang, Yuheng, Ma, Wei, Xie, Xiaofei, and Ma, Lei
Subjects: Computer Science - Software Engineering, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Large language models (LLMs) have recently achieved significant success across various application domains, garnering substantial attention from different communities. Unfortunately, even for the best LLM, many \textit{faults} still exist that LLM cannot properly predict. Such faults will harm the usability of LLMs in general and could introduce safety issues in reliability-critical systems such as autonomous driving systems. How to quickly reveal these faults in real-world datasets that LLM could face is important, but challenging. The major reason is that the ground truth is necessary but the data labeling process is heavy considering the time and human effort. To handle this problem, in the conventional deep learning testing field, test selection methods have been proposed for efficiently evaluating deep learning models by prioritizing faults. However, despite their importance, the usefulness of these methods on LLMs is unclear, and lack of exploration. In this paper, we conduct the first empirical study to investigate the effectiveness of existing fault detection methods for LLMs. Experimental results on four different tasks~(including both code tasks and natural language processing tasks) and four LLMs~(e.g., LLaMA3 and GPT4) demonstrated that simple methods such as Margin perform well on LLMs but there is still a big room for improvement. Based on the study, we further propose \textbf{MuCS}, a prompt \textbf{Mu}tation-based prediction \textbf{C}onfidence \textbf{S}moothing framework to boost the fault detection capability of existing methods. Concretely, multiple prompt mutation techniques have been proposed to help collect more diverse outputs for confidence smoothing. The results show that our proposed framework significantly enhances existing methods with the improvement of test relative coverage by up to 70.53\%.
Published: 2024

38. Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning

Author: Lin, Zhihao, Ma, Wei, Lin, Tao, Zheng, Yaowen, Ge, Jingquan, Wang, Jun, Klein, Jacques, Bissyande, Tegawende, Liu, Yang, and Li, Li
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks, showcasing their efficacy in code understanding and beyond. Like traditional SE tools, open-source collaboration is key in realising the excellent products. However, with AI models, the essential need is in data. The collaboration of these AI-based SE models hinges on maximising the sources of high-quality data. However, data especially of high quality, often holds commercial or sensitive value, making it less accessible for open-source AI-based SE projects. This reality presents a significant barrier to the development and enhancement of AI-based SE tools within the software engineering community. Therefore, researchers need to find solutions for enabling open-source AI-based SE models to tap into resources by different organisations. Addressing this challenge, our position paper investigates one solution to facilitate access to diverse organizational resources for open-source AI models, ensuring privacy and commercial sensitivities are respected. We introduce a governance framework centered on federated learning (FL), designed to foster the joint development and maintenance of open-source AI code models while safeguarding data privacy and security. Additionally, we present guidelines for developers on AI-based SE tool collaboration, covering data requirements, model architecture, updating strategies, and version control. Given the significant influence of data characteristics on FL, our research examines the effect of code data heterogeneity on FL performance.
Published: 2024

39. An Empirical Study of Automated Vulnerability Localization with Large Language Models

Author: Zhang, Jian, Wang, Chong, Li, Anran, Sun, Weisong, Zhang, Cen, Ma, Wei, and Liu, Yang
Subjects: Computer Science - Software Engineering, Computer Science - Cryptography and Security
Abstract: Recently, Automated Vulnerability Localization (AVL) has attracted much attention, aiming to facilitate diagnosis by pinpointing the lines of code responsible for discovered vulnerabilities. Large Language Models (LLMs) have shown potential in various domains, yet their effectiveness in vulnerability localization remains underexplored. In this work, we perform the first comprehensive study of LLMs for AVL. Our investigation encompasses 10+ leading LLMs suitable for code analysis, including ChatGPT and various open-source models, across three architectural types: encoder-only, encoder-decoder, and decoder-only, with model sizes ranging from 60M to 16B parameters. We explore the efficacy of these LLMs using 4 distinct paradigms: zero-shot learning, one-shot learning, discriminative fine-tuning, and generative fine-tuning. Our evaluation framework is applied to the BigVul-based dataset for C/C++, and an additional dataset comprising smart contract vulnerabilities. The results demonstrate that discriminative fine-tuning of LLMs can significantly outperform existing learning-based methods for AVL, while other paradigms prove less effective or unexpectedly ineffective for the task. We also identify challenges related to input length and unidirectional context in fine-tuning processes for encoders and decoders. We then introduce two remedial strategies: the sliding window and the right-forward embedding, both of which substantially enhance performance. Furthermore, our findings highlight certain generalization capabilities of LLMs across Common Weakness Enumerations (CWEs) and different projects, indicating a promising pathway toward their practical application in vulnerability localization.
Published: 2024

40. SoK: Comprehensive Analysis of Rug Pull Causes, Datasets, and Detection Tools in DeFi

Author: Sun, Dianxiang, Ma, Wei, Nie, Liming, and Liu, Yang
Subjects: Computer Science - Software Engineering
Abstract: Rug pulls pose a grave threat to the cryptocurrency ecosystem, leading to substantial financial loss and undermining trust in decentralized finance (DeFi) projects. With the emergence of new rug pull patterns, research on rug pull is out of state. To fill this gap, we first conducted an extensive analysis of the literature review, encompassing both scholarly and industry sources. By examining existing academic articles and industrial discussions on rug pull projects, we present a taxonomy inclusive of 34 root causes, introducing six new categories inspired by industry sources: burn, hidden owner, ownership transfer, unverified contract, external call, and fake LP lock. Based on the developed taxonomy, we evaluated current rug pull datasets and explored the effectiveness and limitations of existing detection mechanisms. Our evaluation indicates that the existing datasets, which document 2,448 instances, address only 7 of the 34 root causes, amounting to a mere 20% coverage. It indicates that existing open-source datasets need to be improved to study rug pulls. In response, we have constructed a more comprehensive dataset containing 2,360 instances, expanding the coverage to 54% with the best effort. In addition, the examination of 14 detection tools showed that they can identify 25 of the 34 root causes, achieving a coverage of 73.5%. There are nine root causes (Fake LP Lock, Hidden Fee, and Destroy Token, Fake Money Transfer, Ownership Transfer, Liquidity Pool Block, Freeze Account, Wash-Trading, Hedge) that the existing tools cannot cover. Our work indicates that there is a significant gap between current research and detection tools, and the actual situation of rug pulls.
Published: 2024

41. Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications

Author: Ma, Wei, Wu, Daoyuan, Sun, Yuqiang, Wang, Tianwen, Liu, Shangqing, Zhang, Jian, Xue, Yue, and Liu, Yang
Subjects: Computer Science - Software Engineering
Abstract: Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent research has shown that large language models (LLMs) have potential in auditing smart contracts, but the state-of-the-art indicates that even GPT-4 can achieve only 30% precision (when both decision and justification are correct). This is likely because off-the-shelf LLMs were primarily pre-trained on a general text/code corpus and not fine-tuned on the specific domain of Solidity smart contract auditing. In this paper, we propose iAudit, a general framework that combines fine-tuning and LLM-based agents for intuitive smart contract auditing with justifications. Specifically, iAudit is inspired by the observation that expert human auditors first perceive what could be wrong and then perform a detailed analysis of the code to identify the cause. As such, iAudit employs a two-stage fine-tuning approach: it first tunes a Detector model to make decisions and then tunes a Reasoner model to generate causes of vulnerabilities. However, fine-tuning alone faces challenges in accurately identifying the optimal cause of a vulnerability. Therefore, we introduce two LLM-based agents, the Ranker and Critic, to iteratively select and debate the most suitable cause of vulnerability based on the output of the fine-tuned Reasoner model. To evaluate iAudit, we collected a balanced dataset with 1,734 positive and 1,810 negative samples to fine-tune iAudit. We then compared it with traditional fine-tuned models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) as well as prompt learning-based LLMs (GPT4, GPT-3.5, and CodeLlama-13b/34b). On a dataset of 263 real smart contract vulnerabilities, iAudit achieves an F1 score of 91.21% and an accuracy of 91.11%. The causes generated by iAudit achieved a consistency of about 38% compared to the ground truth causes., Comment: Accepted for the 47th International Conference on Software Engineering (ICSE 2025)
Published: 2024

42. Desiderata of evidence for representation in neuroscience

Author: Pohl, Stephan, Walker, Edgar Y., Barack, David L., Lee, Jennifer, Denison, Rachel N., Block, Ned, Meyniel, Florent, and Ma, Wei Ji
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: This paper develops a systematic framework for the evidence neuroscientists use to establish whether a neural response represents a feature. Researchers try to establish that the neural response is (1) sensitive and (2) specific to the feature, (3) invariant to other features, and (4) functional, which means that it is used downstream in the brain. We formalize these desiderata in information-theoretic terms. This formalism allows us to precisely state the desiderata while unifying the different analysis methods used in neuroscience under one framework. We discuss how common methods such as correlational analyses, decoding and encoding models, representational similarity analysis, and tests of statistical dependence are used to evaluate the desiderata. In doing so, we provide a common terminology to researchers that helps to clarify disagreements, to compare and integrate results across studies and research groups, and to identify when evidence might be missing and when evidence for some representational conclusion is strong. We illustrate the framework with several canonical examples, including the representation of orientation, numerosity, faces, and spatial location. We end by discussing how the framework can be extended to cover models of the neural code, multi-stage models, and other domains., Comment: 50 pages, 11 figures
Published: 2024

43. Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks

Author: Song, Yuxuan, Gong, Jingjing, Qu, Yanru, Zhou, Hao, Zheng, Mingyue, Liu, Jingjing, and Ma, Wei-Ying
Subjects: Physics - Chemical Physics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Quantitative Biology - Biomolecules
Abstract: Advanced generative model (e.g., diffusion model) derived from simplified continuity assumptions of data distribution, though showing promising progress, has been difficult to apply directly to geometry generation applications due to the multi-modality and noise-sensitive nature of molecule geometry. This work introduces Geometric Bayesian Flow Networks (GeoBFN), which naturally fits molecule geometry by modeling diverse modalities in the differentiable parameter space of distributions. GeoBFN maintains the SE-(3) invariant density modeling property by incorporating equivariant inter-dependency modeling on parameters of distributions and unifying the probabilistic modeling of different modalities. Through optimized training and sampling techniques, we demonstrate that GeoBFN achieves state-of-the-art performance on multiple 3D molecule generation benchmarks in terms of generation quality (90.87% molecule stability in QM9 and 85.6% atom stability in GEOM-DRUG. GeoBFN can also conduct sampling with any number of steps to reach an optimal trade-off between efficiency and quality (e.g., 20-times speedup without sacrificing performance)., Comment: ICLR 2024
Published: 2024

44. Exploring Hilbert-Space Fragmentation on a Superconducting Processor

Author: Wang, Yong-Yi, Shi, Yun-Hao, Sun, Zheng-Hang, Chen, Chi-Tong, Wang, Zheng-An, Zhao, Kui, Liu, Hao-Tian, Ma, Wei-Guo, Wang, Ziting, Li, Hao, Zhang, Jia-Chi, Liu, Yu, Deng, Cheng-Lin, Li, Tian-Ming, He, Yang, Liu, Zheng-He, Peng, Zhen-Yu, Song, Xiaohui, Xue, Guangming, Yu, Haifeng, Huang, Kaixuan, Xiang, Zhongcheng, Zheng, Dongning, Xu, Kai, and Fan, Heng
Subjects: Quantum Physics, Condensed Matter - Disordered Systems and Neural Networks, Condensed Matter - Statistical Mechanics
Abstract: Isolated interacting quantum systems generally thermalize, yet there are several counterexamples for the breakdown of ergodicity, such as many-body localization and quantum scars. Recently, ergodicity breaking has been observed in systems subjected to linear potentials, termed Stark many-body localization. This phenomenon is closely associated with Hilbert-space fragmentation, characterized by a strong dependence of dynamics on initial conditions. Here, we experimentally explore initial-state dependent dynamics using a ladder-type superconducting processor with up to 24 qubits, which enables precise control of the qubit frequency and initial state preparation. In systems with linear potentials, we observe distinct non-equilibrium dynamics for initial states with the same quantum numbers and energy, but with varying domain wall numbers. This distinction becomes increasingly pronounced as the system size grows, in contrast with disordered interacting systems. Our results provide convincing experimental evidence of the fragmentation in Stark systems, enriching our understanding of the weak breakdown of ergodicity., Comment: main text: 7 pages, 4 figures; supplementary: 13 pages, 14 figures
Published: 2024

45. ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling

Author: Zheng, Kangjie, Long, Siyu, Lu, Tianyu, Yang, Junwei, Dai, Xinyu, Zhang, Ming, Nie, Zaiqing, Ma, Wei-Ying, and Zhou, Hao
Subjects: Quantitative Biology - Biomolecules, Computer Science - Computational Engineering, Finance, and Science, Computer Science - Machine Learning
Abstract: Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small molecules. In this paper, we propose ESM-AA (ESM All-Atom), a novel approach that enables atom-scale and residue-scale unified molecular modeling. ESM-AA achieves this by pre-training on multi-scale code-switch protein sequences and utilizing a multi-scale position encoding to capture relationships among residues and atoms. Experimental results indicate that ESM-AA surpasses previous methods in protein-molecule tasks, demonstrating the full utilization of protein language models. Further investigations reveal that through unified molecular modeling, ESM-AA not only gains molecular knowledge but also retains its understanding of proteins. The source codes of ESM-AA are publicly released at https://github.com/zhengkangjie/ESM-AA., Comment: ICML2024 camera-ready, update some experimental results, add github url, fix some typos
Published: 2024

46. Rethinking Specificity in SBDD: Leveraging Delta Score and Energy-Guided Diffusion

Author: Gao, Bowen, Ren, Minsi, Ni, Yuyan, Huang, Yanwen, Qiang, Bo, Ma, Zhi-Ming, Ma, Wei-Ying, and Lan, Yanyan
Subjects: Quantitative Biology - Biomolecules, Computer Science - Machine Learning
Abstract: In the field of Structure-based Drug Design (SBDD), deep learning-based generative models have achieved outstanding performance in terms of docking score. However, further study shows that the existing molecular generative methods and docking scores both have lacked consideration in terms of specificity, which means that generated molecules bind to almost every protein pocket with high affinity. To address this, we introduce the Delta Score, a new metric for evaluating the specificity of molecular binding. To further incorporate this insight for generation, we develop an innovative energy-guided approach using contrastive learning, with active compounds as decoys, to direct generative models toward creating molecules with high specificity. Our empirical results show that this method not only enhances the delta score but also maintains or improves traditional docking scores, successfully bridging the gap between SBDD and real-world needs.
Published: 2024

47. Fluorine/bromine/selenium multi-heteroatoms substituted dual-asymmetric electron acceptors for o-xylene processed organic solar cells with 19.12% efficiency

Author: Zhou, Yibo, Qi, Guangyu, Liu, Han, Bai, Hairui, Li, Tengfei, Maqsood, Muhammad Hamza, Liu, Chang, Song, Bohao, Chen, Na, Lu, Guanghao, Gao, Chao, Liu, Yuhang, Su, Wenyan, Du, Huiling, Ma, Ruijie, Ma, Wei, and Fan, Qunping
Published: 2024
Full Text: View/download PDF

48. Black phosphorus nanodots-modified Pt/C electrocatalyst for methanol-tolerant oxygen reduction in direct methanol fuel cells

Author: Zhang, Li-Li, Lu, Pan-Pan, Yin, Ming-Ming, Li, Ruo-Nan, Wang, Bing, Ma, Xian-Di, Jiao, Meng-Gai, Ma, Wei, and Zhou, Zhen
Published: 2024
Full Text: View/download PDF

49. Positioning error compensation method for industrial robots based on stacked ensemble learning

Author: Chen, Qizhi, Zhang, Chengrui, Ma, Wei, and Yang, Chen
Published: 2024
Full Text: View/download PDF

50. Effect of Hydrogen Content and Microstructure Inhomogeneity on Fatigue Crack Growth Behavior of Ti-0.3Mo-0.8Ni Alloy Argon-Arc Welded Joints

Author: Liu, Quanming, Xiao, Junfeng, Gao, Sifeng, Tang, Wenshu, Yang, Haiying, Li, Yongjun, Zhang, Jiong, Nan, Qing, Ma, Wei, and Xu, Xiaobo
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

16,825 results on '"Ma, Wei"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources