Author: "Li, Jiawei" / Publication Year Range: Last 3 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Li, Jiawei"' showing total 3,074 results

Start Over Author "Li, Jiawei" Publication Year Range Last 3 years

3,074 results on '"Li, Jiawei"'

1. Delta-Influence: Unlearning Poisons via Influence Functions

Author: Li, Wenjie, Li, Jiawei, de Witt, Christian Schroeder, Prabhu, Ameya, and Sanyal, Amartya
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Addressing data integrity challenges, such as unlearning the effects of data poisoning after model training, is necessary for the reliable deployment of machine learning models. State-of-the-art influence functions, such as EK-FAC, often fail to accurately attribute abnormal model behavior to the specific poisoned training data responsible for the data poisoning attack. In addition, traditional unlearning algorithms often struggle to effectively remove the influence of poisoned samples, particularly when only a few affected examples can be identified. To address these challenge, we introduce $\Delta$-Influence, a novel approach that leverages influence functions to trace abnormal model behavior back to the responsible poisoned training data using as little as just one poisoned test example. $\Delta$-Influence applies data transformations that sever the link between poisoned training data and compromised test points without significantly affecting clean data. This allows $\Delta$-Influence to detect large negative shifts in influence scores following data transformations, a phenomenon we term as influence collapse, thereby accurately identifying poisoned training data. Unlearning this subset, e.g. through retraining, effectively eliminates the data poisoning. We validate our method across three vision-based poisoning attacks and three datasets, benchmarking against four detection algorithms and five unlearning strategies. We show that $\Delta$-Influence consistently achieves the best unlearning across all settings, showing the promise of influence functions for corrective unlearning. Our code is publicly available at: \url{https://github.com/andyisokay/delta-influence}, Comment: Accepted at NeurIPS Workshop on Attributing Model Behavior at Scale (ATTRIB @ NeurIPS 2024)
Published: 2024

2. METEOR: Evolutionary Journey of Large Language Models from Guidance to Self-Growth

Author: Li, Jiawei, Feng, Chong, and Gao, Yang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Model evolution enables learning from feedback to refine experiences and update skills, transforming models from having no domain knowledge to becoming domain experts. However, there is currently no unified and effective method for guiding this evolutionary process. To address this gap, we propose the Meteor method, which includes three training phases: weak-to-strong data distillation, iterative training, and self-evolution strategies. Each phase maximizes the model's inherent domain capabilities, allowing it to autonomously refine its domain knowledge and enhance performance. Experiments demonstrate that our approach significantly improves accuracy, completeness, relevance, coherence, and reliability across domain-specific tasks.
Published: 2024

3. PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

Author: Li, Jiawei, Liang, Xinyue, Yang, Yizhe, Feng, Chong, and Gao, Yang
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Process supervision enhances the performance of large language models in reasoning tasks by providing feedback at each step of chain-of-thought reasoning. However, due to the lack of effective process supervision methods, even advanced large language models are prone to logical errors and redundant reasoning. We claim that the effectiveness of process supervision significantly depends on both the accuracy and the length of reasoning chains. Moreover, we identify that these factors exhibit a nonlinear relationship with the overall reward score of the reasoning process. Inspired by these insights, we propose a novel process supervision paradigm, PSPO*, which systematically outlines the workflow from reward model training to policy optimization, and highlights the importance of nonlinear rewards in process supervision. Based on PSPO*, we develop the PSPO-WRS, which considers the number of reasoning steps in determining reward scores and utilizes an adjusted Weibull distribution for nonlinear reward shaping. Experimental results on six mathematical reasoning datasets demonstrate that PSPO-WRS consistently outperforms current mainstream models.
Published: 2024

4. An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?

Author: Suh, Hyunjae, Tafreshipour, Mahan, Li, Jiawei, Bhattiprolu, Adithya, and Ahmed, Iftekhar
Subjects: Computer Science - Software Engineering
Abstract: Artificial Intelligence (AI) techniques, especially Large Language Models (LLMs), have started gaining popularity among researchers and software developers for generating source code. However, LLMs have been shown to generate code with quality issues and also incurred copyright/licensing infringements. Therefore, detecting whether a piece of source code is written by humans or AI has become necessary. This study first presents an empirical analysis to investigate the effectiveness of the existing AI detection tools in detecting AI-generated code. The results show that they all perform poorly and lack sufficient generalizability to be practically deployed. Then, to improve the performance of AI-generated code detection, we propose a range of approaches, including fine-tuning the LLMs and machine learning-based classification with static code metrics or code embedding generated from Abstract Syntax Tree (AST). Our best model outperforms state-of-the-art AI-generated code detector (GPTSniffer) and achieves an F1 score of 82.55. We also conduct an ablation study on our best-performing model to investigate the impact of different source code features on its performance., Comment: Accepted at The 47th IEEE/ACM International Conference on Software Engineering (ICSE 2025)
Published: 2024

5. Quantum Communication Advantage in TFNP

Author: Göös, Mika, Gur, Tom, Jain, Siddhartha, and Li, Jiawei
Subjects: Quantum Physics, Computer Science - Computational Complexity
Abstract: We exhibit a total search problem whose communication complexity in the quantum SMP (simultaneous message passing) model is exponentially smaller than in the classical two-way randomized model. Moreover, the quantum protocol is computationally efficient and its solutions are classically verifiable, that is, the problem lies in the communication analogue of the class TFNP. Our problem is a bipartite version of a query complexity problem recently introduced by Yamakawa and Zhandry (JACM 2024). We prove the classical lower bound using the structure-vs-randomness paradigm for analyzing communication protocols.
Published: 2024

6. A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why?

Author: Chen, QiHong, Li, Jiawei, Deng, Jiecheng, Yu, Jiachen, Chen, Justin Tian Jin, and Ahmed, Iftekhar
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: Recent advancements in Large Language Models (LLMs) have led to their widespread application in automated code generation. However, these models can still generate defective code that deviates from the specification. Previous research has mainly focused on the mistakes in LLM-generated standalone functions, overlooking real-world software development situations where the successful generation of the code requires software contexts such as external dependencies. In this paper, we considered both of these code generation situations and identified a range of \textit{non-syntactic mistakes} arising from LLMs' misunderstandings of coding question specifications. Seven categories of non-syntactic mistakes were identified through extensive manual analyses, four of which were missed by previous works. To better understand these mistakes, we proposed six reasons behind these mistakes from various perspectives. Moreover, we explored the effectiveness of LLMs in detecting mistakes and their reasons. Our evaluation demonstrated that GPT-4 with the ReAct prompting technique can achieve an F1 score of up to 0.65 when identifying reasons for LLM's mistakes, such as misleading function signatures. We believe that these findings offer valuable insights into enhancing the quality of LLM-generated code.
Published: 2024

7. AI as a Bridge Across Ages: Exploring The Opportunities of Artificial Intelligence in Supporting Inter-Generational Communication in Virtual Reality

Author: Du, Qiuxin, Wei, Xiaoying, Li, Jiawei, Kuang, Emily, Hao, Jie, Weng, Dongdong, and Fan, Mingming
Subjects: Computer Science - Human-Computer Interaction
Abstract: Inter-generational communication is essential for bridging generational gaps and fostering mutual understanding. However, maintaining it is complex due to cultural, communicative, and geographical differences. Recent research indicated that while Virtual Reality (VR) creates a relaxed atmosphere and promotes companionship, it inadequately addresses the complexities of inter-generational dialogue, including variations in values and relational dynamics. To address this gap, we explored the opportunities of Artificial Intelligence (AI) in supporting inter-generational communication in VR. We developed three technology probes (e.g., Content Generator, Communication Facilitator, and Info Assistant) in VR and employed them in a probe-based participatory design study with twelve inter-generational pairs. Our results show that AI-powered VR facilitates inter-generational communication by enhancing mutual understanding, fostering conversation fluency, and promoting active participation. We also introduce several challenges when using AI-powered VR in supporting inter-generational communication and derive design implications for future VR platforms, aiming to improve inter-generational communication.
Published: 2024

8. TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis

Author: Wang, Shiyu, Li, Jiawei, Shi, Xiaoming, Ye, Zhou, Mo, Baichuan, Lin, Wenze, Ju, Shengtong, Chu, Zhixuan, and Jin, Ming
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Time series analysis plays a critical role in numerous applications, supporting tasks such as forecasting, classification, anomaly detection, and imputation. In this work, we present the time series pattern machine (TSPM), a model designed to excel in a broad range of time series tasks through powerful representation and pattern extraction capabilities. Traditional time series models often struggle to capture universal patterns, limiting their effectiveness across diverse tasks. To address this, we define multiple scales in the time domain and various resolutions in the frequency domain, employing various mixing strategies to extract intricate, task-adaptive time series patterns. Specifically, we introduce a general-purpose TSPM that processes multi-scale time series using (1) multi-resolution time imaging (MRTI), (2) time image decomposition (TID), (3) multi-scale mixing (MCM), and (4) multi-resolution mixing (MRM) to extract comprehensive temporal patterns. MRTI transforms multi-scale time series into multi-resolution time images, capturing patterns across both temporal and frequency domains. TID leverages dual-axis attention to extract seasonal and trend patterns, while MCM hierarchically aggregates these patterns across scales. MRM adaptively integrates all representations across resolutions. This method achieves state-of-the-art performance across 8 time series analytical tasks, consistently surpassing both general-purpose and task-specific models. Our work marks a promising step toward the next generation of TSPMs, paving the way for further advancements in time series analysis.
Published: 2024

9. ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization

Author: Li, Jiawei, Zhang, Fanrui, Zhu, Jiaying, Sun, Esther, Zhang, Qiang, and Zha, Zheng-Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of Image Forgery Detection and Localization (IFDL). Moreover, existing IFDL methods are typically limited to the learning of low-level semantic-agnostic clues and merely provide a single outcome judgment. To tackle these issues, we propose ForgeryGPT, a novel framework that advances the IFDL task by capturing high-order forensics knowledge correlations of forged images from diverse linguistic feature spaces, while enabling explainable generation and interactive dialogue through a newly customized Large Language Model (LLM) architecture. Specifically, ForgeryGPT enhances traditional LLMs by integrating the Mask-Aware Forgery Extractor, which enables the excavating of precise forgery mask information from input images and facilitating pixel-level understanding of tampering artifacts. The Mask-Aware Forgery Extractor consists of a Forgery Localization Expert (FL-Expert) and a Mask Encoder, where the FL-Expert is augmented with an Object-agnostic Forgery Prompt and a Vocabulary-enhanced Vision Encoder, allowing for effectively capturing of multi-scale fine-grained forgery details. To enhance its performance, we implement a three-stage training strategy, supported by our designed Mask-Text Alignment and IFDL Task-Specific Instruction Tuning datasets, which align vision-language modalities and improve forgery detection and instruction-following capabilities. Extensive experiments demonstrate the effectiveness of the proposed method., Comment: 16 pages, 14 figures
Published: 2024

10. Does the Order of Fine-tuning Matter and Why?

Author: Chen, Qihong, Li, Jiawei, Suh, Hyunjae, Jiang, Lianghao, Zhou, Zheng, Chen, Jingze, Gesi, Jiri, and Ahmed, Iftekhar
Subjects: Computer Science - Software Engineering
Abstract: To improve the performance on a target task, researchers have fine-tuned language models with an intermediate task before the target task of interest. However, previous works have focused on the pre-trained language models and downstream tasks in Natural Language Processing (NLP) and considered only one intermediate task. The effect of fine-tuning multiple intermediate tasks and their ordering on target task performance has not been fully explored in Software Engineering. In this study, we perform the first empirical study on analyzing the impact of task ordering on target task performance. Experimental results show that there is an impact of task ordering on target task performance by up to 6% of performance gain and up to 4% of performance loss. To explain such an impact, we consider a variety of potential factors, including the characteristics of dataset (syntactic similarity and semantic similarity analysis, dataset size), model (probing task and attention analysis), and task (task affinity analysis). Our study provides Software Engineering researchers and practitioners with insights into the effect of task orderings and how to select the one that is cost-effective while achieving the best performance gain.
Published: 2024

11. NeDF: neural deflection fields for sparse-view tomographic background oriented Schlieren

Author: Li, Jiawei, Meng, Xuhui, Xiong, Yuan, Jia, Tong, Pan, Chong, and Wang, Jinjun
Subjects: Physics - Fluid Dynamics, Physics - Optics
Abstract: Three-dimensional (3D) density-varying turbulent flows are widely encountered in high-speed aerodynamics, combustion, and heterogeneous mixing processes. Multi-camera-based tomographic background-oriented Schlieren (TBOS) has emerged as a powerful technique for revealing 3D flow density structures. However, dozens of cameras are typically required to obtain high-quality reconstructed density fields. Limited by the number of available optical windows and confined space in the harsh experimental environments, TBOS with only sparse views and limited viewing angles often becomes the necessary choice practically, rendering the inverse problem for TBOS reconstruction severely ill-posed and resulting in degraded tomography quality. In this study, we propose a novel TBOS reconstruction method, neural deflection field (NeDF), utilizing deep neural networks (DNNs) to represent the density gradient fields without using any pretrained neural network models. Particularly, state-of-the-art positional encoding techniques and hierarchical sampling strategies are incorporated to capture the density structures of high spatial frequencies. Required background images for TBOS reconstructions are synthesized based on a high-fidelity nonlinear ray-tracing method with the ground truth flows from conducting LES simulations on premixed turbulent flames. Owing to these synthesized BOS images, the superiority of the proposed method is quantitatively verified compared to the classical TBOS reconstruction methods, and the specific contributions from the position encoding and the hierarchical sampling strategy are also elucidated., Comment: 17 pages, 4 figures, 1 table (paper); 3 pages, 2 figures, 3 tables (supplementary material)
Published: 2024

12. Evolution of two-magnon bound states in a higher-spin ferromagnetic chain with single-ion anisotropy: A complete solution

Author: Lou, Xinlan, Li, Jiawei, and Wu, Ning
Subjects: Condensed Matter - Strongly Correlated Electrons, Quantum Physics
Abstract: Few-magnon bound states in quantum spin chains have been long studied and attracted much recent attentions. For a higher-spin ferromagnetic XXZ chain with single-ion anisotropy, several features regarding the evolution of the low-lying two-magnon bound states with varying wave number were observed in the literature. However, most of these observations are only qualitatively understood due to the lack of analytical tools. By combining a set of exact two-magnon Bloch states and a plane-wave ansatz, we achieve a complete solution of the two-magnon problem in such a system. We identify parameter regions that support different types of two-magnon bound states, with the boundaries defined by algebraic equations. We discover for the first time a narrow region in which two single-ion bound states coexist. We show that the phase diagrams for distinct wave numbers are similar to each other, which enables us to map the evolution of the bound states to the rectilinear movement of a representative point for given parameters in a rescaled phase diagram. This dynamic picture provides quantitative interpretations of the observed features., Comment: 6 pages, 4 figures, accepted for publication as a Letter in Physical Review B
Published: 2024
Full Text: View/download PDF

13. Pattern based learning and optimisation through pricing for bin packing problem

Author: Zhang, Huayan, Bai, Ruibin, Liu, Tie-Yan, Li, Jiawei, Lin, Bingchen, and Ren, Jianfeng
Subjects: Mathematics - Optimization and Control, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: As a popular form of knowledge and experience, patterns and their identification have been critical tasks in most data mining applications. However, as far as we are aware, no study has systematically examined the dynamics of pattern values and their reuse under varying conditions. We argue that when problem conditions such as the distributions of random variables change, the patterns that performed well in previous circumstances may become less effective and adoption of these patterns would result in sub-optimal solutions. In response, we make a connection between data mining and the duality theory in operations research and propose a novel scheme to efficiently identify patterns and dynamically quantify their values for each specific condition. Our method quantifies the value of patterns based on their ability to satisfy stochastic constraints and their effects on the objective value, allowing high-quality patterns and their combinations to be detected. We use the online bin packing problem to evaluate the effectiveness of the proposed scheme and illustrate the online packing procedure with the guidance of patterns that address the inherent uncertainty of the problem. Results show that the proposed algorithm significantly outperforms the state-of-the-art methods. We also analysed in detail the distinctive features of the proposed methods that lead to performance improvement and the special cases where our method can be further improved.
Published: 2024

14. New horizon in the statistical physics of earthquakes: Dragon-king theory and dragon-king earthquakes

Author: Li, Jiawei, Sornette, Didier, Wu, Zhongliang, and Li, Hangwei
Subjects: Physics - Geophysics, Physics - Data Analysis, Statistics and Probability
Abstract: A systematic quantitative investigation into whether the mechanisms of large earthquakes are unique could significantly deepen our understanding of fault rupture and seismicity patterns. This research holds the potential to advance our ability to predict large earthquakes and enhance the effectiveness of disaster prevention and mitigation strategies. In 2009, one of us introduced the dragon-king theory, offering a quantitative framework for identifying and testing extreme outliers-referred to as dragon-king events-that are endogenously generated. This theory provides valuable tools for explaining, predicting, and managing the risks associated with these rare but highly impactful events. The present paper discusses the feasibility of applying this theory to seismology, proposing that dragon-king earthquake events can be identified as outliers to the Gutenberg-Richter law. It also examines several seismological mechanisms that may contribute to the occurrence of these extraordinary events. Although applying the dragon-king theory to seismology presents practical challenges, it offers the potential to significantly enrich statistical seismology. By reexamining the classification of earthquake rupture types through a statistical testing lens and integrating these insights with underlying physical mechanisms, this approach can greatly enhance the analytical tools and depth of research in the field of statistical seismology., Comment: 21 pages, 2 figures
Published: 2024

15. Deep learning for predicting the occurrence of tipping points

Author: Zhuge, Chengzuo, Li, Jiawei, and Chen, Wei
Subjects: Computer Science - Machine Learning, Mathematics - Dynamical Systems
Abstract: Tipping points occur in many real-world systems, at which the system shifts suddenly from one state to another. The ability to predict the occurrence of tipping points from time series data remains an outstanding challenge and a major interest in a broad range of research fields. Particularly, the widely used methods based on bifurcation theory are neither reliable in prediction accuracy nor applicable for irregularly-sampled time series which are commonly observed from real-world systems. Here we address this challenge by developing a deep learning algorithm for predicting the occurrence of tipping points in untrained systems, by exploiting information about normal forms. Our algorithm not only outperforms traditional methods for regularly-sampled model time series but also achieves accurate predictions for irregularly-sampled model time series and empirical time series. Our ability to predict tipping points for complex systems paves the way for mitigation risks, prevention of catastrophic failures, and restoration of degraded systems, with broad applications in social science, engineering, and biology.
Published: 2024

16. One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

Author: Fang, Hao, Kong, Jiawei, Yu, Wenbo, Chen, Bin, Li, Jiawei, Xia, Shutao, and Xu, Ke
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Cryptography and Security
Abstract: Vision-Language Pre-training (VLP) models have exhibited unprecedented capability in many applications by taking full advantage of the multimodal alignment. However, previous studies have shown they are vulnerable to maliciously crafted adversarial samples. Despite recent success, these methods are generally instance-specific and require generating perturbations for each input sample. In this paper, we reveal that VLP models are also vulnerable to the instance-agnostic universal adversarial perturbation (UAP). Specifically, we design a novel Contrastive-training Perturbation Generator with Cross-modal conditions (C-PGC) to achieve the attack. In light that the pivotal multimodal alignment is achieved through the advanced contrastive learning technique, we devise to turn this powerful weapon against themselves, i.e., employ a malicious version of contrastive learning to train the C-PGC based on our carefully crafted positive and negative image-text pairs for essentially destroying the alignment relationship learned by VLP models. Besides, C-PGC fully utilizes the characteristics of Vision-and-Language (V+L) scenarios by incorporating both unimodal and cross-modal information as effective guidance. Extensive experiments show that C-PGC successfully forces adversarial samples to move away from their original area in the VLP model's feature space, thus essentially enhancing attacks across various victim models and V+L tasks. The GitHub repository is available at https://github.com/ffhibnese/CPGC_VLP_Universal_Attacks.
Published: 2024

17. UniCL: A Universal Contrastive Learning Framework for Large Time Series Models

Author: Li, Jiawei, Peng, Jingshu, Li, Haoyang, and Chen, Lei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Time-series analysis plays a pivotal role across a range of critical applications, from finance to healthcare, which involves various tasks, such as forecasting and classification. To handle the inherent complexities of time-series data, such as high dimensionality and noise, traditional supervised learning methods first annotate extensive labels for time-series data in each task, which is very costly and impractical in real-world applications. In contrast, pre-trained foundation models offer a promising alternative by leveraging unlabeled data to capture general time series patterns, which can then be fine-tuned for specific tasks. However, existing approaches to pre-training such models typically suffer from high-bias and low-generality issues due to the use of predefined and rigid augmentation operations and domain-specific data training. To overcome these limitations, this paper introduces UniCL, a universal and scalable contrastive learning framework designed for pretraining time-series foundation models across cross-domain datasets. Specifically, we propose a unified and trainable time-series augmentation operation to generate pattern-preserved, diverse, and low-bias time-series data by leveraging spectral information. Besides, we introduce a scalable augmentation algorithm capable of handling datasets with varying lengths, facilitating cross-domain pretraining. Extensive experiments on two benchmark datasets across eleven domains validate the effectiveness of UniCL, demonstrating its high generalization on time-series analysis across various fields.
Published: 2024

18. DeepDamageNet: A two-step deep-learning model for multi-disaster building damage segmentation and classification using satellite imagery

Author: Alisjahbana, Irene, Li, Jiawei, Ben, Strong, and Zhang, Yue
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Satellite imagery has played an increasingly important role in post-disaster building damage assessment. Unfortunately, current methods still rely on manual visual interpretation, which is often time-consuming and can cause very low accuracy. To address the limitations of manual interpretation, there has been a significant increase in efforts to automate the process. We present a solution that performs the two most important tasks in building damage assessment, segmentation and classification, through deep-learning models. We show our results submitted as part of the xView2 Challenge, a competition to design better models for identifying buildings and their damage level after exposure to multiple kinds of natural disasters. Our best model couples a building identification semantic segmentation convolutional neural network (CNN) to a building damage classification CNN, with a combined F1 score of 0.66, surpassing the xView2 challenge baseline F1 score of 0.28. We find that though our model was able to identify buildings with relatively high accuracy, building damage classification across various disaster types is a difficult task due to the visual similarity between different damage levels and different damage distribution between disaster types, highlighting the fact that it may be important to have a probabilistic prior estimate regarding disaster damage in order to obtain accurate predictions.
Published: 2024

19. Revisiting Seismicity Criticality: A New Framework for Bias Correction of Statistical Seismology Model Calibrations

Author: Li, Jiawei, Sornette, Didier, Wu, Zhongliang, Zhuang, Jiancang, and Jiang, Changsheng
Subjects: Physics - Geophysics
Abstract: The Epidemic-Type Aftershock Sequences (ETAS) model and its variants effectively capture the space-time clustering of seismicity, setting the standard for earthquake forecasting. Accurate unbiased ETAS calibration is thus crucial. But we identify three sources of bias, (i) boundary effects, (ii) finite-size effects, and (iii) censorship, which are often overlooked or misinterpreted, causing errors in seismic analysis and predictions. By employing an ETAS model variant with variable spatial background rates, we propose a method to correct for these biases, focusing on the branching ratio n, a key indicator of earthquake triggering potential. Our approach quantifies the variation in the apparent branching ratio (napp) with increased cut-off magnitude (Mco) above the optimal cut-off (Mcobest). The napp(Mco) function yields insights superior to traditional point estimates. We validate our method using synthetic earthquake catalogs, accurately recovering the true branching ratio (ntrue) after correcting biases with napp(Mco). Additionally, our method introduces a refined estimation of the minimum triggering magnitude (m0), a crucial parameter in the ETAS model. Applying our framework to the earthquake catalogs of California, New Zealand, and the China Seismic Experimental Site (CSES) in Sichuan and Yunnan provinces, we find that seismicity hovers away from the critical point, nc = 1, remaining distinctly subcritical, however with values tending to be larger than recent reports that do not consider the above biases. It is interesting that, m0 is found around 4 for California, 3 for New Zealand and 2 for CSES, suggesting that many small triggered earthquakes may not be fertile. Understanding seismicity's critical state significantly enhances our comprehension of seismic patterns, aftershock predictability, and informs earthquake risk mitigation and management strategies., Comment: 36 pages, 7 figures, 5 tables
Published: 2024

20. Mean field equations arising from random vortex dynamics

Author: Li, Jiawei and Qian, Zhongmin
Subjects: Mathematics - Probability
Abstract: We consider Mckean-Vlasov type stochastic differential equations with multiplicative noise arising from the random vortex method. Such an equation can be viewed as the mean-field limit of interacting particle systems with singular interacting kernels such as the Biot-Savart kernel. A new estimate for the transition probability density of diffusion processes will be formulated to handle the singularity of the interacting kernel. The existence and uniqueness of the weak solution of such SDEs will be established as the main result.
Published: 2024

21. The Death of Feature Engineering? BERT with Linguistic Features on SQuAD 2.0

Author: Li, Jiawei and Zhang, Yue
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Machine reading comprehension is an essential natural language processing task, which takes into a pair of context and query and predicts the corresponding answer to query. In this project, we developed an end-to-end question answering model incorporating BERT and additional linguistic features. We conclude that the BERT base model will be improved by incorporating the features. The EM score and F1 score are improved 2.17 and 2.14 compared with BERT(base). Our best single model reaches EM score 76.55 and F1 score 79.97 in the hidden test set. Our error analysis also shows that the linguistic architecture can help model understand the context better in that it can locate answers that BERT only model predicted "No Answer" wrongly.
Published: 2024

22. Enhancing Out-of-Distribution Detection with Multitesting-based Layer-wise Feature Fusion

Author: Li, Jiawei, Li, Sitong, Wang, Shanshan, Zeng, Yicheng, Tan, Falong, and Xie, Chuanlong
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Deploying machine learning in open environments presents the challenge of encountering diverse test inputs that differ significantly from the training data. These out-of-distribution samples may exhibit shifts in local or global features compared to the training distribution. The machine learning (ML) community has responded with a number of methods aimed at distinguishing anomalous inputs from original training data. However, the majority of previous studies have primarily focused on the output layer or penultimate layer of pre-trained deep neural networks. In this paper, we propose a novel framework, Multitesting-based Layer-wise Out-of-Distribution (OOD) Detection (MLOD), to identify distributional shifts in test samples at different levels of features through rigorous multiple testing procedure. Our approach distinguishes itself from existing methods as it does not require modifying the structure or fine-tuning of the pre-trained classifier. Through extensive experiments, we demonstrate that our proposed framework can seamlessly integrate with any existing distance-based inspection method while efficiently utilizing feature extractors of varying depths. Our scheme effectively enhances the performance of out-of-distribution detection when compared to baseline methods. In particular, MLOD-Fisher achieves superior performance in general. When trained using KNN on CIFAR10, MLOD-Fisher significantly lowers the false positive rate (FPR) from 24.09% to 7.47% on average compared to merely utilizing the features of the last layer.
Published: 2024

23. An Adaptive Dimension Reduction Estimation Method for High-dimensional Bayesian Optimization

Author: Hu, Shouri, Li, Jiawei, and Cai, Zhibo
Subjects: Statistics - Machine Learning, Statistics - Methodology
Abstract: Bayesian optimization (BO) has shown impressive results in a variety of applications within low-to-moderate dimensional Euclidean spaces. However, extending BO to high-dimensional settings remains a significant challenge. We address this challenge by proposing a two-step optimization framework. Initially, we identify the effective dimension reduction (EDR) subspace for the objective function using the minimum average variance estimation (MAVE) method. Subsequently, we construct a Gaussian process model within this EDR subspace and optimize it using the expected improvement criterion. Our algorithm offers the flexibility to operate these steps either concurrently or in sequence. In the sequential approach, we meticulously balance the exploration-exploitation trade-off by distributing the sampling budget between subspace estimation and function optimization, and the convergence rate of our algorithm in high-dimensional contexts has been established. Numerical experiments validate the efficacy of our method in challenging scenarios., Comment: First draft
Published: 2024

24. Structurally Aware Robust Model Selection for Mixtures

Author: Li, Jiawei and Huggins, Jonathan H.
Subjects: Statistics - Methodology, Mathematics - Statistics Theory
Abstract: Mixture models are often used to identify meaningful subpopulations (i.e., clusters) in observed data such that the subpopulations have a real-world interpretation (e.g., as cell types). However, when used for subpopulation discovery, mixture model inference is usually ill-defined a priori because the assumed observation model is only an approximation to the true data-generating process. Thus, as the number of observations increases, rather than obtaining better inferences, the opposite occurs: the data is explained by adding spurious subpopulations that compensate for the shortcomings of the observation model. However, there are two important sources of prior knowledge that we can exploit to obtain well-defined results no matter the dataset size: known causal structure (e.g., knowing that the latent subpopulations cause the observed signal but not vice-versa) and a rough sense of how wrong the observation model is (e.g., based on small amounts of expert-labeled data or some understanding of the data-generating process). We propose a new model selection criteria that, while model-based, uses this available knowledge to obtain mixture model inferences that are robust to misspecification of the observation model. We provide theoretical support for our approach by proving a first-of-its-kind consistency result under intuitive assumptions. Simulation studies and an application to flow cytometry data demonstrate our model selection criteria consistently finds the correct number of subpopulations.
Published: 2024

25. Whole tumour- and subregion-based radiomics of contrast-enhanced mammography in differentiating HER2 expression status of invasive breast cancers: A double-centre pilot study

Author: Wang, Simin, Wang, Ting, Guo, Sailing, Zhu, Shuangshuang, Chen, Ruchuan, Zheng, Jinlong, Jiang, Tingting, Li, Ruimin, Li, Jinhui, Li, Jiawei, Shen, Xigang, Qian, Min, Yang, Meng, Yu, Shengnan, You, Chao, and Gu, Yajia
Published: 2024
Full Text: View/download PDF

26. Enhanced thermal constant B of diamond films for ultrahigh sensitivity negative temperature coefficient thermistors

Author: Chen, Qiao, Zhao, Yimeng, Li, Jiawei, Liu, Xiyuan, Wang, Xinyue, Zhang, Wenxi, and Zhu, Hongwei
Published: 2024
Full Text: View/download PDF

27. InSAR-derived surface deformation characteristics and mining subsidence parameters in mountain coal mines

Author: Jiang, Xiaowei, Shi, Wenbing, Liang, Feng, Gui, Jingjing, and Li, Jiawei
Published: 2024
Full Text: View/download PDF

28. Giant Anomalous Hall and Nernst Effects in a Heavy Fermion Ferromagnet

Author: Li, Longfei, Guan, Shuyue, Chi, Shengwei, Li, Jiawei, Lin, Xinxuan, Xu, Gang, and Jia, Shuang
Subjects: Condensed Matter - Strongly Correlated Electrons
Abstract: The anomalous Hall and Nernst effects describe the voltage drop perpendicular to an applied current and temperature gradient due to the magnetization of a magnetic material. These effects can be utilized to measure the Berry curvature at the Fermi energy, and have potential applications in future electronic devices and thermoelectric energy conversion. In this paper, we report giant anomalous Hall conductivity and anomalous Nernst coefficient, as high as about 1000 $\Omega^{-1}$ cm$^{-1}$ and 10 $\mu$V K$^{-1}$, respectively, in a heavy fermion ferromagnet, CeCrGe$_3$. This compound uniquely manifests strong hybridization between the 4$f$ and conduction electrons, leading to a Kondo lattice state in the presence of ferromagnetic order. Unlike conventional topological semimetals in which the electron correlation is weak, CeCrGe$_3$ manifests a strong Berry curvature field of the heavy fermion with an extremely low Fermi energy. Our findings pave the way for exploring correlation-driven topological responses in a ferromagnetic Kondo lattice environment., Comment: 22 pages, 5 figures
Published: 2024

29. Few-magnon excitations in a frustrated spin-$S$ ferromagnetic chain with single-ion anisotropy

Author: Li, Jiawei, Cao, Ye, and Wu, Ning
Subjects: Condensed Matter - Strongly Correlated Electrons, Quantum Physics
Abstract: We study few-magnon excitations in a finite-size spin-$S$ chain with ferromagnetic nearest-neighbor (NN) interaction $J>0$ and antiferromagnetic next-nearest-neighbor (NNN) interaction $J'<0$, in the presence of the single-ion (SI) anisotropy $D$. We first reveal the condition for the emergence of zero-excitation-energy states. In the isotropic case with $\Delta=\Delta'=1$ ($\Delta$ and $\Delta'$ are the corresponding anisotropy parameters), a threshold of $J/|J'|$ above which the ground state is ferromagnetic is determined by exact diagonalization for short chains up to $12$ sites. Using a set of exact two-magnon Bloch states, we then map the two-magnon problem to a single-particle one on an effective open chain with both NN and NNN hoppings. The whole two-magnon excitation spectrum is calculated for large systems and the commensurate-incommensurate transition in the lowest-lying mode is found to exhibit different behaviors between $S=1/2$ and higher spins due to the interplay of the SI anisotropy and the NNN interaction. For the commensurate momentum $k=-\pi$, the effective lattice is decoupled into two NN open chains that can be exactly solved via a plane-wave ansatz. Based on this, we analytically identify in the $\Delta'-D/|J'|$ plane the regions supporting the SI or NNN exchange two-magnon bound states near the edge of the band. In particular, we prove that there always exists a lower-lying NN exchange two-magnon bound state near the band edge for arbitrary $S\geq 1/2$. Finally, we numerically calculate the $n$-magnon spectra for $S=1/2$ with $n\leq5$ by using a spin-operator matrix element method. The corresponding $n$-magnon commensurate instability regions are determined for finite chains and consistent results with prior literature are observed., Comment: 18 pages, 11 figures, to appear in Physical Review B
Published: 2024
Full Text: View/download PDF

30. On Pigeonhole Principles and Ramsey in TFNP

Author: Jain, Siddhartha, Li, Jiawei, Robere, Robert, and Xun, Zhiyang
Subjects: Computer Science - Computational Complexity
Abstract: We show that the TFNP problem RAMSEY is not black-box reducible to PIGEON, refuting a conjecture of Goldberg and Papadimitriou in the black-box setting. We prove this by giving reductions to RAMSEY from a new family of TFNP problems that correspond to generalized versions of the pigeonhole principle, and then proving that these generalized versions cannot be reduced to PIGEON. Formally, we define t-PPP as the class of total NP-search problems reducible to finding a t-collision in a mapping from (t-1)N+1 pigeons to N holes. These classes are closely related to multi-collision resistant hash functions in cryptography. We show that the generalized pigeonhole classes form a hierarchy as t increases, and also give a natural condition on the parameters t1, t2 that captures exactly when t1-PPP and t2-PPP collapse in the black-box setting. Finally, we prove other inclusion and separation results between these generalized PIGEON problems and other previously studied TFNP subclasses, such as PLS, PPA and PLC. Our separation results rely on new lower bounds in propositional proof complexity based on pseudoexpectation operators, which may be of independent interest.
Published: 2024

31. High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering

Author: Ming, Xin, Li, Jiawei, Ling, Jingwang, Zhang, Libo, and Xu, Feng
Subjects: Computer Science - Graphics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Readily editable mesh blendshapes have been widely used in animation pipelines, while recent advancements in neural geometry and appearance representations have enabled high-quality inverse rendering. Building upon these observations, we introduce a novel technique that reconstructs mesh-based blendshape rigs from single or sparse multi-view videos, leveraging state-of-the-art neural inverse rendering. We begin by constructing a deformation representation that parameterizes vertex displacements into differential coordinates with tetrahedral connections, allowing for high-quality vertex deformation on high-resolution meshes. By constructing a set of semantic regulations in this representation, we achieve joint optimization of blendshapes and expression coefficients. Furthermore, to enable a user-friendly multi-view setup with unsynchronized cameras, we propose a neural regressor to model time-varying motion parameters. This approach implicitly considers the time difference across multiple cameras, enhancing the accuracy of motion modeling. Experiments demonstrate that, with the flexible input of single or sparse multi-view videos, we reconstruct personalized high-fidelity blendshapes. These blendshapes are both geometrically and semantically accurate, and they are compatible with industrial animation pipelines. Code and data are available at https://github.com/grignarder/high-quality-blendshape-generation.
Published: 2024

32. Bis-cyclometalated Ir(III) complexes with carbazole/triphenylamine donor fragment for oxygen sensing

Author: Yu, Hongcui, Yu, Bo, Song, Yajiao, and Li, Jiawei
Published: 2024
Full Text: View/download PDF

33. DAS coupling noise suppression based on MCA–FK

Author: Xu, Yankai, Zhu, Hongduo, Cao, Siyuan, Chen, Siyuan, Li, Jiawei, and Liu, Hongwei
Published: 2024
Full Text: View/download PDF

34. MFFNet: multimodal feature fusion network for point cloud semantic segmentation

Author: Ren, Dayong, Li, Jiawei, Wu, Zhengyi, Guo, Jie, Wei, Mingqiang, and Guo, Yanwen
Published: 2024
Full Text: View/download PDF

35. Study on the Unconfined Compressive Strength Property and Mechanism of Soda Residue Soil

Author: Zhao, Xiaoqing, Yang, Tianfeng, Yu, Zhilong, Zong, Zhongling, and Li, Jiawei
Published: 2024
Full Text: View/download PDF

36. PSST: A Benchmark for Evaluation-driven Text Public-Speaking Style Transfer

Author: Sun, Huashan, Wu, Yixiao, Ye, Yuhao, Yang, Yizhe, Li, Yinghao, Li, Jiawei, and Gao, Yang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, I.2.7
Abstract: Language style is necessary for AI systems to understand and generate diverse human language accurately. However, previous text style transfer primarily focused on sentence-level data-driven approaches, limiting exploration of potential problems in large language models (LLMs) and the ability to meet complex application needs. To overcome these limitations, we introduce a novel task called Public-Speaking Style Transfer (PSST), which aims to simulate humans to transform passage-level, official texts into a public-speaking style. Grounded in the analysis of real-world data from a linguistic perspective, we decompose public-speaking style into key sub-styles to pose challenges and quantify the style modeling capability of LLMs. For such intricate text style transfer, we further propose a fine-grained evaluation framework to analyze the characteristics and identify the problems of stylized texts. Comprehensive experiments suggest that current LLMs struggle to generate public speaking texts that align with human preferences, primarily due to excessive stylization and loss of semantic information., Comment: EMNLP 2024 Findings
Published: 2023

37. MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

Author: Yang, Yizhe, Sun, Huashan, Li, Jiawei, Liu, Runheng, Li, Yinghao, Liu, Yuhang, Huang, Heyan, and Gao, Yang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models., Comment: Working in progress
Published: 2023

38. Learning Agility and Adaptive Legged Locomotion via Curricular Hindsight Reinforcement Learning

Author: Li, Sicen, Pang, Yiming, Bai, Panju, Liu, Zhaojin, Li, Jiawei, Hu, Shihao, Wang, Liquan, and Wang, Gang
Subjects: Computer Science - Robotics
Abstract: Agile and adaptive maneuvers such as fall recovery, high-speed turning, and sprinting in the wild are challenging for legged systems. We propose a Curricular Hindsight Reinforcement Learning (CHRL) that learns an end-to-end tracking controller that achieves powerful agility and adaptation for the legged robot. The two key components are (I) a novel automatic curriculum strategy on task difficulty and (ii) a Hindsight Experience Replay strategy adapted to legged locomotion tasks. We demonstrated successful agile and adaptive locomotion on a real quadruped robot that performed fall recovery autonomously, coherent trotting, sustained outdoor speeds up to 3.45 m/s, and tuning speeds up to 3.2 rad/s. This system produces adaptive behaviours responding to changing situations and unexpected disturbances on natural terrains like grass and dirt.
Published: 2023

39. Test Smell: A Parasitic Energy Consumer in Software Testing

Author: Misu, Md Rakib Hossain, Li, Jiawei, Bhattiprolu, Adithya, Liu, Yang, Almeida, Eduardo, and Ahmed, Iftekhar
Subjects: Computer Science - Software Engineering
Abstract: Traditionally, energy efficiency research has focused on reducing energy consumption at the hardware level and, more recently, in the design and coding phases of the software development life cycle. However, software testing's impact on energy consumption did not receive attention from the research community. Specifically, how test code design quality and test smell (e.g., sub-optimal design and bad practices in test code) impact energy consumption has not been investigated yet. This study examined 12 Apache projects to analyze the association between test smell and its effects on energy consumption in software testing. We conducted a mixed-method empirical analysis from two dimensions; software (data mining in Apache projects) and developers' views (a survey of 62 software practitioners). Our findings show that: 1) test smell is associated with energy consumption in software testing. Specifically smelly part of a test case consumes 10.92\% more energy compared to the non-smelly part. 2) certain test smells are more energy-hungry than others, 3) refactored test cases tend to consume less energy than their smelly counterparts, and 4) most developers lack knowledge about test smells' impact on energy consumption. We conclude the paper with several observations that can direct future research and developments.
Published: 2023

40. Automated Repair of Declarative Software Specifications in the Era of Large Language Models

Author: Hasan, Md Rashedul, Li, Jiawei, Ahmed, Iftekhar, and Bagheri, Hamid
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The growing adoption of declarative software specification languages, coupled with their inherent difficulty in debugging, has underscored the need for effective and automated repair techniques applicable to such languages. Researchers have recently explored various methods to automatically repair declarative software specifications, such as template-based repair, feedback-driven iterative repair, and bounded exhaustive approaches. The latest developments in large language models provide new opportunities for the automatic repair of declarative specifications. In this study, we assess the effectiveness of utilizing OpenAI's ChatGPT to repair software specifications written in the Alloy declarative language. Unlike imperative languages, specifications in Alloy are not executed but rather translated into logical formulas and evaluated using backend constraint solvers to identify specification instances and counterexamples to assertions. Our evaluation focuses on ChatGPT's ability to improve the correctness and completeness of Alloy declarative specifications through automatic repairs. We analyze the results produced by ChatGPT and compare them with those of leading automatic Alloy repair methods. Our study revealed that while ChatGPT falls short in comparison to existing techniques, it was able to successfully repair bugs that no other technique could address. Our analysis also identified errors in ChatGPT's generated repairs, including improper operator usage, type errors, higher-order logic misuse, and relational arity mismatches. Additionally, we observed instances of hallucinations in ChatGPT-generated repairs and inconsistency in its results. Our study provides valuable insights for software practitioners, researchers, and tool builders considering ChatGPT for declarative specification repairs., Comment: 13 Pages with reference, 4 Tables, 2 Figures, 2 Listings
Published: 2023

41. Adversarial Attacks on Combinatorial Multi-Armed Bandits

Author: Balasubramanian, Rishab, Li, Jiawei, Tadepalli, Prasad, Wang, Huazheng, Wu, Qingyun, and Zhao, Haoyu
Subjects: Computer Science - Machine Learning, Computer Science - Data Structures and Algorithms, Statistics - Machine Learning
Abstract: We study reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB). We first provide a sufficient and necessary condition for the attackability of CMAB, a notion to capture the vulnerability and robustness of CMAB. The attackability condition depends on the intrinsic properties of the corresponding CMAB instance such as the reward distributions of super arms and outcome distributions of base arms. Additionally, we devise an attack algorithm for attackable CMAB instances. Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary. This finding indicates that adversarial attacks on CMAB are difficult in practice and a general attack strategy for any CMAB instance does not exist since the environment is mostly unknown to the adversary. We validate our theoretical findings via extensive experiments on real-world CMAB applications including probabilistic maximum covering problem, online minimum spanning tree, cascading bandits for online ranking, and online shortest path., Comment: 28 pages, Accepted to ICML 2024
Published: 2023

42. Neural2Speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction

Author: Li, Jiawei, Guo, Chunxu, Fu, Li, Fan, Lu, Chang, Edward F., and Li, Yuanning
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Quantitative Biology - Neurons and Cognition
Abstract: Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network (DNN) models trained on extensive neural recording data, which is resource-intensive under regular clinical constraints. However, achieving satisfactory performance in reconstructing speech from limited-scale neural recordings has been challenging, mainly due to the complexity of speech representations and the neural data constraints. To overcome these challenges, we propose a novel transfer learning framework for neural-driven speech reconstruction, called Neural2Speech, which consists of two distinct training phases. First, a speech autoencoder is pre-trained on readily available speech corpora to decode speech waveforms from the encoded speech representations. Second, a lightweight adaptor is trained on the small-scale neural recordings to align the neural activity and the speech representation for decoding. Remarkably, our proposed Neural2Speech demonstrates the feasibility of neural-driven speech reconstruction even with only 20 minutes of intracranial data, which significantly outperforms existing baseline methods in terms of speech fidelity and intelligibility., Comment: To appear in 2024 IEEE International Conference on Acoustics, Speech and Signal Processing
Published: 2023

43. A Novel Approach for Effective Multi-View Clustering with Information-Theoretic Perspective

Author: Cui, Chenhang, Ren, Yazhou, Pu, Jingyu, Li, Jiawei, Pu, Xiaorong, Wu, Tianyi, Shi, Yutao, and He, Lifang
Subjects: Computer Science - Machine Learning
Abstract: Multi-view clustering (MVC) is a popular technique for improving clustering performance using various data sources. However, existing methods primarily focus on acquiring consistent information while often neglecting the issue of redundancy across multiple views. This study presents a new approach called Sufficient Multi-View Clustering (SUMVC) that examines the multi-view clustering framework from an information-theoretic standpoint. Our proposed method consists of two parts. Firstly, we develop a simple and reliable multi-view clustering method SCMVC (simple consistent multi-view clustering) that employs variational analysis to generate consistent information. Secondly, we propose a sufficient representation lower bound to enhance consistent information and minimise unnecessary information among views. The proposed SUMVC method offers a promising solution to the problem of multi-view clustering and provides a new perspective for analyzing multi-view data. To verify the effectiveness of our model, we conducted a theoretical analysis based on the Bayes Error Rate, and experiments on multiple multi-view datasets demonstrate the superior performance of SUMVC.
Published: 2023

44. Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study

Author: Deiner, Michael S, Honcharov, Vlad, Li, Jiawei, Mackey, Tim K, Porco, Travis C, and Sarkar, Urmimala
Subjects: Health Services and Systems, Health Sciences, Social Media, Humans, Natural Language Processing, generative large language model, generative pretrained transformer, GPT, Claude, Twitter, X formerly known as Twitter, social media, inductive content analysis, COVID-19, vaccine hesitancy, infodemiology, GPT, generative pretrained transformer, Health services and systems
Abstract: BackgroundManually analyzing public health-related content from social media provides valuable insights into the beliefs, attitudes, and behaviors of individuals, shedding light on trends and patterns that can inform public understanding, policy decisions, targeted interventions, and communication strategies. Unfortunately, the time and effort needed from well-trained human subject matter experts makes extensive manual social media listening unfeasible. Generative large language models (LLMs) can potentially summarize and interpret large amounts of text, but it is unclear to what extent LLMs can glean subtle health-related meanings in large sets of social media posts and reasonably report health-related themes.ObjectiveWe aimed to assess the feasibility of using LLMs for topic model selection or inductive thematic analysis of large contents of social media posts by attempting to answer the following question: Can LLMs conduct topic model selection and inductive thematic analysis as effectively as humans did in a prior manual study, or at least reasonably, as judged by subject matter experts?MethodsWe asked the same research question and used the same set of social media content for both the LLM selection of relevant topics and the LLM analysis of themes as was conducted manually in a published study about vaccine rhetoric. We used the results from that study as background for this LLM experiment by comparing the results from the prior manual human analyses with the analyses from 3 LLMs: GPT4-32K, Claude-instant-100K, and Claude-2-100K. We also assessed if multiple LLMs had equivalent ability and assessed the consistency of repeated analysis from each LLM.ResultsThe LLMs generally gave high rankings to the topics chosen previously by humans as most relevant. We reject a null hypothesis (P
Published: 2024

45. Building domain lexicon oriented to behavioral features in depression

Author: ZHOU Ruotong, ZHU Guangli, LI Shuyu, DUAN Wenjie, and LI Jiawei
Subjects: depression, domain lexicon, behavioral feature, WoBERT, label propagation algorithm, Electronic computers. Computer science, QA75.5-76.95
Abstract: Behavioral representations of the patients with depression reflect the clinical features and condition of the patients, therefore it is beneficial for disease diagnosis. However, in the construction of current depression lexicon, the correlation between the behavioral features and the condition of patients in depression texts is overlooked, resulting in incompleteness of the lexicon information. To address this problem, a domain lexicon construction, oriented to behavioral features in depression. was proposed which aimed to extend the domain lexicon's coverage of emotional expressions. Firstly, the seed word sets of sentiment and behavior were constructed by the TF-IDF algorithm respectively, the word set of sentiment was obtained by calculating PMI similarity between the seed word set of sentiment and the existing sentiment lexicon Secondly, the seed words of behavioral were labeled based on correspondence between behavioral features and the condition of patients, and further inputted into WoBERT with depression texts to separately generate dynamic word vectors. In addition, the candidate word set was acquired by calculating the similarity between the seed word set of behavioral and depression texts In addition,based on the similarity between words, the semantic graph was constructed to obtain the word set of behavioral features by label propagation algorithm. Finally, the emoticons with negative emotions on Weibo were collected to build the word set of emoticons. The word set of sentiment, the word set of behavioral features and the word set of emoticons were integrated into the Chinese Depression Domain Lexicon. Experimental results show that the constructed lexicon can improve the effect of depression text classification.
Published: 2024
Full Text: View/download PDF

46. Persistence of Monoclinic Crystal Structure in Three-Dimensional Second-Order Topological Insulator Candidate 1T'-MoTe2 Thin Flake without Structural Phase transition

Author: Su, Bo, Huang, Yuan, Hou, Yan Hui, Li, Jiawei, Yang, Rong, Ma, Yongchang, Yang, Yang, Zhang, Guangyu, Zhou, Xingjiang, Luo, Jianlin, and Chen, Zhi-Guo
Subjects: Condensed Matter - Materials Science, Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: A van der Waals material, MoTe2 with a monoclinic 1T' crystal structure is a candidate for three-dimensional (3D) second-order topological insulators (SOTIs) hosting gapless hinge states and insulating surface states. However, due to the temperature-induced structural phase transition, the monoclinic 1T' structure of MoTe2 would be transformed into the orthorhombic Td structure as the temperature is lowered, which hinders the experimental verification and the electronic applications of the predicted SOTI state at low temperatures. Here, we present systematic Raman spectroscopy studies of the exfoliated MoTe2 thin flakes with variable thicknesses at different temperatures. As a spectroscopic signature of the orthorhombic Td structure of MoTe2, the out-of-plane vibration mode D at ~ 125 cm-1 is always visible below a certain temperature in the multilayer flakes thicker than ~ 27.7 nm, but vanishes in the temperature range from 80 K to 320 K when the flake thickness becomes lower than ~ 19.5 nm. The absence of the out-of-plane vibration mode D in the Raman spectra here demonstrates not only the disappearance of the monoclinic-to-orthorhombic phase transition but also the persistence of the monoclinic 1T' structure in the MoTe2 thin flakes thinner than ~ 19.5 nm at low temperatures down to 80 K, which may be caused by the high enough density of the holes introduced during the gold-enhanced exfoliation process and exposure to air. The MoTe2 thin flakes with the low-temperature monoclinic 1T' structure provide a material platform for realizing SOTI states in van der Waals materials at low temperatures, which paves the way for developing a new generation of electronic devices based on SOTIs., Comment: 20 pages, 5 figures
Published: 2023
Full Text: View/download PDF

47. ALIP: Adaptive Language-Image Pre-training with Synthetic Caption

Author: Yang, Kaicheng, Deng, Jiankang, An, Xiang, Li, Jiawei, Feng, Ziyong, Guo, Jia, Yang, Jing, and Liu, Tongliang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Contrastive Language-Image Pre-training (CLIP) has significantly boosted the performance of various vision-language tasks by scaling up the dataset with image-text pairs collected from the web. However, the presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning. To address this issue, we first utilize the OFA model to generate synthetic captions that focus on the image content. The generated captions contain complementary information that is beneficial for pre-training. Then, we propose an Adaptive Language-Image Pre-training (ALIP), a bi-path model that integrates supervision from both raw text and synthetic caption. As the core components of ALIP, the Language Consistency Gate (LCG) and Description Consistency Gate (DCG) dynamically adjust the weights of samples and image-text/caption pairs during the training process. Meanwhile, the adaptive contrastive loss can effectively reduce the impact of noise data and enhances the efficiency of pre-training data. We validate ALIP with experiments on different scales of models and pre-training datasets. Experiments results show that ALIP achieves state-of-the-art performance on multiple downstream tasks including zero-shot image-text retrieval and linear probe. To facilitate future research, the code and pre-trained models are released at https://github.com/deepglint/ALIP., Comment: 15pages, 10figures, ICCV2023
Published: 2023

48. Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey

Author: Wang, Liping, Li, Jiawei, Zhao, Lifan, Kou, Zhizhuo, Wang, Xiaohan, Zhu, Xinyi, Wang, Hao, Shen, Yanyan, and Chen, Lei
Subjects: Quantitative Finance - Statistical Finance, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Predicting stock prices presents a challenging research problem due to the inherent volatility and non-linear nature of the stock market. In recent years, knowledge-enhanced stock price prediction methods have shown groundbreaking results by utilizing external knowledge to understand the stock market. Despite the importance of these methods, there is a scarcity of scholarly works that systematically synthesize previous studies from the perspective of external knowledge types. Specifically, the external knowledge can be modeled in different data structures, which we group into non-graph-based formats and graph-based formats: 1) non-graph-based knowledge captures contextual information and multimedia descriptions specifically associated with an individual stock; 2) graph-based knowledge captures interconnected and interdependent information in the stock market. This survey paper aims to provide a systematic and comprehensive description of methods for acquiring external knowledge from various unstructured data sources and then incorporating it into stock price prediction models. We also explore fusion methods for combining external knowledge with historical price features. Moreover, this paper includes a compilation of relevant datasets and delves into potential future research directions in this domain.
Published: 2023

49. Learning a Graph Neural Network with Cross Modality Interaction for Image Fusion

Author: Li, Jiawei, Chen, Jiansheng, Liu, Jinyuan, and Ma, Huimin
Subjects: Computer Science - Computer Vision and Pattern Recognition, I.4, I.2
Abstract: Infrared and visible image fusion has gradually proved to be a vital fork in the field of multi-modality imaging technologies. In recent developments, researchers not only focus on the quality of fused images but also evaluate their performance in downstream tasks. Nevertheless, the majority of methods seldom put their eyes on the mutual learning from different modalities, resulting in fused images lacking significant details and textures. To overcome this issue, we propose an interactive graph neural network (GNN)-based architecture between cross modality for fusion, called IGNet. Specifically, we first apply a multi-scale extractor to achieve shallow features, which are employed as the necessary input to build graph structures. Then, the graph interaction module can construct the extracted intermediate features of the infrared/visible branch into graph structures. Meanwhile, the graph structures of two branches interact for cross-modality and semantic learning, so that fused images can maintain the important feature expressions and enhance the performance of downstream tasks. Besides, the proposed leader nodes can improve information propagation in the same modality. Finally, we merge all graph features to get the fusion result. Extensive experiments on different datasets (TNO, MFNet and M3FD) demonstrate that our IGNet can generate visually appealing fused images while scoring averagely 2.59% mAP@.5 and 7.77% mIoU higher in detection and segmentation than the compared state-of-the-art methods. The source code of the proposed IGNet can be available at https://github.com/lok-18/IGNet., Comment: 9 pages, 10 figures, ACM MM 2023
Published: 2023

50. Complexity Matters: Rethinking the Latent Space for Generative Modeling

Author: Hu, Tianyang, Chen, Fei, Wang, Haonan, Li, Jiawei, Wang, Wenjia, Sun, Jiacheng, and Li, Zhenguo
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion models the latent space induced by an encoder and generates images through a paired decoder. Although the selection of the latent space is empirically pivotal, determining the optimal choice and the process of identifying it remain unclear. In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity. Our investigation starts with the classic generative adversarial networks (GANs). Inspired by the GAN training objective, we propose a novel "distance" between the latent and data distributions, whose minimization coincides with that of the generator complexity. The minimizer of this distance is characterized as the optimal data-dependent latent that most effectively capitalizes on the generator's capacity. Then, we consider parameterizing such a latent distribution by an encoder network and propose a two-stage training strategy called Decoupled Autoencoder (DAE), where the encoder is only updated in the first stage with an auxiliary decoder and then frozen in the second stage while the actual decoder is being trained. DAE can improve the latent distribution and as a result, improve the generative performance. Our theoretical analyses are corroborated by comprehensive experiments on various models such as VQGAN and Diffusion Transformer, where our modifications yield significant improvements in sample quality with decreased model complexity., Comment: Accepted to NeurIPS 2023 (Spotlight)
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

3,074 results on '"Li, Jiawei"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources