18,600 results on '"Jiang, Bo"'
Search Results
2. AlterMOMA: Fusion Redundancy Pruning for Camera-LiDAR Fusion Models with Alternative Modality Masking
- Author
-
Sun, Shiqi, Lu, Yantao, Liu, Ning, Jiang, Bo, Chen, JinChao, and Zhang, Ying
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Camera-LiDAR fusion models significantly enhance perception performance in autonomous driving. The fusion mechanism leverages the strengths of each modality while minimizing their weaknesses. Moreover, in practice, camera-LiDAR fusion models utilize pre-trained backbones for efficient training. However, we argue that directly loading single-modal pre-trained camera and LiDAR backbones into camera-LiDAR fusion models introduces similar feature redundancy across modalities due to the nature of the fusion mechanism. Unfortunately, existing pruning methods are developed explicitly for single-modal models, and thus, they struggle to effectively identify these specific redundant parameters in camera-LiDAR fusion models. In this paper, to address the issue above on camera-LiDAR fusion models, we propose a novelty pruning framework Alternative Modality Masking Pruning (AlterMOMA), which employs alternative masking on each modality and identifies the redundant parameters. Specifically, when one modality parameters are masked (deactivated), the absence of features from the masked backbone compels the model to reactivate previous redundant features of the other modality backbone. Therefore, these redundant features and relevant redundant parameters can be identified via the reactivation process. The redundant parameters can be pruned by our proposed importance score evaluation function, Alternative Evaluation (AlterEva), which is based on the observation of the loss changes when certain modality parameters are activated and deactivated. Extensive experiments on the nuScene and KITTI datasets encompassing diverse tasks, baseline models, and pruning algorithms showcase that AlterMOMA outperforms existing pruning methods, attaining state-of-the-art performance., Comment: 17 pages, 3 figures, Accepted by NeurIPS 2024
- Published
- 2024
3. CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
- Author
-
Liu, Wentao, Pan, Qianjun, Zhang, Yi, Liu, Zhuo, Wu, Ji, Zhou, Jie, Zhou, Aimin, Chen, Qin, Jiang, Bo, and He, Liang
- Subjects
Computer Science - Computation and Language - Abstract
Large language models (LLMs) have obtained promising results in mathematical reasoning, which is a foundational skill for human intelligence. Most previous studies focus on improving and measuring the performance of LLMs based on textual math reasoning datasets (e.g., MATH, GSM8K). Recently, a few researchers have released English multimodal math datasets (e.g., MATHVISTA and MATH-V) to evaluate the effectiveness of large multimodal models (LMMs). In this paper, we release a Chinese multimodal math (CMM-Math) dataset, including benchmark and training parts, to evaluate and enhance the mathematical reasoning of LMMs. CMM-Math contains over 28,000 high-quality samples, featuring a variety of problem types (e.g., multiple-choice, fill-in-the-blank, and so on) with detailed solutions across 12 grade levels from elementary to high school in China. Specifically, the visual context may be present in the questions or opinions, which makes this dataset more challenging. Through comprehensive analysis, we discover that state-of-the-art LMMs on the CMM-Math dataset face challenges, emphasizing the necessity for further improvements in LMM development. We also propose a Multimodal Mathematical LMM (Math-LMM) to handle the problems with mixed input of multiple images and text segments. We train our model using three stages, including foundational pre-training, foundational fine-tuning, and mathematical fine-tuning. The extensive experiments indicate that our model effectively improves math reasoning performance by comparing it with the SOTA LMMs over three multimodal mathematical datasets.
- Published
- 2024
4. Solving Integrated Process Planning and Scheduling Problem via Graph Neural Network Based Deep Reinforcement Learning
- Author
-
Li, Hongpei, Zhang, Han, He, Ziyan, Jia, Yunkai, Jiang, Bo, Huang, Xiang, and Ge, Dongdong
- Subjects
Mathematics - Optimization and Control ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
The Integrated Process Planning and Scheduling (IPPS) problem combines process route planning and shop scheduling to achieve high efficiency in manufacturing and maximize resource utilization, which is crucial for modern manufacturing systems. Traditional methods using Mixed Integer Linear Programming (MILP) and heuristic algorithms can not well balance solution quality and speed when solving IPPS. In this paper, we propose a novel end-to-end Deep Reinforcement Learning (DRL) method. We model the IPPS problem as a Markov Decision Process (MDP) and employ a Heterogeneous Graph Neural Network (GNN) to capture the complex relationships among operations, machines, and jobs. To optimize the scheduling strategy, we use Proximal Policy Optimization (PPO). Experimental results show that, compared to traditional methods, our approach significantly improves solution efficiency and quality in large-scale IPPS instances, providing superior scheduling strategies for modern intelligent manufacturing systems., Comment: 24 pages, 13 figures
- Published
- 2024
5. Broad-line Region of the Quasar PG 2130+099. II. Doubling the Size Over Four Years?
- Author
-
Yao, Zhu-Heng, Yang, Sen, Guo, Wei-Jian, Chen, Yong-Jie, Songsheng, Yu-Yang, Bao, Dong-Wei, Jiang, Bo-Wei, Wang, Yi-Lin, Zhang, Hao, Hu, Chen, Li, Yan-Rong, Du, Pu, Xiao, Ming, Bai, Jin-Ming, Ho, Luis C., Brotherton, Michael S., Aceituno, Jesús, Winkler, Hartmut, and Wang, Jian-Min
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
Over the past three decades, multiple reverberation mapping (RM) campaigns conducted for the quasar PG 2130+099 have exhibited inconsistent findings with time delays ranging from $\sim$10 to $\sim$200 days. To achieve a comprehensive understanding of the geometry and dynamics of the broad-line region (BLR) in PG 2130+099, we continued an ongoing high-cadence RM monitoring campaign using the Calar Alto Observatory 2.2m optical telescope for an extra four years from 2019 to 2022. We measured the time lags of several broad emission lines (including He II, He I, H$\beta$, and Fe II) with respect to the 5100 {\AA} continuum, and their time lags continuously vary through the years. Especially, the H$\beta$ time lags exhibited approximately a factor of two increase in the last two years. Additionally, the velocity-resolved time delays of the broad H$\beta$ emission line reveal a back-and-forth change between signs of virial motion and inflow in the BLR. The combination of negligible ($\sim$10%) continuum change and substantial time-lag variation (over two times) results in significant scatter in the intrinsic $R_{\rm H\beta}-L_{\rm 5100}$ relationship for PG 2130+099. Taking into account the consistent changes in the continuum variability time scale and the size of the BLR, we tentatively propose that the changes in the measurement of the BLR size may be affected by 'geometric dilution'., Comment: 21 pages, 13 figures, 7 tables; accepted for publication in ApJ
- Published
- 2024
6. FG-SAT: Efficient Flow Graph for Encrypted Traffic Classification under Environment Shifts
- Author
-
Cui, Susu, Han, Xueying, Han, Dongqi, Wang, Zhiliang, Wang, Weihang, Li, Yun, Jiang, Bo, Liu, Baoxu, and Lu, Zhigang
- Subjects
Computer Science - Cryptography and Security - Abstract
Encrypted traffic classification plays a critical role in network security and management. Currently, mining deep patterns from side-channel contents and plaintext fields through neural networks is a major solution. However, existing methods have two major limitations: (1) They fail to recognize the critical link between transport layer mechanisms and applications, missing the opportunity to learn internal structure features for accurate traffic classification. (2) They assume network traffic in an unrealistically stable and singular environment, making it difficult to effectively classify real-world traffic under environment shifts. In this paper, we propose FG-SAT, the first end-to-end method for encrypted traffic analysis under environment shifts. We propose a key abstraction, the Flow Graph, to represent flow internal relationship structures and rich node attributes, which enables robust and generalized representation. Additionally, to address the problem of inconsistent data distribution under environment shifts, we introduce a novel feature selection algorithm based on Jensen-Shannon divergence (JSD) to select robust node attributes. Finally, we design a classifier, GraphSAT, which integrates GraphSAGE and GAT to deeply learn Flow Graph features, enabling accurate encrypted traffic identification. FG-SAT exhibits both efficient and robust classification performance under environment shifts and outperforms state-of-the-art methods in encrypted attack detection and application classification., Comment: Ready to submit to IEEE Transactions on Information Forensics and Security (TIFS)
- Published
- 2024
7. Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm
- Author
-
Wang, Xiao, Rong, Yao, Wang, Fuling, Li, Jianing, Zhu, Lin, Jiang, Bo, and Wang, Yaowei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Neural and Evolutionary Computing - Abstract
Sign Language Translation (SLT) is a core task in the field of AI-assisted disability. Unlike traditional SLT based on visible light videos, which is easily affected by factors such as lighting, rapid hand movements, and privacy breaches, this paper proposes the use of high-definition Event streams for SLT, effectively mitigating the aforementioned issues. This is primarily because Event streams have a high dynamic range and dense temporal signals, which can withstand low illumination and motion blur well. Additionally, due to their sparsity in space, they effectively protect the privacy of the target person. More specifically, we propose a new high-resolution Event stream sign language dataset, termed Event-CSL, which effectively fills the data gap in this area of research. It contains 14,827 videos, 14,821 glosses, and 2,544 Chinese words in the text vocabulary. These samples are collected in a variety of indoor and outdoor scenes, encompassing multiple angles, light intensities, and camera movements. We have benchmarked existing mainstream SLT works to enable fair comparison for future efforts. Based on this dataset and several other large-scale datasets, we propose a novel baseline method that fully leverages the Mamba model's ability to integrate temporal information of CNN features, resulting in improved sign language translation outcomes. Both the benchmark dataset and source code will be released on https://github.com/Event-AHU/OpenESL, Comment: First Large-scale and High-Definition Benchmark Dataset for Event-based Sign Language Translation
- Published
- 2024
8. MambaEVT: Event Stream based Visual Object Tracking using State Space Model
- Author
-
Wang, Xiao, wang, Chao, Wang, Shiao, Wang, Xixi, Zhao, Zhicheng, Zhu, Lin, and Jiang, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Event camera-based visual tracking has drawn more and more attention in recent years due to the unique imaging principle and advantages of low energy consumption, high dynamic range, and dense temporal resolution. Current event-based tracking algorithms are gradually hitting their performance bottlenecks, due to the utilization of vision Transformer and the static template for target object localization. In this paper, we propose a novel Mamba-based visual tracking framework that adopts the state space model with linear complexity as a backbone network. The search regions and target template are fed into the vision Mamba network for simultaneous feature extraction and interaction. The output tokens of search regions will be fed into the tracking head for target localization. More importantly, we consider introducing a dynamic template update strategy into the tracking framework using the Memory Mamba network. By considering the diversity of samples in the target template library and making appropriate adjustments to the template memory module, a more effective dynamic template can be integrated. The effective combination of dynamic and static templates allows our Mamba-based tracking algorithm to achieve a good balance between accuracy and computational cost on multiple large-scale datasets, including EventVOT, VisEvent, and FE240hz. The source code will be released on https://github.com/Event-AHU/MambaEVT, Comment: In Peer Review
- Published
- 2024
9. Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms
- Author
-
Wang, Xiao, Wang, Shiao, Shao, Pengpeng, Jiang, Bo, Zhu, Lin, and Tian, Yonghong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Neural and Evolutionary Computing - Abstract
Human Action Recognition (HAR) stands as a pivotal research domain in both computer vision and artificial intelligence, with RGB cameras dominating as the preferred tool for investigation and innovation in this field. However, in real-world applications, RGB cameras encounter numerous challenges, including light conditions, fast motion, and privacy concerns. Consequently, bio-inspired event cameras have garnered increasing attention due to their advantages of low energy consumption, high dynamic range, etc. Nevertheless, most existing event-based HAR datasets are low resolution ($346 \times 260$). In this paper, we propose a large-scale, high-definition ($1280 \times 800$) human action recognition dataset based on the CeleX-V event camera, termed CeleX-HAR. It encompasses 150 commonly occurring action categories, comprising a total of 124,625 video sequences. Various factors such as multi-view, illumination, action speed, and occlusion are considered when recording these data. To build a more comprehensive benchmark dataset, we report over 20 mainstream HAR models for future works to compare. In addition, we also propose a novel Mamba vision backbone network for event stream based HAR, termed EVMamba, which equips the spatial plane multi-directional scanning and novel voxel temporal scanning mechanism. By encoding and mining the spatio-temporal information of event streams, our EVMamba has achieved favorable results across multiple datasets. Both the dataset and source code will be released on \url{https://github.com/Event-AHU/CeleX-HAR}, Comment: In Peer Review
- Published
- 2024
10. R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation
- Author
-
Wang, Xiao, Li, Yuehang, Wang, Fuling, Wang, Shiao, Li, Chuanfu, and Jiang, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Inspired by the tremendous success of Large Language Models (LLMs), existing X-ray medical report generation methods attempt to leverage large models to achieve better performance. They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation. How to extract more effective information for the LLMs to help them improve final results is an urgent problem that needs to be solved. Additionally, the use of visual Transformer models also brings high computational complexity. To address these issues, this paper proposes a novel context-guided efficient X-ray medical report generation framework. Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model. More importantly, we perform context retrieval from the training set for samples within each mini-batch during the training phase, utilizing both positively and negatively related samples to enhance feature representation and discriminative learning. Subsequently, we feed the vision tokens, context information, and prompt statements to invoke the LLM for generating high-quality medical reports. Extensive experiments on three X-ray report generation datasets (i.e., IU-Xray, MIMIC-CXR, CheXpert Plus) fully validated the effectiveness of our proposed model. The source code of this work will be released on \url{https://github.com/Event-AHU/Medical_Image_Analysis}., Comment: In Peer Review
- Published
- 2024
11. Treat Stillness with Movement: Remote Sensing Change Detection via Coarse-grained Temporal Foregrounds Mining
- Author
-
Wang, Xixi, Wang, Zitian, Jiang, Jingtao, Chen, Lan, Wang, Xiao, and Jiang, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Current works focus on addressing the remote sensing change detection task using bi-temporal images. Although good performance can be achieved, however, seldom of they consider the motion cues which may also be vital. In this work, we revisit the widely adopted bi-temporal images-based framework and propose a novel Coarse-grained Temporal Mining Augmented (CTMA) framework. To be specific, given the bi-temporal images, we first transform them into a video using interpolation operations. Then, a set of temporal encoders is adopted to extract the motion features from the obtained video for coarse-grained changed region prediction. Subsequently, we design a novel Coarse-grained Foregrounds Augmented Spatial Encoder module to integrate both global and local information. We also introduce a motion augmented strategy that leverages motion cues as an additional output to aggregate with the spatial features for improved results. Meanwhile, we feed the input image pairs into the ResNet to get the different features and also the spatial blocks for fine-grained feature learning. More importantly, we propose a mask augmented strategy that utilizes coarse-grained changed regions, incorporating them into the decoder blocks to enhance the final changed prediction. Extensive experiments conducted on multiple benchmark datasets fully validated the effectiveness of our proposed framework for remote sensing image change detection. The source code of this paper will be released on https://github.com/Event-AHU/CTM_Remote_Sensing_Change_Detection, Comment: In Peer Review
- Published
- 2024
12. AC thermal conductivity as a tool for solution mapping from diffusive to ballistic regime
- Author
-
Li, Tao, Jiang, Bo, and Chen, Zhen
- Subjects
Condensed Matter - Materials Science ,Physics - Applied Physics - Abstract
Although the Boltzmann transport equation (BTE) has been exploited to investigate non-diffusive phonon transport for decades, due to the challenges of solving this integro-differential equation, most standard techniques for thermal measurements still rely on solutions to the diffusion equation, causing inconsistency between measured non-diffusive effects and the diffusion equation based techniques. With the AC thermal conductivity, an analogous concept of the AC electrical conductivity in solid state physics, we transform BTE under the relaxation time approximation into the form of the diffusion equation. This transformation maps any analytical solution of the diffusion equation under periodic heating to that of the BTE, with the nonlocal effect captured by the jump boundary condition. After investigating the validity of this framework, we apply it to generalize the 3{\omega} method from diffusive to quasi-ballistic, and propose an experimental scheme to address the inconsistency problem above.
- Published
- 2024
13. RepoMasterEval: Evaluating Code Completion via Real-World Repositories
- Author
-
Wu, Qinyun, Peng, Chao, Gao, Pengfei, Hu, Ruida, Gan, Haoyu, Jiang, Bo, Tang, Jinhe, Deng, Zhiwen, Guan, Zhanming, Gao, Cuiyun, Liu, Xia, and Yang, Ping
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence - Abstract
With the growing reliance on automated code completion tools in software development, the need for robust evaluation benchmarks has become critical. However, existing benchmarks focus more on code generation tasks in function and class level and provide rich text description to prompt the model. By contrast, such descriptive prompt is commonly unavailable in real development and code completion can occur in wider range of situations such as in the middle of a function or a code block. These limitations makes the evaluation poorly align with the practical scenarios of code completion tools. In this paper, we propose RepoMasterEval, a novel benchmark for evaluating code completion models constructed from real-world Python and TypeScript repositories. Each benchmark datum is generated by masking a code snippet (ground truth) from one source code file with existing test suites. To improve test accuracy of model generated code, we employ mutation testing to measure the effectiveness of the test cases and we manually crafted new test cases for those test suites with low mutation score. Our empirical evaluation on 6 state-of-the-art models shows that test argumentation is critical in improving the accuracy of the benchmark and RepoMasterEval is able to report difference in model performance in real-world scenarios. The deployment of RepoMasterEval in a collaborated company for one month also revealed that the benchmark is useful to give accurate feedback during model training and the score is in high correlation with the model's performance in practice. Based on our findings, we call for the software engineering community to build more LLM benchmarks tailored for code generation tools taking the practical and complex development environment into consideration.
- Published
- 2024
14. CoEdPilot: Recommending Code Edits with Learned Prior Edit Relevance, Project-wise Awareness, and Interactive Nature
- Author
-
Liu, Chenyan, Cai, Yufan, Lin, Yun, Huang, Yuhuan, Pei, Yunrui, Jiang, Bo, Yang, Ping, Dong, Jin Song, and Mei, Hong
- Subjects
Computer Science - Software Engineering - Abstract
Recent years have seen the development of LLM-based code generation. Compared to generating code in a software project, incremental code edits are empirically observed to be more frequent. The emerging code editing approaches usually formulate the problem as generating an edit based on known relevant prior edits and context. However, practical code edits can be more complicated. First, an editing session can include multiple (ir)relevant edits to the code under edit. Second, the inference of the subsequent edits is non-trivial as the scope of its ripple effect can be the whole project. In this work, we propose CoEdPilot, an LLM-driven solution to recommend code edits by discriminating the relevant edits, exploring their interactive natures, and estimating its ripple effect in the project. Specifically, CoEdPilot orchestrates multiple neural transformers to identify what and how to edit in the project regarding both edit location and edit content. When a user accomplishes an edit with an optional editing description, a Subsequent Edit Analysis first reports the most relevant files in the project with what types of edits (e.g., keep, insert, and replace) can happen for each line of their code. Next, an Edit-content Generator generates concrete edit options for the lines of code, regarding its relevant prior changes reported by an Edit-dependency Analyzer. Lastly, both the Subsequent Edit Analysis and the Edit-content Generator capture relevant prior edits as feedback to readjust their recommendations. We train our models by collecting over 180K commits from 471 open-source projects in 5 programming languages. Our extensive experiments show that CoEdPilot can well predict the edits (i.e., predicting edit location with an accuracy of 70.8%-85.3%, and the edit content with an exact match rate of 41.8% and BLEU4 score of 60.7)..., Comment: 13 pages, 7 figures
- Published
- 2024
- Full Text
- View/download PDF
15. Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching
- Author
-
Ding, Yuyang, Hu, Hanglei, Zhou, Jie, Chen, Qin, Jiang, Bo, and He, Liang
- Subjects
Computer Science - Computation and Language - Abstract
With the introduction of large language models (LLMs), automatic math reasoning has seen tremendous success. However, current methods primarily focus on providing solutions or using techniques like Chain-of-Thought to enhance problem-solving accuracy. In this paper, we focus on improving the capability of mathematics teaching via a Socratic teaching-based LLM (\texttt{SocraticLLM}), which guides learners toward profound thinking with clarity and self-discovery via conversation. We collect and release a high-quality mathematical teaching dataset, named \texttt{SocraticMATH}, which provides Socratic-style conversations of problems with extra knowledge. Also, we propose a knowledge-enhanced LLM as a strong baseline to generate reliable responses with review, guidance/heuristic, rectification, and summarization. Experimental results show the great advantages of \texttt{SocraticLLM} by comparing it with several strong generative models. The codes and datasets are available on \url{https://github.com/ECNU-ICALK/SocraticMath}., Comment: Accepted By CIKM 2024
- Published
- 2024
16. Application of Sunflower Seed Oil in Preparation of Novel Low-Fat Kefir
- Author
-
Meriem, Bensmira and Jiang, Bo
- Published
- 2021
17. Enhancing Explainability of Knowledge Learning Paths: Causal Knowledge Networks
- Author
-
Wei, Yuang, Zhou, Yizhou, Jiang, Yuan-Hao, and Jiang, Bo
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Social and Information Networks - Abstract
A reliable knowledge structure is a prerequisite for building effective adaptive learning systems and intelligent tutoring systems. Pursuing an explainable and trustworthy knowledge structure, we propose a method for constructing causal knowledge networks. This approach leverages Bayesian networks as a foundation and incorporates causal relationship analysis to derive a causal network. Additionally, we introduce a dependable knowledge-learning path recommendation technique built upon this framework, improving teaching and learning quality while maintaining transparency in the decision-making process., Comment: 8 pages, 3 figures, Educational Data Mining 2024, Human-Centric eXplainable AI in Education
- Published
- 2024
18. Expansive Synthesis: Generating Large-Scale Datasets from Minimal Samples
- Author
-
Jebraeeli, Vahid, Jiang, Bo, Krim, Hamid, and Cansever, Derya
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
The challenge of limited availability of data for training in machine learning arises in many applications and the impact on performance and generalization is serious. Traditional data augmentation methods aim to enhance training with a moderately sufficient data set. Generative models like Generative Adversarial Networks (GANs) often face problematic convergence when generating significant and diverse data samples. Diffusion models, though effective, still struggle with high computational cost and long training times. This paper introduces an innovative Expansive Synthesis model that generates large-scale, high-fidelity datasets from minimal samples. The proposed approach exploits expander graph mappings and feature interpolation to synthesize expanded datasets while preserving the intrinsic data distribution and feature structural relationships. The rationale of the model is rooted in the non-linear property of neural networks' latent space and in its capture by a Koopman operator to yield a linear space of features to facilitate the construction of larger and enriched consistent datasets starting with a much smaller dataset. This process is optimized by an autoencoder architecture enhanced with self-attention layers and further refined for distributional consistency by optimal transport. We validate our Expansive Synthesis by training classifiers on the generated datasets and comparing their performance to classifiers trained on larger, original datasets. Experimental results demonstrate that classifiers trained on synthesized data achieve performance metrics on par with those trained on full-scale datasets, showcasing the model's potential to effectively augment training data. This work represents a significant advancement in data generation, offering a robust solution to data scarcity and paving the way for enhanced data availability in machine learning applications., Comment: 14 pages. arXiv admin note: text overlap with arXiv:2405.13866
- Published
- 2024
19. Graph Edge Representation via Tensor Product Graph Convolutional Representation
- Author
-
Jiang, Bo, Ge, Sheng, Zhang, Ziyan, Wang, Beibei, Tang, Jin, and Luo, Bin
- Subjects
Computer Science - Machine Learning - Abstract
Graph Convolutional Networks (GCNs) have been widely studied. The core of GCNs is the definition of convolution operators on graphs. However, existing Graph Convolution (GC) operators are mainly defined on adjacency matrix and node features and generally focus on obtaining effective node embeddings which cannot be utilized to address the graphs with (high-dimensional) edge features. To address this problem, by leveraging tensor contraction representation and tensor product graph diffusion theories, this paper analogously defines an effective convolution operator on graphs with edge features which is named as Tensor Product Graph Convolution (TPGC). The proposed TPGC aims to obtain effective edge embeddings. It provides a complementary model to traditional graph convolutions (GCs) to address the more general graph data analysis with both node and edge features. Experimental results on several graph learning tasks demonstrate the effectiveness of the proposed TPGC.
- Published
- 2024
20. A Unified Graph Selective Prompt Learning for Graph Neural Networks
- Author
-
Jiang, Bo, Wu, Hao, Zhang, Ziyan, Wang, Beibei, and Tang, Jin
- Subjects
Computer Science - Machine Learning ,Computer Science - Social and Information Networks - Abstract
In recent years, graph prompt learning/tuning has garnered increasing attention in adapting pre-trained models for graph representation learning. As a kind of universal graph prompt learning method, Graph Prompt Feature (GPF) has achieved remarkable success in adapting pre-trained models for Graph Neural Networks (GNNs). By fixing the parameters of a pre-trained GNN model, the aim of GPF is to modify the input graph data by adding some (learnable) prompt vectors into graph node features to better align with the downstream tasks on the smaller dataset. However, existing GPFs generally suffer from two main limitations. First, GPFs generally focus on node prompt learning which ignore the prompting for graph edges. Second, existing GPFs generally conduct the prompt learning on all nodes equally which fails to capture the importances of different nodes and may perform sensitively w.r.t noisy nodes in aligning with the downstream tasks. To address these issues, in this paper, we propose a new unified Graph Selective Prompt Feature learning (GSPF) for GNN fine-tuning. The proposed GSPF integrates the prompt learning on both graph node and edge together, which thus provides a unified prompt model for the graph data. Moreover, it conducts prompt learning selectively on nodes and edges by concentrating on the important nodes and edges for prompting which thus make our model be more reliable and compact. Experimental results on many benchmark datasets demonstrate the effectiveness and advantages of the proposed GSPF method.
- Published
- 2024
21. Label Smoothing Improves Machine Unlearning
- Author
-
Di, Zonglin, Zhu, Zhaowei, Jia, Jinghan, Liu, Jiancheng, Takhirov, Zafar, Jiang, Bo, Yao, Yuanshun, Liu, Sijia, and Liu, Yang
- Subjects
Computer Science - Machine Learning - Abstract
The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of label smoothing. This work introduces UGradSL, a simple, plug-and-play MU approach that uses smoothed labels. We provide theoretical analyses demonstrating why properly introducing label smoothing improves MU performance. We conducted extensive experiments on six datasets of various sizes and different modalities, demonstrating the effectiveness and robustness of our proposed method. The consistent improvement in MU performance is only at a marginal cost of additional computations. For instance, UGradSL improves over the gradient ascent MU baseline by 66% unlearning accuracy without sacrificing unlearning efficiency.
- Published
- 2024
22. VersiCode: Towards Version-controllable Code Generation
- Author
-
Wu, Tongtong, Wu, Weigang, Wang, Xingyu, Xu, Kang, Ma, Suyu, Jiang, Bo, Yang, Ping, Xing, Zhenchang, Li, Yuan-Fang, and Haffari, Gholamreza
- Subjects
Computer Science - Software Engineering ,Computer Science - Computation and Language - Abstract
Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comprehensive dataset designed to assess the ability of large language models to generate verifiable code for specific library versions. VersiCode encompasses 300 libraries across more than 2,000 versions spanning 9 years. We design two dedicated evaluation tasks: version-specific code completion (VSCC) and version-aware code editing (VACE). Comprehensive experiments are conducted to benchmark the performance of LLMs, revealing the challenging nature of these tasks and VersiCode, that even state-of-the-art LLMs struggle to generate version-correct code. This dataset, together with the proposed tasks, sheds light on LLMs' capabilities and limitations in handling version-specific code generation, and opens up an important new area of research for further investigation. The resources can be found at https://github.com/wutong8023/VersiCode.
- Published
- 2024
23. Leveraging Pedagogical Theories to Understand Student Learning Process with Graph-based Reasonable Knowledge Tracing
- Author
-
Cui, Jiajun, Qian, Hong, Jiang, Bo, and Zhang, Wei
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computers and Society ,Computer Science - Machine Learning - Abstract
Knowledge tracing (KT) is a crucial task in intelligent education, focusing on predicting students' performance on given questions to trace their evolving knowledge. The advancement of deep learning in this field has led to deep-learning knowledge tracing (DLKT) models that prioritize high predictive accuracy. However, many existing DLKT methods overlook the fundamental goal of tracking students' dynamical knowledge mastery. These models do not explicitly model knowledge mastery tracing processes or yield unreasonable results that educators find difficulty to comprehend and apply in real teaching scenarios. In response, our research conducts a preliminary analysis of mainstream KT approaches to highlight and explain such unreasonableness. We introduce GRKT, a graph-based reasonable knowledge tracing method to address these issues. By leveraging graph neural networks, our approach delves into the mutual influences of knowledge concepts, offering a more accurate representation of how the knowledge mastery evolves throughout the learning process. Additionally, we propose a fine-grained and psychological three-stage modeling process as knowledge retrieval, memory strengthening, and knowledge learning/forgetting, to conduct a more reasonable knowledge tracing process. Comprehensive experiments demonstrate that GRKT outperforms eleven baselines across three datasets, not only enhancing predictive accuracy but also generating more reasonable knowledge tracing results. This makes our model a promising advancement for practical implementation in educational settings. The source code is available at https://github.com/JJCui96/GRKT., Comment: Preprint, accepted to appear in SIGKDD 2024, 12 pages. The source code is available at https://github.com/JJCui96/GRKT. Keywords: interpretable knowledge tracing, student behavior modeling, intelligence education
- Published
- 2024
24. CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework
- Author
-
Zhao, Yiyang, Liu, Yunzhuo, Jiang, Bo, and Guo, Tian
- Subjects
Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Signal Processing - Abstract
This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-learning agent to dynamically adjust GPU resources based on carbon intensity, predicted by a time-series transformer, to balance energy-efficient sampling and energy-intensive evaluation tasks. Furthermore, CE-NAS leverages a recently proposed multi-objective optimizer to effectively reduce the NAS search space. We demonstrate the efficacy of CE-NAS in lowering carbon emissions while achieving SOTA results for both NAS datasets and open-domain NAS tasks. For example, on the HW-NasBench dataset, CE-NAS reduces carbon emissions by up to 7.22X while maintaining a search efficiency comparable to vanilla NAS. For open-domain NAS tasks, CE-NAS achieves SOTA results with 97.35% top-1 accuracy on CIFAR-10 with only 1.68M parameters and a carbon consumption of 38.53 lbs of CO2. On ImageNet, our searched model achieves 80.6% top-1 accuracy with a 0.78 ms TensorRT latency using FP16 on NVIDIA V100, consuming only 909.86 lbs of CO2, making it comparable to other one-shot-based NAS baselines., Comment: arXiv admin note: text overlap with arXiv:2307.04131
- Published
- 2024
25. Koopcon: A new approach towards smarter and less complex learning
- Author
-
Jebraeeli, Vahid, Jiang, Bo, Cansever, Derya, and Krim, Hamid
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
In the era of big data, the sheer volume and complexity of datasets pose significant challenges in machine learning, particularly in image processing tasks. This paper introduces an innovative Autoencoder-based Dataset Condensation Model backed by Koopman operator theory that effectively packs large datasets into compact, information-rich representations. Inspired by the predictive coding mechanisms of the human brain, our model leverages a novel approach to encode and reconstruct data, maintaining essential features and label distributions. The condensation process utilizes an autoencoder neural network architecture, coupled with Optimal Transport theory and Wasserstein distance, to minimize the distributional discrepancies between the original and synthesized datasets. We present a two-stage implementation strategy: first, condensing the large dataset into a smaller synthesized subset; second, evaluating the synthesized data by training a classifier and comparing its performance with a classifier trained on an equivalent subset of the original data. Our experimental results demonstrate that the classifiers trained on condensed data exhibit comparable performance to those trained on the original datasets, thus affirming the efficacy of our condensation model. This work not only contributes to the reduction of computational resources but also paves the way for efficient data handling in constrained environments, marking a significant step forward in data-efficient machine learning., Comment: 7 pages, 3 figures
- Published
- 2024
26. CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models
- Author
-
Zhou, Qilin, Wei, Zhengyuan, Wang, Haipeng, Jiang, Bo, and Chan, W. K.
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security - Abstract
Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, existing certified detection defenders suffer from protecting labels subject to manipulation, and existing certified recovery defenders cannot systematically warn samples about their labels. A certified defense that simultaneously offers robust labels and systematic warning protection against patch attacks is desirable. This paper proposes a novel certified defense technique called CrossCert. CrossCert formulates a novel approach by cross-checking two certified recovery defenders to provide unwavering certification and detection certification. Unwavering certification ensures that a certified sample, when subjected to a patched perturbation, will always be returned with a benign label without triggering any warnings with a provable guarantee. To our knowledge, CrossCert is the first certified detection technique to offer this guarantee. Our experiments show that, with a slightly lower performance than ViP and comparable performance with PatchCensor in terms of detection certification, CrossCert certifies a significant proportion of samples with the guarantee of unwavering certification., Comment: 23 pages, 2 figures, accepted by FSE 2024 (The ACM International Conference on the Foundations of Software Engineering)
- Published
- 2024
27. Riemannian Accelerated Zeroth-order Algorithm: Improved Robustness and Lower Query Complexity
- Author
-
He, Chang, Pan, Zhaoye, Wang, Xiao, and Jiang, Bo
- Subjects
Mathematics - Optimization and Control - Abstract
Optimization problems with access to only zeroth-order information of the objective function on Riemannian manifolds arise in various applications, spanning from statistical learning to robot learning. While various zeroth-order algorithms have been proposed in Euclidean space, they are not inherently designed to handle the challenging constraints imposed by Riemannian manifolds. The proper adaptation of zeroth-order techniques to Riemannian manifolds remained unknown until the pioneering work of \cite{li2023stochastic}. However, zeroth-order algorithms are widely observed to converge slowly and be unstable in practice. To alleviate these issues, we propose a Riemannian accelerated zeroth-order algorithm with improved robustness. Regarding efficiency, our accelerated algorithm has the function query complexity of $\mathcal{O}(\epsilon^{-7/4}d)$ for finding an $\epsilon$-approximate first-order stationary point. By introducing a small perturbation, it exhibits a function query complexity of $\tilde{\mathcal{O}}(\epsilon^{-7/4}d)$ for seeking a second-order stationary point with a high probability, matching state-of-the-art result in Euclidean space. Moreover, we further establish the almost sure convergence in the asymptotic sense through the Stable Manifold Theorem. Regarding robustness, our algorithm requires larger smoothing parameters in the order of $\tilde{\mathcal{O}}(\epsilon^{7/8}d^{-1/2})$, improving the existing result by a factor of $\tilde{\mathcal{O}}(\epsilon^{3/4})$., Comment: Accepted by ICML 2024
- Published
- 2024
28. Mamba-FETrack: Frame-Event Tracking via State Space Model
- Author
-
Huang, Ju, Wang, Shiao, Wang, Shuai, Wu, Zhe, Wang, Xiao, and Jiang, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
RGB-Event based tracking is an emerging research topic, focusing on how to effectively integrate heterogeneous multi-modal data (synchronized exposure video frames and asynchronous pulse Event stream). Existing works typically employ Transformer based networks to handle these modalities and achieve decent accuracy through input-level or feature-level fusion on multiple datasets. However, these trackers require significant memory consumption and computational complexity due to the use of self-attention mechanism. This paper proposes a novel RGB-Event tracking framework, Mamba-FETrack, based on the State Space Model (SSM) to achieve high-performance tracking while effectively reducing computational costs and realizing more efficient tracking. Specifically, we adopt two modality-specific Mamba backbone networks to extract the features of RGB frames and Event streams. Then, we also propose to boost the interactive learning between the RGB and Event features using the Mamba network. The fused features will be fed into the tracking head for target object localization. Extensive experiments on FELT and FE108 datasets fully validated the efficiency and effectiveness of our proposed tracker. Specifically, our Mamba-based tracker achieves 43.5/55.6 on the SR/PR metric, while the ViT-S based tracker (OSTrack) obtains 40.0/50.9. The GPU memory cost of ours and ViT-S based tracker is 13.98GB and 15.44GB, which decreased about $9.5\%$. The FLOPs and parameters of ours/ViT-S based OSTrack are 59GB/1076GB and 7MB/60MB, which decreased about $94.5\%$ and $88.3\%$, respectively. We hope this work can bring some new insights to the tracking field and greatly promote the application of the Mamba architecture in tracking. The source code of this work will be released on \url{https://github.com/Event-AHU/Mamba_FETrack}., Comment: In Peer Review
- Published
- 2024
29. Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition
- Author
-
Wang, Xiao, Zhu, Qian, Jin, Jiandong, Zhu, Jun, Wang, Futian, Jiang, Bo, Wang, Yaowei, and Tian, Yonghong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Existing pedestrian attribute recognition (PAR) algorithms are mainly developed based on a static image, however, the performance is unreliable in challenging scenarios, such as heavy occlusion, motion blur, etc. In this work, we propose to understand human attributes using video frames that can fully use temporal information by fine-tuning a pre-trained multi-modal foundation model efficiently. Specifically, we formulate the video-based PAR as a vision-language fusion problem and adopt a pre-trained foundation model CLIP to extract the visual features. More importantly, we propose a novel spatiotemporal side-tuning strategy to achieve parameter-efficient optimization of the pre-trained vision foundation model. To better utilize the semantic information, we take the full attribute list that needs to be recognized as another input and transform the attribute words/phrases into the corresponding sentence via split, expand, and prompt operations. Then, the text encoder of CLIP is utilized for embedding processed attribute descriptions. The averaged visual tokens and text tokens are concatenated and fed into a fusion Transformer for multi-modal interactive learning. The enhanced tokens will be fed into a classification head for pedestrian attribute prediction. Extensive experiments on two large-scale video-based PAR datasets fully validated the effectiveness of our proposed framework. The source code of this paper is available at https://github.com/Event-AHU/OpenPAR., Comment: Parameter Efficient Fine-Tuning Strategy for Video-based Pedestrian Attribute Recognition
- Published
- 2024
30. Pre-training on High Definition X-ray Images: An Experimental Study
- Author
-
Wang, Xiao, Li, Yuehang, Wu, Wentao, Jin, Jiandong, Rong, Yao, Jiang, Bo, Li, Chuanfu, and Tang, Jin
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficult miscellaneous diseases. In this paper, we address these issues by proposing the first high-definition (1280 $\times$ 1280) X-ray based pre-trained foundation vision model on our newly collected large-scale dataset which contains more than 1 million X-ray images. Our model follows the masked auto-encoder framework which takes the tokens after mask processing (with a high rate) is used as input, and the masked image patches are reconstructed by the Transformer encoder-decoder network. More importantly, we introduce a novel context-aware masking strategy that utilizes the chest contour as a boundary for adaptive masking operations. We validate the effectiveness of our model on two downstream tasks, including X-ray report generation and disease recognition. Extensive experiments demonstrate that our pre-trained medical foundation vision model achieves comparable or even new state-of-the-art performance on downstream benchmark datasets. The source code and pre-trained models of this paper will be released on https://github.com/Event-AHU/Medical_Image_Analysis., Comment: Technology Report
- Published
- 2024
31. Ultrasound SAM Adapter: Adapting SAM for Breast Lesion Segmentation in Ultrasound Images
- Author
-
Tu, Zhengzheng, Gu, Le, Wang, Xixi, and Jiang, Bo
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Segment Anything Model (SAM) has recently achieved amazing results in the field of natural image segmentation. However, it is not effective for medical image segmentation, owing to the large domain gap between natural and medical images. In this paper, we mainly focus on ultrasound image segmentation. As we know that it is very difficult to train a foundation model for ultrasound image data due to the lack of large-scale annotated ultrasound image data. To address these issues, in this paper, we develop a novel Breast Ultrasound SAM Adapter, termed Breast Ultrasound Segment Anything Model (BUSSAM), which migrates the SAM to the field of breast ultrasound image segmentation by using the adapter technique. To be specific, we first design a novel CNN image encoder, which is fully trained on the BUS dataset. Our CNN image encoder is more lightweight, and focuses more on features of local receptive field, which provides the complementary information to the ViT branch in SAM. Then, we design a novel Cross-Branch Adapter to allow the CNN image encoder to fully interact with the ViT image encoder in SAM module. Finally, we add both of the Position Adapter and the Feature Adapter to the ViT branch to fine-tune the original SAM. The experimental results on AMUBUS and BUSI datasets demonstrate that our proposed model outperforms other medical image segmentation models significantly. Our code will be available at: https://github.com/bscs12/BUSSAM.
- Published
- 2024
32. State Space Model for New-Generation Network Alternative to Transformers: A Survey
- Author
-
Wang, Xiao, Wang, Shiao, Ding, Yuhe, Li, Yuehang, Wu, Wentao, Rong, Yao, Kong, Weizhe, Huang, Ju, Li, Shihao, Yang, Haoxiang, Wang, Ziwen, Jiang, Bo, Li, Chenglong, Wang, Yaowei, Tian, Yonghong, and Tang, Jin
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia - Abstract
In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: https://github.com/Event-AHU/Mamba_State_Space_Model_Paper_List., Comment: The First review of State Space Model (SSM)/Mamba and their applications in artificial intelligence, 33 pages
- Published
- 2024
33. A Spatial-Temporal Progressive Fusion Network for Breast Lesion Segmentation in Ultrasound Videos
- Author
-
Tu, Zhengzheng, Zhu, Zigang, Duan, Yayang, Jiang, Bo, Wang, Qishun, and Zhang, Chaoxue
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Ultrasound video-based breast lesion segmentation provides a valuable assistance in early breast lesion detection and treatment. However, existing works mainly focus on lesion segmentation based on ultrasound breast images which usually can not be adapted well to obtain desirable results on ultrasound videos. The main challenge for ultrasound video-based breast lesion segmentation is how to exploit the lesion cues of both intra-frame and inter-frame simultaneously. To address this problem, we propose a novel Spatial-Temporal Progressive Fusion Network (STPFNet) for video based breast lesion segmentation problem. The main aspects of the proposed STPFNet are threefold. First, we propose to adopt a unified network architecture to capture both spatial dependences within each ultrasound frame and temporal correlations between different frames together for ultrasound data representation. Second, we propose a new fusion module, termed Multi-Scale Feature Fusion (MSFF), to fuse spatial and temporal cues together for lesion detection. MSFF can help to determine the boundary contour of lesion region to overcome the issue of lesion boundary blurring. Third, we propose to exploit the segmentation result of previous frame as the prior knowledge to suppress the noisy background and learn more robust representation. In particular, we introduce a new publicly available ultrasound video breast lesion segmentation dataset, termed UVBLS200, which is specifically dedicated to breast lesion segmentation. It contains 200 videos, including 80 videos of benign lesions and 120 videos of malignant lesions. Experiments on the proposed dataset demonstrate that the proposed STPFNet achieves better breast lesion detection performance than state-of-the-art methods.
- Published
- 2024
34. An Empirical Study on JIT Defect Prediction Based on BERT-style Model
- Author
-
Guo, Yuxiang, Gao, Xiaopeng, and Jiang, Bo
- Subjects
Computer Science - Software Engineering - Abstract
Previous works on Just-In-Time (JIT) defect prediction tasks have primarily applied pre-trained models directly, neglecting the configurations of their fine-tuning process. In this study, we perform a systematic empirical study to understand the impact of the settings of the fine-tuning process on BERT-style pre-trained model for JIT defect prediction. Specifically, we explore the impact of different parameter freezing settings, parameter initialization settings, and optimizer strategies on the performance of BERT-style models for JIT defect prediction. Our findings reveal the crucial role of the first encoder layer in the BERT-style model and the project sensitivity to parameter initialization settings. Another notable finding is that the addition of a weight decay strategy in the Adam optimizer can slightly improve model performance. Additionally, we compare performance using different feature extractors (FCN, CNN, LSTM, transformer) and find that a simple network can achieve great performance. These results offer new insights for fine-tuning pre-trained models for JIT defect prediction. We combine these findings to find a cost-effective fine-tuning method based on LoRA, which achieve a comparable performance with only one-third memory consumption than original fine-tuning process.
- Published
- 2024
35. Budget Recycling Differential Privacy
- Author
-
Jiang, Bo, Du, Jian, Sharma, Sagar, and Yan, Qiang
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Data Structures and Algorithms ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within a predefined error boundary, thereby improving utility and maintaining privacy simultaneously. The core of BR-DP consists of two components: a DP kernel responsible for generating a noisy answer per iteration, and a recycler that probabilistically recycles/regenerates or releases the noisy answer. We delve into the privacy accounting of BR-DP, culminating in the development of a budgeting principle that optimally sub-allocates the available budget between the DP kernel and the recycler. Furthermore, we introduce algorithms for tight BR-DP accounting in composition scenarios, and our findings indicate that BR-DP achieves reduced privacy leakage post-composition compared to DP. Additionally, we explore the concept of privacy amplification via subsampling within the BR-DP framework and propose optimal sampling rates for BR-DP across various queries. We experiment with real data, and the results demonstrate BR-DP's effectiveness in lifting the utility-privacy tradeoff provided by DP mechanisms.
- Published
- 2024
36. Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization
- Author
-
Xu, Qin, Li, Sitong, Wang, Jiahui, Jiang, Bo, and Tang, Jinhui
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Exploring and mining subtle yet distinctive features between sub-categories with similar appearances is crucial for fine-grained visual categorization (FGVC). However, less effort has been devoted to assessing the quality of extracted visual representations. Intuitively, the network may struggle to capture discriminative features from low-quality samples, which leads to a significant decline in FGVC performance. To tackle this challenge, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC. In this network, to model the spatial contextual relationship between rich part descriptors and global semantics for capturing more discriminative details within the object, we design a novel multi-part and multi-scale cross-attention (MPMSCA) module. Before feeding to the MPMSCA module, the part navigator is developed to address the scale confusion problems and accurately identify the local distinctive regions. Furthermore, we propose a generic multi-level semantic quality evaluation module (MLSQE) to progressively supervise and enhance hierarchical semantics from different levels of the backbone network. Finally, context-aware features from MPMSCA and semantically enhanced features from MLSQE are fed into the corresponding quality probing classifiers to evaluate their quality in real-time, thus boosting the discriminability of feature representations. Comprehensive experiments on four popular and highly competitive FGVC datasets demonstrate the superiority of the proposed CSQA-Net in comparison with the state-of-the-art methods.
- Published
- 2024
37. Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline
- Author
-
Wang, Xiao, Huang, Ju, Wang, Shiao, Tang, Chuanming, Jiang, Bo, Tian, Yonghong, Tang, Jin, and Luo, Bin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Neural and Evolutionary Computing - Abstract
Current event-/frame-event based trackers undergo evaluation on short-term tracking datasets, however, the tracking of real-world scenarios involves long-term tracking, and the performance of existing tracking algorithms in these scenarios remains unclear. In this paper, we first propose a new long-term and large-scale frame-event single object tracking dataset, termed FELT. It contains 742 videos and 1,594,474 RGB frames and event stream pairs and has become the largest frame-event tracking dataset to date. We re-train and evaluate 15 baseline trackers on our dataset for future works to compare. More importantly, we find that the RGB frames and event streams are naturally incomplete due to the influence of challenging factors and spatially sparse event flow. In response to this, we propose a novel associative memory Transformer network as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data. Extensive experiments on RGB-Event (FELT), RGB-Thermal (RGBT234, LasHeR), and RGB-Depth (DepthTrack) datasets fully validated the effectiveness of our model. The dataset and source code can be found at \url{https://github.com/Event-AHU/FELT_SOT_Benchmark}., Comment: In Peer Review
- Published
- 2024
38. Which Model to Transfer? A Survey on Transferability Estimation
- Author
-
Ding, Yuhe, Jiang, Bo, Yu, Aijing, Zheng, Aihua, and Liang, Jian
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Transfer learning methods endeavor to leverage relevant knowledge from existing source pre-trained models or datasets to solve downstream target tasks. With the increase in the scale and quantity of available pre-trained models nowadays, it becomes critical to assess in advance whether they are suitable for a specific target task. Model transferability estimation is an emerging and growing area of interest, aiming to propose a metric to quantify this suitability without training them individually, which is computationally prohibitive. Despite extensive recent advances already devoted to this area, they have custom terminological definitions and experimental settings. In this survey, we present the first review of existing advances in this area and categorize them into two separate realms: source-free model transferability estimation and source-dependent model transferability estimation. Each category is systematically defined, accompanied by a comprehensive taxonomy. Besides, we address challenges and outline future research directions, intending to provide a comprehensive guide to aid researchers and practitioners.
- Published
- 2024
39. VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
- Author
-
Chen, Shaoyu, Jiang, Bo, Gao, Hao, Liao, Bencheng, Xu, Qing, Zhang, Qian, Huang, Chang, Liu, Wenyu, and Wang, Xinggang
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
Learning a human-like driving policy from large-scale driving demonstrations is promising, but the uncertainty and non-deterministic nature of planning make it challenging. In this work, to cope with the uncertainty problem, we propose VADv2, an end-to-end driving model based on probabilistic planning. VADv2 takes multi-view image sequences as input in a streaming manner, transforms sensor data into environmental token embeddings, outputs the probabilistic distribution of action, and samples one action to control the vehicle. Only with camera sensors, VADv2 achieves state-of-the-art closed-loop performance on the CARLA Town05 benchmark, significantly outperforming all existing methods. It runs stably in a fully end-to-end manner, even without the rule-based wrapper. Closed-loop demos are presented at https://hgao-cv.github.io/VADv2., Comment: Project Page: https://hgao-cv.github.io/VADv2
- Published
- 2024
40. Inexact and Implementable Accelerated Newton Proximal Extragradient Method for Convex Optimization
- Author
-
Huang, Ziyu, Jiang, Bo, and Jiang, Yuntian
- Subjects
Mathematics - Optimization and Control - Abstract
In this paper, we investigate the convergence behavior of the Accelerated Newton Proximal Extragradient (A-NPE) method when employing inexact Hessian information. The exact A-NPE method was the pioneer near-optimal second-order approach, exhibiting an oracle complexity of $\Tilde{O}(\epsilon^{-2/7})$ for convex optimization. Despite its theoretical optimality, there has been insufficient attention given to the study of its inexact version and efficient implementation. We introduce the inexact A-NPE method (IA-NPE), which is shown to maintain the near-optimal oracle complexity. In particular, we design a dynamic approach to balance the computational cost of constructing the Hessian matrix and the progress of the convergence. Moreover, we show the robustness of the line-search procedure, which is a subroutine in IA-NPE, in the face of the inexactness of the Hessian. These nice properties enable the implementation of highly effective machine learning techniques like sub-sampling and various heuristics in the method. Extensive numerical results illustrate that IA-NPE compares favorably with state-of-the-art second-order methods, including Newton's method with cubic regularization and Trust-Region methods.
- Published
- 2024
41. A Comprehensive Exploration of Personalized Learning in Smart Education: From Student Modeling to Personalized Recommendations
- Author
-
Wu, Siyu, Cao, Yang, Cui, Jiajun, Li, Runze, Qian, Hong, Jiang, Bo, and Zhang, Wei
- Subjects
Computer Science - Computers and Society ,A.1 - Abstract
With the development of artificial intelligence, personalized learning has attracted much attention as an integral part of intelligent education. China, the United States, the European Union, and others have put forward the importance of personalized learning in recent years, emphasizing the realization of the organic combination of large-scale education and personalized training. The development of a personalized learning system oriented to learners' preferences and suited to learners' needs should be accelerated. This review provides a comprehensive analysis of the current situation of personalized learning and its key role in education. It discusses the research on personalized learning from multiple perspectives, combining definitions, goals, and related educational theories to provide an in-depth understanding of personalized learning from an educational perspective, analyzing the implications of different theories on personalized learning, and highlighting the potential of personalized learning to meet the needs of individuals and to enhance their abilities. Data applications and assessment indicators in personalized learning are described in detail, providing a solid data foundation and evaluation system for subsequent research. Meanwhile, we start from both student modeling and recommendation algorithms and deeply analyze the cognitive and non-cognitive perspectives and the contribution of personalized recommendations to personalized learning. Finally, we explore the challenges and future trajectories of personalized learning. This review provides a multidimensional analysis of personalized learning through a more comprehensive study, providing academics and practitioners with cutting-edge explorations to promote continuous progress in the field of personalized learning., Comment: 82 pages,5 figures
- Published
- 2024
42. Unifying Graph Contrastive Learning via Graph Message Augmentation
- Author
-
Zhang, Ziyan, Jiang, Bo, Tang, Jin, and Luo, Bin
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Graph contrastive learning is usually performed by first conducting Graph Data Augmentation (GDA) and then employing a contrastive learning pipeline to train GNNs. As we know that GDA is an important issue for graph contrastive learning. Various GDAs have been developed recently which mainly involve dropping or perturbing edges, nodes, node attributes and edge attributes. However, to our knowledge, it still lacks a universal and effective augmentor that is suitable for different types of graph data. To address this issue, in this paper, we first introduce the graph message representation of graph data. Based on it, we then propose a novel Graph Message Augmentation (GMA), a universal scheme for reformulating many existing GDAs. The proposed unified GMA not only gives a new perspective to understand many existing GDAs but also provides a universal and more effective graph data augmentation for graph self-supervised learning tasks. Moreover, GMA introduces an easy way to implement the mixup augmentor which is natural for images but usually challengeable for graphs. Based on the proposed GMA, we then propose a unified graph contrastive learning, termed Graph Message Contrastive Learning (GMCL), that employs attribution-guided universal GMA for graph contrastive learning. Experiments on many graph learning tasks demonstrate the effectiveness and benefits of the proposed GMA and GMCL approaches.
- Published
- 2024
43. Understanding Representation Learnability of Nonlinear Self-Supervised Learning
- Author
-
Yang, Ruofeng, Li, Xiangyuan, Jiang, Bo, and Li, Shuai
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Self-supervised learning (SSL) has empirically shown its data representation learnability in many downstream tasks. There are only a few theoretical works on data representation learnability, and many of those focus on final data representation, treating the nonlinear neural network as a ``black box". However, the accurate learning results of neural networks are crucial for describing the data distribution features learned by SSL models. Our paper is the first to analyze the learning results of the nonlinear SSL model accurately. We consider a toy data distribution that contains two features: the label-related feature and the hidden feature. Unlike previous linear setting work that depends on closed-form solutions, we use the gradient descent algorithm to train a 1-layer nonlinear SSL model with a certain initialization region and prove that the model converges to a local minimum. Furthermore, different from the complex iterative analysis, we propose a new analysis process which uses the exact version of Inverse Function Theorem to accurately describe the features learned by the local minimum. With this local minimum, we prove that the nonlinear SSL model can capture the label-related feature and hidden feature at the same time. In contrast, the nonlinear supervised learning (SL) model can only learn the label-related feature. We also present the learning processes and results of the nonlinear SSL and SL model via simulation experiments.
- Published
- 2024
44. CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras
- Author
-
Zhu, Yabin, Wang, Xiao, Li, Chenglong, Jiang, Bo, Zhu, Lin, Huang, Zhixiang, Tian, Yonghong, and Tang, Jin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Neural and Evolutionary Computing - Abstract
Existing datasets for RGB-DVS tracking are collected with DVS346 camera and their resolution ($346 \times 260$) is low for practical applications. Actually, only visible cameras are deployed in many practical systems, and the newly designed neuromorphic cameras may have different resolutions. The latest neuromorphic sensors can output high-definition event streams, but it is very difficult to achieve strict alignment between events and frames on both spatial and temporal views. Therefore, how to achieve accurate tracking with unaligned neuromorphic and visible sensors is a valuable but unresearched problem. In this work, we formally propose the task of object tracking using unaligned neuromorphic and visible cameras. We build the first unaligned frame-event dataset CRSOT collected with a specially built data acquisition system, which contains 1,030 high-definition RGB-Event video pairs, 304,974 video frames. In addition, we propose a novel unaligned object tracking framework that can realize robust tracking even using the loosely aligned RGB-Event data. Specifically, we extract the template and search regions of RGB and Event data and feed them into a unified ViT backbone for feature embedding. Then, we propose uncertainty perception modules to encode the RGB and Event features, respectively, then, we propose a modality uncertainty fusion module to aggregate the two modalities. These three branches are jointly optimized in the training phase. Extensive experiments demonstrate that our tracker can collaborate the dual modalities for high-performance tracking even without strictly temporal and spatial alignment. The source code, dataset, and pre-trained models will be released at https://github.com/Event-AHU/Cross_Resolution_SOT., Comment: In Peer Review
- Published
- 2024
45. Microstructural Analysis and Hardening Mechanism of High-Strength Low-Carbon Weathering Steel: Effect of Coiling Temperature
- Author
-
Dai, Bowen, Guo, Shuo, Liu, Chenxuan, He, Jianzhong, Liu, Zhouli, Yang, Feng, Zhou, Leyu, and Jiang, Bo
- Published
- 2024
- Full Text
- View/download PDF
46. A practical contrast-enhanced ultrasound risk prediction of gallbladder polyp: differentiation of adenoma from cholesterol polyp lesion
- Author
-
Fei, Xiang, Cheng, Zhihao, Zhu, Lianhua, Han, Peng, Li, Nan, Jiao, Ziyu, Liang, Shuyuan, Jiang, Bo, Li, Miao, Li, Hongtian, and Lv, Wenping
- Published
- 2024
- Full Text
- View/download PDF
47. WeakCLIP: Adapting CLIP for Weakly-Supervised Semantic Segmentation
- Author
-
Zhu, Lianghui, Wang, Xinggang, Feng, Jiapei, Cheng, Tianheng, Li, Yingyue, Jiang, Bo, Zhang, Dingwen, and Han, Junwei
- Published
- 2024
- Full Text
- View/download PDF
48. Effects of Marangoni Flow on the Composition Distribution of Laser-Cladded 316L on 45 Steel
- Author
-
Ge, Honghao, Zhang, Pengzhi, Jiang, Bo, Liu, Yunfeng, Zhang, Qunli, and Yao, Jianhua
- Published
- 2024
- Full Text
- View/download PDF
49. Synergistic effect of chimeric antigen receptor modified with Bcl-2 on enhanced solid tumour targeting
- Author
-
Wang, Xiaoyan, Liu, Guodong, Huan, Tian, Wang, Yuxing, Jiang, Bo, Liu, Wei, Dai, Anran, Zhang, Xiangzhi, and Yu, Feng
- Published
- 2024
- Full Text
- View/download PDF
50. MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention
- Author
-
Wang, Xixi, Wang, Xiao, Jiang, Bo, Tang, Jin, and Luo, Bin
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.