72,031 results on '"Liu, Li"'
Search Results
2. Step-wise Distribution Alignment Guided Style Prompt Tuning for Source-free Cross-domain Few-shot Learning
- Author
-
Xu, Huali, Liu, Yongxiang, Liu, Li, Zhi, Shuaifeng, Sun, Shuzhou, Liu, Tianpeng, and Cheng, MingMing
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing cross-domain few-shot learning (CDFSL) methods, which develop source-domain training strategies to enhance model transferability, face challenges with large-scale pre-trained models (LMs) due to inaccessible source data and training strategies. Moreover, fine-tuning LMs for CDFSL demands substantial computational resources, limiting practicality. This paper addresses the source-free CDFSL (SF-CDFSL) problem, tackling few-shot learning (FSL) in the target domain using only pre-trained models and a few target samples without source data or strategies. To overcome the challenge of inaccessible source data, this paper introduces Step-wise Distribution Alignment Guided Style Prompt Tuning (StepSPT), which implicitly narrows domain gaps through prediction distribution optimization. StepSPT proposes a style prompt to align target samples with the desired distribution and adopts a dual-phase optimization process. In the external process, a step-wise distribution alignment strategy factorizes prediction distribution optimization into a multi-step alignment problem to tune the style prompt. In the internal process, the classifier is updated using standard cross-entropy loss. Evaluations on five datasets demonstrate that StepSPT outperforms existing prompt tuning-based methods and SOTAs. Ablation studies further verify its effectiveness. Code will be made publicly available at \url{https://github.com/xuhuali-mxj/StepSPT}., Comment: 15 pages, 12 figures, 7 tables
- Published
- 2024
3. Cross Space and Time: A Spatio-Temporal Unitized Model for Traffic Flow Forecasting
- Author
-
Ruan, Weilin, Wang, Wenzhuo, Zhong, Siru, Chen, Wei, Liu, Li, and Liang, Yuxuan
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Predicting spatio-temporal traffic flow presents significant challenges due to complex interactions between spatial and temporal factors. Existing approaches often address these dimensions in isolation, neglecting their critical interdependencies. In this paper, we introduce the Spatio-Temporal Unitized Model (STUM), a unified framework designed to capture both spatial and temporal dependencies while addressing spatio-temporal heterogeneity through techniques such as distribution alignment and feature fusion. It also ensures both predictive accuracy and computational efficiency. Central to STUM is the Adaptive Spatio-temporal Unitized Cell (ASTUC), which utilizes low-rank matrices to seamlessly store, update, and interact with space, time, as well as their correlations. Our framework is also modular, allowing it to integrate with various spatio-temporal graph neural networks through components such as backbone models, feature extractors, residual fusion blocks, and predictive modules to collectively enhance forecasting outcomes. Experimental results across multiple real-world datasets demonstrate that STUM consistently improves prediction performance with minimal computational cost. These findings are further supported by hyperparameter optimization, pre-training analysis, and result visualization. We provide our source code for reproducibility at https://anonymous.4open.science/r/STUM-E4F0.
- Published
- 2024
4. MaDiNet: Mamba Diffusion Network for SAR Target Detection
- Author
-
Zhou, Jie, Xiao, Chao, Peng, Bowen, Liu, Tianpeng, Liu, Zhen, Liu, Yongxiang, and Liu, Li
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing - Abstract
The fundamental challenge in SAR target detection lies in developing discriminative, efficient, and robust representations of target characteristics within intricate non-cooperative environments. However, accurate target detection is impeded by factors including the sparse distribution and discrete features of the targets, as well as complex background interference. In this study, we propose a \textbf{Ma}mba \textbf{Di}ffusion \textbf{Net}work (MaDiNet) for SAR target detection. Specifically, MaDiNet conceptualizes SAR target detection as the task of generating the position (center coordinates) and size (width and height) of the bounding boxes in the image space. Furthermore, we design a MambaSAR module to capture intricate spatial structural information of targets and enhance the capability of the model to differentiate between targets and complex backgrounds. The experimental results on extensive SAR target detection datasets achieve SOTA, proving the effectiveness of the proposed network. Code is available at \href{https://github.com/JoyeZLearning/MaDiNet}{https://github.com/JoyeZLearning/MaDiNet}.
- Published
- 2024
5. UEVAVD: A Dataset for Developing UAV's Eye View Active Object Detection
- Author
-
Jiang, Xinhua, Liu, Tianpeng, Liu, Li, Liu, Zhen, and Liu, Yongxiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
Occlusion is a longstanding difficulty that challenges the UAV-based object detection. Many works address this problem by adapting the detection model. However, few of them exploit that the UAV could fundamentally improve detection performance by changing its viewpoint. Active Object Detection (AOD) offers an effective way to achieve this purpose. Through Deep Reinforcement Learning (DRL), AOD endows the UAV with the ability of autonomous path planning to search for the observation that is more conducive to target identification. Unfortunately, there exists no available dataset for developing the UAV AOD method. To fill this gap, we released a UAV's eye view active vision dataset named UEVAVD and hope it can facilitate research on the UAV AOD problem. Additionally, we improve the existing DRL-based AOD method by incorporating the inductive bias when learning the state representation. First, due to the partial observability, we use the gated recurrent unit to extract state representations from the observation sequence instead of the single-view observation. Second, we pre-decompose the scene with the Segment Anything Model (SAM) and filter out the irrelevant information with the derived masks. With these practices, the agent could learn an active viewing policy with better generalization capability. The effectiveness of our innovations is validated by the experiments on the UEVAVD dataset. Our dataset will soon be available at https://github.com/Leo000ooo/UEVAVD_dataset.
- Published
- 2024
6. Advances in Photoacoustic Imaging Reconstruction and Quantitative Analysis for Biomedical Applications
- Author
-
Wang, Lei, Zeng, Weiming, Long, Kai, Lan, Rongfeng, Liu, Li, Siok, Wai Ting, and Wang, Nizhuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Photoacoustic imaging (PAI) represents an innovative biomedical imaging modality that harnesses the advantages of optical resolution and acoustic penetration depth while ensuring enhanced safety. Despite its promising potential across a diverse array of preclinical and clinical applications, the clinical implementation of PAI faces significant challenges, including the trade-off between penetration depth and spatial resolution, as well as the demand for faster imaging speeds. This paper explores the fundamental principles underlying PAI, with a particular emphasis on three primary implementations: photoacoustic computed tomography (PACT), photoacoustic microscopy (PAM), and photoacoustic endoscopy (PAE). We undertake a critical assessment of their respective strengths and practical limitations. Furthermore, recent developments in utilizing conventional or deep learning (DL) methodologies for image reconstruction and artefact mitigation across PACT, PAM, and PAE are outlined, demonstrating considerable potential to enhance image quality and accelerate imaging processes. Furthermore, this paper examines the recent developments in quantitative analysis within PAI, including the quantification of haemoglobin concentration, oxygen saturation, and other physiological parameters within tissues. Finally, our discussion encompasses current trends and future directions in PAI research while emphasizing the transformative impact of deep learning on advancing PAI.
- Published
- 2024
7. Graph-based Confidence Calibration for Large Language Models
- Author
-
Li, Yukun, Wang, Sijia, Huang, Lifu, and Liu, Li-Ping
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Information Retrieval ,Computer Science - Machine Learning - Abstract
One important approach to improving the reliability of large language models (LLMs) is to provide accurate confidence estimations regarding the correctness of their answers. However, developing a well-calibrated confidence estimation model is challenging, as mistakes made by LLMs can be difficult to detect. We propose a novel method combining the LLM's self-consistency with labeled data and training an auxiliary model to estimate the correctness of its responses to questions. This auxiliary model predicts the correctness of responses based solely on their consistent information. To set up the learning problem, we use a weighted graph to represent the consistency among the LLM's multiple responses to a question. Correctness labels are assigned to these responses based on their similarity to the correct answer. We then train a graph neural network to estimate the probability of correct responses. Experiments demonstrate that the proposed approach substantially outperforms several of the most recent methods in confidence calibration across multiple widely adopted benchmark datasets. Furthermore, the proposed approach significantly improves the generalization capability of confidence calibration on out-of-domain (OOD) data.
- Published
- 2024
8. Right this way: Can VLMs Guide Us to See More to Answer Questions?
- Author
-
Liu, Li, Yang, Diji, Zhong, Sijia, Tholeti, Kalyana Suma Sree, Ding, Lei, Zhang, Yi, and Gilpin, Leilani H.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
In question-answering scenarios, humans can assess whether the available information is sufficient and seek additional information if necessary, rather than providing a forced answer. In contrast, Vision Language Models (VLMs) typically generate direct, one-shot responses without evaluating the sufficiency of the information. To investigate this gap, we identify a critical and challenging task in the Visual Question Answering (VQA) scenario: can VLMs indicate how to adjust an image when the visual information is insufficient to answer a question? This capability is especially valuable for assisting visually impaired individuals who often need guidance to capture images correctly. To evaluate this capability of current VLMs, we introduce a human-labeled dataset as a benchmark for this task. Additionally, we present an automated framework that generates synthetic training data by simulating ``where to know'' scenarios. Our empirical results show significant performance improvements in mainstream VLMs when fine-tuned with this synthetic data. This study demonstrates the potential to narrow the gap between information assessment and acquisition in VLMs, bringing their performance closer to humans., Comment: NeurIPS 2024
- Published
- 2024
9. MassSpecGym: A benchmark for the discovery and identification of molecules
- Author
-
Bushuiev, Roman, Bushuiev, Anton, de Jonge, Niek F., Young, Adamo, Kretschmer, Fleming, Samusevich, Raman, Heirman, Janne, Wang, Fei, Zhang, Luke, Dührkop, Kai, Ludwig, Marcus, Haupt, Nils A., Kalia, Apurva, Brungs, Corinna, Schmid, Robin, Greiner, Russell, Wang, Bo, Wishart, David S., Liu, Li-Ping, Rousu, Juho, Bittremieux, Wout, Rost, Hannes, Mak, Tytus D., Hassoun, Soha, Huber, Florian, van der Hooft, Justin J. J., Stravs, Michael A., Böcker, Sebastian, Sivic, Josef, and Pluskal, Tomáš
- Subjects
Quantitative Biology - Quantitative Methods ,Computer Science - Machine Learning - Abstract
The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym -- the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality labeled MS/MS spectra and defines three MS/MS annotation challenges: \textit{de novo} molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available at \url{https://github.com/pluskal-lab/MassSpecGym}.
- Published
- 2024
10. Artificial Intelligence of Things: A Survey
- Author
-
Siam, Shakhrul Iman, Ahn, Hyunho, Liu, Li, Alam, Samiul, Shen, Hui, Cao, Zhichao, Shroff, Ness, Krishnamachari, Bhaskar, Srivastava, Mani, and Zhang, Mi
- Subjects
Computer Science - Networking and Internet Architecture ,Computer Science - Artificial Intelligence - Abstract
The integration of the Internet of Things (IoT) and modern Artificial Intelligence (AI) has given rise to a new paradigm known as the Artificial Intelligence of Things (AIoT). In this survey, we provide a systematic and comprehensive review of AIoT research. We examine AIoT literature related to sensing, computing, and networking & communication, which form the three key components of AIoT. In addition to advancements in these areas, we review domain-specific AIoT systems that are designed for various important application domains. We have also created an accompanying GitHub repository, where we compile the papers included in this survey: https://github.com/AIoT-MLSys-Lab/AIoT-Survey. This repository will be actively maintained and updated with new research as it becomes available. As both IoT and AI become increasingly critical to our society, we believe AIoT is emerging as an essential research field at the intersection of IoT and modern AI. We hope this survey will serve as a valuable resource for those engaged in AIoT research and act as a catalyst for future explorations to bridge gaps and drive advancements in this exciting field., Comment: Accepted in ACM Transactions on Sensor Networks (TOSN)
- Published
- 2024
- Full Text
- View/download PDF
11. New Paradigm of Adversarial Training: Breaking Inherent Trade-Off between Accuracy and Robustness via Dummy Classes
- Author
-
Wang, Yanyun, Liu, Li, Liang, Zi, Ye, Qingqing, and Hu, Haibo
- Subjects
Computer Science - Machine Learning ,I.2.6 - Abstract
Adversarial Training (AT) is one of the most effective methods to enhance the robustness of DNNs. However, existing AT methods suffer from an inherent trade-off between adversarial robustness and clean accuracy, which seriously hinders their real-world deployment. While this problem has been widely studied within the current AT paradigm, existing AT methods still typically experience a reduction in clean accuracy by over 10% to date, without significant improvements in robustness compared with simple baselines like PGD-AT. This inherent trade-off raises a question: whether the current AT paradigm, which assumes to learn the corresponding benign and adversarial samples as the same class, inappropriately combines clean and robust objectives that may be essentially inconsistent. In this work, we surprisingly reveal that up to 40% of CIFAR-10 adversarial samples always fail to satisfy such an assumption across various AT methods and robust models, explicitly indicating the improvement room for the current AT paradigm. Accordingly, to relax the tension between clean and robust learning derived from this overstrict assumption, we propose a new AT paradigm by introducing an additional dummy class for each original class, aiming to accommodate the hard adversarial samples with shifted distribution after perturbation. The robustness w.r.t. these adversarial samples can be achieved by runtime recovery from the predicted dummy classes to their corresponding original ones, eliminating the compromise with clean learning. Building on this new paradigm, we propose a novel plug-and-play AT technology named DUmmy Classes-based Adversarial Training (DUCAT). Extensive experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that the DUCAT concurrently improves clean accuracy and adversarial robustness compared with state-of-the-art benchmarks, effectively breaking the existing inherent trade-off., Comment: Preprint. Work in progress. The code is available at https://github.com/FlaAI/DUCAT
- Published
- 2024
12. S$^4$ST: A Strong, Self-transferable, faSt, and Simple Scale Transformation for Transferable Targeted Attack
- Author
-
Liu, Yongxiang, Peng, Bowen, Liu, Li, and Li, Xiang
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence - Abstract
Transferable targeted adversarial attacks (TTAs) against deep neural networks have been proven significantly more challenging than untargeted ones, yet they remain relatively underexplored. This paper sheds new light on performing highly efficient yet transferable targeted attacks leveraging the simple gradient-based baseline. Our research underscores the critical importance of image transformations within gradient calculations, marking a shift from the prevalent emphasis on loss functions to address the gradient vanishing problem. Moreover, we have developed two effective blind estimators that facilitate the design of transformation strategies to enhance targeted transferability under black-box conditions. The adversarial examples' self-transferability to geometric transformations has been identified as strongly correlated with their black-box transferability, featuring these basic operations as potent yet overlapped proxies for facilitating targeted transferability. The surrogate self-alignment assessments further highlight simple scaling transformation's exceptional efficacy, which rivals that of most advanced methods. Building on these insights, we introduce a scaling-centered transformation strategy termed Strong, Self-transferable, faSt, and Simple Scale Transformation (S4ST) to enhance transferable targeted attacks. In experiments conducted on the ImageNet-Compatible benchmark dataset, our proposed S4ST attains a SOTA average targeted transfer success rate across various challenging black-box models, outperforming the previous leading method by over 14% while requiring only 25% of the execution time. Additionally, our approach eclipses SOTA attacks considerably and exhibits remarkable effectiveness against real-world APIs. This work marks a significant leap forward in TTAs, revealing the realistic threats they pose and providing a practical generation method for future research., Comment: 16 pages, 18 figures
- Published
- 2024
13. Enhanced Multi-Robot SLAM System with Cross-Validation Matching and Exponential Threshold Keyframe Selection
- Author
-
He, Ang, Wu, Xi-mei, Guo, Xiao-bin, and Liu, Li-bin
- Subjects
Computer Science - Robotics - Abstract
The evolving field of mobile robotics has indeed increased the demand for simultaneous localization and mapping (SLAM) systems. To augment the localization accuracy and mapping efficacy of SLAM, we refined the core module of the SLAM system. Within the feature matching phase, we introduced cross-validation matching to filter out mismatches. In the keyframe selection strategy, an exponential threshold function is constructed to quantify the keyframe selection process. Compared with a single robot, the multi-robot collaborative SLAM (CSLAM) system substantially improves task execution efficiency and robustness. By employing a centralized structure, we formulate a multi-robot SLAM system and design a coarse-to-fine matching approach for multi-map point cloud registration. Our system, built upon ORB-SLAM3, underwent extensive evaluation utilizing the TUM RGB-D, EuRoC MAV, and TUM_VI datasets. The experimental results demonstrate a significant improvement in the positioning accuracy and mapping quality of our enhanced algorithm compared to those of ORB-SLAM3, with a 12.90% reduction in the absolute trajectory error.
- Published
- 2024
14. Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2
- Author
-
Zhang, Chunhui, Liu, Li, Huang, Guanjie, Wen, Hao, Zhou, Xi, and Wang, Yanfeng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Over the past decade, significant progress has been made in visual object tracking, largely due to the availability of large-scale training datasets. However, existing tracking datasets are primarily focused on open-air scenarios, which greatly limits the development of object tracking in underwater environments. To address this issue, we take a step forward by proposing the first large-scale underwater camouflaged object tracking dataset, namely UW-COT. Based on the proposed dataset, this paper presents an experimental evaluation of several advanced visual object tracking methods and the latest advancements in image and video segmentation. Specifically, we compare the performance of the Segment Anything Model (SAM) and its updated version, SAM 2, in challenging underwater environments. Our findings highlight the improvements in SAM 2 over SAM, demonstrating its enhanced capability to handle the complexities of underwater camouflaged objects. Compared to current advanced visual object tracking methods, the latest video segmentation foundation model SAM 2 also exhibits significant advantages, providing valuable insights into the development of more effective tracking technologies for underwater scenarios. The dataset will be accessible at \color{magenta}{https://github.com/983632847/Awesome-Multimodal-Object-Tracking}., Comment: Preprint. Work in Progress
- Published
- 2024
15. Numerical Approximation Capacity of Neural Networks with Bounded Parameters: Do Limits Exist, and How Can They Be Measured?
- Author
-
Liu, Li, Yu, Tengchao, and Yong, Heng
- Subjects
Computer Science - Machine Learning - Abstract
The Universal Approximation Theorem posits that neural networks can theoretically possess unlimited approximation capacity with a suitable activation function and a freely chosen or trained set of parameters. However, a more practical scenario arises when these neural parameters, especially the nonlinear weights and biases, are bounded. This leads us to question: \textbf{Does the approximation capacity of a neural network remain universal, or does it have a limit when the parameters are practically bounded? And if it has a limit, how can it be measured?} Our theoretical study indicates that while universal approximation is theoretically feasible, in practical numerical scenarios, Deep Neural Networks (DNNs) with any analytic activation functions (such as Tanh and Sigmoid) can only be approximated by a finite-dimensional vector space under a bounded nonlinear parameter space (NP space), whether in a continuous or discrete sense. Based on this study, we introduce the concepts of \textit{$\epsilon$ outer measure} and \textit{Numerical Span Dimension (NSdim)} to quantify the approximation capacity limit of a family of networks both theoretically and practically. Furthermore, drawing on our new theoretical study and adopting a fresh perspective, we strive to understand the relationship between back-propagation neural networks and random parameter networks (such as the Extreme Learning Machine (ELM)) with both finite and infinite width. We also aim to provide fresh insights into regularization, the trade-off between width and depth, parameter space, width redundancy, condensation, and other related important issues., Comment: Universal Approximation; Bounded Weights; Analytic Function; Numerical Span Dimension; Infinite Width Neural Network}
- Published
- 2024
16. Coexistence of positive and negative information in information-epidemic dynamics on multiplex networks
- Author
-
Liu, Li-Ying, Cai, Chao-Ran, Zhang, Si-Ping, and Li, Bin-Quan
- Subjects
Physics - Physics and Society - Abstract
This paper investigates the coexistence of positive and negative information in the context of information-epidemic dynamics on multiplex networks. In accordance with the tenets of mean field theory, we present not only the analytic solution of the prevalence threshold, but also the coexistence conditions of two distinct forms of information (i.e., the two phase transition points at which a single form of information becomes extinct). In regions where multiple forms of information coexist, two completely distinct patterns emerge: monotonic and non-monotonic. The physical mechanisms that give rise to these different patterns have also been elucidated. The theoretical results are robust with regard to the network structure and show a high degree of agreement with the findings of the Monte Carlo simulation.
- Published
- 2024
17. Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework
- Author
-
Ying, Xinyi, Liu, Li, Lin, Zaipin, Shi, Yangsi, Wang, Yingqian, Li, Ruojing, Cao, Xu, Li, Boyang, and Zhou, Shilin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multi-frame infrared small target (MIRST) detection in satellite videos is a long-standing, fundamental yet challenging task for decades, and the challenges can be summarized as: First, extremely small target size, highly complex clutters & noises, various satellite motions result in limited feature representation, high false alarms, and difficult motion analyses. Second, the lack of large-scale public available MIRST dataset in satellite videos greatly hinders the algorithm development. To address the aforementioned challenges, in this paper, we first build a large-scale dataset for MIRST detection in satellite videos (namely IRSatVideo-LEO), and then develop a recurrent feature refinement (RFR) framework as the baseline method. Specifically, IRSatVideo-LEO is a semi-simulated dataset with synthesized satellite motion, target appearance, trajectory and intensity, which can provide a standard toolbox for satellite video generation and a reliable evaluation platform to facilitate the algorithm development. For baseline method, RFR is proposed to be equipped with existing powerful CNN-based methods for long-term temporal dependency exploitation and integrated motion compensation & MIRST detection. Specifically, a pyramid deformable alignment (PDA) module and a temporal-spatial-frequency modulation (TSFM) module are proposed to achieve effective and efficient feature alignment, propagation, aggregation and refinement. Extensive experiments have been conducted to demonstrate the effectiveness and superiority of our scheme. The comparative results show that ResUNet equipped with RFR outperforms the state-of-the-art MIRST detection methods. Dataset and code are released at https://github.com/XinyiYing/RFR.
- Published
- 2024
18. Roles of the scalar $f_0(500)$ and $f_0(980)$ in the process $D^0\to \pi^0\pi^0 \bar{K}^0$
- Author
-
Zhang, Xiao-Hui, Zhang, Han, Ke, Bai-Cian, Liu, Li-Juan, Li, De-Min, and Wang, En
- Subjects
High Energy Physics - Phenomenology - Abstract
Motivated by the near-threshold enhancement and the dip structure around 1~GeV in the $\pi^0\pi^0$ invariant mass distribution of the process $D^0\to \pi^0\pi^0\bar{K}^0$ observed by the CLEO Collaboration, we have investigated this process by taking into account the contribution from the $S$-wave pseudoscalar meson-pseudoscalar meson interactions within the chiral unitary approach, and also the one from the intermediate resonance $K^{*}(892)$. Our results are in good agreement with the CLEO measurements, which implies that, the near-threshold enhancement near the $\pi^0\pi^0$ threshold is mainly due to the contributions from the scalar meson $f_0(500)$ and the intermediate $K^*$, and the cusp structure around 1~GeV in the $\pi^0\pi^0$ invariant mass distribution should be associated with the scalar meson $f_0(980)$., Comment: 6 pages, 5 figures
- Published
- 2024
19. Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation
- Author
-
Liu, Li, Zhu, Ruijie, Deng, Jiacheng, Song, Ziyang, Yang, Wenfei, and Zhang, Tianzhu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Monocular depth estimation aims to infer a dense depth map from a single image, which is a fundamental and prevalent task in computer vision. Many previous works have shown impressive depth estimation results through carefully designed network structures, but they usually ignore the planar information and therefore perform poorly in low-texture areas of indoor scenes. In this paper, we propose Plane2Depth, which adaptively utilizes plane information to improve depth prediction within a hierarchical framework. Specifically, in the proposed plane guided depth generator (PGDG), we design a set of plane queries as prototypes to softly model planes in the scene and predict per-pixel plane coefficients. Then the predicted plane coefficients can be converted into metric depth values with the pinhole camera model. In the proposed adaptive plane query aggregation (APGA) module, we introduce a novel feature interaction approach to improve the aggregation of multi-scale plane features in a top-down manner. Extensive experiments show that our method can achieve outstanding performance, especially in low-texture or repetitive areas. Furthermore, under the same backbone network, our method outperforms the state-of-the-art methods on the NYU-Depth-v2 dataset, achieves competitive results with state-of-the-art methods KITTI dataset and can be generalized to unseen scenes effectively., Comment: 14 pages, 12 figures, 8 tables
- Published
- 2024
- Full Text
- View/download PDF
20. Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
- Author
-
Rong, Yan and Liu, Li
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Face-based Voice Conversion (FVC) is a novel task that leverages facial images to generate the target speaker's voice style. Previous work has two shortcomings: (1) suffering from obtaining facial embeddings that are well-aligned with the speaker's voice identity information, and (2) inadequacy in decoupling content and speaker identity information from the audio input. To address these issues, we present a novel FVC method, Identity-Disentanglement Face-based Voice Conversion (ID-FaceVC), which overcomes the above two limitations. More precisely, we propose an Identity-Aware Query-based Contrastive Learning (IAQ-CL) module to extract speaker-specific facial features, and a Mutual Information-based Dual Decoupling (MIDD) module to purify content features from audio, ensuring clear and high-quality voice conversion. Besides, unlike prior works, our method can accept either audio or text inputs, offering controllable speech generation with adjustable emotional tone and speed. Extensive experiments demonstrate that ID-FaceVC achieves state-of-the-art performance across various metrics, with qualitative and user study results confirming its effectiveness in naturalness, similarity, and diversity. Project website with audio samples and code can be found at https://id-facevc.github.io.
- Published
- 2024
21. 100luoy
- Author
-
Luo, Yufa, Liu, Li-Juan, and Pensoft Publishers
- Subjects
Biogeography ,co-evolution ,Grasses ,phylogenetics ,species distribution - Published
- 2024
22. A Pilot Study on Mandarin Chinese Cued Speech
- Author
-
Liu, Li and Feng, Gang
- Published
- 2019
- Full Text
- View/download PDF
23. Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation
- Author
-
Lin, Weilin, Liu, Li, Li, Jianze, and Xiong, Hui
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
Backdoor attacks present a serious security threat to deep neuron networks (DNNs). Although numerous effective defense techniques have been proposed in recent years, they inevitably rely on the availability of either clean or poisoned data. In contrast, data-free defense techniques have evolved slowly and still lag significantly in performance. To address this issue, different from the traditional approach of pruning followed by fine-tuning, we propose a novel data-free defense method named Optimal Transport-based Backdoor Repairing (OTBR) in this work. This method, based on our findings on neuron weight changes (NWCs) of random unlearning, uses optimal transport (OT)-based model fusion to combine the advantages of both pruned and backdoored models. Specifically, we first demonstrate our findings that the NWCs of random unlearning are positively correlated with those of poison unlearning. Based on this observation, we propose a random-unlearning NWC pruning technique to eliminate the backdoor effect and obtain a backdoor-free pruned model. Then, motivated by the OT-based model fusion, we propose the pruned-to-backdoored OT-based fusion technique, which fuses pruned and backdoored models to combine the advantages of both, resulting in a model that demonstrates high clean accuracy and a low attack success rate. To our knowledge, this is the first work to apply OT and model fusion techniques to backdoor defense. Extensive experiments show that our method successfully defends against all seven backdoor attacks across three benchmark datasets, outperforming both state-of-the-art (SOTA) data-free and data-dependent methods. The code implementation and Appendix are provided in the Supplementary Material.
- Published
- 2024
24. Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual Learning
- Author
-
Liu, Lei, Liu, Li, and Cui, Yawen
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Even in the era of large models, one of the well-known issues in continual learning (CL) is catastrophic forgetting, which is significantly challenging when the continual data stream exhibits a long-tailed distribution, termed as Long-Tailed Continual Learning (LTCL). Existing LTCL solutions generally require the label distribution of the data stream to achieve re-balance training. However, obtaining such prior information is often infeasible in real scenarios since the model should learn without pre-identifying the majority and minority classes. To this end, we propose a novel Prior-free Balanced Replay (PBR) framework to learn from long-tailed data stream with less forgetting. Concretely, motivated by our experimental finding that the minority classes are more likely to be forgotten due to the higher uncertainty, we newly design an uncertainty-guided reservoir sampling strategy to prioritize rehearsing minority data without using any prior information, which is based on the mutual dependence between the model and samples. Additionally, we incorporate two prior-free components to further reduce the forgetting issue: (1) Boundary constraint is to preserve uncertain boundary supporting samples for continually re-estimating task boundaries. (2) Prototype constraint is to maintain the consistency of learned class prototypes along with training. Our approach is evaluated on three standard long-tailed benchmarks, demonstrating superior performance to existing CL methods and previous SOTA LTCL approach in both task- and class-incremental learning settings, as well as ordered- and shuffled-LTCL settings.
- Published
- 2024
25. Segment Anything for Videos: A Systematic Survey
- Author
-
Zhang, Chunhui, Cui, Yawen, Lin, Weilin, Huang, Guanjie, Rong, Yan, Liu, Li, and Shan, Shiguang
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond, with the segment anything model (SAM) having sparked a passion for exploring task-agnostic visual foundation models. Empowered by its remarkable zero-shot generalization, SAM is currently challenging numerous traditional paradigms in CV, delivering extraordinary performance not only in various image segmentation and multi-modal segmentation (\eg, text-to-mask) tasks, but also in the video domain. Additionally, the latest released SAM 2 is once again sparking research enthusiasm in the realm of promptable visual segmentation for both images and videos. However, existing surveys mainly focus on SAM in various image processing tasks, a comprehensive and in-depth review in the video domain is notably absent. To address this gap, this work conducts a systematic review on SAM for videos in the era of foundation models. As the first to review the progress of SAM for videos, this work focuses on its applications to various tasks by discussing its recent advances, and innovation opportunities of developing foundation models on broad applications. We begin with a brief introduction to the background of SAM and video-related research domains. Subsequently, we present a systematic taxonomy that categorizes existing methods into three key areas: video understanding, video generation, and video editing, analyzing and summarizing their advantages and limitations. Furthermore, comparative results of SAM-based and current state-of-the-art methods on representative benchmarks, as well as insightful analysis are offered. Finally, we discuss the challenges faced by current research and envision several future research directions in the field of SAM for video and beyond., Comment: https://github.com/983632847/SAM-for-Videos
- Published
- 2024
26. BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning
- Author
-
Wu, Baoyuan, Chen, Hongrui, Zhang, Mingda, Zhu, Zihao, Wei, Shaokui, Yuan, Danni, Zhu, Mingli, Wang, Ruotong, Liu, Li, and Shen, Chao
- Subjects
Computer Science - Machine Learning ,Computer Science - Cryptography and Security - Abstract
As an emerging approach to explore the vulnerability of deep neural networks (DNNs), backdoor learning has attracted increasing interest in recent years, and many seminal backdoor attack and defense algorithms are being developed successively or concurrently, in the status of a rapid arms race. However, mainly due to the diverse settings, and the difficulties of implementation and reproducibility of existing works, there is a lack of a unified and standardized benchmark of backdoor learning, causing unfair comparisons or unreliable conclusions (e.g., misleading, biased or even false conclusions). Consequently, it is difficult to evaluate the current progress and design the future development roadmap of this literature. To alleviate this dilemma, we build a comprehensive benchmark of backdoor learning called BackdoorBench. Our benchmark makes three valuable contributions to the research community. 1) We provide an integrated implementation of state-of-the-art (SOTA) backdoor learning algorithms (currently including 20 attack and 32 defense algorithms), based on an extensible modular-based codebase. 2) We conduct comprehensive evaluations with 5 poisoning ratios, based on 4 models and 4 datasets, leading to 11,492 pairs of attack-against-defense evaluations in total. 3) Based on above evaluations, we present abundant analysis from 10 perspectives via 18 useful analysis tools, and provide several inspiring insights about backdoor learning. We hope that our efforts could build a solid foundation of backdoor learning to facilitate researchers to investigate existing algorithms, develop more innovative algorithms, and explore the intrinsic mechanism of backdoor learning. Finally, we have created a user-friendly website at http://backdoorbench.com, which collects all important information of BackdoorBench, including codebase, docs, leaderboard, and model Zoo., Comment: Substantial extensions based on our previous conference version "Backdoorbench: A comprehensive benchmark of backdoor learning" published at NeurIPS D&B Track 2022. 20 backdoor attack algorithms, 32 backdoor defense algorithms, 11000+ pairs of attack-against-defense evaluations, 10 analyses, 18 analysis tools
- Published
- 2024
27. Enhancing Transferability of Targeted Adversarial Examples: A Self-Universal Perspective
- Author
-
Peng, Bowen, Liu, Li, Liu, Tianpeng, Liu, Zhen, and Liu, Yongxiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Transfer-based targeted adversarial attacks against black-box deep neural networks (DNNs) have been proven to be significantly more challenging than untargeted ones. The impressive transferability of current SOTA, the generative methods, comes at the cost of requiring massive amounts of additional data and time-consuming training for each targeted label. This results in limited efficiency and flexibility, significantly hindering their deployment in practical applications. In this paper, we offer a self-universal perspective that unveils the great yet underexplored potential of input transformations in pursuing this goal. Specifically, transformations universalize gradient-based attacks with intrinsic but overlooked semantics inherent within individual images, exhibiting similar scalability and comparable results to time-consuming learning over massive additional data from diverse classes. We also contribute a surprising empirical insight that one of the most fundamental transformations, simple image scaling, is highly effective, scalable, sufficient, and necessary in enhancing targeted transferability. We further augment simple scaling with orthogonal transformations and block-wise applicability, resulting in the Simple, faSt, Self-universal yet Strong Scale Transformation (S$^4$ST) for self-universal TTA. On the ImageNet-Compatible benchmark dataset, our method achieves a 19.8% improvement in the average targeted transfer success rate against various challenging victim models over existing SOTA transformation methods while only consuming 36% time for attacking. It also outperforms resource-intensive attacks by a large margin in various challenging settings., Comment: 8 pages and 9 figures
- Published
- 2024
28. Minimum tracking linear response Hubbard and Hund corrected Density Functional Theory in CP2K
- Author
-
Chai, Ziwei, Si, Rutong, Chen, Mingyang, Teobaldi, Gilberto, O'Regan, David D., and Liu, Li-Min
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Materials Science ,Physics - Chemical Physics ,Physics - Computational Physics ,Quantum Physics - Abstract
We present the implementation of the Hubbard ($U$) and Hund ($J$) corrected Density Functional Theory (DFT+$U$+$J$) functionality in the Quickstep program, which is part of the CP2K suite. The tensorial and L\"owdin subspace representations are implemented and compared. Full analytical DFT+$U$+$J$ forces are implemented and benchmarked for the tensorial and L\"owdin representations. We also present the implementation of the recently proposed minimum-tracking linear-response method that enables the $U$ and $J$ parameters to be calculated on first principles basis without reference to the Kohn-Sham eigensystem. These implementations are benchmarked against recent results for different materials properties including DFT+$U$ band gap opening in NiO, the relative stability of various polaron distributions in TiO$_2$, the dependence of the calculated TiO$_2$ band gap on +$J$ corrections, and, finally, the role of the +$U$ and +$J$ corrections for the computed properties of a series of the hexahydrated transition metals. Our implementation provides results consistent with those already reported in the literature from comparable methods. We conclude the contribution with tests on the influence of the L\"owdin orthonormalization on the occupancies, calculated parameters, and derived properties.
- Published
- 2024
- Full Text
- View/download PDF
29. A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights
- Author
-
Lei, Wentao, Wang, Jinting, Ma, Fengji, Huang, Guanjie, and Liu, Li
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Human video generation is a dynamic and rapidly evolving task that aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. With the potential for wide-ranging applications in film, gaming, and virtual communication, the ability to generate natural and realistic human video is critical. Recent advancements in generative models have laid a solid foundation for the growing interest in this area. Despite the significant progress, the task of human video generation remains challenging due to the consistency of characters, the complexity of human motion, and difficulties in their relationship with the environment. This survey provides a comprehensive review of the current state of human video generation, marking, to the best of our knowledge, the first extensive literature review in this domain. We start with an introduction to the fundamentals of human video generation and the evolution of generative models that have facilitated the field's growth. We then examine the main methods employed for three key sub-tasks within human video generation: text-driven, audio-driven, and pose-driven motion generation. These areas are explored concerning the conditions that guide the generation process. Furthermore, we offer a collection of the most commonly utilized datasets and the evaluation metrics that are crucial in assessing the quality and realism of generated videos. The survey concludes with a discussion of the current challenges in the field and suggests possible directions for future research. The goal of this survey is to offer the research community a clear and holistic view of the advancements in human video generation, highlighting the milestones achieved and the challenges that lie ahead.
- Published
- 2024
30. ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation
- Author
-
Zhu, Ruijie, Wang, Chuxin, Song, Ziyang, Liu, Li, Zhang, Tianzhu, and Zhang, Yongdong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Estimating depth from a single image is a challenging visual task. Compared to relative depth estimation, metric depth estimation attracts more attention due to its practical physical significance and critical applications in real-life scenarios. However, existing metric depth estimation methods are typically trained on specific datasets with similar scenes, facing challenges in generalizing across scenes with significant scale variations. To address this challenge, we propose a novel monocular depth estimation method called ScaleDepth. Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction (SASP) module and an adaptive relative depth estimation (ARDE) module, respectively. The proposed ScaleDepth enjoys several merits. First, the SASP module can implicitly combine structural and semantic features of the images to predict precise scene scales. Second, the ARDE module can adaptively estimate the relative depth distribution of each image within a normalized depth space. Third, our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework, without the need for setting the depth range or fine-tuning model. Extensive experiments demonstrate that our method attains state-of-the-art performance across indoor, outdoor, unconstrained, and unseen scenes. Project page: https://ruijiezhu94.github.io/ScaleDepth, Comment: 14 pages, 11 figure, 13 tables
- Published
- 2024
31. Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher
- Author
-
Chen, Jiangming, Liu, Li, Deng, Wanxia, Liu, Zhen, Liu, Yu, Wei, Yingmei, and Liu, Yongxiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain. Promising results have been achieved via Mean Teacher, however, pseudo labeling which is the bottleneck of mutual learning remains to be further explored. In this study, we find that confidence misalignment of the predictions, including category-level overconfidence, instance-level task confidence inconsistency, and image-level confidence misfocusing, leading to the injection of noisy pseudo label in the training process, will bring suboptimal performance on the target domain. To tackle this issue, we present a novel general framework termed Multi-Granularity Confidence Alignment Mean Teacher (MGCAMT) for cross domain object detection, which alleviates confidence misalignment across category-, instance-, and image-levels simultaneously to obtain high quality pseudo supervision for better teacher-student learning. Specifically, to align confidence with accuracy at category level, we propose Classification Confidence Alignment (CCA) to model category uncertainty based on Evidential Deep Learning (EDL) and filter out the category incorrect labels via an uncertainty-aware selection strategy. Furthermore, to mitigate the instance-level misalignment between classification and localization, we design Task Confidence Alignment (TCA) to enhance the interaction between the two task branches and allow each classification feature to adaptively locate the optimal feature for the regression. Finally, we develop imagery Focusing Confidence Alignment (FCA) adopting another way of pseudo label learning, i.e., we use the original outputs from the Mean Teacher network for supervised learning without label assignment to concentrate on holistic information in the target image. These three procedures benefit from each other from a cooperative learning perspective.
- Published
- 2024
32. Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines
- Author
-
Ying, Xinyi, Xiao, Chao, Li, Ruojing, He, Xu, Li, Boyang, Li, Zhaoxu, Wang, Yingqian, Hu, Mingyuan, Xu, Qingyu, Lin, Zaiping, Li, Miao, Zhou, Shilin, An, Wei, Sheng, Weidong, and Liu, Li
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large target size cannot provide an impartial benchmark to evaluate multi-category visible-thermal small object detection (RGBT SOD) algorithms. In this paper, we build the first large-scale benchmark with high diversity for RGBT SOD (namely RGBT-Tiny), including 115 paired sequences, 93K frames and 1.2M manual annotations. RGBT-Tiny contains abundant targets (7 categories) and high-diversity scenes (8 types that cover different illumination and density variations). Note that, over 81% of targets are smaller than 16x16, and we provide paired bounding box annotations with tracking ID to offer an extremely challenging benchmark with wide-range applications, such as RGBT fusion, detection and tracking. In addition, we propose a scale adaptive fitness (SAFit) measure that exhibits high robustness on both small and large targets. The proposed SAFit can provide reasonable performance evaluation and promote detection performance. Based on the proposed RGBT-Tiny dataset and SAFit measure, extensive evaluations have been conducted, including 23 recent state-of-the-art algorithms that cover four different types (i.e., visible generic detection, visible SOD, thermal SOD and RGBT object detection). Project is available at https://github.com/XinyiYing24/RGBT-Tiny.
- Published
- 2024
33. New QEC codes and EAQEC codes from repeated-root cyclic codes of length $2^rp^s$
- Author
-
Li, Lanqiang, Cao, Ziwen, Wu, Tingting, and Liu, Li
- Subjects
Computer Science - Information Theory ,94B15 (Primary) 94B05, 11T71(Secondary) - Abstract
Let $p$ be an odd prime and $r,s,m$ be positive integers. In this study, we initiate our exploration by delving into the intricate structure of all repeated-root cyclic codes and their duals with a length of $2^rp^s$ over the finite field $\mathbb{F}_{p^m}$. Through the utilization of CSS and Steane's constructions, a series of new quantum error-correcting (QEC) codes are constructed with parameters distinct from all previous constructions. Furthermore, we provide all maximum distance separable (MDS) cyclic codes of length $2^rp^s$, which are further utilized in the construction of QEC MDS codes. Finally, we introduce a significant number of novel entanglement-assisted quantum error-correcting (EAQEC) codes derived from these repeated-root cyclic codes. Notably, these newly constructed codes exhibit parameters distinct from those of previously known constructions.
- Published
- 2024
34. Low Duty Cycle Pulsed UV Technique for Spectroscopy of Aluminum Monochloride
- Author
-
Liu, Li-Ren, Kendrick, Brian K., and Hemmerling, Boerge
- Subjects
Physics - Atomic Physics ,Physics - Optics - Abstract
We present a novel technique to minimize UV-induced damage in experiments that employ second-harmonic generation cavities. The principle of our approach is to reduce the duty cycle of the UV light as much as possible to prolong the lifetime of the used optics. The low duty cycle is achieved by ramping the cavity into resonance for a short time during the experimental cycle when the light is used and tuning it to an off-resonant state otherwise. The necessary fast ramp and length-stabilization control of the cavity is implemented with the FPGA-based STEMlab platform. We demonstrate the utility of this method by measuring the isotope shift of the electronic transition ($X^1\Sigma \leftarrow A^1\Pi$) in AlCl at 261.5 nm in a pulsed molecular beam experiment.
- Published
- 2024
- Full Text
- View/download PDF
35. TwinS: Revisiting Non-Stationarity in Multivariate Time Series Forecasting
- Author
-
Hu, Jiaxi, Wen, Qingsong, Ruan, Sijie, Liu, Li, and Liang, Yuxuan
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Recently, multivariate time series forecasting tasks have garnered increasing attention due to their significant practical applications, leading to the emergence of various deep forecasting models. However, real-world time series exhibit pronounced non-stationary distribution characteristics. These characteristics are not solely limited to time-varying statistical properties highlighted by non-stationary Transformer but also encompass three key aspects: nested periodicity, absence of periodic distributions, and hysteresis among time variables. In this paper, we begin by validating this theory through wavelet analysis and propose the Transformer-based TwinS model, which consists of three modules to address the non-stationary periodic distributions: Wavelet Convolution, Period-Aware Attention, and Channel-Temporal Mixed MLP. Specifically, The Wavelet Convolution models nested periods by scaling the convolution kernel size like wavelet transform. The Period-Aware Attention guides attention computation by generating period relevance scores through a convolutional sub-network. The Channel-Temporal Mixed MLP captures the overall relationships between time series through channel-time mixing learning. TwinS achieves SOTA performance compared to mainstream TS models, with a maximum improvement in MSE of 25.8\% over PatchTST.
- Published
- 2024
36. A tri-light warning system for hospitalized COVID-19 patients: Credibility-based risk stratification for future pandemic preparedness
- Author
-
Xu, Chuanjun, Xu, Qinmei, Liu, Li, Zhou, Mu, Xing, Zijian, Zhou, Zhen, Ren, Danyang, Zhou, Changsheng, Zhang, Longjiang, Li, Xiao, Zhan, Xianghao, Gevaert, Olivier, and Lu, Guangming
- Subjects
Biomedical and Clinical Sciences ,Clinical Sciences ,Genetics ,Lung ,Infectious Diseases ,Coronaviruses ,Emerging Infectious Diseases ,Patient Safety ,Good Health and Well Being ,COVID-19 pandemic ,Multi-modal artificial intelligence ,Risk stratification ,Conformal prediction ,Multi-center study - Abstract
PurposeThe novel coronavirus pneumonia (COVID-19) has continually spread and mutated, requiring a patient risk stratification system to optimize medical resources and improve pandemic response. We aimed to develop a conformal prediction-based tri-light warning system for stratifying COVID-19 patients, applicable to both original and emerging variants.MethodsWe retrospectively collected data from 3646 patients across multiple centers in China. The dataset was divided into a training set (n = 1451), a validation set (n = 662), an external test set from Huoshenshan Field Hospital (n = 1263), and a specific test set for Delta and Omicron variants (n = 544). The tri-light warning system extracts radiomic features from CT (computed tomography) and integrates clinical records to classify patients into high-risk (red), uncertain-risk (yellow), and low-risk (green) categories. Models were built to predict ICU (intensive care unit) admissions (adverse cases in training/validation/Huoshenshan/variant test sets: n = 39/21/262/11) and were evaluated using AUROC ((area under the receiver operating characteristic curve)) and AUPRC ((area under the precision-recall curve)) metrics.ResultsThe dataset included 1830 men (50.2 %) and 1816 women (50.8 %), with a median age of 53.7 years (IQR [interquartile range]: 42-65 years). The system demonstrated strong performance under data distribution shifts, with AUROC of 0.89 and AUPRC of 0.42 for original strains, and AUROC of 0.77-0.85 and AUPRC of 0.51-0.60 for variants.ConclusionThe tri-light warning system can enhance pandemic responses by effectively stratifying COVID-19 patients under varying conditions and data shifts.
- Published
- 2024
37. Reform of China’s Science and Technology System in the Xi Jinping Era
- Author
-
Cao, Cong, Li, Ning, Li, Xia, and Liu, Li
- Published
- 2018
- Full Text
- View/download PDF
38. Inhibition of lysine acetyltransferase KAT6 in ER+HER2- metastatic breast cancer: a phase 1 trial.
- Author
-
Mukohara, Toru, Park, Yeon, Sommerhalder, David, Yonemori, Kan, Hamilton, Erika, Kim, Sung-Bae, Kim, Jee, Iwata, Hiroji, Yamashita, Toshinari, Layman, Rachel, Mita, Monica, Clay, Timothy, Chae, Yee, Oakman, Catherine, Yan, Fengting, Kim, Gun, Im, Seock-Ah, Lindeman, Geoffrey, Rugo, Hope, Liyanage, Marlon, Saul, Michelle, Le Corre, Christophe, Skoura, Athanasia, Liu, Li, Li, Meng, and LoRusso, Patricia
- Subjects
Humans ,Female ,Breast Neoplasms ,Histone Acetyltransferases ,Middle Aged ,Receptor ,ErbB-2 ,Receptors ,Estrogen ,Fulvestrant ,Aged ,Adult ,Neoplasm Metastasis ,Antineoplastic Combined Chemotherapy Protocols - Abstract
Inhibition of histone lysine acetyltransferases (KATs) KAT6A and KAT6B has shown antitumor activity in estrogen receptor-positive (ER+) breast cancer preclinical models. PF-07248144 is a selective catalytic inhibitor of KAT6A and KAT6B. In the present study, we report the safety, pharmacokinetics (PK), pharmacodynamics, efficacy and biomarker results from the first-in-human, phase 1 dose escalation and dose expansion study (n = 107) of PF-07248144 monotherapy and fulvestrant combination in heavily pretreated ER+ human epidermal growth factor receptor-negative (HER2-) metastatic breast cancer (mBC). The primary objectives of assessing the safety and tolerability and determining the recommended dose for expansion of PF-07248144, as monotherapy and in combination with fulvestrant, were met. Secondary endpoints included characterization of PK and evaluation of antitumor activity, including objective response rate (ORR) and progression-free survival (PFS). Common treatment-related adverse events (any grade; grades 3-4) included dysgeusia (83.2%, 0%), neutropenia (59.8%, 35.5%) and anemia (48.6%, 13.1%). Exposure was approximately dose proportional. Antitumor activity was observed as monotherapy. For the PF-07248144-fulvestrant combination (n = 43), the ORR (95% confidence interval (CI)) was 30.2% (95% CI = 17.2-46.1%) and the median PFS was 10.7 (5.3-not evaluable) months. PF-07248144 demonstrated a tolerable safety profile and durable antitumor activity in heavily pretreated ER+HER2- mBC. These findings establish KAT6A and KAT6B as druggable cancer targets, provide clinical proof of concept and reveal a potential avenue to treat mBC. clinicaltrial.gov registration: NCT04606446 .
- Published
- 2024
39. Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness
- Author
-
Lin, Weilin, Liu, Li, Wei, Shaokui, Li, Jianze, and Xiong, Hui
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
The security threat of backdoor attacks is a central concern for deep neural networks (DNNs). Recently, without poisoned data, unlearning models with clean data and then learning a pruning mask have contributed to backdoor defense. Additionally, vanilla fine-tuning with those clean data can help recover the lost clean accuracy. However, the behavior of clean unlearning is still under-explored, and vanilla fine-tuning unintentionally induces back the backdoor effect. In this work, we first investigate model unlearning from the perspective of weight changes and gradient norms, and find two interesting observations in the backdoored model: 1) the weight changes between poison and clean unlearning are positively correlated, making it possible for us to identify the backdoored-related neurons without using poisoned data; 2) the neurons of the backdoored model are more active (i.e., larger changes in gradient norm) than those in the clean model, suggesting the need to suppress the gradient norm during fine-tuning. Then, we propose an effective two-stage defense method. In the first stage, an efficient Neuron Weight Change (NWC)-based Backdoor Reinitialization is proposed based on observation 1). In the second stage, based on observation 2), we design an Activeness-Aware Fine-Tuning to replace the vanilla fine-tuning. Extensive experiments, involving eight backdoor attacks on three benchmark datasets, demonstrate the superior performance of our proposed method compared to recent state-of-the-art backdoor defense approaches.
- Published
- 2024
40. WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark
- Author
-
Zhang, Chunhui, Liu, Li, Huang, Guanjie, Wen, Hao, Zhou, Xi, and Wang, Yanfeng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Underwater object tracking (UOT) is a foundational task for identifying and tracing submerged entities in underwater video sequences. However, current UOT datasets suffer from limitations in scale, diversity of target categories and scenarios covered, hindering the training and evaluation of modern tracking algorithms. To bridge this gap, we take the first step and introduce WebUOT-1M, \ie, the largest public UOT benchmark to date, sourced from complex and realistic underwater environments. It comprises 1.1 million frames across 1,500 video clips filtered from 408 target categories, largely surpassing previous UOT datasets, \eg, UVOT400. Through meticulous manual annotation and verification, we provide high-quality bounding boxes for underwater targets. Additionally, WebUOT-1M includes language prompts for video sequences, expanding its application areas, \eg, underwater vision-language tracking. Most existing trackers are tailored for open-air environments, leading to performance degradation when applied to UOT due to domain gaps. Retraining and fine-tuning these trackers are challenging due to sample imbalances and limited real-world underwater datasets. To tackle these challenges, we propose a novel omni-knowledge distillation framework based on WebUOT-1M, incorporating various strategies to guide the learning of the student Transformer. To the best of our knowledge, this framework is the first to effectively transfer open-air domain knowledge to the UOT model through knowledge distillation, as demonstrated by results on both existing UOT datasets and the newly proposed WebUOT-1M. Furthermore, we comprehensively evaluate WebUOT-1M using 30 deep trackers, showcasing its value as a benchmark for UOT research by presenting new challenges and opportunities for future studies. The complete dataset, codes and tracking results, will be made publicly available., Comment: GitHub project: https://github.com/983632847/Awesome-Multimodal-Object-Tracking
- Published
- 2024
41. TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability
- Author
-
Ma, Fengji, Liu, Li, and Cheng, Hei Victor
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
This work addresses the challenge of achieving zero-shot adversarial robustness while preserving zero-shot generalization in large-scale foundation models, with a focus on the popular Contrastive Language-Image Pre-training (CLIP). Although foundation models were reported to have exceptional zero-shot generalization, they are highly vulnerable to adversarial perturbations. Existing methods achieve a comparable good tradeoff between zero-shot adversarial robustness and generalization under small adversarial perturbations. However, they fail to achieve a good tradeoff under large adversarial perturbations. To this end, we propose a novel Text-Image Mutual Awareness (TIMA) method that strikes a balance between zero-shot adversarial robustness and generalization. More precisely, we propose an Image-Aware Text (IAT) tuning mechanism that increases the inter-class distance of text embeddings by incorporating the Minimum Hyperspherical Energy (MHE). Simultaneously, fixed pre-trained image embeddings are used as cross-modal auxiliary supervision to maintain the similarity between the MHE-tuned and original text embeddings by the knowledge distillation, preserving semantic information between different classes. Besides, we introduce a Text-Aware Image (TAI) tuning mechanism, which increases inter-class distance between image embeddings during the training stage by Text-distance based Adaptive Margin (TAM). Similarly, a knowledge distillation is utilized to retain the similarity between fine-tuned and pre-trained image embeddings. Extensive experimental results demonstrate the effectiveness of our approach, showing impressive zero-shot performance against a wide range of adversarial perturbations while preserving the zero-shot generalization capabilities of the original CLIP model.
- Published
- 2024
42. Awesome Multi-modal Object Tracking
- Author
-
Zhang, Chunhui, Liu, Li, Wen, Hao, Zhou, Xi, and Wang, Yanfeng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Multi-modal object tracking (MMOT) is an emerging field that combines data from various modalities, \eg vision (RGB), depth, thermal infrared, event, language and audio, to estimate the state of an arbitrary object in a video sequence. It is of great significance for many applications such as autonomous driving and intelligent surveillance. In recent years, MMOT has received more and more attention. However, existing MMOT algorithms mainly focus on two modalities (\eg RGB+depth, RGB+thermal infrared, and RGB+language). To leverage more modalities, some recent efforts have been made to learn a unified visual object tracking model for any modality. Additionally, some large-scale multi-modal tracking benchmarks have been established by simultaneously providing more than two modalities, such as vision-language-audio (\eg WebUAV-3M) and vision-depth-language (\eg UniMod1K). To track the latest progress in MMOT, we conduct a comprehensive investigation in this report. Specifically, we first divide existing MMOT tasks into five main categories, \ie RGBL tracking, RGBE tracking, RGBD tracking, RGBT tracking, and miscellaneous (RGB+X), where X can be any modality, such as language, depth, and event. Then, we analyze and summarize each MMOT task, focusing on widely used datasets and mainstream tracking algorithms based on their technical paradigms (\eg self-supervised learning, prompt learning, knowledge distillation, generative models, and state space models). Finally, we maintain a continuously updated paper list for MMOT at https://github.com/983632847/Awesome-Multimodal-Object-Tracking., Comment: A continuously updated project to track the latest progress in multi-modal object tracking
- Published
- 2024
43. SARatrX: Towards Building A Foundation Model for SAR Target Recognition
- Author
-
Li, Weijie, Yang, Wei, Hou, Yuenan, Liu, Li, Liu, Yongxiang, and Li, Xiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Despite the remarkable progress in synthetic aperture radar automatic target recognition (SAR ATR), recent efforts have concentrated on the detection or classification of a specific and coarse category, e.g., vehicles, ships, airplanes, or buildings. One of the fundamental limitations of the top-performing SAR ATR methods is that the learning paradigm is supervised, task-specific, limited-category, closed-world learning, which depends on massive amounts of accurately annotated samples that are expensively labeled by expert SAR analysts and has limited generalization capability and scalability. In this work, we make the first attempt towards building a foundation model for SAR ATR, termed SARatrX. SARatrX learns generalizable representations via self-supervised learning (SSL) and provides a basis for label-efficient model adaptation to generic SAR target detection and classification tasks. Specifically, SARatrX is trained on 0.18 M unlabelled SAR target samples, which are curated by combining contemporary benchmarks and constitute the largest publicly available dataset till now. Considering the characteristics of SAR images, a backbone tailored for SAR ATR is carefully designed, and a two-step SSL method endowed with multi-scale gradient features was applied to ensure the feature diversity and model scalability of SARatrX. The capabilities of SARatrX are evaluated on classification under few-shot and robustness settings and detection across various categories and scenes, and impressive performance is achieved, often competitive with or even superior to prior fully supervised, semi-supervised, or self-supervised algorithms. Our SARatrX and the curated dataset are released at https://github.com/waterdisappear/SARatrX to foster research into foundation models for SAR ATR and SAR image interpretation.
- Published
- 2024
44. Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model
- Author
-
Lei, Wentao, Liu, Li, and Wang, Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Cued Speech (CS) is an advanced visual phonetic encoding system that integrates lip reading with hand codings, enabling people with hearing impairments to communicate efficiently. CS video generation aims to produce specific lip and gesture movements of CS from audio or text inputs. The main challenge is that given limited CS data, we strive to simultaneously generate fine-grained hand and finger movements, as well as lip movements, meanwhile the two kinds of movements need to be asynchronously aligned. Existing CS generation methods are fragile and prone to poor performance due to template-based statistical models and careful hand-crafted pre-processing to fit the models. Therefore, we propose a novel Gloss-prompted Diffusion-based CS Gesture generation framework (called GlossDiff). Specifically, to integrate additional linguistic rules knowledge into the model. we first introduce a bridging instruction called \textbf{Gloss}, which is an automatically generated descriptive text to establish a direct and more delicate semantic connection between spoken language and CS gestures. Moreover, we first suggest rhythm is an important paralinguistic feature for CS to improve the communication efficacy. Therefore, we propose a novel Audio-driven Rhythmic Module (ARM) to learn rhythm that matches audio speech. Moreover, in this work, we design, record, and publish the first Chinese CS dataset with four CS cuers. Extensive experiments demonstrate that our method quantitatively and qualitatively outperforms current state-of-the-art (SOTA) methods. We release the code and data at https://glossdiff.github.io/.
- Published
- 2024
45. Voice Attribute Editing with Text Prompt
- Author
-
Sheng, Zhengyan, Ai, Yang, Liu, Li-Juan, Pan, Jia, and Ling, Zhen-Hua
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Despite recent advancements in speech generation with text prompt providing control over speech style, voice attributes in synthesized speech remain elusive and challenging to control. This paper introduces a novel task: voice attribute editing with text prompt, with the goal of making relative modifications to voice attributes according to the actions described in the text prompt. To solve this task, VoxEditor, an end-to-end generative model, is proposed. In VoxEditor, addressing the insufficiency of text prompt, a Residual Memory (ResMem) block is designed, that efficiently maps voice attributes and these descriptors into the shared feature space. Additionally, the ResMem block is enhanced with a voice attribute degree prediction (VADP) block to align voice attributes with corresponding descriptors, addressing the imprecision of text prompt caused by non-quantitative descriptions of voice attributes. We also establish the open-source VCTK-RVA dataset, which leads the way in manual annotations detailing voice characteristic differences among different speakers. Extensive experiments demonstrate the effectiveness and generalizability of our proposed method in terms of both objective and subjective metrics. The dataset and audio samples are available on the website.
- Published
- 2024
46. From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsy Narratives
- Author
-
Fan, Shuxian, Visokay, Adam, Hoffman, Kentaro, Salerno, Stephen, Liu, Li, Leek, Jeffrey T., and McCormick, Tyler H.
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
In settings where most deaths occur outside the healthcare system, verbal autopsies (VAs) are a common tool to monitor trends in causes of death (COD). VAs are interviews with a surviving caregiver or relative that are used to predict the decedent's COD. Turning VAs into actionable insights for researchers and policymakers requires two steps (i) predicting likely COD using the VA interview and (ii) performing inference with predicted CODs (e.g. modeling the breakdown of causes by demographic factors using a sample of deaths). In this paper, we develop a method for valid inference using outcomes (in our case COD) predicted from free-form text using state-of-the-art NLP techniques. This method, which we call multiPPI++, extends recent work in "prediction-powered inference" to multinomial classification. We leverage a suite of NLP techniques for COD prediction and, through empirical analysis of VA data, demonstrate the effectiveness of our approach in handling transportability issues. multiPPI++ recovers ground truth estimates, regardless of which NLP model produced predictions and regardless of whether they were produced by a more accurate predictor like GPT-4-32k or a less accurate predictor like KNN. Our findings demonstrate the practical importance of inference correction for public health decision-making and suggests that if inference tasks are the end goal, having a small amount of contextually relevant, high quality labeled data is essential regardless of the NLP algorithm., Comment: 12 pages, 7 figures
- Published
- 2024
47. SceneTracker: Long-term Scene Flow Estimation Network
- Author
-
Wang, Bo, Li, Jian, Yu, Yang, Liu, Li, Sun, Zhenping, and Hu, Dewen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Considering the complementarity of scene flow estimation in the spatial domain's focusing capability and 3D object tracking in the temporal domain's coherence, this study aims to address a comprehensive new task that can simultaneously capture fine-grained and long-term 3D motion in an online manner: long-term scene flow estimation (LSFE). We introduce SceneTracker, a novel learning-based LSFE network that adopts an iterative approach to approximate the optimal trajectory. Besides, it dynamically indexes and constructs appearance and depth correlation features simultaneously and employs the Transformer to explore and utilize long-range connections within and between trajectories. With detailed experiments, SceneTracker shows superior capabilities in handling 3D spatial occlusion and depth noise interference, highly tailored to the LSFE task's needs. Finally, we build the first real-world evaluation dataset, LSFDriving, further substantiating SceneTracker's commendable generalization capacity. The code and data for SceneTracker is available at https://github.com/wwsource/SceneTracker.
- Published
- 2024
48. Convert laser light into single photons via interference
- Author
-
Li, Yanfeng, Wang, Manman, Huang, Guoqi, Liu, Li, Wang, Wenyan, Ji, Weijie, Liu, Hanqing, Su, Xiangbin, Li, Shulun, Dai, Deyan, Shang, Xiangjun, Ni, Haiqiao, Niu, Zhichuan, and Hu, Chengyong
- Subjects
Quantum Physics ,Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Atomic Physics ,Physics - Optics - Abstract
Laser light possesses perfect coherence, but cannot be attenuated to single photons via linear optics. An elegant route to convert laser light into single photons is based on photon blockade in a cavity with a single atom in the strong coupling regime. However, the single-photon purity achieved by this method remains relatively low. Here we propose an interference-based approach where laser light can be transformed into single photons by destructively interfering with a weak but super-bunched incoherent field emitted from a cavity coupling to a single quantum emitter. We demonstrate this idea by measuring the reflected light of a laser field which drives a double-sided optical microcavity containing a single artificial atom-quantum dot (QD) in the Purcell regime. The reflected light consists of a superposition of the driving field with the cavity output field. We achieve the second-order autocorrelation g2(0)=0.030+-0.002 and the two-photon interference visibility 94.3%+-0.2. By separating the coherent and incoherent fields in the reflected light, we observe that the incoherent field from the cavity exhibits super-bunching with g2(0)=41+-2 while the coherent field remains Poissonian statistics. By controlling the relative amplitude of coherent and incoherent fields, we verify that photon statistics of reflected light is tuneable from perfect anti-bunching to super-bunching in agreement with our predictions. Our results demonstrate photon statistics of light as a quantum interference phenomenon that a single QD can scatter two photons simultaneously at low driving fields in contrast to the common picture that a single two-level quantum emitter can only scatter (or absorb and emit) single photons. This work opens the door to tailoring photon statistics of laser light via cavity or waveguide quantum electrodynamics and interference., Comment: Comments are welcome
- Published
- 2024
49. Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review
- Author
-
Wang, Jinge, Cheng, Zien, Yao, Qiuming, Liu, Li, Xu, Dong, and Hu, Gangqing
- Subjects
Quantitative Biology - Other Quantitative Biology ,Computer Science - Artificial Intelligence - Abstract
The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments., Comment: Peer-reviewed and accepted by Quantitative Biology
- Published
- 2024
50. SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
- Author
-
Liu, Lizhe, Wang, Bohua, Xie, Hongwei, Liu, Daqi, Liu, Li, Tian, Zhiqiang, Yang, Kuiyuan, and Wang, Bing
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Vision-centric 3D environment understanding is both vital and challenging for autonomous driving systems. Recently, object-free methods have attracted considerable attention. Such methods perceive the world by predicting the semantics of discrete voxel grids but fail to construct continuous and accurate obstacle surfaces. To this end, in this paper, we propose SurroundSDF to implicitly predict the signed distance field (SDF) and semantic field for the continuous perception from surround images. Specifically, we introduce a query-based approach and utilize SDF constrained by the Eikonal formulation to accurately describe the surfaces of obstacles. Furthermore, considering the absence of precise SDF ground truth, we propose a novel weakly supervised paradigm for SDF, referred to as the Sandwich Eikonal formulation, which emphasizes applying correct and dense constraints on both sides of the surface, thereby enhancing the perceptual accuracy of the surface. Experiments suggest that our method achieves SOTA for both occupancy prediction and 3D scene reconstruction tasks on the nuScenes dataset.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.