Author: "Li, Yunsong" / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Li, Yunsong"' showing total 1,146 results

Start Over Author "Li, Yunsong" Publication Year Range Last 10 years

1,146 results on '"Li, Yunsong"'

1. SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection

Author: Dong, Shuhan, Li, Yunsong, Xie, Weiying, Zhang, Jiaqing, Tian, Jiayuan, Yang, Danian, and Lei, Jie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal object detection leverages diverse modal information to enhance the accuracy and robustness of detectors. By learning long-term dependencies, Transformer can effectively integrate multimodal features in the feature extraction stage, which greatly improves the performance of multimodal object detection. However, current methods merely stack Transformer-guided fusion techniques without exploring their capability to extract features at various depth layers of network, thus limiting the improvements in detection performance. In this paper, we introduce an accurate and efficient object detection method named SeaDATE. Initially, we propose a novel dual attention Feature Fusion (DTF) module that, under Transformer's guidance, integrates local and global information through a dual attention mechanism, strengthening the fusion of modal features from orthogonal perspectives using spatial and channel tokens. Meanwhile, our theoretical analysis and empirical validation demonstrate that the Transformer-guided fusion method, treating images as sequences of pixels for fusion, performs better on shallow features' detail information compared to deep semantic information. To address this, we designed a contrastive learning (CL) module aimed at learning features of multimodal samples, remedying the shortcomings of Transformer-guided fusion in extracting deep semantic features, and effectively utilizing cross-modal information. Extensive experiments and ablation studies on the FLIR, LLVIP, and M3FD datasets have proven our method to be effective, achieving state-of-the-art detection performance.
Published: 2024

2. AMPO: Automatic Multi-Branched Prompt Optimization

Author: Yang, Sheng, Wu, Yurong, Gao, Yan, Zhou, Zineng, Zhu, Bin Benjamin, Sun, Xiaodi, Lou, Jian-Guang, Ding, Zhiming, Hu, Anbang, Fang, Yuan, Li, Yunsong, Chen, Junyan, and Yang, Linjun
Subjects: Computer Science - Computation and Language
Abstract: Prompt engineering is very important to enhance the performance of large language models (LLMs). When dealing with complex issues, prompt engineers tend to distill multiple patterns from examples and inject relevant solutions to optimize the prompts, achieving satisfying results. However, existing automatic prompt optimization techniques are only limited to producing single flow instructions, struggling with handling diverse patterns. In this paper, we present AMPO, an automatic prompt optimization method that can iteratively develop a multi-branched prompt using failure cases as feedback. Our goal is to explore a novel way of structuring prompts with multi-branches to better handle multiple patterns in complex tasks, for which we introduce three modules: Pattern Recognition, Branch Adjustment, and Branch Pruning. In experiments across five tasks, AMPO consistently achieves the best results. Additionally, our approach demonstrates significant optimization efficiency due to our adoption of a minimal search strategy., Comment: 13 pages, 7 figures, 6 tables
Published: 2024

3. Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection

Author: Zhang, Mingjin, Zhang, Chi, Zhang, Qiming, Li, Yunsong, Gao, Xinbo, and Zhang, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancements in deep learning have greatly advanced the field of infrared small object detection (IRSTD). Despite their remarkable success, a notable gap persists between these IRSTD methods and generic segmentation approaches in natural image domains. This gap primarily arises from the significant modality differences and the limited availability of infrared data. In this study, we aim to bridge this divergence by investigating the adaptation of generic segmentation models, such as the Segment Anything Model (SAM), to IRSTD tasks. Our investigation reveals that many generic segmentation models can achieve comparable performance to state-of-the-art IRSTD methods. However, their full potential in IRSTD remains untapped. To address this, we propose a simple, lightweight, yet effective baseline model for segmenting small infrared objects. Through appropriate distillation strategies, we empower smaller student models to outperform state-of-the-art methods, even surpassing fine-tuned teacher results. Furthermore, we enhance the model's performance by introducing a novel query design comprising dense and sparse queries to effectively encode multi-scale features. Through extensive experimentation across four popular IRSTD datasets, our model demonstrates significantly improved performance in both accuracy and throughput compared to existing approaches, surpassing SAM and Semantic-SAM by over 14 IoU on NUDT and 4 IoU on IRSTD1k. The source code and models will be released at https://github.com/O937-blip/SimIR., Comment: ACM MM'24
Published: 2024
Full Text: View/download PDF

4. FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation

Author: Li, Daixun, Xie, Weiying, Cao, Mingxiang, Wang, Yunke, Zhang, Jiaqing, Li, Yunsong, Fang, Leyuan, and Xu, Chang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal image fusion and segmentation enhance scene understanding in autonomous driving by integrating data from various sensors. However, current models struggle to efficiently segment densely packed elements in such scenes, due to the absence of comprehensive fusion features that can guide mid-process fine-tuning and focus attention on relevant areas. The Segment Anything Model (SAM) has emerged as a transformative segmentation method. It provides more effective prompts through its flexible prompt encoder, compared to transformers lacking fine-tuned control. Nevertheless, SAM has not been extensively studied in the domain of multimodal fusion for natural images. In this paper, we introduce SAM into multimodal image segmentation for the first time, proposing a novel framework that combines Latent Space Token Generation (LSTG) and Fusion Mask Prompting (FMP) modules to enhance SAM's multimodal fusion and segmentation capabilities. Specifically, we first obtain latent space features of the two modalities through vector quantization and embed them into a cross-attention-based inter-domain fusion module to establish long-range dependencies between modalities. Then, we use these comprehensive fusion features as prompts to guide precise pixel-level segmentation. Extensive experiments on several public datasets demonstrate that the proposed method significantly outperforms SAM and SAM2 in multimodal autonomous driving scenarios, achieving at least 3.9$\%$ higher segmentation mIoU than the state-of-the-art approaches.
Published: 2024

5. FedFQ: Federated Learning with Fine-Grained Quantization

Author: Li, Haowei, Xie, Weiying, Ye, Hangyu, Ma, Jitao, Ma, Shuran, and Li, Yunsong
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Federated learning (FL) is a decentralized approach, enabling multiple participants to collaboratively train a model while ensuring the protection of data privacy. The transmission of updates from numerous edge clusters to the server creates a significant communication bottleneck in FL. Quantization is an effective compression technology, showcasing immense potential in addressing this bottleneck problem. The Non-IID nature of FL renders it sensitive to quantization. Existing quantized FL frameworks inadequately balance high compression ratios and superior convergence performance by roughly employing a uniform quantization bit-width on the client-side. In this work, we propose a communication-efficient FL algorithm with a fine-grained adaptive quantization strategy (FedFQ). FedFQ addresses the trade-off between achieving high communication compression ratios and maintaining superior convergence performance by introducing parameter-level quantization. Specifically, we have designed a Constraint-Guided Simulated Annealing algorithm to determine specific quantization schemes. We derive the convergence of FedFQ, demonstrating its superior convergence performance compared to existing quantized FL algorithms. We conducted extensive experiments on multiple benchmarks and demonstrated that, while maintaining lossless performance, FedFQ achieves a compression ratio of 27 times to 63 times compared to the baseline experiment.
Published: 2024

6. Reducing Spurious Correlation for Federated Domain Generalization

Author: Ma, Shuran, Xie, Weiying, Li, Daixun, Li, Haowei, and Li, Yunsong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The rapid development of multimedia has provided a large amount of data with different distributions for visual tasks, forming different domains. Federated Learning (FL) can efficiently use this diverse data distributed on different client media in a decentralized manner through model sharing. However, in open-world scenarios, there is a challenge: global models may struggle to predict well on entirely new domain data captured by certain media, which were not encountered during training. Existing methods still rely on strong statistical correlations between samples and labels to address this issue, which can be misleading, as some features may establish spurious short-cut correlations with the predictions. To comprehensively address this challenge, we introduce FedCD (Cross-Domain Invariant Federated Learning), an overall optimization framework at both the local and global levels. We introduce the Spurious Correlation Intervener (SCI), which employs invariance theory to locally generate interventers for features in a self-supervised manner to reduce the model's susceptibility to spurious correlated features. Our approach requires no sharing of data or features, only the gradients related to the model. Additionally, we develop the simple yet effective Risk Extrapolation Aggregation strategy (REA), determining aggregation coefficients through mathematical optimization to facilitate global causal invariant predictions. Extensive experiments and ablation studies highlight the effectiveness of our approach. In both classification and object detection generalization tasks, our method outperforms the baselines by an average of at least 1.45% in Acc, 4.8% and 1.27% in mAP50., Comment: 10 pages, 4 figures
Published: 2024

7. FoRA: Low-Rank Adaptation Model beyond Multimodal Siamese Network

Author: Xie, Weiying, Zhang, Yusi, Hui, Tianlin, Zhang, Jiaqing, Lei, Jie, and Li, Yunsong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Multimodal object detection offers a promising prospect to facilitate robust detection in various visual conditions. However, existing two-stream backbone networks are challenged by complex fusion and substantial parameter increments. This is primarily due to large data distribution biases of multimodal homogeneous information. In this paper, we propose a novel multimodal object detector, named Low-rank Modal Adaptors (LMA) with a shared backbone. The shared parameters enhance the consistency of homogeneous information, while lightweight modal adaptors focus on modality unique features. Furthermore, we design an adaptive rank allocation strategy to adapt to the varying heterogeneity at different feature levels. When applied to two multimodal object detection datasets, experiments validate the effectiveness of our method. Notably, on DroneVehicle, LMA attains a 10.4% accuracy improvement over the state-of-the-art method with a 149M-parameters reduction. The code is available at https://github.com/zyszxhy/FoRA. Our work was submitted to ACM MM in April 2024, but was rejected. We will continue to refine our work and paper writing next, mainly including proof of theory and multi-task applications of FoRA.
Published: 2024

8. High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion

Author: Song, Juan, He, Jiaxiang, Feng, Mingtao, Wang, Keyan, Li, Yunsong, and Mian, Ajmal
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in image compression applications. To address this issue, we propose an efficient Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compression via the wavelet transform, since high frequency components are crucial for reconstructing image details. We introduce a wavelet conditional diffusion model for high frequency prediction, followed by a residual codec that compresses and transmits prediction residuals to the decoder. This diffusion prediction-then-residual compression paradigm effectively addresses the low fidelity issue common in direct reconstructions by existing diffusion models. Considering the uncertainty from the random sampling of the diffusion model, we further design an uncertainty-weighted rate-distortion (R-D) loss tailored for residual compression, providing a more rational trade-off between rate and distortion. Comprehensive experiments on two benchmark datasets validate the effectiveness of UGDiff, surpassing state-of-the-art image compression methods in R-D performance, perceptual quality, subjective quality, and inference time. Our code is available at: https://github.com/hejiaxiang1/Wavelet-Diffusion/tree/main
Published: 2024

9. IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

Author: Zhang, Mingjin, Wang, Yuchun, Guo, Jie, Li, Yunsong, Gao, Xinbo, and Zhang, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The recent Segment Anything Model (SAM) is a significant advancement in natural image segmentation, exhibiting potent zero-shot performance suitable for various downstream image segmentation tasks. However, directly utilizing the pretrained SAM for Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared images. Unlike a visible light camera, a thermal imager reveals an object's temperature distribution by capturing infrared radiation. Small targets often show a subtle temperature transition at the object's boundaries. To address this issue, we propose the IRSAM model for IRSTD, which improves SAM's encoder-decoder architecture to learn better feature representation of infrared small objects. Specifically, we design a Perona-Malik diffusion (PMD)-based block and incorporate it into multiple levels of SAM's encoder to help it capture essential structural features while suppressing noise. Additionally, we devise a Granularity-Aware Decoder (GAD) to fuse the multi-granularity feature from the encoder to capture structural information that may be lost in long-distance modeling. Extensive experiments on the public datasets, including NUAA-SIRST, NUDT-SIRST, and IRSTD-1K, validate the design choice of IRSAM and its significant superiority over representative state-of-the-art methods. The source code are available at: github.com/IPIC-Lab/IRSAM., Comment: 18 pages, 8 figures, to be published in ECCV2024
Published: 2024

10. Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer

Author: Xu, Kepeng, Xu, Li, He, Gang, Yu, Wenxin, and Li, Yunsong
Subjects: Computer Science - Multimedia, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Multiple complex degradations are coupled in low-quality video faces in the real world. Therefore, blind video face restoration is a highly challenging ill-posed problem, requiring not only hallucinating high-fidelity details but also enhancing temporal coherence across diverse pose variations. Restoring each frame independently in a naive manner inevitably introduces temporal incoherence and artifacts from pose changes and keypoint localization errors. To address this, we propose the first blind video face restoration approach with a novel parsing-guided temporal-coherent transformer (PGTFormer) without pre-alignment. PGTFormer leverages semantic parsing guidance to select optimal face priors for generating temporally coherent artifact-free results. Specifically, we pre-train a temporal-spatial vector quantized auto-encoder on high-quality video face datasets to extract expressive context-rich priors. Then, the temporal parse-guided codebook predictor (TPCP) restores faces in different poses based on face parsing context cues without performing face pre-alignment. This strategy reduces artifacts and mitigates jitter caused by cumulative errors from face pre-alignment. Finally, the temporal fidelity regulator (TFR) enhances fidelity through temporal feature interaction and improves video temporal consistency. Extensive experiments on face videos show that our method outperforms previous face restoration baselines. The code will be released on \href{https://github.com/kepengxu/PGTFormer}{https://github.com/kepengxu/PGTFormer}., Comment: 9 pages
Published: 2024

11. Hyperspectral Anomaly Detection with Self-Supervised Anomaly Prior

Author: Liu, Yidan, Xie, Weiying, Jiang, Kai, Zhang, Jiaqing, Li, Yunsong, and Fang, Leyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: The majority of existing hyperspectral anomaly detection (HAD) methods use the low-rank representation (LRR) model to separate the background and anomaly components, where the anomaly component is optimized by handcrafted sparse priors (e.g., $\ell_{2,1}$-norm). However, this may not be ideal since they overlook the spatial structure present in anomalies and make the detection result largely dependent on manually set sparsity. To tackle these problems, we redefine the optimization criterion for the anomaly component in the LRR model with a self-supervised network called self-supervised anomaly prior (SAP). This prior is obtained by the pretext task of self-supervised learning, which is customized to learn the characteristics of hyperspectral anomalies. Specifically, this pretext task is a classification task to distinguish the original hyperspectral image (HSI) and the pseudo-anomaly HSI, where the pseudo-anomaly is generated from the original HSI and designed as a prism with arbitrary polygon bases and arbitrary spectral bands. In addition, a dual-purified strategy is proposed to provide a more refined background representation with an enriched background dictionary, facilitating the separation of anomalies from complex backgrounds. Extensive experiments on various hyperspectral datasets demonstrate that the proposed SAP offers a more accurate and interpretable solution than other advanced HAD methods.
Published: 2024

12. E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection

Author: Zhang, Jiaqing, Cao, Mingxiang, Yang, Xue, Xie, Weiying, Lei, Jie, Li, Daixun, Huang, Wenbo, and Li, Yunsong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal image fusion and object detection are crucial for autonomous driving. While current methods have advanced the fusion of texture details and semantic information, their complex training processes hinder broader applications. Addressing this challenge, we introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection. E2E-MFD streamlines the process, achieving high performance with a single training phase. It employs synchronous joint optimization across components to avoid suboptimal solutions tied to individual tasks. Furthermore, it implements a comprehensive optimization strategy in the gradient matrix for shared parameters, ensuring convergence to an optimal fusion detection configuration. Our extensive testing on multiple public datasets reveals E2E-MFD's superior capabilities, showcasing not only visually appealing image fusion but also impressive detection outcomes, such as a 3.9% and 2.0% mAP50 increase on horizontal object detection dataset M3FD and oriented object detection dataset DroneVehicle, respectively, compared to state-of-the-art approaches. The code is released at https://github.com/icey-zhang/E2E-MFD.
Published: 2024

13. Domain Adaptation for Large-Vocabulary Object Detectors

Author: Jiang, Kai, Huang, Jiaxing, Xie, Weiying, Lei, Jie, Li, Yunsong, Shao, Ling, and Lu, Shijian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large-vocabulary object detectors (LVDs) aim to detect objects of many categories, which learn super objectness features and can locate objects accurately while applied to various downstream data. However, LVDs often struggle in recognizing the located objects due to domain discrepancy in data distribution and object vocabulary. At the other end, recent vision-language foundation models such as CLIP demonstrate superior open-vocabulary recognition capability. This paper presents KGD, a Knowledge Graph Distillation technique that exploits the implicit knowledge graphs (KG) in CLIP for effectively adapting LVDs to various downstream domains. KGD consists of two consecutive stages: 1) KG extraction that employs CLIP to encode downstream domain data as nodes and their feature distances as edges, constructing KG that inherits the rich semantic relations in CLIP explicitly; and 2) KG encapsulation that transfers the extracted KG into LVDs to enable accurate cross-domain object classification. In addition, KGD can extract both visual and textual KG independently, providing complementary vision and language knowledge for object localization and object classification in detection tasks over various downstream domains. Experiments over multiple widely adopted detection benchmarks show that KGD outperforms the state-of-the-art consistently by large margins.
Published: 2024

14. DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception

Author: Jiang, Kai, Huang, Jiaxing, Xie, Weiying, Li, Yunsong, Shao, Ling, and Lu, Shijian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. However, most existing studies were conducted under a supervised setup which cannot scale well while handling various new data. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. In this work, we design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features. DA-BEV introduces the idea of query into the domain adaptation framework to derive useful information from image-view and BEV features. It consists of two query-based designs, namely, query-based adversarial learning (QAL) and query-based self-training (QST), which exploits image-view features or BEV features to regularize the adaptation of the other. Extensive experiments show that DA-BEV achieves superior domain adaptive BEV perception performance consistently across multiple datasets and tasks such as 3D object detection and 3D scene segmentation.
Published: 2024
Full Text: View/download PDF

15. SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image

Author: Tian, Jiayuan, Lei, Jie, Zhang, Jiaqing, Xie, Weiying, and Li, Yunsong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: With recent advancements in aerospace technology, the volume of unlabeled remote sensing image (RSI) data has increased dramatically. Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing. However, current methodologies, particularly contrastive learning (CL), a leading SSL method, encounter specific challenges in this domain. Firstly, CL often mistakenly identifies geographically adjacent samples with similar semantic content as negative pairs, leading to confusion during model training. Secondly, as an instance-level discriminative task, it tends to neglect the essential fine-grained features and complex details inherent in unstructured RSIs. To overcome these obstacles, we introduce SwiMDiff, a novel self-supervised pre-training framework designed for RSIs. SwiMDiff employs a scene-wide matching approach that effectively recalibrates labels to recognize data from the same scene as false negatives. This adjustment makes CL more applicable to the nuances of remote sensing. Additionally, SwiMDiff seamlessly integrates CL with a diffusion model. Through the implementation of pixel-level diffusion constraints, we enhance the encoder's ability to capture both the global semantic information and the fine-grained features of the images more comprehensively. Our proposed framework significantly enriches the information available for downstream tasks in remote sensing. Demonstrating exceptional performance in change detection and land-cover classification tasks, SwiMDiff proves its substantial utility and value in the field of remote sensing.
Published: 2024

16. Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image

Author: Zhang, Jiaqing, Lei, Jie, Xie, Weiying, Jiang, Kai, Cao, Mingxiang, and Li, Yunsong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Accurate cloud recognition and warning are crucial for various applications, including in-flight support, weather forecasting, and climate research. However, recent deep learning algorithms have predominantly focused on detecting cloud regions in satellite imagery, with insufficient attention to the specificity required for accurate cloud recognition. This limitation inspired us to develop the novel FY-4A-Himawari-8 (FYH) dataset, which includes nine distinct cloud categories and uses precise domain adaptation methods to align 70,419 image-label pairs in terms of projection, temporal resolution, and spatial resolution, thereby facilitating the training of supervised deep learning networks. Given the complexity and diversity of cloud formations, we have thoroughly analyzed the challenges inherent to cloud recognition tasks, examining the intricate characteristics and distribution of the data. To effectively address these challenges, we designed a Distribution-aware Interactive-Attention Network (DIAnet), which preserves pixel-level details through a high-resolution branch and a parallel multi-resolution cross-branch. We also integrated a distribution-aware loss (DAL) to mitigate the imbalance across cloud categories. An Interactive Attention Module (IAM) further enhances the robustness of feature extraction combined with spatial and channel information. Empirical evaluations on the FYH dataset demonstrate that our method outperforms other cloud recognition networks, achieving superior performance in terms of mean Intersection over Union (mIoU). The code for implementing DIAnet is available at https://github.com/icey-zhang/DIAnet.
Published: 2024

17. Multimodal Informative ViT: Information Aggregation and Distribution for Hyperspectral and LiDAR Classification

Author: Zhang, Jiaqing, Lei, Jie, Xie, Weiying, Yang, Geng, Li, Daixun, and Li, Yunsong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In multimodal land cover classification (MLCC), a common challenge is the redundancy in data distribution, where irrelevant information from multiple modalities can hinder the effective integration of their unique features. To tackle this, we introduce the Multimodal Informative Vit (MIVit), a system with an innovative information aggregate-distributing mechanism. This approach redefines redundancy levels and integrates performance-aware elements into the fused representation, facilitating the learning of semantics in both forward and backward directions. MIVit stands out by significantly reducing redundancy in the empirical distribution of each modality's separate and fused features. It employs oriented attention fusion (OAF) for extracting shallow local features across modalities in horizontal and vertical dimensions, and a Transformer feature extractor for extracting deep global features through long-range attention. We also propose an information aggregation constraint (IAC) based on mutual information, designed to remove redundant information and preserve complementary information within embedded features. Additionally, the information distribution flow (IDF) in MIVit enhances performance-awareness by distributing global classification information across different modalities' feature maps. This architecture also addresses missing modality challenges with lightweight independent modality classifiers, reducing the computational load typically associated with Transformers. Our results show that MIVit's bidirectional aggregate-distributing mechanism between modalities is highly effective, achieving an average overall accuracy of 95.56% across three multimodal datasets. This performance surpasses current state-of-the-art methods in MLCC. The code for MIVit is accessible at https://github.com/icey-zhang/MIViT.
Published: 2024

18. Exploring Hyperspectral Anomaly Detection with Human Vision: A Small Target Aware Detector

Author: Ma, Jitao, Xie, Weiying, and Li, Yunsong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Hyperspectral anomaly detection (HAD) aims to localize pixel points whose spectral features differ from the background. HAD is essential in scenarios of unknown or camouflaged target features, such as water quality monitoring, crop growth monitoring and camouflaged target detection, where prior information of targets is difficult to obtain. Existing HAD methods aim to objectively detect and distinguish background and anomalous spectra, which can be achieved almost effortlessly by human perception. However, the underlying processes of human visual perception are thought to be quite complex. In this paper, we analyze hyperspectral image (HSI) features under human visual perception, and transfer the solution process of HAD to the more robust feature space for the first time. Specifically, we propose a small target aware detector (STAD), which introduces saliency maps to capture HSI features closer to human visual perception. STAD not only extracts more anomalous representations, but also reduces the impact of low-confidence regions through a proposed small target filter (STF). Furthermore, considering the possibility of HAD algorithms being applied to edge devices, we propose a full connected network to convolutional network knowledge distillation strategy. It can learn the spectral and spatial features of the HSI while lightening the network. We train the network on the HAD100 training set and validate the proposed method on the HAD100 test set. Our method provides a new solution space for HAD that is closer to human visual perception with high confidence. Sufficient experiments on real HSI with multiple method comparisons demonstrate the excellent performance and unique potential of the proposed method. The code is available at https://github.com/majitao-xd/STAD-HAD.
Published: 2024

19. RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation

Author: Xie, Weiying, Wang, Zixuan, Ma, Jitao, Li, Daixun, and Li, Yunsong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Distributed deep learning has recently been attracting more attention in remote sensing (RS) applications due to the challenges posed by the increased amount of open data that are produced daily by Earth observation programs. However, the high communication costs of sending model updates among multiple nodes are a significant bottleneck for scalable distributed learning. Gradient sparsification has been validated as an effective gradient compression (GC) technique for reducing communication costs and thus accelerating the training speed. Existing state-of-the-art gradient sparsification methods are mostly based on the "larger-absolute-more-important" criterion, ignoring the importance of small gradients, which is generally observed to affect the performance. Inspired by informative representation of manifold structures from neighborhood information, we propose a simple yet effective dynamic gradient compression scheme leveraging neighborhood statistics indicator for RS image interpretation, termed RS-DGC. We first enhance the interdependence between gradients by introducing the gradient neighborhood to reduce the effect of random noise. The key component of RS-DGC is a Neighborhood Statistical Indicator (NSI), which can quantify the importance of gradients within a specified neighborhood on each node to sparsify the local gradients before gradient transmission in each iteration. Further, a layer-wise dynamic compression scheme is proposed to track the importance changes of each layer in real time. Extensive downstream tasks validate the superiority of our method in terms of intelligent interpretation of RS images. For example, we achieve an accuracy improvement of 0.51% with more than 50 times communication compression on the NWPU-RESISC45 dataset using VGG-19 network.
Published: 2023

20. Multi-scale direction-aware SAR object detection network via global information fusion

Author: Cao, Mingxiang, Xie, Weiying, Lei, Jie, Zhang, Jiaqing, Li, Daixun, and Li, Yunsong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning has driven significant progress in object detection using Synthetic Aperture Radar (SAR) imagery. Existing methods, while achieving promising results, often struggle to effectively integrate local and global information, particularly direction-aware features. This paper proposes SAR-Net, a novel framework specifically designed for global fusion of direction-aware information in SAR object detection. SAR-Net leverages two key innovations: the Unity Compensation Mechanism (UCM) and the Direction-aware Attention Module (DAM). UCM facilitates the establishment of complementary relationships among features across different scales, enabling efficient global information fusion and transmission. Additionally, DAM, through bidirectional attention polymerization, captures direction-aware information, effectively eliminating background interference. Extensive experiments demonstrate the effectiveness of SAR-Net, achieving state-of-the-art results on aircraft (SAR-AIRcraft-1.0) and ship datasets (SSDD, HRSID), confirming its generalization capability and robustness.
Published: 2023

21. Physics Inspired Criterion for Pruning-Quantization Joint Learning

Author: Xie, Weiying, Fan, Xiaoyi, Zhang, Xin, Li, Yunsong, Lei, Jie, and Fang, Leyuan
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Pruning-quantization joint learning always facilitates the deployment of deep neural networks (DNNs) on resource-constrained edge devices. However, most existing methods do not jointly learn a global criterion for pruning and quantization in an interpretable way. In this paper, we propose a novel physics inspired criterion for pruning-quantization joint learning (PIC-PQ), which is explored from an analogy we first draw between elasticity dynamics (ED) and model compression (MC). Specifically, derived from Hooke's law in ED, we establish a linear relationship between the filters' importance distribution and the filter property (FP) by a learnable deformation scale in the physics inspired criterion (PIC). Furthermore, we extend PIC with a relative shift variable for a global view. To ensure feasibility and flexibility, available maximum bitwidth and penalty factor are introduced in quantization bitwidth assignment. Experiments on benchmarks of image classification demonstrate that PIC-PQ yields a good trade-off between accuracy and bit-operations (BOPs) compression ratio e.g., 54.96X BOPs compression ratio in ResNet56 on CIFAR10 with 0.10% accuracy drop and 53.24X in ResNet18 on ImageNet with 0.61% accuracy drop). The code will be available at https://github.com/fanxxxxyi/PIC-PQ.
Published: 2023

22. Traffic flow prediction based on graph convolutional networks with a parallel attention network and stacked gate recurrent units

Author: Xia, Dawen, Ao, Yuce, Wei, Xiaoduo, Li, Yunsong, Chen, Yan, Hu, Yang, Li, Yantao, and Li, Huaqing
Published: 2024
Full Text: View/download PDF

23. Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

Author: Zhang, Xin, Du, Jiawei, Li, Yunsong, Xie, Weiying, and Zhou, Joey Tianyi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Dataset pruning aims to construct a coreset capable of achieving performance comparable to the original, full dataset. Most existing dataset pruning methods rely on snapshot-based criteria to identify representative samples, often resulting in poor generalization across various pruning and cross-architecture scenarios. Recent studies have addressed this issue by expanding the scope of training dynamics considered, including factors such as forgetting event and probability change, typically using an averaging approach. However, these works struggle to integrate a broader range of training dynamics without overlooking well-generalized samples, which may not be sufficiently highlighted in an averaging manner. In this study, we propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS), to tackle this problem. TDDS utilizes a dual-depth strategy to achieve a balance between incorporating extensive training dynamics and identifying representative samples for dataset pruning. In the first depth, we estimate the series of each sample's individual contributions spanning the training progress, ensuring comprehensive integration of training dynamics. In the second depth, we focus on the variability of the sample-wise contributions identified in the first depth to highlight well-generalized samples. Extensive experiments conducted on CIFAR and ImageNet datasets verify the superiority of TDDS over previous SOTA methods. Specifically on CIFAR-100, our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%., Comment: Accepted by CVPR2024
Published: 2023

24. FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients

Author: Li, DaiXun, Xie, Weiying, Wang, ZiXuan, Lu, YiBing, Li, Yunsong, and Fang, Leyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: With the rapid development of imaging sensor technology in the field of remote sensing, multi-modal remote sensing data fusion has emerged as a crucial research direction for land cover classification tasks. While diffusion models have made great progress in generative models and image classification tasks, existing models primarily focus on single-modality and single-client control, that is, the diffusion process is driven by a single modal in a single computing node. To facilitate the secure fusion of heterogeneous data from clients, it is necessary to enable distributed multi-modal control, such as merging the hyperspectral data of organization A and the LiDAR data of organization B privately on each base station client. In this study, we propose a multi-modal collaborative diffusion federated learning framework called FedDiff. Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder. Our key insight is that diffusion models driven by different modalities are inherently complementary in terms of potential denoising steps on which bilateral connections can be built. Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure, and introduce a lightweight communication module. Qualitative and quantitative experiments validate the superiority of our framework in terms of image quality and conditional consistency.
Published: 2023

25. FedFusion: Manifold Driven Federated Learning for Multi-satellite and Multi-modality Fusion

Author: Li, DaiXun, Xie, Weiying, Li, Yunsong, and Fang, Leyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multi-satellite, multi-modality in-orbit fusion is a challenging task as it explores the fusion representation of complex high-dimensional data under limited computational resources. Deep neural networks can reveal the underlying distribution of multi-modal remote sensing data, but the in-orbit fusion of multimodal data is more difficult because of the limitations of different sensor imaging characteristics, especially when the multimodal data follows non-independent identically distribution (Non-IID) distributions. To address this problem while maintaining classification performance, this paper proposes a manifold-driven multi-modality fusion framework, FedFusion, which randomly samples local data on each client to jointly estimate the prominent manifold structure of shallow features of each client and explicitly compresses the feature matrices into a low-rank subspace through cascading and additive approaches, which is used as the feature input of the subsequent classifier. Considering the physical space limitations of the satellite constellation, we developed a multimodal federated learning module designed specifically for manifold data in a deep latent space. This module achieves iterative updating of the sub-network parameters of each client through global weighted averaging, constructing a framework that can represent compact representations of each client. The proposed framework surpasses existing methods in terms of performance on three multimodal datasets, achieving a classification average accuracy of 94.35$\%$ while compressing communication costs by a factor of 4. Furthermore, extensive numerical evaluations of real-world satellite images were conducted on the orbiting edge computing architecture based on Jetson TX2 industrial modules, which demonstrated that FedFusion significantly reduced training time by 48.4 minutes (15.18%) while optimizing accuracy.}
Published: 2023
Full Text: View/download PDF

26. MDFL: Multi-domain Diffusion-driven Feature Learning

Author: Li, Daixun, Xie, Weiying, Zhang, Jiaqing, and Li, Yunsong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: High-dimensional images, known for their rich semantic information, are widely applied in remote sensing and other fields. The spatial information in these images reflects the object's texture features, while the spectral information reveals the potential spectral representations across different bands. Currently, the understanding of high-dimensional images remains limited to a single-domain perspective with performance degradation. Motivated by the masking texture effect observed in the human visual system, we present a multi-domain diffusion-driven feature learning network (MDFL) , a scheme to redefine the effective information domain that the model really focuses on. This method employs diffusion-based posterior sampling to explicitly consider joint information interactions between the high-dimensional manifold structures in the spectral, spatial, and frequency domains, thereby eliminating the influence of masking texture effects in visual models. Additionally, we introduce a feature reuse mechanism to gather deep and raw features of high-dimensional data. We demonstrate that MDFL significantly improves the feature extraction performance of high-dimensional data, thereby providing a powerful aid for revealing the intrinsic patterns and structures of such data. The experimental results on three multi-modal remote sensing datasets show that MDFL reaches an average overall accuracy of 98.25%, outperforming various state-of-the-art baseline schemes. The code will be released, contributing to the computer vision community.
Published: 2023

27. BSDM: Background Suppression Diffusion Model for Hyperspectral Anomaly Detection

Author: Ma, Jitao, Xie, Weiying, Li, Yunsong, and Fang, Leyuan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Hyperspectral anomaly detection (HAD) is widely used in Earth observation and deep space exploration. A major challenge for HAD is the complex background of the input hyperspectral images (HSIs), resulting in anomalies confused in the background. On the other hand, the lack of labeled samples for HSIs leads to poor generalization of existing HAD methods. This paper starts the first attempt to study a new and generalizable background learning problem without labeled samples. We present a novel solution BSDM (background suppression diffusion model) for HAD, which can simultaneously learn latent background distributions and generalize to different datasets for suppressing complex background. It is featured in three aspects: (1) For the complex background of HSIs, we design pseudo background noise and learn the potential background distribution in it with a diffusion model (DM). (2) For the generalizability problem, we apply a statistical offset module so that the BSDM adapts to datasets of different domains without labeling samples. (3) For achieving background suppression, we innovatively improve the inference process of DM by feeding the original HSIs into the denoising network, which removes the background as noise. Our work paves a new background suppression way for HAD that can improve HAD performance without the prerequisite of manually labeled data. Assessments and generalization experiments of four HAD methods on several real HSI datasets demonstrate the above three unique properties of the proposed method. The code is available at https://github.com/majitao-xd/BSDM-HAD.
Published: 2023

28. NaMemo2: Facilitating Teacher-Student Interaction with Theory-Based Design and Student Autonomy Consideration

Author: Jiang, Guang, Zhu, Jiahui, Li, Yunsong, An, Pengcheng, and Wang, Yunlong
Subjects: Computer Science - Human-Computer Interaction
Abstract: Teacher-student interaction (TSI) is essential for learning efficiency and harmonious teacher-student interpersonal relationships. However, studies on TSI support tools often focus on teacher needs while neglecting student needs and autonomy. To enhance both lecturer competence in delivering interpersonal interaction and student autonomy in TSI, we developed NaMemo2, a novel augmented-reality system that allows students to express their willingness to TSI and displays student information to teachers during lectures. The design and evaluation process follows a new framework, STUDIER, which can facilitate the development of theory-based ethnics-aware TSI support tools in general. The quantitative results of our four-week field study with four classes in a university suggested that NaMemo2 can improve 1) TSI in the classroom from both teacher and student perspectives, 2) student attitudes and willingness to TSI, and 3) student attitudes to the deployment of NaMemo2. The qualitative feedback from students and teachers indicated that improving TSI may be responsible for improved attention in students and a better classroom atmosphere during lectures., Comment: This paper has been accepted in July 2023 for publication in Education and Information Technologies
Published: 2023

29. Dual Inverse Degradation Network for Real-World SDRTV-to-HDRTV Conversion

Author: Xu, Kepeng, Xu, Li, He, Gang, Wu, Xianyun, Zhang, Zhiqiang, Yu, Wenxin, and Li, Yunsong
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Multimedia
Abstract: In this study, we address the emerging necessity of converting Standard Dynamic Range Television (SDRTV) content into High Dynamic Range Television (HDRTV) in light of the limited number of native HDRTV content. A principal technical challenge in this conversion is the exacerbation of coding artifacts inherent in SDRTV, which detrimentally impacts the quality of the resulting HDRTV. To address this issue, our method introduces a novel approach that conceptualizes the SDRTV-to-HDRTV conversion as a composite task involving dual degradation restoration. This encompasses inverse tone mapping in conjunction with video restoration. We propose Dual Inversion Downgraded SDRTV to HDRTV Network (DIDNet), which can accurately perform inverse tone mapping while preventing encoding artifacts from being amplified, thereby significantly improving visual quality. DIDNet integrates an intermediate auxiliary loss function to effectively separate the dual degradation restoration tasks and efficient learning of both artifact reduction and inverse tone mapping during end-to-end training. Additionally, DIDNet introduces a spatio-temporal feature alignment module for video frame fusion, which augments texture quality and reduces artifacts. The architecture further includes a dual-modulation convolution mechanism for optimized inverse tone mapping. Recognizing the richer texture and high-frequency information in HDRTV compared to SDRTV, we further introduce a wavelet attention module to enhance frequency features. Our approach demonstrates marked superiority over existing state-of-the-art techniques in terms of quantitative performance and visual quality.
Published: 2023

30. An examination of the interplay of message framing and vaccine safety information sources on COVID-19 vaccination promotion

Author: Yang, Xiaodong, Xu, Yimeng, Guo, Yu, and Li, Yunsong
Published: 2024
Full Text: View/download PDF

31. NaMemo2: Facilitating Teacher-Student Interaction with Theory-Based Design and Student Autonomy Consideration

Author: Jiang, Guang, Zhu, Jiahui, Li, Yunsong, An, Pengcheng, and Wang, Yunlong
Published: 2024
Full Text: View/download PDF

32. SAR-to-Optical Image Translation via Thermodynamics-inspired Network

Author: Zhang, Mingjin, Xu, Jiamin, He, Chengyu, Shang, Wenteng, Li, Yunsong, and Gao, Xinbo
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Synthetic aperture radar (SAR) is prevalent in the remote sensing field but is difficult to interpret in human visual perception. Recently, SAR-to-optical (S2O) image conversion methods have provided a prospective solution for interpretation. However, since there is a huge domain difference between optical and SAR images, they suffer from low image quality and geometric distortion in the produced optical images. Motivated by the analogy between pixels during the S2O image translation and molecules in a heat field, Thermodynamics-inspired Network for SAR-to-Optical Image Translation (S2O-TDN) is proposed in this paper. Specifically, we design a Third-order Finite Difference (TFD) residual structure in light of the TFD equation of thermodynamics, which allows us to efficiently extract inter-domain invariant features and facilitate the learning of the nonlinear translation mapping. In addition, we exploit the first law of thermodynamics (FLT) to devise an FLT-guided branch that promotes the state transition of the feature values from the unstable diffusion state to the stable one, aiming to regularize the feature diffusion and preserve image structures during S2O image translation. S2O-TDN follows an explicit design principle derived from thermodynamic theory and enjoys the advantage of explainability. Experiments on the public SEN1-2 dataset show the advantages of the proposed S2O-TDN over the current methods with more delicate textures and higher quantitative results.
Published: 2023

33. Contrastive Semi-supervised Learning for Underwater Image Restoration via Reliable Bank

Author: Huang, Shirui, Wang, Keyan, Liu, Huan, Chen, Jun, and Li, Yunsong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite the remarkable achievement of recent underwater image restoration techniques, the lack of labeled data has become a major hurdle for further progress. In this work, we propose a mean-teacher based Semi-supervised Underwater Image Restoration (Semi-UIR) framework to incorporate the unlabeled data into network training. However, the naive mean-teacher method suffers from two main problems: (1) The consistency loss used in training might become ineffective when the teacher's prediction is wrong. (2) Using L1 distance may cause the network to overfit wrong labels, resulting in confirmation bias. To address the above problems, we first introduce a reliable bank to store the "best-ever" outputs as pseudo ground truth. To assess the quality of outputs, we conduct an empirical analysis based on the monotonicity property to select the most trustworthy NR-IQA method. Besides, in view of the confirmation bias problem, we incorporate contrastive regularization to prevent the overfitting on wrong labels. Experimental results on both full-reference and non-reference underwater benchmarks demonstrate that our algorithm has obvious improvement over SOTA methods quantitatively and qualitatively. Code has been released at https://github.com/Huang-ShiRui/Semi-UIR., Comment: CVPR2023
Published: 2023

34. Guided Hybrid Quantization for Object detection in Multimodal Remote Sensing Imagery via One-to-one Self-teaching

Author: Zhang, Jiaqing, Lei, Jie, Xie, Weiying, Li, Yunsong, and Jia, Xiuping
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Considering the computation complexity, we propose a Guided Hybrid Quantization with One-to-one Self-Teaching (GHOST}) framework. More concretely, we first design a structure called guided quantization self-distillation (GQSD), which is an innovative idea for realizing lightweight through the synergy of quantization and distillation. The training process of the quantization model is guided by its full-precision model, which is time-saving and cost-saving without preparing a huge pre-trained model in advance. Second, we put forward a hybrid quantization (HQ) module to obtain the optimal bit width automatically under a constrained condition where a threshold for distribution distance between the center and samples is applied in the weight value search space. Third, in order to improve information transformation, we propose a one-to-one self-teaching (OST) module to give the student network a ability of self-judgment. A switch control machine (SCM) builds a bridge between the student network and teacher network in the same location to help the teacher to reduce wrong guidance and impart vital knowledge to the student. This distillation method allows a model to learn from itself and gain substantial improvement without any additional supervision. Extensive experiments on a multimodal dataset (VEDAI) and single-modality datasets (DOTA, NWPU, and DIOR) show that object detection based on GHOST outperforms the existing detectors. The tiny parameters (<9.7 MB) and Bit-Operations (BOPs) (<2158 G) compared with any remote sensing-based, lightweight or distillation-based algorithms demonstrate the superiority in the lightweight design domain. Our code and model will be released at https://github.com/icey-zhang/GHOST., Comment: This article has been delivered to TRGS and is under review
Published: 2022
Full Text: View/download PDF

35. SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery

Author: Zhang, Jiaqing, Lei, Jie, Xie, Weiying, Fang, Zhenman, Li, Yunsong, and Du, Qian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Accurately and timely detecting multiscale small objects that contain tens of pixels from remote sensing images (RSI) remains challenging. Most of the existing solutions primarily design complex deep neural networks to learn strong feature representations for objects separated from the background, which often results in a heavy computation burden. In this article, we propose an accurate yet fast object detection method for RSI, named SuperYOLO, which fuses multimodal data and performs high-resolution (HR) object detection on multiscale objects by utilizing the assisted super resolution (SR) learning and considering both the detection accuracy and computation cost. First, we utilize a symmetric compact multimodal fusion (MF) to extract supplementary information from various data for improving small object detection in RSI. Furthermore, we design a simple and flexible SR branch to learn HR feature representations that can discriminate small objects from vast backgrounds with low-resolution (LR) input, thus further improving the detection accuracy. Moreover, to avoid introducing additional computation, the SR branch is discarded in the inference stage, and the computation of the network model is reduced due to the LR input. Experimental results show that, on the widely used VEDAI RS dataset, SuperYOLO achieves an accuracy of 75.09% (in terms of mAP50 ), which is more than 10% higher than the SOTA large models, such as YOLOv5l, YOLOv5x, and RS designed YOLOrs. Meanwhile, the parameter size and GFLOPs of SuperYOLO are about 18 times and 3.8 times less than YOLOv5x. Our proposed model shows a favorable accuracy and speed tradeoff compared to the state-of-the-art models. The code will be open-sourced at https://github.com/icey-zhang/SuperYOLO., Comment: The article is accepted by IEEE Transactions on Geoscience and Remote Sensing
Published: 2022
Full Text: View/download PDF

36. Dual-branch spectral–spatial feature extraction network for multispectral image compression

Author: Kong, Fanqiang, Tang, Jiahui, Li, Yunsong, Li, Dan, and Hu, Kedi
Published: 2023
Full Text: View/download PDF

37. MTLSC-Diff: Multitask learning with diffusion models for hyperspectral image super-resolution and classification

Author: Qu, Jiahui, Xiao, Liusheng, Dong, Wenqian, and Li, Yunsong
Published: 2024
Full Text: View/download PDF

38. HPRN: Holistic Prior-embedded Relation Network for Spectral Super-Resolution

Author: Wu, Chaoxiong, Li, Jiaojiao, Song, Rui, Li, Yunsong, and Du, Qian
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Spectral super-resolution (SSR) refers to the hyperspectral image (HSI) recovery from an RGB counterpart. Due to the one-to-many nature of the SSR problem, a single RGB image can be reprojected to many HSIs. The key to tackle this ill-posed problem is to plug into multi-source prior information such as the natural spatial context-prior of RGB images, deep feature-prior or inherent statistical-prior of HSIs, etc., so as to effectively alleviate the degree of ill-posedness. However, most current approaches only consider the general and limited priors in their customized convolutional neural networks (CNNs), which leads to the inability to guarantee the confidence and fidelity of reconstructed spectra. In this paper, we propose a novel holistic prior-embedded relation network (HPRN) to integrate comprehensive priors to regularize and optimize the solution space of SSR. Basically, the core framework is delicately assembled by several multi-residual relation blocks (MRBs) that fully facilitate the transmission and utilization of the low-frequency content prior of RGBs. Innovatively, the semantic prior of RGB inputs is introduced to mark category attributes, and a semantic-driven spatial relation module (SSRM) is invented to perform the feature aggregation of clustered similar range for refining recovered characteristics. Additionally, we develop a transformer-based channel relation module (TCRM), which breaks the habit of employing scalars as the descriptors of channel-wise relations in the previous deep feature-prior, and replaces them with certain vectors to make the mapping function more robust and smoother. In order to maintain the mathematical correlation and spectral consistency between hyperspectral bands, the second-order prior constraints (SOPC) are incorporated into the loss function to guide the HSI reconstruction.
Published: 2021

39. Transcoded Video Restoration by Temporal Spatial Auxiliary Network

Author: Xu, Li, He, Gang, Zhou, Jinjia, Lei, Jie, Xie, Weiying, Li, Yunsong, and Tai, Yu-Wing
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: In most video platforms, such as Youtube, and TikTok, the played videos usually have undergone multiple video encodings such as hardware encoding by recording devices, software encoding by video editing apps, and single/multiple video transcoding by video application servers. Previous works in compressed video restoration typically assume the compression artifacts are caused by one-time encoding. Thus, the derived solution usually does not work very well in practice. In this paper, we propose a new method, temporal spatial auxiliary network (TSAN), for transcoded video restoration. Our method considers the unique traits between video encoding and transcoding, and we consider the initial shallow encoded videos as the intermediate labels to assist the network to conduct self-supervised attention training. In addition, we employ adjacent multi-frame information and propose the temporal deformable alignment and pyramidal spatial fusion for transcoded video restoration. The experimental results demonstrate that the performance of the proposed method is superior to that of the previous techniques. The code is available at https://github.com/icecherylXuli/TSAN., Comment: Accepted by AAAI2022
Published: 2021

40. Experimental study of the effect of dedicated mechanical subcooling on the performance of a single multiparallel evaporator

Author: Sun, Huan, Li, Yunsong, Sun, Zhili, Wu, Dongxia, and Lei, Zhuoya
Published: 2024
Full Text: View/download PDF

41. Left Versus Right Destroyed Lung Pneumonectomy: Long Term Prognosis and Key Factors Associated With Poor Treatment Outcomes

Author: Li, YunSong, Wang, Heng, Wang, Chunmao, Zhang, Li, Gong, Changfan, Yan, Dongjie, Liu, Fangchao, and Ruan, Hongyun
Published: 2024
Full Text: View/download PDF

42. Transformer-based visual object tracking via fine–coarse concatenated attention and cross concatenated MLP

Author: Gao, Long, Chen, Langkun, Liu, Pan, Jiang, Yan, Li, Yunsong, and Ning, Jifeng
Published: 2024
Full Text: View/download PDF

43. AdaptiveWeighted Attention Network with Camera Spectral Sensitivity Prior for Spectral Reconstruction from RGB Images

Author: Li, Jiaojiao, Wu, Chaoxiong, Song, Rui, Li, Yunsong, and Liu, Fei
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent promising effort for spectral reconstruction (SR) focuses on learning a complicated mapping through using a deeper and wider convolutional neural networks (CNNs). Nevertheless, most CNN-based SR algorithms neglect to explore the camera spectral sensitivity (CSS) prior and interdependencies among intermediate features, thus limiting the representation ability of the network and performance of SR. To conquer these issues, we propose a novel adaptive weighted attention network (AWAN) for SR, whose backbone is stacked with multiple dual residual attention blocks (DRAB) decorating with long and short skip connections to form the dual residual learning. Concretely, we investigate an adaptive weighted channel attention (AWCA) module to reallocate channel-wise feature responses via integrating correlations between channels. Furthermore, a patch-level second-order non-local (PSNL) module is developed to capture long-range spatial contextual information by second-order non-local operations for more powerful feature representations. Based on the fact that the recovered RGB images can be projected by the reconstructed hyperspectral image (HSI) and the given CSS function, we incorporate the discrepancies of the RGB images and HSIs as a finer constraint for more accurate reconstruction. Experimental results demonstrate the effectiveness of our proposed AWAN network in terms of quantitative comparison and perceptual quality over other state-of-the-art SR methods. In the NTIRE 2020 Spectral Reconstruction Challenge, our entries obtain the 1st ranking on the Clean track and the 3rd place on the Real World track. Codes are available at https://github.com/Deep-imagelab/AWAN., Comment: The 1st ranking on the Clean track and the 3rd place only 1.59106e-4 more than the 1st on the Real World track of the NTIRE 2020 Spectral Reconstruction Challenge
Published: 2020

44. V-shaped neural network structure based on multi-scale features for image denoising

Author: Zhang, Jing, Sang, Liu, Zhang, Zhicheng, Shao, Minhao, and Li, Yunsong
Published: 2023
Full Text: View/download PDF

45. MPCNet: Compressed multi-view video restoration via motion-parallax complementation network

Author: Wu, Chang, He, Gang, Lai, Xinquan, and Li, Yunsong
Published: 2023
Full Text: View/download PDF

46. Mechanistic study in sulfur-carbon co-modification on Li2FeSiO4 for lithium-ion batteries

Author: Luo, Xiaoying, Li, Yunsong, and Cheng, Xuan
Published: 2023
Full Text: View/download PDF

47. Ultra-high voltage solid-state Li metal batteries enabled by in-situ construction of cathode electrolyte interphase through synergistic dual-anion decomposition

Author: Jiang, Haolong, Han, Yu, Li, Cong, Sun, Weiwei, Zheng, Jiayi, Zhu, Yuhao, Zheng, Chunman, Li, Yunsong, Lin, Yuxiao, Guo, Qingpeng, Wang, Hui, and Xie, Kai
Published: 2023
Full Text: View/download PDF

48. An ASM-CF model for anomalous trajectory detection with mobile trajectory big data

Author: Xia, Dawen, Jiang, Shunying, Li, Yunsong, Yang, Nan, Hu, Yang, Li, Yantao, and Li, Huaqing
Published: 2023
Full Text: View/download PDF

49. Weakly supervised adversarial learning via latent space for hyperspectral target detection

Author: Qin, Haonan, Xie, Weiying, Li, Yunsong, Jiang, Kai, Lei, Jie, and Du, Qian
Published: 2023
Full Text: View/download PDF

50. Reconsidering the Effectiveness of Fear Appeals: An Experimental Study of Interactive Fear Messaging to Promote Positive Actions on Climate Change

Author: Tang, Hongjie, primary, Chen, Liang, additional, Liu, Sijia, additional, Tan, Xinying, additional, and Li, Yunsong, additional
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,146 results on '"Li, Yunsong"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources