148 results on '"feature refinement"'
Search Results
2. OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection
- Author
-
Jin, Dongkwon, Kim, Chang-Su, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Enhanced Feature Refinement Network Based on Depthwise Separable Convolution for Lightweight Image Super-Resolution.
- Author
-
Sun, Weizhe, Ke, Ran, Liu, Zhen, Lu, Haoran, Li, Dong, Yang, Fei, and Zhang, Lei
- Abstract
Image super-resolution (SR) techniques aim to enhance the clarity and realism of images. Recently, a wide range of excellent SR algorithms with powerful characterization capabilities have emerged and are widely used. However, there are still challenges and room for improvement in designing a lighter and more edge-friendly SR networks for hardware devices. In this paper, we propose a lightweight enhanced feature refinement network (EFRN) based on depthwise separatable convolution for SR reconstruction. The core network components consist of multiple enhanced feature refinement blocks (EFRB), which fully fuse channel features to extract more accurate low-frequency information based on the attention of different channels. In addition, a lightweight residual block (LRB) and a lightweight dual attention block (LDAB) are designed to enhance network information extraction with minimal parameter cost. We improve the feature refinement by using 1 × 1 convolution instead of a channel selection operation to reduce the dimensionality of the features and extract the refined features more efficiently. Finally, to achieve better reconstruction performance, the depth and number of channels of the network are expanded while keeping the total number of parameters at a low level. Extensive experiments have been conducted to demonstrate the superiority of our EFRN over other mainstream SR algorithms in terms of reconstruction results and the number of parameters. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. MDH-Net: advancing 3D brain MRI registration with multi-stage transformer and dual-stream feature refinement hybrid network.
- Author
-
Liu, Chenou, He, Kangjian, Xu, Dan, and Shi, Hongzhen
- Abstract
Since the advent of the registration method based on deep learning, it has demonstrated a time efficiency advantage several orders of magnitude higher than traditional methods. However, the current deep networks have not fully explored the potential to capture spatial relationships comprehensively. Faced with the complex anatomical structures and detailed deformations in the brain, existing methods often run into difficulties. Therefore, addressing this challenging issue, we propose an unsupervised registration network for 3D brain magnetic resonance imaging (MRI). This framework adopts a hybrid structure of CNN transformer, gradually refining the deformation field using a pyramid structure and multi-level strategies, introducing a local salient position transformer to focus on local details of the deformation, and a hierarchical feature fusion module to merge features of multi-layer deformation fields further. These strategies help preserve high-level semantic information in the brain, thereby improving the quality of the deformation field and achieving deformable registration more effectively. We evaluated the proposed method on two publicly available brain MRI datasets, and the results show that our method outperforms all advanced competing methods in terms of accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
5. Feature Refinement and Multi-scale Attention for Transformer Image Denoising Network
- Author
-
YUAN Heng, GENG Yikun
- Subjects
image denoising ,feature refinement ,multi-scale attention ,transformer ,real noise ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
In order to enhance the relevance of global context information, strengthen the attention to multi-scale features, improve the image denoising effect while preserving the details to the greatest extent, a Transformer based feature refinement and multi-scale attention image denoising network (TFRADNet) is proposed. The network not only uses Transformer in the codec part to solve the long-term dependence problem of large-scale images and improve the efficiency of model noise reduction, but also adds a position awareness layer after the up-sampling operation to enhance the network’s perception ability of pixel positions in the feature map. To cope with Transformer’s neglect of spatial relationships among pixels, which may result in local detail distortion, a feature refinement block (FRB) is designed at feature reconstruction stage. A serial structure is used to introduce nonlinear transformations layer by layer, to enhance the recognition of local image features with complex noise levels. Meanwhile, a multi-scale attention block (MAB) is designed, which adopts a parallel double-branch structure to jointly model spatial attention and channel attention, effectively capturing and weighting image features of different scales, and improving the model’s perception ability of multi-scale features. Experimental results on real noise datasets SIDD, DND and RNI15 show that TFRADNet can take into account global information and local details, and has stronger noise suppression ability and robustness than other advanced methods.
- Published
- 2024
- Full Text
- View/download PDF
6. Multi-View Feature Fusion and Rich Information Refinement Network for Semantic Segmentation of Remote Sensing Images.
- Author
-
Liu, Jiang, Cheng, Shuli, and Du, Anyu
- Subjects
- *
CONVOLUTIONAL neural networks , *REMOTE sensing , *TRANSFORMER models , *INFORMATION networks , *IMAGE processing - Abstract
Semantic segmentation is currently a hot topic in remote sensing image processing. There are extensive applications in land planning and surveying. Many current studies combine Convolutional Neural Networks (CNNs), which extract local information, with Transformers, which capture global information, to obtain richer information. However, the fused feature information is not sufficiently enriched and it often lacks detailed refinement. To address this issue, we propose a novel method called the Multi-View Feature Fusion and Rich Information Refinement Network (MFRNet). Our model is equipped with the Multi-View Feature Fusion Block (MAFF) to merge various types of information, including local, non-local, channel, and positional information. Within MAFF, we introduce two innovative methods. The Sliding Heterogeneous Multi-Head Attention (SHMA) extracts local, non-local, and positional information using a sliding window, while the Multi-Scale Hierarchical Compressed Channel Attention (MSCA) leverages bar-shaped pooling kernels and stepwise compression to obtain reliable channel information. Additionally, we introduce the Efficient Feature Refinement Module (EFRM), which enhances segmentation accuracy by interacting the results of the Long-Range Information Perception Branch and the Local Semantic Information Perception Branch. We evaluate our model on the ISPRS Vaihingen and Potsdam datasets. We conducted extensive comparison experiments with state-of-the-art models and verified that MFRNet outperforms other models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. RPV-CASNet: range-point-voxel integration with channel self-attention network for lidar point cloud segmentation.
- Author
-
Li, Jiajiong, Wang, Chuanxu, Wang, Chenyang, Zhao, Min, and Jiang, Zitai
- Subjects
POINT cloud ,NETWORK performance ,LIDAR ,PYRAMIDS ,ALGORITHMS - Abstract
Maximizing the advantages of different views and mitigating their respective disadvantages in fine-grained segmentation tasks are an important challenge in the field of point cloud multi-view fusion. Traditional multi-view fusion methods ignore two fatal problems: 1. the loss of depth and quantization information due to mapping and voxelization operations, resulting in "anomalies" in the extracted features; 2. how to pay attention to the large differences in object sizes among different views during point cloud learning, and fine-tune the fusion efficiency in order to improve the performance of network. In this paper, we propose a new algorithm that uses channel self-attention to fuse range-point-voxel, abbreviated as RPV-CASNet. RPV-CASNet integrates the three different views: range, point and voxel in a more subtle way through an interactive structure (range-point-voxel cross-adaptive layer known as RPVLayer for short), to take full advantage of the differences among them. The RPVLayer contains two key designs: the Feature Refinement Module (FRM) and the Multi-Fine-Grained Feature Self-Attention Module(MFGFSAM). Specifically, the FRM allows for a re-inference representation of points with entrained anomalous features, correcting the features. The MFGFSAM addresses two challenges: efficiently aggregating tokens from distant regions and preserving multiscale features within a single attention layer. In addition, we design a Dynamic Feature Pyramid Extractor (DFPE) for network deployment, which is used to extract rich features from spherical range images. Our method achieves impressive mIoU scores of 69.8% and 77.1% on the SemanticKITTI and nuScenes datasets, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Multi-branch feature fusion and refinement network for salient object detection.
- Author
-
Yang, Jinyu, Shi, Yanjiao, Zhang, Jin, Guo, Qianqian, Zhang, Qing, and Cui, Liu
- Abstract
With the development of convolutional neural networks (CNNs), salient object detection methods have made great progress in performance. Most methods are designed with complex structures to aggregate the multi-level feature maps, to reach the goal of filtering noise and obtaining rich information. However, there is no differentiation when dealing with the multi-level features, and only a uniform treatment is used in general. Based on the above considerations, in this paper, we propose a multi-branch feature fusion and refinement network (MFFRNet), which is a framework for treating low-level features and high-level features differently, and effectively fuses the information of multi-level features to make the results more accurate. We propose a detail optimization module (DOM) designed for the rich detail information in low-level features and a pyramid feature extraction module (PFEM) designed for the rich semantic information in high-level features, as well as a feature optimization module (FOM) for refining the fused feature of multiple levels. Extensive experiments are conducted on six benchmark datasets, and the results show that our approach outperforms the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. 特征细化和多尺度注意力的Transformer 图像去噪网络.
- Author
-
袁姮 and 耿仪坤
- Abstract
Copyright of Journal of Frontiers of Computer Science & Technology is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
10. SFERNet: Student Facial Expression Recognition Using Superpixel-Assisted Global Semantic Enhancement and Fine-Grained Features
- Author
-
Rong, Yan, Liu, Jiawen, Li, Xinlei, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Zhang, Xiankun, editor, and Guo, Jiayang, editor
- Published
- 2024
- Full Text
- View/download PDF
11. CCMFRNet: A Real-Time Semantic Segmentation Network with Context Cascade and Multi-scale Feature Refinement
- Author
-
Hua, Shuai, Cheng, Jieren, Han, Wenbao, Xu, Wenhang, Sheng, Victor S., Howlett, Robert J., Series Editor, Jain, Lakhmi C., Series Editor, Qiu, Xuesong, editor, Xiao, Yang, editor, Wu, Zhiqiang, editor, Zhang, Yudong, editor, Tian, Yuan, editor, and Liu, Bo, editor
- Published
- 2024
- Full Text
- View/download PDF
12. Boosting adversarial robustness via feature refinement, suppression, and alignment.
- Author
-
Wu, Yulun, Guo, Yanming, Chen, Dongmei, Yu, Tianyuan, Xiao, Huaxin, Guo, Yuanhao, and Bai, Liang
- Subjects
ARTIFICIAL neural networks ,RSA algorithm - Abstract
Deep neural networks are vulnerable to adversarial attacks, bringing high risk to numerous security-critical applications. Existing adversarial defense algorithms primarily concentrate on optimizing adversarial training strategies to improve the robustness of neural networks, but ignore that the misguided decisions are essentially made by the activation values. Besides, such conventional strategies normally result in a great decline in clean accuracy. To address the above issues, we propose a novel RSA algorithm to counteract adversarial perturbations while maintaining clean accuracy. Specifically, RSA comprises three distinct modules: feature refinement, activation suppression, and alignment modules. First, the feature refinement module refines malicious activation values in the feature space. Subsequently, the feature activation suppression module mitigates redundant activation values induced by adversarial perturbations across both channel and spatial dimensions. Finally, to avoid an excessive performance drop on clean samples, RSA incorporates a consistency constraint and a knowledge distillation constraint for feature alignment. Extensive experiments on five public datasets and three backbone networks demonstrate that our proposed algorithm achieves consistently superior performance in both adversarial robustness and clean accuracy over the state-of-the-art. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. BCT-OFD: bridging CNN and transformer via online feature distillation for COVID-19 image recognition.
- Author
-
Zhang, Hongbin, Hu, Lang, Liang, Weinan, Li, Zhijie, Yuan, Meng, Ye, Yiyuan, Wang, Zelin, Ren, Yafeng, and Li, Xiong
- Abstract
Computer-aided systems can assist radiologists in improving the efficiency of COVID-19 diagnosis. Current work adopts complex structures and the correlations between convolutional neural network (CNN) and Transformer haven't been explored. We propose a novel model named bridging CNN and Transformer via online feature distillation (BCT-OFD). First, lightweight Mpvit-tiny and MobileNetV3-small are chosen as the teacher and student networks, respectively. Sufficient pathological knowledge is smoothly transferred from the teacher to the student using OFD. Then, a adaptive feature fusion module is designed to efficiently fuse the heterogeneous CNN and Transformer features. The implicit correlations between the two networks are fully mined to generate more discriminative fused features. And coordinate attention is adopted to make further feature refinement. The accuracy on three public avaliable datasets reach 97.76%, 98.12% and 96.96%, respectively. It validates that BCT-OFD outperforms state-of-the-art baselines in terms of effectiveness and generalization ability. Notably, BCT-OFD is relatively more lightweight and easier to deploy on those resource-constrained devices, making it the bridge that links theory to application as well as narrowing the gap between them. This study provides an innovative approach in the field of COVID-19 image recognition, offering valuable insights for further improvements in performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. 细化多尺度感知与优化轮廓的自适应道路场景语义分割网络.
- Author
-
司马海峰, 许毓霜, 王静, and 徐明亮
- Abstract
Copyright of Journal of Computer-Aided Design & Computer Graphics / Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao is the property of Gai Kan Bian Wei Hui and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
15. Unsupervised person re-identification of adversarial disentangling learning guided by refined features.
- Author
-
Chen Yuanmei, Wang Fengsui, and Wang Luyao
- Abstract
Unsupervised person re-identification aims to identify the same person from non-overlapping cameras under unsupervised settings. Aiming at the problem that the existing unsupervised person re-identification network cannot fully extract pedestrian features and the difference between cameras leads to pedestrian retrieval errors, we propose an unsupervised person re-identification of adversarial disentangling learning guided by refined features. A feature refinement information fusion module is designed and embedded into different layers of ResNet50 network to enhance the ability of the network to extract key information. A disentangled feature learning method is designed to minimize the mutual information between pedestrian features and camera features, and reduce the negative impact of camera differences on the network. At the same time, the adversarial disentangling loss function is designed for unsupervised joint learning. Using the Market-1501 and DukeMTMC-reID public datasets, we tested the proposed method. The mean average precision increased by 4.6% and 3.1% respectively. Compared with the baseline algorithm, it has strong robustness and meets the needs of pedestrian recognition in unsupervised background. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Noise-Aware Extended U-Net With Split Encoder and Feature Refinement Module for Robust Speaker Verification in Noisy Environments
- Author
-
Chan-Yeong Lim, Jungwoo Heo, Ju-Ho Kim, Hyun-Seo Shin, and Ha-Jin Yu
- Subjects
Noise-aware extended U-Net ,split encoder ,feature refinement ,feature enhancement ,joint training ,noisy environments ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Speech data gathered from real-world environments typically contain noise, a significant element that undermines the performance of deep neural network-based speaker verification (SV) systems. To mitigate performance degradation due to noise and develop noise-robust SV systems, several researchers have integrated speech enhancement (SE) and SV systems. We previously proposed the extended U-Net (ExU-Net), which achieved state-of-the-art performance in SV in noisy environments by jointly training SE and SV systems. In the SE field, some studies have shown that recognizing noise components within speech can improve the system’s performance. Inspired by these approaches, we propose a noise-aware ExU-Net (NA-ExU-Net) that acknowledges noise information in the SE process based on the ExU-Net architecture. The proposed system comprises a Split Encoder and a feature refinement module (FRM). The Split Encoder handles the speech and noise separately by dividing the encoder blocks, whereas FRM is designed to inhibit the propagation of irrelevant data via skip connections. To validate the effectiveness of our proposed framework in noisy conditions, we evaluated the models on the VoxCeleb1 test set with added noise from the MUSAN corpus. The experimental results demonstrate that NA-ExU-Net outperforms the ExU-Net and other baseline systems under all evaluation conditions. Furthermore, evaluations in out-of-domain noise environments indicate that NA-ExU-Net significantly surpasses existing frameworks, highlighting its robustness and generalization capabilities. The codes utilized in our experiments can be accessed at https://github.com/chan-yeong0519/NA-ExU-Net.
- Published
- 2024
- Full Text
- View/download PDF
17. SiamTAR: Enhancing Object Tracking With Joint Template Updating and Relocation Mechanisms
- Author
-
Jia Wen and Kejun Ren
- Subjects
Object tracking ,Siamese network ,template update ,feature refinement ,relocation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Siamese network-based trackers have demonstrated competitive performance in the domain of single object tracking. However, their effectiveness is significantly hindered when the target undergoes challenges such as deformation and illumination changes, due to the fixed template features from the first frame. To address this issue, we propose a novel tracking framework called SiamTAR. This framework adaptively updates the current frame’s template by fusing template features from the first frame with updated template features and tracking box features from the previous frame, thereby effectively improving tracking accuracy. Additionally, to reduce the tracker’s attention to redundant information such as similar shapes, colors, and textures near the target, we designed a feature refinement module. This module integrates three attention mechanisms through two parallel branches to capture critical target information, allowing the tracker to ignore some redundant information. To tackle issues of tracking box drift and inaccurate scale estimation during online tracking, we introduce a relocation mechanism. This mechanism corrects the tracking box position by merging the output tracking box features with the template features. Extensive experiments on multiple datasets validate the superior tracking performance of SiamTAR. Specifically, on the GOT-10K dataset, SiamTAR surpasses the current leading Siamese tracker, SiamPW-RBO, by 1.5% in AO and 7.5% in $SR_{0.75}$ metrics, achieving a tracking speed of 26.23 FPS. Source code is available at https://github.com/rkj12345/SiamTAR.
- Published
- 2024
- Full Text
- View/download PDF
18. Boosting adversarial robustness via feature refinement, suppression, and alignment
- Author
-
Yulun Wu, Yanming Guo, Dongmei Chen, Tianyuan Yu, Huaxin Xiao, Yuanhao Guo, and Liang Bai
- Subjects
Adversarial example ,Adversarial defense ,Feature refinement ,Feature suppression ,Feature alignment ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract Deep neural networks are vulnerable to adversarial attacks, bringing high risk to numerous security-critical applications. Existing adversarial defense algorithms primarily concentrate on optimizing adversarial training strategies to improve the robustness of neural networks, but ignore that the misguided decisions are essentially made by the activation values. Besides, such conventional strategies normally result in a great decline in clean accuracy. To address the above issues, we propose a novel RSA algorithm to counteract adversarial perturbations while maintaining clean accuracy. Specifically, RSA comprises three distinct modules: feature refinement, activation suppression, and alignment modules. First, the feature refinement module refines malicious activation values in the feature space. Subsequently, the feature activation suppression module mitigates redundant activation values induced by adversarial perturbations across both channel and spatial dimensions. Finally, to avoid an excessive performance drop on clean samples, RSA incorporates a consistency constraint and a knowledge distillation constraint for feature alignment. Extensive experiments on five public datasets and three backbone networks demonstrate that our proposed algorithm achieves consistently superior performance in both adversarial robustness and clean accuracy over the state-of-the-art.
- Published
- 2024
- Full Text
- View/download PDF
19. A Remote Sensing Target Detection Model Based on Lightweight Feature Enhancement and Feature Refinement Extraction
- Author
-
Dongen Guo, Zhuoke Zhou, Fengshuo Guo, Chaoxin Jia, Xiaohong Huang, Jiangfan Feng, and Zhen Shen
- Subjects
Feature refinement ,feature fusion ,remote sensing image (RSI) ,target detection ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Remote sensing image (RSI) target detection methods based on traditional multiscale feature fusion (MSFF) have achieved great success. However, the traditional MSFF method significantly increases the computational cost during model training and inference, and the simple fusion operation may lead to the semantic confusion of the feature map, which cannot realize the refined extraction of features by the model. In order to reduce the computational effort associated with the MSFF operation and to enable the features in the feature map to present an accurate, fine-grained distribution, we propose a single-stage detection model (RS-YOLO). Our main additions to RS-YOLO are a computationally smaller and faster Quick and Small E-ELEN (QS-E-ELEN) module and a feature refinement extraction (FRE) module. In the QS-E-ELEN module, We utilize QSBlock, jump-join, and convolution operations to fuse features on different scales and reduce the computational effort of the model by exploiting the similarity of the RSI feature map channels. In order for the model to better utilize the enhanced features, we designed the FRE module to make the location of the enhanced features more accurate and fine. By conducting experiments on the popular NWPU-VHR- 10 and SSDD datasets, we derive results showing that RS-YOLO outperforms most mainstream models in terms of the tradeoff between accuracy and speed. Specifically, in terms of accuracy, it improves 1.6$\%$ and 1.7$\%$ compared to the current state-of-the-art models, respectively. At the same time, RS-YOLO reduces the number of parameters and computational effort.
- Published
- 2024
- Full Text
- View/download PDF
20. Colposcopic Image Segmentation Based on Feature Refinement and Attention
- Author
-
Yuxi He, Liping Liu, Jinliang Wang, Nannan Zhao, and Hangyu He
- Subjects
Image segmentation ,colposcopy image ,feature refinement ,lightweight upsampling ,loss function ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The current computer-aided diagnosis for cervical cancer screening encounters issues with missing detailed information during colposcopic image segmentation and incomplete edge delineation. To overcome these challenges, this study introduces the RUC-U2Net architecture, which enhances image segmentation through feature refinement and upsampling connections. Two variants are developed: RUC-U2Net and the lightweight RUC+-U2Net. Initially, a feature refinement module that leverages an attention mechanism is proposed to improve detail capture by the model’s fundamental unit during downsampling. Subsequently, the integration of diagonal attention in connecting peer-level encoders and decoders supplements finer semantic details to the decoder’s feature maps, addressing the problem of incomplete edge segmentation. Finally, the application of the Focal Tversky loss function allows the model to concentrate on difficult samples, mitigating the challenges posed by imbalanced distributions of positive and negative samples in training datasets. Experimental evaluations on three publicly available datasets demonstrate that the proposed models significantly outperform existing methods across seven performance metrics, evidencing their superior segmentation accuracy.
- Published
- 2024
- Full Text
- View/download PDF
21. Enhanced Feature Refinement Network Based on Depthwise Separable Convolution for Lightweight Image Super-Resolution
- Author
-
Weizhe Sun, Ran Ke, Zhen Liu, Haoran Lu, Dong Li, Fei Yang, and Lei Zhang
- Subjects
image super-resolution ,enhanced lightweight network ,depthwise separatable convolution ,feature refinement ,Mathematics ,QA1-939 - Abstract
Image super-resolution (SR) techniques aim to enhance the clarity and realism of images. Recently, a wide range of excellent SR algorithms with powerful characterization capabilities have emerged and are widely used. However, there are still challenges and room for improvement in designing a lighter and more edge-friendly SR networks for hardware devices. In this paper, we propose a lightweight enhanced feature refinement network (EFRN) based on depthwise separatable convolution for SR reconstruction. The core network components consist of multiple enhanced feature refinement blocks (EFRB), which fully fuse channel features to extract more accurate low-frequency information based on the attention of different channels. In addition, a lightweight residual block (LRB) and a lightweight dual attention block (LDAB) are designed to enhance network information extraction with minimal parameter cost. We improve the feature refinement by using 1 × 1 convolution instead of a channel selection operation to reduce the dimensionality of the features and extract the refined features more efficiently. Finally, to achieve better reconstruction performance, the depth and number of channels of the network are expanded while keeping the total number of parameters at a low level. Extensive experiments have been conducted to demonstrate the superiority of our EFRN over other mainstream SR algorithms in terms of reconstruction results and the number of parameters.
- Published
- 2024
- Full Text
- View/download PDF
22. An adaptive guidance fusion network for RGB-D salient object detection.
- Author
-
Sun, Haodong, Wang, Yu, and Ma, Xinpeng
- Abstract
RGB-D salient object detection (RGB-D SOD) has currently attracted much attention for its prospect of broad application. On the basis of the "encoder-decoder" paradigm of the fully convolutional network (FCN), many FCN-based strategies have emerged and achieved huge progress, but underestimated the potential of level-specific characteristics of multi-modal features. In this paper, we propose the adaptive guided fusion network (AGFNet) to further mine the potential information between the depth image and the RGB image, and design an adaptive fusion and coarse-to-fine decoding strategy to achieve high-precision detection of salient objects. Specifically, we first use a two-stream encoder to extract the multi-level features of the RGB image and depth image but refrain from the previous practice of using depth features for each layer. Second, a simple but effective way named multi-modal selective fusion strategy is designed to fuse the multi-level features. Third, for enhancement of contextual information of each level adaptively, an adaptive cross fusion module (ACFM) fuses the features at all levels and outputs a coarse saliency map. Finally, a guided attention refinement module (GARM) utilizes the coarse saliency map to guide the final features from ACFM to realize the enhancement and obtain a refined saliency map. Our method is compared with other state-of-the-art RGB-SOD methods through extensive experiments, and the results demonstrate the superiority of our proposed AGFNet. The source code of this project is available at https://github.com/HaodongSun809/my_AGFNet.git. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. 基于上下文信息与特征细化的无人机小目标检测算法.
- Author
-
彭晏飞, 赵 涛, 陈炎康, and 袁晓龙
- Abstract
Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
24. Adaptive Point-Line Fusion: A Targetless LiDAR–Camera Calibration Method with Scheme Selection for Autonomous Driving.
- Author
-
Zhou, Yingtong, Han, Tiansi, Nie, Qiong, Zhu, Yuxuan, Li, Minghu, Bian, Ning, and Li, Zhiheng
- Subjects
- *
CALIBRATION , *LIDAR - Abstract
Accurate calibration between LiDAR and camera sensors is crucial for autonomous driving systems to perceive and understand the environment effectively. Typically, LiDAR–camera extrinsic calibration requires feature alignment and overlapping fields of view. Aligning features from different modalities can be challenging due to noise influence. Therefore, this paper proposes a targetless extrinsic calibration method for monocular cameras and LiDAR sensors that have a non-overlapping field of view. The proposed solution uses pose transformation to establish data association across different modalities. This conversion turns the calibration problem into an optimization problem within a visual SLAM system without requiring overlapping views. To improve performance, line features serve as constraints in visual SLAM. Accurate positions of line segments are obtained by utilizing an extended photometric error optimization method. Moreover, a strategy is proposed for selecting appropriate calibration methods from among several alternative optimization schemes. This adaptive calibration method selection strategy ensures robust calibration performance in urban autonomous driving scenarios with varying lighting and environmental textures while avoiding failures and excessive bias that may result from relying on a single approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. A Lightweight Context-Aware Feature Transformer Network for Human Pose Estimation.
- Author
-
Ma, Yanli, Shi, Qingxuan, and Zhang, Fan
- Subjects
CONVOLUTIONAL neural networks - Abstract
We propose a Context-aware Feature Transformer Network (CaFTNet), a novel network for human pose estimation. To address the issue of limited modeling of global dependencies in convolutional neural networks, we design the Transformerneck to strengthen the expressive power of features. Transformerneck directly substitutes 3 × 3 convolution in the bottleneck of HRNet with a Contextual Transformer (CoT) block while reducing the complexity of the network. Specifically, the CoT first produces keys with static contextual information through 3 × 3 convolution. Then, relying on query and contextualization keys, dynamic contexts are generated through two concatenated 1 × 1 convolutions. Static and dynamic contexts are eventually fused as an output. Additionally, for multi-scale networks, in order to further refine the features of the fusion output, we propose an Attention Feature Aggregation Module (AFAM). Technically, given an intermediate input, the AFAM successively deduces attention maps along the channel and spatial dimensions. Then, an adaptive refinement module (ARM) is exploited to activate the obtained attention maps. Finally, the input undergoes adaptive feature refinement through multiplication with the activated attention maps. Through the above procedures, our lightweight network provides powerful clues for the detection of keypoints. Experiments are performed on the COCO and MPII datasets. The model achieves a 76.2 AP on the COCO val2017 dataset. Compared to other methods with a CNN as the backbone, CaFTNet has a 72.9% reduced number of parameters. On the MPII dataset, our method uses only 60.7% of the number of parameters, acquiring similar results to other methods with a CNN as the backbone. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. MBA-Net: multi-branch attention network for occluded person re-identification.
- Author
-
Hong, Xing, Zhang, Langwen, Yu, Xiaoyuan, Xie, Wei, and Xie, Yumin
- Abstract
Occluded person re-identification (ReID) aims to retrieve the same pedestrian from partially occluded pedestrian images across non-overlapping cameras. Current state-of-the-art methods generally use auxiliary models to obtain non-occluded regions, which not only result in more complex models, but also cannot effectively handle the more generalized ReID task. To this end, a Multi-Branch Attention Network (MBA-Net) is proposed to achieve multi-level refinement of features through an end-to-end multi-branch framework with attention mechanisms. Specifically, we first achieve preliminary feature refinement through a backbone network with a non-local attention mechanism. Then, a two-level multi-branch architecture in MBA-Net is proposed with two-level features refinement to obtain aware local discriminative features from the self-attention branch, non-occluded local complementary features from the cross-attention branch, and global features from the global branch. Finally, we can obtain retrieval features that are robust to occlusion by concatenating all the above features. Experimental results show that our MBA-Net achieves state-of-the-art performance on an occluded person ReID dataset Occluded-Duke and simultaneously achieves competitive performance on two general person ReID datasets Market-1501 and DukeMTMC-ReID. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. DHFNet: dual-decoding hierarchical fusion network for RGB-thermal semantic segmentation.
- Author
-
Cai, Yuqi, Zhou, Wujie, Zhang, Liting, Yu, Lu, and Luo, Ting
- Subjects
- *
LINEAR network coding , *THERMOGRAPHY , *DEEP learning , *PYRAMIDS - Abstract
Recently, red-green-blue (RGB) and thermal (RGB-T) data have attracted considerable interest for semantic segmentation because they provide robust imaging under the complex lighting conditions of urban roads. Most existing RGB-T semantic segmentation methods adopt an encoder-decoder structure, and repeated upsampling causes semantic information loss during decoding. Moreover, using simple cross-modality fusion neither completely mines complementary information from different modalities nor removes noise from the extracted features. To address these problems, we developed a dual-decoding hierarchical fusion network (DHFNet) to extract RGB and thermal information for RGB-T Semantic Segmentation. DHFNet uses a novel two-layer decoder and implements boundary refinement and boundary-guided foreground/background enhancement modules. The modules process features from different levels to achieve the global guidance and local refinement of the segmentation prediction. In addition, an adaptive attention-filtering fusion module filters and extracts complementary information from the RGB and thermal modalities. Further, we introduce a graph convolutional network and an atrous spatial pyramid pooling module to obtain multiscale features and deepen the extracted semantic information. Experimental results on two benchmark datasets showed that the proposed DHFNet performed well relative to state-of-the-art semantic segmentation methods in terms of different evaluation metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Multi-View Feature Fusion and Rich Information Refinement Network for Semantic Segmentation of Remote Sensing Images
- Author
-
Jiang Liu, Shuli Cheng, and Anyu Du
- Subjects
semantic segmentation ,feature refinement ,remote sensing ,multi-view feature fusion ,Science - Abstract
Semantic segmentation is currently a hot topic in remote sensing image processing. There are extensive applications in land planning and surveying. Many current studies combine Convolutional Neural Networks (CNNs), which extract local information, with Transformers, which capture global information, to obtain richer information. However, the fused feature information is not sufficiently enriched and it often lacks detailed refinement. To address this issue, we propose a novel method called the Multi-View Feature Fusion and Rich Information Refinement Network (MFRNet). Our model is equipped with the Multi-View Feature Fusion Block (MAFF) to merge various types of information, including local, non-local, channel, and positional information. Within MAFF, we introduce two innovative methods. The Sliding Heterogeneous Multi-Head Attention (SHMA) extracts local, non-local, and positional information using a sliding window, while the Multi-Scale Hierarchical Compressed Channel Attention (MSCA) leverages bar-shaped pooling kernels and stepwise compression to obtain reliable channel information. Additionally, we introduce the Efficient Feature Refinement Module (EFRM), which enhances segmentation accuracy by interacting the results of the Long-Range Information Perception Branch and the Local Semantic Information Perception Branch. We evaluate our model on the ISPRS Vaihingen and Potsdam datasets. We conducted extensive comparison experiments with state-of-the-art models and verified that MFRNet outperforms other models.
- Published
- 2024
- Full Text
- View/download PDF
29. Edge-Node Refinement for Weakly-Supervised Point Cloud Segmentation
- Author
-
Wang, Yufan, Zhao, Qunfei, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, and Deng, Zhidong, editor
- Published
- 2023
- Full Text
- View/download PDF
30. Detection and localization of multi-scale and oriented objects using an enhanced feature refinement algorithm
- Author
-
Deepika Roselind Johnson and Rhymend Uthariaraj Vaidhyanathan
- Subjects
object detection ,single-stage rotation detection ,feature refinement ,oriented object detection ,progressive approach ,loss discontinuity ,Biotechnology ,TP248.13-248.65 ,Mathematics ,QA1-939 - Abstract
Object detection is a fundamental aspect of computer vision, with numerous generic object detectors proposed by various researchers. The proposed work presents a novel single-stage rotation detector that can detect oriented and multi-scale objects accurately from diverse scenarios. This detector addresses the challenges faced by current rotation detectors, such as the detection of arbitrary orientations, objects that are densely arranged, and the issue of loss discontinuity. First, the detector also adopts a progressive regression form (coarse-to-fine-grained approach) that uses both horizontal anchors (speed and higher recall) and rotating anchors (oriented objects) in cluttered backgrounds. Second, the proposed detector includes a feature refinement module that helps minimize the problems related to feature angulation and reduces the number of bounding boxes generated. Finally, to address the issue of loss discontinuity, the proposed detector utilizes a newly formulated adjustable loss function that can be extended to both single-stage and two-stage detectors. The proposed detector shows outstanding performance on benchmark datasets and significantly outperforms other state-of-the-art methods in terms of speed and accuracy.
- Published
- 2023
- Full Text
- View/download PDF
31. Feature alignment and refinement for Remote Sensing images change Detection.
- Author
-
Yikun Liu, Mingsong Li, Tao Xiao, Yuwen Huang, and Gongping Yang
- Subjects
- *
REMOTE sensing , *DEEP learning , *FEATURE extraction , *SURFACE dynamics , *TIME-varying networks , *REMOTE-sensing images - Abstract
Change detection (CD) plays a critical role in extracting ground changes from bi-temporal remote sensing (RS) images and is instrumental in understanding surface dynamics. In recent years, deep learning has made significant breakthroughs in CD. However, typical CD methods that employ the Siamese network for temporal feature extraction lack feature alignment ability for bi-temporal heterogeneous RS images, resulting in inadequate temporal discriminative capability. Moreover, deep learning-based CD methods are still susceptible to the problem of minor changes missing due to scale variation with deeper network layers. In this article, we propose a bi-temporal feature alignment and refinement network (FARNet). To improve the discriminative capability of the Siamese network, an adversarial learning-based temporal discriminatory loss function is designed to align temporal-level features and eliminate bi-temporal domain shift, and a cosine similarity-based loss function is employed to measure feature distance at the Pixel-level. To address the problem of minor changes missing, we adopt a dilated convolution-based Siamese network to prevent feature map size reduction, and a multi-level feature detail supplement (MFDS) module is designed to supplement the deep layer features with shallow layer features. Additionally, we construct a change map refinement (CMR) module that refines the coarse change map to the finegrained change map. Furthermore, we design a cross-temporal feature interaction (CFI) module to learn more fine-grained change features by combining features across temporal. Comprehensive experimental results on two popular CD datasets demonstrate the effectiveness and efficiency of FARNet compared with state-of-theart (SOTA) methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
32. 自适应特征细化的遥感图像有向目标检测.
- Author
-
刘恩海, 许佳音, 李妍, and 樊世燕
- Abstract
Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
33. PVFAN: Point-view fusion attention network for 3D shape recognition.
- Author
-
Cao, Jiangzhong and Liao, Siyi
- Subjects
- *
RECOGNITION (Psychology) , *VISUAL fields , *POINT set theory , *COMPUTER vision , *POINT cloud - Abstract
3D shape recognition is a critical research topic in the field of computer vision, attracting substantial attention. Existing approaches mainly focus on extracting distinctive 3D shape features; however, they often neglect the model's robustness and lack refinement in deep features. To address these limitations, we propose the point-view fusion attention network that aims to extract a concise, informative, and robust 3D shape descriptor. Initially, our approach combines multi-view features with point cloud features to obtain accurate and distinguishable fusion features. To effectively handle these fusion features, we design a dual-attention convolutional network which consists of a channel attention module and a spatial attention module. This dual-attention mechanism greatly enhances the generalization ability and robustness of 3D recognition models. Notably, we introduce a strip-pooling layer in the channel attention module to refine the features, resulting in improved fusion features that are more compact. Finally, a classification process is performed on the refined features to assign appropriate 3D shape labels. Our extensive experiments on the ModelNet10 and ModelNet40 datasets for 3D shape recognition and retrieval demonstrate the remarkable accuracy and robustness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
34. Dilated Convolution-based Feature Refinement Network for Crowd Localization.
- Author
-
XINGYU GAO, JINYANG XIE, ZHENYU CHEN, AN-AN LIU, ZHENAN SUN, and LEI LYU
- Abstract
As an emerging computer vision task, crowd localization has received increasing attention due to its ability to produce more accurate spatially predictions. However, continuous scale variations in complex crowd scenes lead to tiny individuals at the edges, so that existing methods cannot achieve precise crowd localization. Aiming at alleviating the above problems, we propose a novel Dilated Convolution-based Feature Refinement Network (DFRNet) to enhance the representation learning capability. Specifically, the DFRNet is built with three branches that can capture the information of each individual in crowd scenes more precisely. More specifically, we introduce a Feature Perception Module to model long-range contextual information at different scales by adopting multiple dilated convolutions, thus providing sufficient feature information to perceive tiny individuals at the edge of images. Afterwards, a Feature Refinement Module is deployed at multiple stages of the three branches to facilitate the mutual refinement of feature information at different scales, thus further improving the expression capability of multi-scale contextual information. By incorporating the above modules, DFRNet can locate individuals in complex scenes more precisely. Extensive experiments on multiple datasets demonstrate that the proposed method has more advanced performance compared to existing methods and can be more accurately adapted to complex crowd scenes. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
35. Even-Order Taylor Approximation-Based Feature Refinement and Dynamic Aggregation Model for Video Object Detection.
- Author
-
Chen, Liule, Li, Jianqiang, Li, Yunyu, and Zhao, Qing
- Subjects
DYNAMIC models ,ENCYCLOPEDIAS & dictionaries ,VIDEOS - Abstract
Video object detection (VOD) is a sophisticated visual task. It is a consensus that is used to find effective supportive information from correlation frames to boost the performance of the model in VOD tasks. In this paper, we not only improve the method of finding supportive information from correlation frames but also strengthen the quality of the features extracted from the correlation frames to further strengthen the fusion of correlation frames so that the model can achieve better performance. The feature refinement module FRM in our model refines the features through the key–value encoding dictionary based on the even-order Taylor series, and the refined features are used to guide the fusion of features at different stages. In the stage of correlation frame fusion, the generative MLP is applied in the feature aggregation module DFAM to fuse the refined features extracted from the correlation frames. Experiments adequately demonstrate the effectiveness of our proposed approach. Our YOLOX-based model can achieve 83.3% AP50 on the ImageNet VID dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. Substation rotational object detection based on multi-scale feature fusion and refinement
- Author
-
Bin Li, Yalin Li, Xinshan Zhu, Luyao Qu, Shuai Wang, Yangyang Tian, and Dan Xu
- Subjects
Substation ,Rotated device ,Object detection ,Feature fusion ,Feature refinement ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 ,Computer software ,QA76.75-76.765 - Abstract
In modern energy systems, substations are the core of electricity transmission and distribution. However, similar appearance and small size pose significant challenges for automatic identification of electrical devices. To address these issues, we collect and annotate the substation rotated device dataset (SRDD). Further, feature fusion and feature refinement network (F3RNet) are constructed based on the classic structure pattern of backbone-neck-head. Considering the similar appearance of electrical devices, the deconvolution fusion module (DFM) is designed to enhance the expression of feature information. The balanced feature pyramid (BFP) is embedded to aggregate the global feature. The feature refinement is constructed to adjust the original feature maps by considering the feature alignment between the anchors and devices. It can generate more accurate feature vectors. To address the problem of sample imbalance between electrical devices, the gradient harmonized mechanism (GHM) loss is utilized to adjust the weight of each sample. The ablation experiments are conducted on the SRDD dataset. F3RNet achieves the best detection performance compared with classical object detection networks. Also, it is verified that the features from global feature maps can effectively recognize the similar and small devices.
- Published
- 2023
- Full Text
- View/download PDF
37. Feature refinement with multi-level context for object detection.
- Author
-
Ma, Yingdong and Wang, Yanan
- Abstract
Robust multi-scale object detection is challenging as it requires both spatial details and semantic knowledge to deal with problems including high scale variation and cluttered background. Appropriate fusion of high-resolution features with deep semantic features is the key issue to achieve better performance. Different approaches have been developed to extract and combine deep features with shallow layer spatial features, such as feature pyramid network. However, high-resolution feature maps contain noisy and distractive features. Directly combines shallow features with semantic features might degrade detection accuracy. Besides, contextual information is also important for multi-scale object detection. In this work, we present a feature refinement scheme to tackle the feature fusion problem. The proposed feature refinement module increases feature resolution and refine feature maps progressively with the guidance from deep features. Meanwhile, we propose a context extraction method to capture global and local contextual information. The method utilizes a multi-level cross-pooling unit to extract global context and a cascaded context module to extract local context. The proposed object detection framework has been evaluated on PASCAL VOC and MS COCO datasets. Experimental results demonstrate that the proposed method performs favorably against state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. CACFNet: Fabric defect detection via context-aware attention cascaded feedback network.
- Author
-
Liu, Zhoufeng, Tian, Bo, Li, Chunlei, Ding, Shumin, and Xi, Jiangtao
- Subjects
CASCADE connections ,TEXTILES ,QUALITY control ,TEXTILE industry ,LOW vision ,PSYCHOLOGICAL feedback ,MANUFACTURING industries - Abstract
Fabric defect detection plays an irreplaceable role in the quality control of the textile manufacturing industry, but it is still a challenging task due to the diversity and complexity of defects and environmental factors. Visual saliency models imitating the human vision system can quickly determine the defect regions from the complex texture background. However, most visual saliency-based methods still suffer from incomplete predictions owing to the variability of fabric defects and low contrast with the background. In this paper, we develop a context-aware attention cascaded feedback network for fabric defect detection to achieve more accurate predictions, in which a parallel context extractor is designed to characterize the multi-scale contextual information. Moreover, a top-down attention cascaded feedback module was devised adaptively to select the important multi-scale complementary information and then transmit it to an adjacent shallower layer to compensate for the inconsistency of information among layers for accurate location. Finally, a multi-level loss function is applied to guide our model for generating more accurate prediction results via optimizing multiple side-output predictions. Experimental results on the two fabric datasets built under six widely used evaluation metrics demonstrate that our proposed framework outperforms state-of-the-art models remarkably. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
39. FRCD: Feature Refine Change Detection Network for Remote Sensing Images.
- Author
-
Wang, Zhewei, Pan, Zongxu, Hu, Yuxin, and Lei, Bin
- Abstract
Change detection (CD) plays an important role in Earth surface analysis. Current CD methods have achieved good performance in large flat areas, but CD of detailed parts is still a great challenge, and the loss of detail causes many faults around the change boundaries and on small objects. By analyzing the feature map of the widely used U-Net architecture in existing methods, we ascribe the detail loss to the depletion of detailed features during the top-to-down delivery in the U-Net architecture. The feature refine CD (FRCD) model is proposed in which the detection results are predicted directly from the multiscale features instead of the U-Net architecture. By direct prediction, the representation ability of details is enhanced, and thus the detection accuracy (Acc) of boundaries and small objects improves. Moreover, the normal upsampling in direct prediction is replaced with the deformable upsampling, which delivers detailed information from the low-level to the high-level via the deformable convolution, allowing the results to further fit boundaries in the FRCD model. Experimental results on two datasets confirm the effectiveness of FRCD compared to the state-of-the-art methods, and the CD results of boundaries and small objects are improved significantly by the proposed method. Code will be available after the acceptance of the letter in https://github.com/ijnokml/cdfr. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. MCRF: Enhancing CTR Prediction Models via Multi-channel Feature Refinement Framework
- Author
-
Wang, Fangye, Gu, Hansu, Li, Dongsheng, Lu, Tun, Zhang, Peng, Gu, Ning, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bhattacharya, Arnab, editor, Lee Mong Li, Janice, editor, Agrawal, Divyakant, editor, Reddy, P. Krishna, editor, Mohania, Mukesh, editor, Mondal, Anirban, editor, Goyal, Vikram, editor, and Uday Kiran, Rage, editor
- Published
- 2022
- Full Text
- View/download PDF
41. Improved YOLOv5 for Small Object Detection Algorithm.
- Author
-
YU Jun and JIA Yinshan
- Abstract
Although the current deep learning technology has made amazing progress in the field of large and medium object detection, small object detection is still a challenging problem today due to the limited size of small object and the limitations of convolutional networks. Based on You Only Look Once version 5 (hereinafter referred to as YOLOv5) algorithm, this research proposes a YOLO-S model, which is very friendly to small objects. Firstly, on the basis of the orginal output layer with only three layers, a special output layer for small object detection is added by using the cascade network. Secondly, in order to supplement context information and suppress multi-scale feature fusion conflicts, a new supplement context information module CFM and channel and spatial feature thinning module FSM is designed. Finally, the upsampling method is replaced by deconvolution from the original linear interpolation. The dataset uses VisDrone2019, which is specially designed for small objects, to verify the effectiveness of the algorithm. The experimental results show that the mAP@0.5 of YOLO-S is 6.9 percentage points higher than that of YOLOv5. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
42. Improving the Performance of the Single Shot Multibox Detector for Steel Surface Defects with Context Fusion and Feature Refinement.
- Author
-
Li, Yiming, He, Lixin, Zhang, Min, Cheng, Zhi, Liu, Wangwei, and Wu, Zijun
- Subjects
SURFACE defects ,STEEL ,DETECTORS ,DEEP learning - Abstract
Strip surface defects have large intraclass and small interclass differences, resulting in the available detection techniques having either a low accuracy or very poor real-time performance. In order to improve the ability for capturing steel surface defects, the context fusion structure introduces the local information of the shallow layer and the semantic information of the deep layer into multiscale feature maps. In addition, for filtering the semantic conflicts and redundancies arising from context fusion, a feature refinement module is introduced in our method, which further improves the detection accuracy. Our experimental results show that this significantly improved the performance. In particular, our method achieved 79.5% mAP and 71 FPS on the public NEU-DET dataset. This means that our method had a higher detection accuracy compared to other techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
43. Feature Aggregation and Refinement Network for 2D Anatomical Landmark Detection.
- Author
-
Ao, Yueyuan and Wu, Hong
- Subjects
SPINE radiography ,HAND radiography ,DEEP learning ,HUMAN body ,AUTOMATION ,CEPHALOMETRY ,STATISTICAL models ,MEDICAL coding - Abstract
Localization of anatomical landmarks is essential for clinical diagnosis, treatment planning, and research. This paper proposes a novel deep network named feature aggregation and refinement network (FARNet) for automatically detecting anatomical landmarks. FARNet employs an encoder-decoder structure architecture. To alleviate the problem of limited training data in the medical domain, we adopt a backbone network pre-trained on natural images as the encoder. The decoder includes a multi-scale feature aggregation module for multi-scale feature fusion and a feature refinement module for high-resolution heatmap regression. Coarse-to-fine supervisions are applied to the two modules to facilitate end-to-end training. We further propose a novel loss function named Exponential Weighted Center loss for accurate heatmap regression, which focuses on the losses from the pixels near landmarks and suppresses the ones from far away. We evaluate FARNet on three publicly available anatomical landmark detection datasets, including cephalometric, hand, and spine radiographs. Our network achieves state-of-the-art performances on all three datasets. Code is available at https://github.com/JuvenileInWind/FARNet. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. Adaptive Point-Line Fusion: A Targetless LiDAR–Camera Calibration Method with Scheme Selection for Autonomous Driving
- Author
-
Yingtong Zhou, Tiansi Han, Qiong Nie, Yuxuan Zhu, Minghu Li, Ning Bian, and Zhiheng Li
- Subjects
LiDAR–camera extrinsic calibration ,autonomous vehicles ,targetless calibration ,point-line fusion ,deep learning ,feature refinement ,Chemical technology ,TP1-1185 - Abstract
Accurate calibration between LiDAR and camera sensors is crucial for autonomous driving systems to perceive and understand the environment effectively. Typically, LiDAR–camera extrinsic calibration requires feature alignment and overlapping fields of view. Aligning features from different modalities can be challenging due to noise influence. Therefore, this paper proposes a targetless extrinsic calibration method for monocular cameras and LiDAR sensors that have a non-overlapping field of view. The proposed solution uses pose transformation to establish data association across different modalities. This conversion turns the calibration problem into an optimization problem within a visual SLAM system without requiring overlapping views. To improve performance, line features serve as constraints in visual SLAM. Accurate positions of line segments are obtained by utilizing an extended photometric error optimization method. Moreover, a strategy is proposed for selecting appropriate calibration methods from among several alternative optimization schemes. This adaptive calibration method selection strategy ensures robust calibration performance in urban autonomous driving scenarios with varying lighting and environmental textures while avoiding failures and excessive bias that may result from relying on a single approach.
- Published
- 2024
- Full Text
- View/download PDF
45. Focus on local: transmission line defect detection via feature refinement.
- Author
-
Li, Yufeng, Dai, Longgang, Ni, Hongxia, Kong, Caihua, and Chen, Xiang
- Abstract
Different from the object detection which has made great progress in natural imagery, transmission line images acquired by UAVs have their own challenges in detecting defects in critical parts, such as object scale variation and small defect targets. In this paper, we construct an effective architecture, called FOLO, to improve the accuracy of defect detection in critical parts of transmission lines. To capture the critical part defect object features, a local contextual feature pyramid network (LCFPN) is proposed to refine the local contextual information and perform multi-scale learning. In LCFPN, we introduce a channel feature refinement block (CFRB) and multiple spatial feature refinement block (SFRBs) to further improve the ability of the network to focus on local features. Besides, a local adaptive feature network (LAFN) is designed, which makes it possible to locate adaptive components with defects in critical areas of different shapes. Since existing transmission line datasets have a single category, we create a new defect detection dataset containing insulators, anti-vibration hammers and bird nests, named IVB. Experimental results on IVB show that the proposed FOLO yields promising performance against other approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. R2-DDI: relation-aware feature refinement for drug–drug interaction prediction.
- Author
-
Lin, Jiacheng, Wu, Lijun, Zhu, Jinhua, Liang, Xiaobo, Xia, Yingce, Xie, Shufang, Qin, Tao, and Liu, Tie-Yan
- Subjects
- *
DRUG interactions , *DEEP learning , *DRUG discovery , *MACHINE learning , *PREDICTION models , *FORECASTING - Abstract
Precisely predicting the drug–drug interaction (DDI) is an important application and host research topic in drug discovery, especially for avoiding the adverse effect when using drug combination treatment for patients. Nowadays, machine learning and deep learning methods have achieved great success in DDI prediction. However, we notice that most of the works ignore the importance of the relation type when building the DDI prediction models. In this work, we propose a novel R |$^2$| -DDI framework, which introduces a relation-aware feature refinement module for drug representation learning. The relation feature is integrated into drug representation and refined in the framework. With the refinement features, we also incorporate the consistency training method to regularize the multi-branch predictions for better generalization. Through extensive experiments and studies, we demonstrate our R |$^2$| -DDI approach can significantly improve the DDI prediction performance over multiple real-world datasets and settings, and our method shows better generalization ability with the help of the feature refinement design. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
47. AFL-Net: Attentional Feature Learning Network for Building Extraction from Remote Sensing Images.
- Author
-
Qiu, Yue, Wu, Fang, Qian, Haizhong, Zhai, Renjian, Gong, Xianyong, Yin, Jichong, Liu, Chengyi, and Wang, Andong
- Subjects
- *
REMOTE sensing , *CONVOLUTIONAL neural networks , *EYE tracking - Abstract
Convolutional neural networks (CNNs) perform well in tasks of segmenting buildings from remote sensing images. However, the intraclass heterogeneity of buildings is high in images, while the interclass homogeneity between buildings and other nonbuilding objects is low. This leads to an inaccurate distinction between buildings and complex backgrounds. To overcome this challenge, we propose an Attentional Feature Learning Network (AFL-Net) that can accurately extract buildings from remote sensing images. We designed an attentional multiscale feature fusion (AMFF) module and a shape feature refinement (SFR) module to improve building recognition accuracy in complex environments. The AMFF module adaptively adjusts the weights of multi-scale features through the attention mechanism, which enhances the global perception and ensures the integrity of building segmentation results. The SFR module captures the shape features of the buildings, which enhances the network capability for identifying the area between building edges and surrounding nonbuilding objects and reduces the over-segmentation of buildings. An ablation study was conducted with both qualitative and quantitative analyses, verifying the effectiveness of the AMFF and SFR modules. The proposed AFL-Net achieved 91.37, 82.10, 73.27, and 79.81% intersection over union (IoU) values on the WHU Building Aerial Imagery, Inria Aerial Image Labeling, Massachusetts Buildings, and Building Instances of Typical Cities in China datasets, respectively. Thus, the AFL-Net offers the prospect of application for successful extraction of buildings from remote sensing images. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
48. A context-aware progressive attention aggregation network for fabric defect detection.
- Author
-
Liu, Zhoufeng, Tian, Bo, Li, Chunlei, Li, Xiao, and Wang, Kaihua
- Abstract
Fabric defect detection plays a critical role for measuring quality control in the textile manufacturing industry. Deep learning-based saliency models can quickly spot the most interesting regions that attract human attention from the complex background, which have been successfully applied in fabric defect detection. However, most of the previous methods mainly adopted multi-level feature aggregation yet ignored the complementary relationship among different features, and thus resulted in poor representation capability for the tiny and slender defects. To remedy these issues, we propose a novel saliency-based fabric defect detection network, which can exploit the complementary information between different layers to enhance the representation features ability and discrimination of defects. Specifically, a multiscale feature aggregation unit (MFAU) is proposed to effectively characterize the multi-scale contextual features. Besides, a feature fusion refinement module (FFR) composed of an attention fusion unit (AFU) and an auxiliary refinement unit (ARU) is designed to exploit complementary important information and further refine the input features for enhancing the discriminative ability of defect features. Finally, a multi-level deep supervision (MDS) is adopted to guide the model to generate more accurate saliency maps. Under different evaluation metrics, our proposed method outperforms most state-of-the-art methods on our developed fabric datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
49. Self-contrastive Feature Guidance Based Multidimensional Collaborative Network of metadata and image features for skin disease classification.
- Author
-
Li, Feng, Li, Min, Zuo, Enguang, Chen, Chen, Chen, Cheng, and Lv, Xiaoyi
- Subjects
- *
FEATURE extraction , *NOSOLOGY , *SKIN cancer , *IMAGE representation , *SKIN imaging - Abstract
Both clinical images and metadata are the foundation of clinical diagnosis, effectively fusing these two resources is a major difficulty in the detection of skin cancer. Even though existing fusion methods produced better fusion outcomes, they only carried out single-level fusion prior to making decisions and used distinct feature extraction for each modal data. The ability of inter-modal synergy is diminished by this fusion strategy, resulting in coarse fusion features. To enhance the multidimensional representation of images, we suggest a Self-contrastive Feature Guidance Based Multidimensional Collaborative Network (SGMC Net). Specifically, we split the fusion method into three steps: spatial dimension fusion, channel dimension fusion, and adaptive corrective outputting to establish multidimensional collaboration between metadata and image features in the feature extraction process. Accordingly, we build three blocks: channel fusion block, spatial fusion block, and feature rectification block. On this basis, we propose a Self-contrastive Feature Guidance method that utilizes the contrast loss between shallow and deep features of the image as a supervisory signal in a non-enhanced manner to optimize shallow features. Finally, extensive experiments were conducted on PAD-UFES-20 and Der7pt dataset, our method achieved an accuracy of 83.3% beyond other state-of-the-art models. We further validated the effectiveness of the feature guidance method, showing a 5.2% improvement in accuracy for SGMC18. • Implementing multidimensional collaboration between metadata and image features. • Designed a non-enhanced manner to promote convergence based on contrast learning. • Introduced a spatial-wise fusion method between metadata and image feature. • SGMC achieves SOTA on PAD-UFES20 dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. MLP-AIR: An effective MLP-based module for actor interaction relation learning in group activity recognition.
- Author
-
Xu, Guoliang, Yin, Jianqin, Zhang, Shaojie, and Gong, Moonjun
- Abstract
Modeling actor interaction relations is crucial for group activity recognition. Previous approaches often adopt a fixed paradigm that involves calculating an affinity matrix to model these interaction relations, yielding significant performance. On the one hand, the affinity matrix introduces an inductive bias that actor interaction relations should be dynamically computed based on the input actor features. On the other hand, MLPs with static parameterization, in which parameters are fixed after training, can represent arbitrary functions. Therefore, it is an open question whether inductive bias is necessary for modeling actor interaction relations. To explore the impact of this inductive bias, we propose an affinity matrix-free paradigm that directly uses the MLP with static parameterization to model actor interaction relations. We term this approach MLP-AIR. This paradigm overcomes the limitations of the inductive bias and enhances the capture of implicit actor interaction relations. Specifically, MLP-AIR consists of two sub-modules: the MLP-based Interaction relation modeling module (MLP-I) and the MLP-based Relation refining module (MLP-R). MLP-I is used to model the spatial–temporal interaction relations by emphasizing cross-actor and cross-frame feature learning. Meanwhile, MLP-R is used to refine the relation between different channels of each relation feature, thereby enhancing the expression ability of the features. MLP-AIR is a plug-and-play module. To evaluate our module, we applied MLP-AIR to replicate three representative methods. We conducted extensive experiments on two widely used benchmarks—the Volleyball and Collective Activity datasets. The experiments demonstrate that MLP-AIR achieves favorable results. The code is available at https://github.com/Xuguoliang12/MLP-AIR. • We propose an MLP-based module for interaction relation modeling implicitly. • Our module is designed by MLP entirely, simplifying the module's structure. • Our module is a universal module that can be used for other methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.