Descriptor: "feature pyramid" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"feature pyramid"' showing total 400 results

Start Over Descriptor "feature pyramid"

400 results on '"feature pyramid"'

1. Rethinking Features-Fused-Pyramid-Neck for Object Detection

Author: Li, Hulin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

2. ChartLine: Automatic Detection and Tracing of Curves in Scientific Line Charts Using Spatial-Sequence Feature Pyramid Network.

Author: Yang, Wenjin, He, Jie, and Li, Qian
Abstract: Line charts are prevalent in scientific documents and commercial data visualization, serving as essential tools for conveying data trends. Automatic detection and tracing of line paths in these charts is crucial for downstream tasks such as data extraction, chart quality assessment, plagiarism detection, and visual question answering. However, line graphs present unique challenges due to their complex backgrounds and diverse curve styles, including solid, dashed, and dotted lines. Existing curve detection algorithms struggle to address these challenges effectively. In this paper, we propose ChartLine, a novel network designed for detecting and tracing curves in line graphs. Our approach integrates a Spatial-Sequence Attention Feature Pyramid Network (SSA-FPN) in both the encoder and decoder to capture rich hierarchical representations of curve structures and boundary features. The model incorporates a Spatial-Sequence Fusion (SSF) module and a Channel Multi-Head Attention (CMA) module to enhance intra-class consistency and inter-class distinction. We evaluate ChartLine on four line chart datasets and compare its performance against state-of-the-art curve detection, edge detection, and semantic segmentation methods. Extensive experiments demonstrate that our method significantly outperforms existing algorithms, achieving an F-measure of 94% on a synthetic dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. YOLO-ML: 基于多尺度特征层注意力机制的滑轨缺陷检测方法.

Author: 王月, 刘永旭, 王鹏, 银兴行, and 杨欢
Subjects: ON-site evaluation, PYRAMIDS, NOISE, AUTOMOBILES, PULLEYS
Abstract: Copyright of Journal of Chongqing University of Posts & Telecommunications (Natural Science Edition) is the property of Chongqing University of Posts & Telecommunications and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

4. Improving YOLOX network for multi-scale fire detection.

Author: Wang, Taofang, Wang, Jun, Wang, Chao, Lei, Yi, Cao, Rui, and Wang, Li
Subjects: *FALSE alarms, *CONVOLUTIONAL neural networks, *FIRE detectors, *FOREST fires, *FOREST protection, *DATA augmentation, *NATURAL disasters
Abstract: Forest fire is a severe natural disaster, which leads to the destruction of forest ecology. At present, fire detection technology represented by convolutional neural network is widely used in forest resource protection, which can realize rapid analysis. However, in forest flame and smoke detection tasks, due to continuous expansion of the target range, a better detection effect cannot be achieved. This paper proposes an improved YOLOX method for multi-scale forest fire detection. This method proposes a novel feature pyramid model to reduce the information loss of high-level forest fire feature maps and enhance the representation ability of feature pyramids. Moreover, the method applies a small object data augmentation strategy to enrich the forest fire dataset, making it more suitable for the actual forest fire scene. According to the experimental results, the mAP of the model proposed in this paper reaches 79.64%, which is about 4.89% higher than the baseline network YOLOX. The method improves the accuracy of forest fire detection, reduces false alarms, and is suitable for real scenarios of forest fires. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. 基于特征金字塔网络与树莓派的护理床智能控制方法研究.

Author: 杜特 and 宋扬
Abstract: Copyright of Computer Measurement & Control is the property of Magazine Agency of Computer Measurement & Control and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

6. Enhanced feature pyramid for multi-view stereo with adaptive correlation cost volume.

Author: Han, Ming, Yin, Hui, Chong, Aixin, and Du, Qianqian
Subjects: CASCADE connections, REFERENCE sources, PYRAMIDS, INTENTION, COST
Abstract: Multi-level features are commonly employed in the cascade network, which is currently the dominant framework in multi-view stereo (MVS). However, there is a potential issue that the recent popular multi-level feature extractor network overlooks the significance of fine-grained structure features for coarse depth inferences in MVS task. Discriminative structure features play an important part in matching and are helpful to boost the performance of depth inference. In this work, we propose an effective cascade-structured MVS model named FANet, where an enhanced feature pyramid is built with the intention of predicting reliable initial depth values. Specifically, the features from deep layers are enhanced with affluent spatial structure information in shallow layers by a bottom-up feature enhancement path. For the enhanced topmost features, an attention mechanism is additionally employed to suppress redundant information and select important features for subsequent matching. To ensure the lightweight and optimal performance of the entire model, an efficient module is built to construct a lightweight and effective cost volume, representing viewpoint correspondence reliably, by utilizing the average similarity metric to calculate feature correlations between reference view and source views and then adaptively aggregating them into a unified correlation cost volume. Extensive quantitative and qualitative comparisons on the DTU and Tanks &Temple benchmarks illustrate that the proposed model exhibits better reconstruction quality than state-of-the-art MVS methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Global remote feature modulation end-to-end detection

Author: XiaoAn Bao, WenJing Yi, XiaoMei Tu, Na Zhang, QingQi Zhang, YuTing Jin, and Biao Wu
Subjects: Object detection, Attention mechanism, Feature pyramid, Pig detection, Medicine, Science
Abstract: Abstract Object detector based on fully convolutional network achieves excellent performance. However, existing detection algorithms still face challenges such as low detection accuracy in dense scenes and issues with occlusion of dense targets. To address these two challenges, we propose an Global Remote Feature Modulation End-to-End (GRFME2E) detection algorithm. In the feature extraction phase of our algorithm, we introduces the Concentric Attention Feature Pyramid Network (CAFPN). The CAFPN captures direction-aware and position-sensitive information, as well as global remote dependencies of features in deep layers by combining Coordinate Attention and Multilayer Perceptron. These features are used to modulate the front-end shallow features, enhancing inter-layer feature adjustment to obtain comprehensive and distinctive feature representations.In the detector part, we introduce the Two-Stage Detection Head (TS Head). This head employs the First-One-to-Few (F-O2F) module to detect slightly or unobstructed objects. Additionally, it uses masks to suppress already detected instances, and then feeds them to the Second-One-to-Few (S-O2F) module to identify those that are heavily occluded. The results from both detection stages are merged to produce the final output, ensuring the detection of objects whether they are slightly obscured, unobstructed, or heavily occluded. Experimental results on the pig detection dataset demonstrate that our GRFME2E achieves an accuracy of 98.4%. In addition, more extensive experimental results show that on the CrowdHuman dataset, our GRFME2E achieves 91.8% and outperforms other methods.
Published: 2024
Full Text: View/download PDF

8. Occluded Face Recognition Based on Segmentation and Multi-stage Mask Learning

Author: ZHANG Zheng, LU Tianliang, CAO Jinxuan
Subjects: occluded face recognition, multi-stage mask learning, occlusion detection and segmentation, feature pyramid, Electronic computers. Computer science, QA75.5-76.95
Abstract: Existing face recognition methods cannot effectively eliminate the influence of corrupted features caused by occlusion. As the features flow deeper, the corrupted features get entangled with the effective features used for identity classification, which affects the recognition results. To address the problem, this paper designs an occluded face recognition method based on segmentation and multi-stage mask learning strategy. The model consists of three components: occlusion detection and segmentation, feature extraction, and mask learning unit. The proposed method only needs one end-to-end process to learn feature masks and deep occlusion-robust features without relying on additional occlusion detectors. The mask learning units take different sizes of occlusion segmentation representations and facial features of different stages as input, generate corresponding feature masks for different stages of feature extraction, and effectively eliminate the influence of corrupted features caused by occlusion at each stage of feature extraction through mask operations. Finally, a feature pyramid is constructed to fuse features of different stages for identity classification. Experimental results show that the proposed method can effectively improve the accuracy of occluded face recognition. The accuracy on the occluded LFW dataset and the real masked datasets MFR2 and Mask_whn reach 98.77%, 96.70% and 81.53%, respectively, which has an accuracy improvement of 2.04, 0.48 and 4.44 percentage points compared with the existing mainstream methods.
Published: 2024
Full Text: View/download PDF

9. Global remote feature modulation end-to-end detection.

Author: Bao, XiaoAn, Yi, WenJing, Tu, XiaoMei, Zhang, Na, Zhang, QingQi, Jin, YuTing, and Wu, Biao
Abstract: Object detector based on fully convolutional network achieves excellent performance. However, existing detection algorithms still face challenges such as low detection accuracy in dense scenes and issues with occlusion of dense targets. To address these two challenges, we propose an Global Remote Feature Modulation End-to-End (GRFME2E) detection algorithm. In the feature extraction phase of our algorithm, we introduces the Concentric Attention Feature Pyramid Network (CAFPN). The CAFPN captures direction-aware and position-sensitive information, as well as global remote dependencies of features in deep layers by combining Coordinate Attention and Multilayer Perceptron. These features are used to modulate the front-end shallow features, enhancing inter-layer feature adjustment to obtain comprehensive and distinctive feature representations.In the detector part, we introduce the Two-Stage Detection Head (TS Head). This head employs the First-One-to-Few (F-O2F) module to detect slightly or unobstructed objects. Additionally, it uses masks to suppress already detected instances, and then feeds them to the Second-One-to-Few (S-O2F) module to identify those that are heavily occluded. The results from both detection stages are merged to produce the final output, ensuring the detection of objects whether they are slightly obscured, unobstructed, or heavily occluded. Experimental results on the pig detection dataset demonstrate that our GRFME2E achieves an accuracy of 98.4%. In addition, more extensive experimental results show that on the CrowdHuman dataset, our GRFME2E achieves 91.8% and outperforms other methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Few-Shot Steel Defect Detection Based on a Fine-Tuned Network with Serial Multi-Scale Attention.

Author: Liu, Xiangpeng, Jiao, Lei, Peng, Yulin, An, Kang, Wang, Danning, Lu, Wei, and Han, Jianjiao
Subjects: MACHINE learning, STEEL, SUPERVISED learning, DEEP learning, SURFACE defects, VIRTUAL networks
Abstract: Detecting defects on a steel surface is crucial for the quality enhancement of steel, but its effectiveness is impeded by the limited number of high-quality samples, diverse defect types, and the presence of interference factors such as dirt spots. Therefore, this article proposes a fine-tuned deep learning approach to overcome these obstacles in unstructured few-shot settings. Initially, to address steel surface defect complexities, we integrated a serial multi-scale attention mechanism, concatenating attention and spatial modules, to generate feature maps that contain both channel information and spatial information. Further, a pseudo-label semi-supervised learning algorithm (SSL) based on a variant of the locally linear embedding (LLE) algorithm was proposed, enhancing the generalization capability of the model through information from unlabeled data. Afterwards, the refined model was merged into a fine-tuned few-shot object detection network, which applied extensive base class samples for initial training and sparsed new class samples for fine-tuning. Finally, specialized datasets considering defect diversity and pixel scales were constructed and tested. Compared with conventional methods, our approach improved accuracy by 5.93% in 7-shot detection tasks, markedly reducing manual workload and signifying a leap forward for practical applications in steel defect detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. MI-RPN: Integrating multi-modalities and multi-scales information for region proposal.

Author: Tian, Shishun, Chen, Ruifeng, Zou, Wenbin, and Li, Xia
Subjects: ADDITION (Mathematics), PYRAMIDS
Abstract: Region proposal is crucial for the two-stage object detectors. Recently, the RGB-based region proposal approaches have achieved impressive progress. However, they still suffer from two problems: (1) RGB images only contain the texture information of objects, while the 3D geometric structure information which is also important for detection is neglected. (2) in a typical Feature Pyramid Network (FPN), the upsampling operation only models the corresponding relation between adjacent locations, the texture structure is not taken into consideration. Besides, the addition operation in FPN ignores the importance of different channels which may affect the propagation of semantic information. In this paper, we propose a Region Proposal Network using Multi-modalities and multi-scales Information (named MI-RPN). Firstly, we propose a Gate-guided Fusion Module (GFM) to fuse the RGB and depth features which respectively contain the texture and geometric information. Secondly, we propose a Flow-guided Upsample Feature Pyramid Network (FUFPN) to optimize the multi-scales feature fusion in typical FPN by taking features of an adjacent layer into consideration. Experimental results on SUNRGBD, NYUv2, and KITTI show that MI-RPN achieves superior results compared to current state-of-the-art methods. Besides, we replace the RPN in typical two-stage object detection models to test the effectiveness of the proposed MI-RPN. The results show that MI-RPN can significantly improve the accuracy of two-stage object detection models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Optimal path for automated pedestrian detection: image deblurring algorithm based on generative adversarial network.

Author: Xiujuan Dong and Jianping Lan
Subjects: *GENERATIVE adversarial networks, *PEDESTRIANS, *SIGNAL-to-noise ratio, *ALGORITHMS
Abstract: The pedestrian detection technology of automated driving is also facing some challenges. Aiming at the problem of specific target deblurring in the image, this research built a pedestrian detection deblurring model in view of Generative adversarial network and multi-scale convolution. First, it designs an image deblurring algorithm in view of Generative adversarial network. Then, on the basis of image deblurring, a pedestrian deblurring algorithm in view of multi-scale convolution is designed to focus on deblurring the pedestrians in the image. The outcomes showcase that the peak signal to noise ratio and structural similarity index of the image deblurring algorithm in view of the Generative adversarial network are the highest, which are 29.7 dB and 0.943 dB respectively, and the operation time is the shortest, which is 0.50 s. The pedestrian deblurring algorithm in view of multi-scale convolution has the highest peak signal-tonoise ratio (PSNR) and structural similarity indicators in the HIDE test set and GoPro dataset, with 29.4 dB and 0.925 dB, 40.45 dB and 0.992 dB, respectively. The resulting restored image is the clearest and possesses the best visual effect. The enlarged part of the face can reveal more detailed information, and it is the closest to a real clear image. The deblurring effect is not limited to the size of the pedestrians in the image. In summary, the model constructed in this study has good application effects in image deblurring and pedestrian detection, and has a certain promoting effect on the development of autonomous driving technology. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. AP-Net: a metallic surface defect detection approach with lightweight adaptive attention and enhanced feature pyramid.

Author: Chen, Faquan, Deng, Miaolei, Gao, Hui, Yang, Xiaoya, and Zhang, Dexian
Subjects: *SURFACE defects, *METALLIC surfaces, *PYRAMIDS, *OBJECT recognition (Computer vision), *DETECTORS
Abstract: Surface defect detection is essential for ensuring the quality of metallic products. Many excellent surface defect detectors have been designed in recent years. Most detection methods achieve success by using attention and feature pyramid modules. However, most attention modules consider only simple global information in channel features and incur heavy computational costs. Furthermore, the existing feature pyramids fail to effectively utilize the information from all multi-level features. To alleviate these issues, we design a metallic surface defect detector, named AP-Net. Specifically, we propose a lightweight adaptive attention module (LAA) and an enhanced feature pyramid module (EFP). We perform extensive experiments on three datasets. The experimental results show that the detection accuracies of the representative detectors can be significantly improved using our LAA and EFP. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. WH-DETR: An Efficient Network Architecture for Wheat Spike Detection in Complex Backgrounds.

Author: Yang, Zhenlin, Yang, Wanhong, Yi, Jizheng, and Liu, Rong
Subjects: WHEAT, DATA augmentation, TRANSFORMER models, PRECISION farming, PYRAMIDS
Abstract: Wheat spike detection is crucial for estimating wheat yields and has a significant impact on the modernization of wheat cultivation and the advancement of precision agriculture. This study explores the application of the DETR (Detection Transformer) architecture in wheat spike detection, introducing a new perspective to this task. We propose a high-precision end-to-end network named WH-DETR, which is based on an enhanced RT-DETR architecture. Initially, we employ data augmentation techniques such as image rotation, scaling, and random occlusion on the GWHD2021 dataset to improve the model's generalization across various scenarios. A lightweight feature pyramid, GS-BiFPN, is implemented in the network's neck section to effectively extract the multi-scale features of wheat spikes in complex environments, such as those with occlusions, overlaps, and extreme lighting conditions. Additionally, the introduction of GSConv enhances the network precision while reducing the computational costs, thereby controlling the detection speed. Furthermore, the EIoU metric is integrated into the loss function, refined to better focus on partially occluded or overlapping spikes. The testing results on the dataset demonstrate that this method achieves an Average Precision (AP) of 95.7%, surpassing current state-of-the-art object detection methods in both precision and speed. These findings confirm that our approach more closely meets the practical requirements for wheat spike detection compared to existing methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Gradient Guided Co-Retention Feature Pyramid Network for LDCT Image Denoising

Author: Zhou, Li, Wang, Dayang, Xu, Yongshun, Han, Shuo, Morovati, Bahareh, Fan, Shuyi, Yu, Hengyong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Linguraru, Marius George, editor, Dou, Qi, editor, Feragen, Aasa, editor, Giannarou, Stamatia, editor, Glocker, Ben, editor, Lekadir, Karim, editor, and Schnabel, Julia A., editor
Published: 2024
Full Text: View/download PDF

16. YOLO-BS: A Better Object Detection Model for Real-Time Driver Behavior Detection

Author: Xi, Yang, Guo, Jinxin, Ma, Ming, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Pan, Yijie, editor, and Guo, Jiayang, editor
Published: 2024
Full Text: View/download PDF

17. Handheld Knife Stick Detection Based on Dual-Path Multi-layer Residuals

Author: Jin, Liuhui, Lu, Quanli, Sui, Chenchen, Chen, Jiyang, Yi, Changle, Jiang, Jiaxuan, Shi, Yanhua, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Si, Zhanjun, editor, and Guo, Jiayang, editor
Published: 2024
Full Text: View/download PDF

18. Multi-layer Cross-Scale Coupling Feature Pyramid Network for Food Logo Detection

Author: Zhang, Baisong, Hou, Sujuan, Zhao, Songhui, Hou, Qiang, Li, Xiaojie, Yan, Wuxia, Tsihrintzis, George A., Series Editor, Virvou, Maria, Series Editor, Jain, Lakhmi C., Series Editor, Su, Jianbo, editor, and Qiao, Xiuquan, editor
Published: 2024
Full Text: View/download PDF

19. Dfp-Unet: A Biomedical Image Segmentation Method Based on Deformable Convolution and Feature Pyramid

Author: Yang, Zengzhi, Wei, Yubin, Yu, Xiao, Guan, Jinting, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Yang, De-Nian, editor, Xie, Xing, editor, Tseng, Vincent S., editor, Pei, Jian, editor, Huang, Jen-Wei, editor, and Lin, Jerry Chun-Wei, editor
Published: 2024
Full Text: View/download PDF

20. Improving Pedestrian Attribute Recognition with Dense Feature Pyramid and Mixed Pooling

Author: Xiao, He, Zou, Chen, Chen, Yaosheng, Gong, Sujia, Dong, Siwen, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Wu, Celimuge, editor, Chen, Xianfu, editor, Feng, Jie, editor, and Wu, Zhen, editor
Published: 2024
Full Text: View/download PDF

21. MsF-HigherHRNet: Multi-scale Feature Fusion for Human Pose Estimation in Crowded Scenes

Author: Yu, Cuihong, Han, Cheng, Zhang, Qi, Zhang, Chao, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hu, Shi-Min, editor, Cai, Yiyu, editor, and Rosin, Paul, editor
Published: 2024
Full Text: View/download PDF

22. Multi-scale Context Aggregation for Video-Based Person Re-Identification

Author: Wu, Lei, Zhang, Canlong, Li, Zhixin, Hu, Liaojie, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Luo, Biao, editor, Cheng, Long, editor, Wu, Zheng-Guang, editor, Li, Hongyi, editor, and Li, Chaojie, editor
Published: 2024
Full Text: View/download PDF

23. Active phase recognition method of hydrogenation catalyst based on multi-feature fusion Mask CenterNet.

Author: Wang, Zhujun, Sun, Tianhe, Li, Haobin, Cui, Ailin, and Bao, Song
Subjects: *IMAGE recognition (Computer vision), *ELECTRON microscopes, *FEATURE extraction, *CATALYSTS, *LEAKAGE
Abstract: In order to realize the intelligent recognition and statistics of hydrogenation catalyst image information, this paper presents a new method to judge the active phase by image recognition, which is different from traditional methods. Firstly, considering that hydrogenation catalyst image targets are small and easy to stack, the feature extraction network in the CenterNet model is optimized by adding the multi-feature fusion module to improve the accuracy of the network in edge positioning. Secondly, according to the linear shape of the hydrogenation catalyst, the mask branch is added to the CenterNet model to train the hydrogenation catalyst stripes with unclear target to reduce the leakage rate of the hydrogenation catalyst. The experimental results show that the detection accuracy of the improved CenterNet network is 91 % , 7 % higher than that of the original one, with a decline in detection rate by 12 % . The method proposed in this paper can accurately identify and segment the hydrogenation catalyst in the electron microscope image, which can provide technical support for the statistics and analysis of the hydrogenation catalyst image. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. U-Net with Coordinate Attention and VGGNet: A Grape Image Segmentation Algorithm Based on Fusion Pyramid Pooling and the Dual-Attention Mechanism.

Author: Yi, Xiaomei, Zhou, Yue, Wu, Peng, Wang, Guoying, Mo, Lufeng, Chola, Musenge, Fu, Xinyun, and Qian, Pengxiang
Subjects: *IMAGE segmentation, *PYRAMIDS, *ALGORITHMS, *LEAF spots, *GRAPE yields
Abstract: Currently, the classification of grapevine black rot disease relies on assessing the percentage of affected spots in the total area, with a primary focus on accurately segmenting these spots in images. Particularly challenging are cases in which lesion areas are small and boundaries are ill-defined, hampering precise segmentation. In our study, we introduce an enhanced U-Net network tailored for segmenting black rot spots on grape leaves. Leveraging VGG as the U-Net's backbone, we strategically position the atrous spatial pyramid pooling (ASPP) module at the base of the U-Net to serve as a link between the encoder and decoder. Additionally, channel and spatial dual-attention modules are integrated into the decoder, alongside a feature pyramid network aimed at fusing diverse levels of feature maps to enhance the segmentation of diseased regions. Our model outperforms traditional plant disease semantic segmentation approaches like DeeplabV3+, U-Net, and PSPNet, achieving impressive pixel accuracy (PA) and mean intersection over union (MIoU) scores of 94.33% and 91.09%, respectively. Demonstrating strong performance across various levels of spot segmentation, our method showcases its efficacy in enhancing the segmentation accuracy of black rot spots on grapevines. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. POD-YOLO Object Detection Model Based on Bi-directional Dynamic Cross-level Pyramid Network.

Author: Yu Zhang, Ming Ma, Zhongxiang Wang, Jing Li, and Yan Sun
Subjects: *OBJECT recognition (Computer vision), *PYRAMIDS, *SPINE, *INFORMATION networks, *IMAGE processing
Abstract: The existing heavy-backbone object detection models overlook the crucial role of cross-level interactive fusion of feature information in pyramid networks, resulting in the inability to detect occluded objects or small objects in complex scenes. In this thesis, we present a new heavy-neck object detection model called POD-YOLO based on YOLOv5s. Firstly, we propose the POD-RepC3 module to increase the model's capability to obtain the multi-layer feature. Additionally, addressing the issue of large object size span, we propose a bidirectional partial dynamic fusion module (Bi-PDC) as the detection neck of the pyramid network. This module preserves the accurate positioning signals and facilitates cross-level interactive fusion of feature information. Finally, we design Reparameterized Bi-directional Dynamic Feature Pyramid Network (RepBi-DFPN), a deep feature fusion network that integrates contextual information and enhances both feature expression and fusion capabilities of our model. The experiment results suggest that the suggested method is positive on the PASCAL VOC dataset. The mAP@0.5 and mAP@0.5:0.95 performance reached 81.3% and 58.2%, respectively, which increased by 2.4% and 4.1% compared to original algorithm YOLOv5s. Furthermore, experiment results also demonstrate that model's performance can compete with SOTA object detection models. In this paper, the algorithm optimizes the feature fusion capability of the pyramid network to effectively decrease the false detection and missing detection of the model. The model's ability to accurately detect multi-scale targets is significantly improved. [ABSTRACT FROM AUTHOR]
Published: 2024

26. Yaru3DFPN: a lightweight modified 3D UNet with feature pyramid network and combine thresholding for brain tumor segmentation.

Author: Akbar, Agus Subhan, Fatichah, Chastine, Suciati, Nanik, and Za'in, Choiru
Subjects: *BRAIN tumors, *DEEP learning, *PYRAMIDS, *MAGNETIC resonance imaging, *SURVIVAL rate, *THREE-dimensional imaging
Abstract: Gliomas are the most common and aggressive form of all brain tumors, with a median survival rate of fewer than two years, especially for the highest-grade glioma patient. Accurate and reproducible brain tumor segmentation is essential for an effective treatment plan and diagnosis to reduce the risk of further spread. Automated brain tumor segmentation is challenging because it can appear in the brain with variations in shape, size, and position from one patient to another. Several deep learning architectures have been created to handle automatic segmentation with good performance results on 3D MRI images. However, these architectures are generally large and require high hardware specifications and a large amount of memory and storage. This paper proposes a lightweight modified 3D UNet architecture with an outstanding performance level called Yaru3DFPN. The architecture is built based on the UNet. The block used is ResNet and is modified to use pre-activation strategies and GroupNormalization for batch normalization. In the expanding section, features are arranged into pyramid features. The final output is thresholded using the combining thresholding method. This architecture is light and fast. This proposal was tested using BraTS datasets with the highest dice performance of 80.90%, 86.27%, and 92.02% for ET, TC, and WT areas, respectively. This result outperformed all other comparative architectures and promised to be developed for clinical application. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. CamGNN: Cascade Graph Neural Network for Camera Re-Localization.

Author: Wang, Li, Jia, Jiale, Dai, Hualin, and Li, Guoyan
Subjects: GRAPH neural networks, CAMERAS, FEATURE extraction, LOCALIZATION (Mathematics), IMAGE representation
Abstract: In response to the inaccurate positioning of traditional camera relocation methods in scenes with large-scale or severe viewpoint changes, this study proposes a camera relocation method based on a cascaded graph neural network to achieve accurate scene relocation. Firstly, the NetVLAD retrieval method, which has advantages in image feature representation and similarity calculation, is used to retrieve the most similar images to a given query image. Then, the feature pyramid is employed to extract features at different scales of these images, and the features at the same scale are treated as nodes of the graph neural network to construct a single-layer graph neural network structure. Secondly, a top–down connection is used to cascade the single-layer graph structures, where the information of nodes in the previous graph is fused into a message node to improve the accuracy of camera pose estimation. To better capture the topological relationships and spatial geometric constraints between images, an attention mechanism is introduced in the single-layer graph structure, which helps to effectively propagate information to the next graph during the cascading process, thereby enhancing the robustness of camera relocation. Experimental results on the public dataset 7-Scenes demonstrate that the proposed method can effectively improve the accuracy of camera absolute pose localization, with average translation and rotation errors of 0.19 m and 6.9°, respectively. Compared to other deep learning-based methods, the proposed method achieves more than 10% improvement in both average translation and rotation accuracy, demonstrating highly competitive localization precision. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. A Serial Multi-Scale Feature Fusion and Enhancement Network for Amur Tiger Re-Identification.

Author: Xu, Nuo, Ma, Zhibin, Xia, Yi, Dong, Yanqi, Zi, Jiali, Xu, Delong, Xu, Fu, Su, Xiaohui, Zhang, Haiyan, and Chen, Feixiang
Abstract: Simple Summary: The Amur tiger is an endangered species in the world, and effective statistics on its individuals and population through re-identification will contribute to ecological diversity investigation and assessment. Due to the fact that the fur texture features of the Amur tiger contain genetic information, the main method of identifying Amur tigers is to distinguish their fur and facial features. In summary, this paper proposes a serial multi-scale feature fusion and enhancement network for Amur tiger re-identification, and designs a global inverted pyramid multi-scale feature fusion module and a local dual-domain attention feature enhancement module. We aim to enhance the learning of fine-grained features and differences in fur texture by better fusing and enhancing global and local features. Our proposed network and module have achieved excellent results on the public dataset of the ATRW. The Amur tiger is an important endangered species in the world, and its re-identification (re-ID) plays an important role in regional biodiversity assessment and wildlife resource statistics. This paper focuses on the task of Amur tiger re-ID based on visible light images from screenshots of surveillance videos or camera traps, aiming to solve the problem of low accuracy caused by camera perspective, noisy background noise, changes in motion posture, and deformation of Amur tiger body patterns during the re-ID process. To overcome this challenge, we propose a serial multi-scale feature fusion and enhancement re-ID network of Amur tiger for this task, in which global and local branches are constructed. Specifically, we design a global inverted pyramid multi-scale feature fusion method in the global branch to effectively fuse multi-scale global features and achieve high-level, fine-grained, and deep semantic feature preservation. We also design a local dual-domain attention feature enhancement method in the local branch, further enhancing local feature extraction and fusion by dividing local feature blocks. Based on the above model structure, we evaluated the effectiveness and feasibility of the model on the public dataset of the Amur Tiger Re-identification in the Wild (ATRW), and achieved good results on mAP, Rank-1, and Rank-5, demonstrating a certain competitiveness. In addition, since our proposed model does not require the introduction of additional expensive annotation information and does not incorporate other pre-training modules, it has important advantages such as strong transferability and simple training. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition.

Author: He, Li, Wang, Qingxiang, Liu, Jie, Duan, Jianyong, and Wang, Hao
Subjects: COLLABORATIVE learning
Abstract: The goal of multimodal named entity recognition (MNER) is to detect entity spans in given image–text pairs and classify them into corresponding entity types. Despite the success of existing works that leverage cross-modal attention mechanisms to integrate textual and visual representations, we observe three key issues. Firstly, models are prone to misguidance when fusing unrelated text and images. Secondly, most existing visual features are not enhanced or filtered. Finally, due to the independent encoding strategies employed for text and images, a noticeable semantic gap exists between them. To address these challenges, we propose a framework called visual clue guidance and consistency matching (GMF). To tackle the first issue, we introduce a visual clue guidance (VCG) module designed to hierarchically extract visual information from multiple scales. This information is utilized as an injectable visual clue guidance sequence to steer text representations for error-insensitive prediction decisions. Furthermore, by incorporating a cross-scale attention (CSA) module, we successfully mitigate interference across scales, enhancing the image's capability to capture details. To address the third issue of semantic disparity between text and images, we employ a consistency matching (CM) module based on the idea of multimodal contrastive learning, facilitating the collaborative learning of multimodal data. To validate the effectiveness of our proposed framework, we conducted comprehensive experimental studies, including extensive comparative experiments, ablation studies, and case studies, on two widely used benchmark datasets, demonstrating the efficacy of the framework. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Joint condition monitoring framework of wind turbines based on multi-task learning with poor-quality data.

Author: Ding, Jiawen, Deng, Lei, Li, Qikang, Gu, Xinyu, and Tang, Baoping
Subjects: INFORMATION sharing, PYRAMIDS
Abstract: Effective condition monitoring can improve the reliability of the turbine and reduce its downtime. However, due to the complexity of the operating conditions, the monitoring data is always mixed with poor-quality data. Poor-quality data mixed in monitoring tasks disrupts long-term dependency on data, which challenges traditional condition monitoring methods to work. To solve it, a joint reparameterization feature pyramid network (JRFPN) is proposed. Firstly, three different reparameterization tricks are designed to reform temporal information and exchange cross-temporal information, to alleviate the damage of long-term dependency. Secondly, a joint condition monitoring framework is designed, aiming to suppress feature confounding between poor-quality data and faulty data. The auxiliary task is trained to extract the degradation trend. The main task fights against feature confounding and dynamically delineates the failure threshold. The degradation trend and failure threshold decisions are corrected for each other to make the final joint state inference. Besides, considering the different quality of the monitoring variables, a channel weighting mechanism is designed to strengthen the ability of JRFPN. The measured data proved that JRFPN is more effective than other methods. • A dynamic channel attention unit(DCAU) to weigh the contribution differences of monitoring variables. • Adaptive data repair by Pixel-level(Re-Param block), scale-level(RepDCConv), and field-level(modified FPS) reparameterization tricks to adaptively adjust the parameter to alleviate the damage of long-term dependency patterns by poor-quality data. • A main and auxiliary adversarial correction-training mode of the network is designed to dynamically delineate the failure threshold and make the joint state inference. • A joint condition monitoring framework to maintain very high accuracy and very low FNR and FPR in the presence of large amounts of poor-quality data. Besides, The degradation trend of the device could be observed through PH. The results of the model are interpretable. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. 金字塔渐进融合低照度图像增强网络.

Author: 余映, 徐超越, 李淼, 何鹏浩, and 杨昊
Abstract: Copyright of Journal of National University of Defense Technology / Guofang Keji Daxue Xuebao is the property of NUDT Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

32. A Lightweight Method for Small Object Detection Models on Unmanned Aerial Vehicles Based on L-FPN

Author: Wei Haokun, Liu Jingyi, Chen Jinyong, Chu Boce, Sun Yuxin, Zhu Jin
Subjects: object detection, feature pyramid, model lightweight, remote sensing images, uav, Motor vehicles. Aeronautics. Astronautics, TL1-4050
Abstract: Oriented object detection in remote sensing images is a current research hotspot. Due to the varying heights and equipment used in capturing remote sensing images, the ground sampling distance (GSD) of each image also varies, causing many small objects to be easily overlooked. Existing rotated object detection algorithms are mainly aimed at multi-scale object detection in general scenarios. The feature pyramid network (FPN) has complex and time-consuming fusion computations, which still faces great challenges when deployed on edge devices like UAVs. Therefore, this paper proposes a lightweight method for small object detection in UAVs based on L-FPN. First, normalize the scale according to the GSD information of the image. Second, remove redundant high-level feature maps in the FPN. Finally, adjust the anchor box sizes for small object detection. The method is trained and validated on the DOTA dataset.Results show that compared to the traditional models, the proposed L-FPN-based lightweight method for small object detection in UAVs achieves consistent recognition accuracy, with 2.7% fewer model parameters, 28% smaller model size, and 13.24% faster inference speed.
Published: 2024
Full Text: View/download PDF

33. Integrated Neural Network-Based Pupil Tracking Technology for Wearable Gaze Tracking Devices in Flight Training

Author: Heming Zhang and Changyuan Wang
Subjects: Pupil-tracking, hybrid neural network, feature pyramid, ViT, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Pupil tracking technology is a tracking and detection method that uses eye image information to extract real-time position information of the pupil. Detecting the pilot’s eye movement patterns and characteristics through pupil movement signals is an important part of monitoring the pilot’s physiological characteristics. The current pupil tracking algorithm is prone to problems such as insufficient tracking accuracy and discontinuous pupil signals when faced with problems such as pupil occlusion caused by frequent blinking and loss of pupil information in dark light environments that occur during flight training for pilot students. To increase the tracking accuracy of pilots’ pupils, this paper designs an integrated neural network-based pupil tracking technology for wearable gaze tracking devices in flight training. To solve the above problems, this paper builds a pupil positioning model based on the hybrid neural network by combining the feature pyramid and ViT network. On this basis, we built a hybrid neural network pupil tracking model for occluded pupil images based on the pilot eye data characteristics collected during flight training and designed a new loss function suitable for pupil detection. After verification, the pupil tracking algorithm we proposed has significantly improved the visual tracking accuracy with an error range of less than 5 pixels compared with existing methods, and the tracking accuracy can reach up to 85%. In pilot flight training, this algorithm has better pupil tracking stability, can effectively reduce pupil signal interference caused by pupil occlusion, and can achieve more accurate real-time tracking of pupils.
Published: 2024
Full Text: View/download PDF

34. Counting Method Based on Density Graph Regression and Object Detection

Author: GAO Jie, ZHAO Xinxin, YU Jian, XU Tianyi, PAN Li, YANG Jun, YU Mei, LI Xuewei
Subjects: intensive count, target detection, deep learning, density map regression, feature pyramid, Electronic computers. Computer science, QA75.5-76.95
Abstract: In response to the low recall rate of detection-based methods and the problem of missing target location information in density-based methods, which are the two mainstream dense-counting methods, a detection and counting method based on density map regression is proposed by combining the two tasks, achieving the counting and positioning of target objects in dense scenes. Complementing the advantages of two methods not only improves recall rate but also calibrates all targets. To extract richer feature information to deal with complex data scenarios, a feature pyramid optimization module is proposed, which vertically fuses low-level high-resolution features with top-level abstract semantic features and horizontally fuses same-size features to enrich the semantic expression of target objects. To address the issue of low pixel proportions occupied by target objects in dense counting scenarios, an attention mechanism for small targets is proposed to improve the network’s detection sensitivity, which can enhance the attention of the network to target objects by constructing a mask on the input image. Experimental results demonstrate that the proposed method significantly improves recall rate and accurately locates targets while maintaining accuracy, effectively providing counting and positioning information of input image, which has a wide range of application prospects in various fields such as industry and ecology.
Published: 2024
Full Text: View/download PDF

35. ChartLine: Automatic Detection and Tracing of Curves in Scientific Line Charts Using Spatial-Sequence Feature Pyramid Network

Author: Wenjin Yang, Jie He, and Qian Li
Subjects: curve detection, self-attention, feature pyramid, BiLSTM, Chemical technology, TP1-1185
Abstract: Line charts are prevalent in scientific documents and commercial data visualization, serving as essential tools for conveying data trends. Automatic detection and tracing of line paths in these charts is crucial for downstream tasks such as data extraction, chart quality assessment, plagiarism detection, and visual question answering. However, line graphs present unique challenges due to their complex backgrounds and diverse curve styles, including solid, dashed, and dotted lines. Existing curve detection algorithms struggle to address these challenges effectively. In this paper, we propose ChartLine, a novel network designed for detecting and tracing curves in line graphs. Our approach integrates a Spatial-Sequence Attention Feature Pyramid Network (SSA-FPN) in both the encoder and decoder to capture rich hierarchical representations of curve structures and boundary features. The model incorporates a Spatial-Sequence Fusion (SSF) module and a Channel Multi-Head Attention (CMA) module to enhance intra-class consistency and inter-class distinction. We evaluate ChartLine on four line chart datasets and compare its performance against state-of-the-art curve detection, edge detection, and semantic segmentation methods. Extensive experiments demonstrate that our method significantly outperforms existing algorithms, achieving an F-measure of 94% on a synthetic dataset.
Published: 2024
Full Text: View/download PDF

36. SFPN: segmentation-based feature pyramid network for multi-focus image fusion.

Author: Wu, Pan, Jiang, Limai, Li, Ying, Fan, Hui, and Li, Jinjiang
Subjects: IMAGE fusion, PYRAMIDS, FEATURE extraction, INFORMATION resources, DEEP learning
Abstract: In multi-focus image fusion, different targets often have different sizes, and the network with poor multi-scale feature extraction ability will inevitably lead to the omission of the source image information. Inspired by this, we propose a network that uses the double multi-scale feature pyramid to extract multi-scale features. We design an effective channel compression excitation module and a channel spatial attention module, which form the semantic segmentation mechanism. The mechanism can efficiently extract multi-scale feature maps, maximize the global information of the source image and ignore similar information. We introduce a joint loss function and use post-processing operations to generate smooth decision maps and fused images. The proposed SFPN is compared with the seven existing MFIf methods in terms of six objective quantitative metrics and subjective visual effects and achieves superior performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. A lightweight vehicle mounted multi-scale traffic sign detector using attention fusion pyramid.

Author: Wang, Junfan, Chen, Yi, Gu, Yeting, Yan, Yunfeng, Li, Qi, Gao, Mingyu, and Dong, Zhekang
Subjects: *VEHICLE detectors, *TRAFFIC signs & signals, *TRAFFIC monitoring, *INTELLIGENT transportation systems, *PYRAMIDS
Abstract: Intelligent Transportation System (ITS) aims to strengthen the connection between vehicles, roads, and people. As the important road information in ITS, intelligent detection of traffic signs has become an important part in the intelligent vehicle. In this paper, a lightweight vehicle mounted multi-scale traffic sign detector is proposed. First, guided by the attention fusion algorithm, an improved feature pyramid network is proposed, named AFPN. Assign weights according to the importance of information and fuse multi-dimensional attention maps to improve feature extraction and information retention capabilities. Second, a multi-head detection structure is designed to improve the positioning and detection capability of the detector. According to the target scale, the corresponding detection head is constructed to improve the target detection accuracy. The experimental results show that compared with other state-of-the-art methods, the proposed method not only has excellent detection accuracy with 50.3% for small targets and 64.8% for large targets but also can better trade-off detection speed and detection accuracy. Furthermore, the proposed detector is deployed on the Jetson Xavier NX and integrated with the vehicle-mounted camera, inverter, and LCD to realize real-time traffic sign detection on the vehicle terminal, and the speed reaches 25.6 FPS. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Vehicle object counting network based on feature pyramid split attention mechanism.

Author: Liu, Mingsheng, Wang, Yu, Yi, Hu, and Huang, Xiaohui
Subjects: *OBJECT recognition (Computer vision), *PYRAMIDS, *TRAFFIC congestion, *COUNTING, *PEDESTRIANS, *AUTOMOBILE license plates
Abstract: In recent years, real-time vehicle congestion detection has become a hot research topic in the field of transportation due to the frequent occurrence of highway traffic jams. Vehicle congestion detection generally adopts a vehicle counting algorithm based on object detection, but it is not effective in scenarios with large changes in vehicle scale, dense vehicles, background clutter, and severe occlusion. A vehicle object counting network based on a feature pyramid split attention mechanism is proposed for accurate vehicle counting and the generation of high-quality vehicle density maps in highly congested scenarios. The network extracts rich contextual features by using blocks at different scales, and then obtains a multi-scale feature mapping in the channel direction using kernel convolution of different sizes, and uses the channel attention module at different scales separately to allow the network to focus on features at different scales to obtain an attention vector in the channel direction to reduce mis-estimation of background information. Experiments on the vehicle datasets TRANCOS, CARPK, and HS-Vehicle show that the proposed method outperforms most existing counting methods based on detection or density estimation. The relative improvement in MAE metrics is 90.5% for the CARPK dataset compared to Fast R-CNN and 73.0% for the HS-Vehicle dataset compared to CSRNet. In addition, the method is also extended to count other objects, such as pedestrians in the ShanghaiTech dataset, and the proposed method effectively reduces the misrecognition rate and achieves higher counting performance compared to the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. An N-Shaped Lightweight Network with a Feature Pyramid and Hybrid Attention for Brain Tumor Segmentation.

Author: Chi, Mengxian, An, Hong, Jin, Xu, and Nie, Zhenguo
Subjects: *BRAIN tumors, *PYRAMIDS, *NOMOGRAPHY (Mathematics), *CLINICAL medicine
Abstract: Brain tumor segmentation using neural networks presents challenges in accurately capturing diverse tumor shapes and sizes while maintaining real-time performance. Additionally, addressing class imbalance is crucial for achieving accurate clinical results. To tackle these issues, this study proposes a novel N-shaped lightweight network that combines multiple feature pyramid paths and U-Net architectures. Furthermore, we ingeniously integrate hybrid attention mechanisms into various locations of depth-wise separable convolution module to improve efficiency, with channel attention found to be the most effective for skip connections in the proposed network. Moreover, we introduce a combination loss function that incorporates a newly designed weighted cross-entropy loss and dice loss to effectively tackle the issue of class imbalance. Extensive experiments are conducted on four publicly available datasets, i.e., UCSF-PDGM, BraTS 2021, BraTS 2019, and MSD Task 01 to evaluate the performance of different methods. The results demonstrate that the proposed network achieves superior segmentation accuracy compared to state-of-the-art methods. The proposed network not only improves the overall segmentation performance but also provides a favorable computational efficiency, making it a promising approach for clinical applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Feature pyramid-based convolutional neural network image inpainting.

Author: Wang, Shengbo and Wang, Xiuyou
Abstract: Deep learning-based methods are widely used in the field of image processing and have achieved remarkable results. However, these methods often produce mis-filling phenomenon when dealing with irregular broken images. The main reason is that the underlying information of the feature map is not fully utilized, and the semantic information of feature maps at different scales cannot complement each other effectively. Therefore, we propose a network structure based on feature pyramid. In the first stage, we set the expansion factor used to avoid the grid effect and increase the receptive field, while maximizing the use of the underlying feature map information. The second stage uses a feature fusion branch, which first samples the feature maps to construct the feature pyramid, second fuses feature maps with different resolutions and semantic strengths, and finally, generates an image by back-convolution of the feature maps with a decoder. Our experimental results show that this method generates recovered regions with coherent, clear, and visually reasonable images, superior to other methods in terms of image quality. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. 基于L-FPN的无人机上小目标识别模型轻量化方法.

Author: 魏昊坤, 刘敬一, 陈金勇, 楚博策, 孙裕鑫, and 朱进
Abstract: Copyright of Aero Weaponry is the property of Aero Weaponry Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

42. DFP-Net: A Crack Segmentation Method Based on a Feature Pyramid Network.

Author: Li, Linjing, Liu, Ran, Ali, Rashid, Chen, Bo, Lin, Haitao, Li, Yonglong, and Zhang, Hua
Subjects: PYRAMIDS, FEATURE extraction, IMAGE segmentation
Abstract: Timely detection of defects is essential for ensuring safe and stable operation of concrete buildings. Automatic segmentation of concrete buildings' surfaces is challenging due to the high diversity of crack appearance, the detailed information, and the unbalanced proportion of crack pixels and background pixels. In this work, the Double Feature Pyramid Network is designed for high-precision crack segmentation. Our work reached the state-of-the-art level in crack segmentation, with key contributions outlined as follows: firstly, considering the diversity of crack shapes, the network constructs a feature pyramid containing three feature extraction backbones to extract the global feature map with three scale input images. In particular, due to the biggest challenge being too much single-pixel crack area, the targeted feature pyramid based on the high-resolution is added to extract adequate shallow semantic information. Lastly, designing a cascade feature fusion unit to aggregate the extracted multi-dimensional feature maps and obtain the final prediction. Compared with existing crack detection methods, the superior performance of this method has been verified based on extensive experiments, with Pixel Accuracy of 65.99%, Intersection over Union of 44.71%, and Recall of 62.95%, providing a reliable and efficient solution for the health monitoring and maintenance of concrete structures. This work contributes to the advancement of research and practical applications in related fields, offering robust support for the monitoring and maintenance of concrete structures. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Pedestrian Detection Based on Feature Enhancement in Complex Scenes.

Author: Su, Jiao, An, Yi, Wu, Jialin, and Zhang, Kai
Subjects: *PEDESTRIANS, *TRANSPORTATION security measures, *COMPUTER vision, *KNOWLEDGE transfer, *PROBLEM solving
Abstract: Pedestrian detection has always been a difficult and hot spot in computer vision research. At the same time, pedestrian detection technology plays an important role in many applications, such as intelligent transportation and security monitoring. In complex scenes, pedestrian detection often faces some challenges, such as low detection accuracy and misdetection due to small target sizes and scale variations. To solve these problems, this paper proposes a pedestrian detection network PT-YOLO based on the YOLOv5. The pedestrian detection network PT-YOLO consists of the YOLOv5 network, the squeeze-and-excitation module (SE), the weighted bi-directional feature pyramid module (BiFPN), the coordinate convolution (coordconv) module and the wise intersection over union loss function (WIoU). The SE module in the backbone allows it to focus on the important features of pedestrians and improves accuracy. The weighted BiFPN module enhances the fusion of multi-scale pedestrian features and information transfer, which can improve fusion efficiency. The prediction head design uses the WIoU loss function to reduce the regression error. The coordconv module allows the network to better perceive the location information in the feature map. The experimental results show that the pedestrian detection network PT-YOLO is more accurate compared with other target detection methods in pedestrian detection and can effectively accomplish the task of pedestrian detection in complex scenes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Traffic Sign Detection Algorithm Based on Improved YOLOv8s.

Author: Xiaoming Zhang and Ying Tian
Subjects: *TRAFFIC monitoring, *TRAFFIC signs & signals, *ALGORITHMS, *NETWORK performance, *LEARNING modules, *HUMAN fingerprints, *COORDINATES
Abstract: Aiming at the problems of low accuracy, false detection, missed detection, and low real-time detection of current traffic sign detection, this paper proposes an improved traffic sign detection algorithm based on the YOLOv8s algorithm. Firstly, this paper proposes a double-layer semicomposite backbone network structure (DSCB), which uses the auxiliary backbone network to extract features, and then transmits the extracted features to the backbone network to enhance the ability of the backbone network to extract target features. At the same time, the deformable convolution is integrated into the DC2f structure of the auxiliary backbone network to enhance the generalization performance of the network. Secondly, the coordinate attention mechanism is used after the SPPF layer. The coordinate attention mechanism can better retain the coordinate position information of small targets, reduce the miss rate of the model, and increase detection accuracy. Finally, this paper introduces a new CAB module to learn and aggregate the output of each layer of the feature pyramid for global spatial context to enhance the feature representation ability further. The experimental results show that the improved algorithm achieves 90.51% detection accuracy, 82.00% recall rate, 89.51% mAP@0.5 on the TT100K dataset, and the FPS reaches 106. Compared with the original algorithm model, the detection accuracy is increased by 2.27%, and the recall rate is increased by 2.48%. mAP@0.5 is increased by 2.01%, and FPS is increased by 1. The improved traffic sign detection algorithm meets the requirements in detection accuracy and real-time detection. [ABSTRACT FROM AUTHOR]
Published: 2024

45. 结合密度图回归与检测的密集计数研究.

Author: 高洁, 赵心馨, 于健, 徐天一, 潘丽, 杨珺, 喻梅, and 李雪威
Abstract: Copyright of Journal of Frontiers of Computer Science & Technology is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

46. 注意力特征融合的快速遥感图像目标检测算法.

Author: 吴建成, 郭荣佐, 成嘉伟, and 张浩
Abstract: Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

47. Swin-Fusion: Swin-Transformer with Feature Fusion for Human Action Recognition.

Author: Chen, Tiansheng and Mo, Lingfei
Subjects: CONVOLUTIONAL neural networks, FEATURE extraction, RECOGNITION (Psychology), TRANSFORMER models, IMAGE recognition (Computer vision), HUMAN activity recognition, COMPUTER vision
Abstract: Human action recognition based on still images is one of the most challenging computer vision tasks. In the past decade, convolutional neural networks (CNNs) have developed rapidly and achieved good performance in human action recognition tasks based on still images. Due to the absence of the remote perception ability of CNNs, it is challenging to have a global structural understanding of human behavior and the overall relationship between the behavior and the environment. Recently, transformer-based models have been making a splash in computer vision, even reaching SOTA in several vision tasks. We explore the transformer's capability in human action recognition based on still images and add a simple but effective feature fusion module based on the Swin-Transformer model. More specifically, we propose a new transformer-based model for behavioral feature extraction that uses a pre-trained Swin-Transformer as the backbone network. Swin-Transformer's distinctive hierarchical structure, combined with the feature fusion module, is used to extract and fuse multi-scale behavioral information. Extensive experiments were conducted on five still image-based human action recognition datasets, including the Li's action dataset, the Stanford-40 dataset, the PPMI-24 dataset, the AUC-V1 dataset, and the AUC-V2 dataset. Results indicate that our proposed Swin-Fusion model achieves better behavior recognition than previously improved CNN-based models by sharing and reusing feature maps of different scales at multiple stages, without modifying the original backbone training method and with only increasing training resources by 1.6%. The code and models will be available at https://github.com/cts4444/Swin-Fusion. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

48. 基于 YOLOX-NGS 的群养猪只攻击行为识别.

Author: 李艳文, 李菊霞, 纳腾潇, 智晴宇, 段磊, and 张朋鹏
Subjects: *PYRAMIDS, *SWINE, *ATTENTION
Abstract: Image data was collected at the Pig Breeding Base in Fenxi County, Linfen City, Shanxi Province in July 2020. Nine 5-month-old fattening pigs were selected to raise in a closed pig house. Hikvision DS-2CD3345D-I model camera was used in a downward tilt angle of 60 degrees to collect data under incandescent light. This angle was utilized to obtain the rich behavioral features of pigs, in order to avoid large-scale occlusion, compared with the head-up and overhead views. In the process of data collection, the daily behavior videos of pigs were first repeatedly observed, and 185 video clips of pigs with aggressive behavior were extracted; Inter frame difference method was used to extract the key frames from these video clips. Slow pig movement and long rest time were removed as well. An improved YOLOX model was proposed to identify the typical attack behaviors of herd pigs, such as impact, ear biting, and tail biting. The high accuracy and effectiveness were achieved to reduce pig stacking and adhesion in complex pen environments. Firstly, a Normalization based Attention Module (NAM) was added to obtain the global information about the YOLOX neck; Secondly, the loss function IoU Loss in the YOLOX was replaced with the GIoU to improve the recognition accuracy; Finally, the real-time performance of the model was realized to enhance feature extraction and detection efficiency. Feature pyramid structure SPP was lightweight to SPPF. The experiment showed that the integrated NAM modules, GIoU Loss replacing, and SPPF feature pyramid structures in the original backbone network improved the average accuracy of the model by 2.50, 2.12, and 0.98 percentage points, respectively. The model with SPP feature pyramid structure reduced the parameter by 0.1 MB and improved the accuracy by 0.98 percentage points, indicating the minimum impact of the model parameter after the integrated NAM module. The average accuracy of the improved model increased from 90.77% to 97.57%, with an increase of 6.80 percentage points; The parameter quantity decreased from the highest 34.7 to 34.5 MB with a decrease of 0.2 MB. In addition, there was the continuous attack behavior of pigs in the low credibility of single-frame images. Two optimization indicators (proportion of attack activities (PAA) and proportion of attack behavior (PAB))were introduced to further confirm whether the attack behavior occurred. When the PAA and PAB thresholds were 0.2 and 0.4, respectively, the recognition accuracy (Accuracy) reached 98.55%. Video segments with frequent attacks were selected to verify the effectiveness of the optimization. Usually, PAA and PAB posed a significant impact on the recognition of pig aggressive behavior; If the threshold set was too small, it was easy to misjudge frames without attack behavior as having occurred; If the threshold set was too large, the frame without the attack was assumed as the occurrence. The experimental results show that the improved YOLOX model was achieved in the high-precision recognition of pig attack behavior by the integrated PAA and PAB. The finding can provide effective reference and technical support for the intelligent monitoring of herd health pigs. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

49. End-to-End Entity Detection with Proposer and Regressor.

Author: Wen, Xueru, Zhou, Changjiang, Tang, Haotian, Liang, Luguang, Qi, Hong, and Jiang, Yu
Subjects: NATURAL language processing, SEMANTICS
Abstract: Named entity recognition is a traditional task in natural language processing. In particular, nested entity recognition receives extensive attention for the widespread existence of the nesting scenario. The latest research migrates the well-established paradigm of set prediction in object detection to cope with entity nesting. However, the manual creation of query vectors, which fail to adapt to the rich semantic information in the context, limits these approaches. An end-to-end entity detection approach with proposer and regressor is presented in this paper to tackle the issues. First, the proposer utilizes the feature pyramid network to generate high-quality entity proposals. Then, the regressor refines the proposals for generating the final prediction. The model adopts encoder-only architecture and thus obtains the advantages of the richness of query semantics, high precision of entity localization, and easiness of model training. Moreover, we introduce the novel spatially modulated attention and progressive refinement for further improvement. Extensive experiments demonstrate that our model achieves advanced performance in flat and nested NER, achieving a new state-of-the-art F1 score of 80.74 on the GENIA dataset and 72.38 on the WeiboNER dataset. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

50. 跨尺度自适应融合的车辆与行人检测算法.

Author: 李建东, 李佳琦, and 曲海成
Subjects: DEEP learning, PEDESTRIANS, PYRAMIDS
Abstract: Copyright of Chinese Journal of Liquid Crystal & Displays is the property of Chinese Journal of Liquid Crystal & Displays and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

400 results on '"feature pyramid"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources