352 results on '"Multi-scale feature extraction"'
Search Results
2. A dimension-enhanced residual multi-scale attention framework for identifying anomalous waveforms of fault recorders
- Author
-
Jia, Lixin, Feng, Lihang, Wang, Dong, Jiang, Jiapeng, Wang, Guannan, and Shi, Jiantao
- Published
- 2025
- Full Text
- View/download PDF
3. Driver abnormal behavior detection enabled self-powered magnetic suspension hybrid wristband and AI for smart transportation
- Author
-
Wu, Jiaoyi, Zhang, Hexiang, Xiao, Enzan, Liang, Tianshuang, Zou, Xiaolong, Sun, Jiantong, Fan, Chengliang, and Zhang, Zutao
- Published
- 2025
- Full Text
- View/download PDF
4. Dynamic Q&A multi-label classification based on adaptive multi-scale feature extraction
- Author
-
Li, Ying, Li, Ming, Zhang, Xiaoyi, and Ding, Jin
- Published
- 2025
- Full Text
- View/download PDF
5. TSPCS-net: Two-stage pavement crack segmentation network based on encoder-decoder architecture
- Author
-
Yue, Biao, Dang, Jianwu, Sun, Qi, Wang, Yangping, Min, Yongzhi, and Wang, Feng
- Published
- 2025
- Full Text
- View/download PDF
6. A High-Resolution Network for Runway Image Detection
- Author
-
Zu, Zhaozi, Lei, Hongjie, Yang, Guoliang, Qu, Zhongjun, Suo, Wenbo, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lan, Xuguang, editor, Mei, Xuesong, editor, Jiang, Caigui, editor, Zhao, Fei, editor, and Tian, Zhiqiang, editor
- Published
- 2025
- Full Text
- View/download PDF
7. Image Super-Resolution with Multi-scale Hybrid Attention
- Author
-
Wang, Ningzhi, Shi, Hanyi, Ruan, Wenna, Zeng, Lingbin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
8. End-to-End Multi-Scale Adaptive Remote Sensing Image Dehazing Network.
- Author
-
Wang, Xinhua, Yuan, Botao, Dong, Haoran, Hao, Qiankun, and Li, Zhuang
- Abstract
Satellites frequently encounter atmospheric haze during imaging, leading to the loss of detailed information in remote sensing images and significantly compromising image quality. This detailed information is crucial for applications such as Earth observation and environmental monitoring. In response to the above issues, this paper proposes an end-to-end multi-scale adaptive feature extraction method for remote sensing image dehazing (MSD-Net). In our network model, we introduce a dilated convolution adaptive module to extract global and local detail features of remote sensing images. The design of this module can extract important image features at different scales. By expanding convolution, the receptive field is expanded to capture broader contextual information, thereby obtaining a more global feature representation. At the same time, a self-adaptive attention mechanism is also used, allowing the module to automatically adjust the size of its receptive field based on image content. In this way, important features suitable for different scales can be flexibly extracted to better adapt to the changes in details in remote sensing images. To fully utilize the features at different scales, we also adopted feature fusion technology. By fusing features from different scales and integrating information from different scales, more accurate and rich feature representations can be obtained. This process aids in retrieving lost detailed information from remote sensing images, thereby enhancing the overall image quality. A large number of experiments were conducted on the HRRSD and RICE datasets, and the results showed that our proposed method can better restore the original details and texture information of remote sensing images in the field of dehazing and is superior to current state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
9. A 3D semantic segmentation network for accurate neuronal soma segmentation.
- Author
-
Ma, Li, Zhong, Qi, Wang, Yezi, Yang, Xiaoquan, and Du, Qian
- Abstract
Neuronal soma segmentation plays a crucial role in neuroscience applications. However, the fine structure, such as boundaries, small-volume neuronal somata and fibers, are commonly present in cell images, which pose a challenge for accurate segmentation. In this paper, we propose a 3D semantic segmentation network for neuronal soma segmentation to address this issue. Using an encoding-decoding structure, we introduce a Multi-Scale feature extraction and Adaptive Weighting fusion module (MSAW) after each encoding block. The MSAW module can not only emphasize the fine structures via an upsampling strategy, but also provide pixel-wise weights to measure the importance of the multi-scale features. Additionally, a dynamic convolution instead of normal convolution is employed to better adapt the network to input data with different distributions. The proposed MSAW-based semantic segmentation network (MSAW-Net) was evaluated on three neuronal soma images from mouse brain and one neuronal soma image from macaque brain, demonstrating the efficiency of the proposed method. It achieved an F1 score of 91.8% on Fezf2-2A-CreER dataset, 97.1% on LSL-H2B-GFP dataset, 82.8% on Thy1-EGFP-Mline dataset, and 86.9% on macaque dataset, achieving improvements over the 3D U-Net model by 3.1%, 3.3%, 3.9%, and 2.3%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
10. Multimodal sleep staging network based on obstructive sleep apnea.
- Author
-
Fan, Jingxin, Zhao, Mingfu, Huang, Li, Tang, Bin, Wang, Lurui, He, Zhong, and Peng, Xiaoling
- Subjects
SLEEP stages ,SLEEP quality ,SLEEP apnea syndromes ,FEATURE extraction ,TRANSFORMER models ,DROWSINESS - Abstract
Background: Automatic sleep staging is essential for assessing sleep quality and diagnosing sleep disorders. While previous research has achieved high classification performance, most current sleep staging networks have only been validated in healthy populations, ignoring the impact of Obstructive Sleep Apnea (OSA) on sleep stage classification. In addition, it remains challenging to effectively improve the fine-grained detection of polysomnography (PSG) and capture multi-scale transitions between sleep stages. Therefore, a more widely applicable network is needed for sleep staging. Methods: This paper introduces MSDC-SSNet, a novel deep learning network for automatic sleep stage classification. MSDC-SSNet transforms two channels of electroencephalogram (EEG) and one channel of electrooculogram (EOG) signals into time-frequency representations to obtain feature sequences at different temporal and frequency scales. An improved Transformer encoder architecture ensures temporal consistency and effectively captures long-term dependencies in EEG and EOG signals. The Multi-Scale Feature Extraction Module (MFEM) employs convolutional layers with varying dilation rates to capture spatial patterns from fine to coarse granularity. It adaptively fuses the weights of features to enhance the robustness of the model. Finally, multiple channel data are integrated to address the heterogeneity between different modalities effectively and alleviate the impact of OSA on sleep stages. Results: We evaluated MSDC-SSNet on three public datasets and our collection of PSG records of 17 OSA patients. It achieved an accuracy of 80.4% on the OSA dataset. It also outperformed the state-of-the-art methods in terms of accuracy, F1 score, and Cohen's Kappa coefficient on the remaining three datasets. Conclusion: The MSDC-SSRNet multi-channel sleep staging architecture proposed in this study enhances widespread system applicability by supplementing inter-channel features. It employs multi-scale attention to extract transition rules between sleep stages and effectively integrates multimodal information. Our method address the limitations of single-channel approaches, enhancing interpretability for clinical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
11. A Full-Scale Shadow Detection Network Based on Multiple Attention Mechanisms for Remote-Sensing Images.
- Author
-
Zhang, Lei, Zhang, Qing, Wu, Yu, Zhang, Yanfeng, Xiang, Shan, Xie, Donghai, and Wang, Zeyu
- Subjects
- *
REMOTE-sensing images , *REMOTE sensing , *FEATURE extraction , *IMAGE analysis , *TASK analysis - Abstract
Shadows degrade image quality and complicate interpretation, underscoring the importance of accurate shadow detection for many image analysis tasks. However, due to the complex backgrounds and variable shadow characteristics of remote sensing images (RSIs), existing methods often struggle with accurately detecting shadows of various scales and misclassifying dark, non-shaded areas as shadows. To address these issues, we proposed a comprehensive shadow detection network called MAMNet. Firstly, we proposed a multi-scale spatial channel attention fusion module, which extracted multi-scale features incorporating both spatial and channel information, allowing the model to flexibly adapt to shadows of different scales. Secondly, to address the issue of false detection in non-shadow areas, we introduced a criss-cross attention module, enabling non-shadow pixels to be compared with other shadow and non-shadow pixels in the same row and column, learning similar features of pixels in the same category, which improved the classification accuracy of non-shadow pixels. Finally, to address the issue of important information from the other two modules being lost due to continuous upsampling during the decoding phase, we proposed an auxiliary branch module to assist the main branch in decision-making, ensuring that the final output retained the key information from all stages. The experimental results demonstrated that the model outperformed the current state-of-the-art RSI shadow detection method on the aerial imagery dataset for shadow detection (AISD). The model achieved an overall accuracy (OA) of 97.50%, an F 1 score of 94.07%, an intersection over union (IOU) of 88.87%, a precision of 95.06%, and a BER of 4.05%, respectively. Additionally, visualization results indicated that our model could effectively detect shadows of various scales while avoiding false detection in non-shadow areas. Therefore, this model offers an efficient solution for shadow detection in aerial imagery. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Multi‐scale feature extraction for energy‐efficient object detection in remote sensing images.
- Author
-
Wu, Di, Liu, Hongning, Xu, Jiawei, and Xie, Fei
- Subjects
- *
TRANSFORMER models , *OBJECT recognition (Computer vision) , *REMOTE sensing , *DEEP learning , *TRAFFIC monitoring , *FEATURE extraction - Abstract
Object detection in remote sensing images aims to interpret images to obtain information on the category and location of potential targets, which is of great importance in traffic detection, marine supervision, and space reconnaissance. However, the complex backgrounds and large scale variations in remote sensing images present significant challenges. Traditional methods relied mainly on image filtering or feature descriptor methods to extract features, resulting in underperformance. Deep learning methods, especially one‐stage detectors, for example, the Real‐Time Object Detector (RTMDet) offers advanced solutions with efficient network architectures. Nevertheless, difficulty in feature extraction from complex backgrounds and target localisation in scale variations images limits detection accuracy. In this paper, an improved detector based on RTMDet, called the Multi‐Scale Feature Extraction‐assist RTMDet (MRTMDet), is proposed which address limitations through enhancement feature extraction and fusion networks. At the core of MRTMDet is a new backbone network MobileViT++ and a feature fusion network SFC‐FPN, which enhances the model's ability to capture global and multi‐scale features by carefully designing a hybrid feature processing unit of CNN and a transformer based on vision transformer (ViT) and poly‐scale convolution (PSConv), respectively. The experiment in DIOR‐R demonstrated that MRTMDet achieves competitive performance of 62.2% mAP, balancing precision with a lightweight design. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Forecasting of Local Lightning Using Spatial–Channel-Enhanced Recurrent Convolutional Neural Network.
- Author
-
Zhou, Wei, Li, Jinliang, Wang, Hongjie, Zhang, Donglai, and Wang, Xupeng
- Subjects
- *
CONVOLUTIONAL neural networks , *RECURRENT neural networks , *EMERGENCY management , *NUMERICAL weather forecasting , *DEEP learning - Abstract
Lightning is a hazardous weather phenomenon, characterized by sudden occurrences and complex local distributions. It poses significant challenges for accurate forecasting, which is crucial for public safety and economic stability. Deep learning methods are often better than traditional numerical weather prediction (NWP) models at capturing the spatiotemporal predictors of lightning events. However, these methods struggle to integrate predictors from diverse data sources, which leads to lower accuracy and interpretability. To address these challenges, the Multi-Scale Spatial–Channel-Enhanced Recurrent Convolutional Neural Network (SCE-RCNN) is proposed to improve forecasting accuracy and timeliness by utilizing multi-source data and enhanced attention mechanisms. The proposed model incorporates a multi-scale spatial–channel attention module and a cross-scale fusion module, which facilitates the integration of data from diverse sources. The multi-scale spatial–channel attention module utilizes a multi-scale convolutional network to extract spatial features at different spatial scales and employs a spatial–channel attention mechanism to focus on the most relevant regions for lightning prediction. Experimental results show that the SCE-RCNN model achieved a critical success index (CSI) of 0.83, a probability of detection (POD) of 0.991, and a false alarm rate (FAR) reduced to 0.351, outperforming conventional deep learning models across multiple prediction metrics. This research provides reliable lightning forecasts to support real-time decision-making, making significant contributions to aviation safety, outdoor event planning, and disaster risk management. The model's high accuracy and low false alarm rate highlight its value in both academic research and practical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. MCADNet: A Multi-Scale Cross-Attention Network for Remote Sensing Image Dehazing.
- Author
-
Tao, Tao, Xu, Haoran, Guan, Xin, and Zhou, Hao
- Subjects
- *
RESEMBLANCE (Philosophy) , *REMOTE sensing , *DATA mining , *STRUCTURAL colors , *FEATURE extraction - Abstract
Remote sensing image dehazing (RSID) aims to remove haze from remote sensing images to enhance their quality. Although existing deep learning-based dehazing methods have made significant progress, it is still difficult to completely remove the uneven haze, which often leads to color or structural differences between the dehazed image and the original image. In order to overcome this difficulty, we propose the multi-scale cross-attention dehazing network (MCADNet), which offers a powerful solution for RSID. MCADNet integrates multi-kernel convolution and a multi-head attention mechanism into the U-Net architecture, enabling effective multi-scale information extraction. Additionally, we replace traditional skip connections with a cross-attention-based gating module, enhancing feature extraction and fusion across different scales. This synergy enables the network to maximize the overall similarity between the restored image and the real image while also restoring the details of the complex texture areas in the image. We evaluate MCADNet on two benchmark datasets, Haze1K and RICE, demonstrating its superior performance. Ablation experiments further verify the importance of our key design choices in enhancing dehazing effectiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. 面向辐射源识别的多尺度特征提取与特征选择网络.
- Author
-
张顺生, 丁宦城, and 王文钦
- Abstract
Copyright of Journal of National University of Defense Technology / Guofang Keji Daxue Xuebao is the property of NUDT Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
16. MFPIDet: improved YOLOV7 architecture based on multi-scale feature fusion for prohibited item detection in complex environment.
- Author
-
Zhang, Lang, Huang, Zhan Ao, Shi, Canghong, Ma, Hongjiang, Li, Xiaojie, and Wu, Xi
- Subjects
DEEP learning ,FEATURE extraction ,PUBLIC spaces ,RECOMMENDER systems ,PUBLIC safety - Abstract
Prohibited item detection is crucial for the safety of public places. Deep learning, one of the mainstream methods in prohibited item detection tasks, has shown superior performance far beyond traditional prohibited item detection methods. However, most neural network architectures in deep learning still lack sufficient local feature representation ability for overlapping and small targets, and ignore the problem of semantic conflicts caused by direct feature fusion. In this paper, we propose MFPIDet, a novel prohibited item detection neural network architecture based on improved YOLOV7 to achieve reliable prohibited item detection in complex environments. Specifically, a multi-scale attention module (MAM) backbone is proposed to filter the redundant information of target regions and further applied to enhance the local feature representation ability of overlapping objects. Here, to reduce the redundant information of target regions, a squeeze-excitation (SE) block is used to filter the background. Then, aiming at enhancing the feature expression ability of overlapping objects, a multi-scale feature extraction module (MFEM) is designed for local feature representation. In addition, to obtain richer context information, We design an adaptive fusion feature pyramid network (AF-FPN) to combine the adaptive context information fusion module (ACIFM) with the feature fusion module (FFM) to improve the neck structure of YOLOV7. The proposed method is validated on the PIDray dataset, and the tested results showed that our method obtained the highest mAP (68.7%), which is improved by 3.5% than YOLOV7 methods. Our approach provides a new design pattern for prohibited item detection in complex environments and shows the development potential of deep learning in related fields. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. MSG-YOLO: A Lightweight Detection Algorithm for Clubbing Finger Detection.
- Author
-
Wang, Zhijie, Meng, Qiao, Tang, Feng, Qi, Yuelin, Li, Bingyu, Liu, Xin, Kong, Siyuan, and Li, Xin
- Subjects
FEATURE extraction ,THERAPEUTICS ,ALGORITHMS ,DIAGNOSIS - Abstract
Clubbing finger is a significant clinical indicator, and its early detection is essential for the diagnosis and treatment of associated diseases. However, traditional diagnostic methods rely heavily on the clinician's subjective assessment, which can be prone to biases and may lack standardized tools. Unlike other diagnostic challenges, the characteristic changes of clubbing finger are subtle and localized, necessitating high-precision feature extraction. Existing models often fail to capture these delicate changes accurately, potentially missing crucial diagnostic features or generating false positives. Furthermore, these models are often not suited for accurate clinical diagnosis in resource-constrained settings. To address these challenges, we propose MSG-YOLO, a lightweight clubbing finger detection model based on YOLOv8n, designed to enhance both detection accuracy and efficiency. The model first employs a multi-scale dilated residual module, which expands the receptive field using dilated convolutions and residual connections, thereby improving the model's ability to capture features across various scales. Additionally, we introduce a Selective Feature Fusion Pyramid Network (SFFPN) that dynamically selects and enhances critical features, optimizing the flow of information while minimizing redundancy. To further refine the architecture, we reconstruct the YOLOv8 detection head with group normalization and shared-parameter convolutions, significantly reducing the model's parameter count and increasing computational efficiency. Experimental results indicate that the model maintains high detection accuracy with reduced parameter and computational requirements. Compared to YOLOv8n, MSG-YOLO achieves a 48.74% reduction in parameter count and a 24.17% reduction in computational load, while improving the mAP0.5 score by 2.86%, reaching 93.64%. This algorithm strikes a balance between accuracy and lightweight design, offering efficient and reliable clubbing finger detection even in resource-constrained environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Robotic Grasping Detection Algorithm Based on 3D Vision Dual-Stream Encoding Strategy.
- Author
-
Lei, Minglin, Wang, Pandong, Lei, Hua, Ma, Jieyun, Wu, Wei, and Hao, Yongtao
- Subjects
TRANSFORMER models ,COMPUTER vision ,LONG-distance relationships ,FEATURE extraction ,GEOMETRIC shapes ,PREHENSION (Physiology) ,POSE estimation (Computer vision) - Abstract
The automatic generation of stable robotic grasping postures is crucial for the application of computer vision algorithms in real-world settings. This task becomes especially challenging in complex environments, where accurately identifying the geometric shapes and spatial relationships between objects is essential. To enhance the capture of object pose information in 3D visual scenes, we propose a planar robotic grasping detection algorithm named SU-Grasp, which simultaneously focuses on local regions and long-distance relationships. Built upon a U-shaped network, SU-Grasp introduces a novel dual-stream encoding strategy using the Swin Transformer combined with spatial semantic enhancement. Compared to existing baseline methods, our algorithm achieves superior performance across public datasets, simulation tests, and real-world scenarios, highlighting its robust understanding of complex spatial environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. An Improved Multi-Scale Feature Extraction Network for Rice Disease and Pest Recognition.
- Author
-
Lv, Pengtao, Xu, Heliang, Zhang, Yana, Zhang, Qinghui, Pan, Quan, Qin, Yao, Chen, Youyang, Cao, Dengke, Wang, Jingping, Zhang, Mengya, and Chen, Cong
- Subjects
- *
RICE diseases & pests , *IMAGE recognition (Computer vision) , *FOOD security , *FEATURE extraction , *AGRICULTURE - Abstract
Simple Summary: Rice is one of the most important sources of food for humans. However, rice production is frequently threatened by pests and diseases, resulting in significant losses. In this study, we developed a model that can assist agricultural practitioners in accurately identifying different types of rice pests and diseases. Our model can accurately identify seven different categories of rice pests and diseases, which enables agriculturalists to promptly identify the causes of crop damage and take appropriate measures to protect their crops. We hope that the application of this technology will reduce global rice losses and alleviate the problem of the global food crisis. In the process of rice production, rice pests are one of the main factors that cause rice yield reduction. To implement prevention and control measures, it is necessary to accurately identify the types of rice pests and diseases. However, the application of image recognition technologies focused on the agricultural field, especially in the field of rice disease and pest identification, is relatively limited. Existing research on rice diseases and pests has problems such as single data types, low data volume, and low recognition accuracy. Therefore, we constructed the rice pest and disease dataset (RPDD), which was expanded through data enhancement methods. Then, based on the ResNet structure and the convolutional attention mechanism module, we proposed a Lightweight Multi-scale Feature Extraction Network (LMN) to extract multi-scale features at a finer granularity. The proposed LMN model achieved an average classification accuracy of 95.38% and an F1-Score of 94.5% on the RPDD. The parameter size of the model is 1.4 M, and the FLOPs is 1.65 G. The results suggest that the LMN model performs rice disease and pest classification tasks more effectively than the baseline ResNet model by significantly reducing the model size and improving accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Cross Attention-Based Multi-Scale Convolutional Fusion Network for Hyperspectral and LiDAR Joint Classification.
- Author
-
Ge, Haimiao, Wang, Liguo, Pan, Haizhu, Liu, Yanzhong, Li, Cheng, Lv, Dan, and Ma, Huiyu
- Subjects
- *
CONVOLUTIONAL neural networks , *OPTICAL radar , *LIDAR , *LAND cover , *FEATURE extraction , *DEEP learning - Abstract
In recent years, deep learning-based multi-source data fusion, e.g., hyperspectral image (HSI) and light detection and ranging (LiDAR) data fusion, has gained significant attention in the field of remote sensing. However, the traditional convolutional neural network fusion techniques always provide poor extraction of discriminative spatial–spectral features from diversified land covers and overlook the correlation and complementarity between different data sources. Furthermore, the mere act of stacking multi-source feature embeddings fails to represent the deep semantic relationships among them. In this paper, we propose a cross attention-based multi-scale convolutional fusion network for HSI-LiDAR joint classification. It contains three major modules: spatial–elevation–spectral convolutional feature extraction module (SESM), cross attention fusion module (CAFM), and classification module. In the SESM, improved multi-scale convolutional blocks are utilized to extract features from HSI and LiDAR to ensure discriminability and comprehensiveness in diversified land cover conditions. Spatial and spectral pseudo-3D convolutions, pointwise convolutions, residual aggregation, one-shot aggregation, and parameter-sharing techniques are implemented in the module. In the CAFM, a self-designed local-global cross attention block is utilized to collect and integrate relationships of the feature embeddings and generate joint semantic representations. In the classification module, average polling, dropout, and linear layers are used to map the fused semantic representations to the final classification results. The experimental evaluations on three public HSI-LiDAR datasets demonstrate the competitiveness of the proposed network in comparison with state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Adversarial subdomain adaptation method based on multi-scale features for bearing fault diagnosis.
- Author
-
Zhou, Yuguo, Jin, Zhao, Zhang, Zhikai, Geng, Zengrong, and Zhou, Lijian
- Subjects
- *
FEATURE extraction , *WAVELET transforms , *DISTRIBUTION (Probability theory) , *DIAGNOSIS methods , *PROBLEM solving , *FAULT diagnosis - Abstract
Due to the variable working environment of bearings, the collected data often follow different probability distributions. It is hard to directly use the trained models to identify the bearing fault with different operating conditions. In addition, it is a high cost to label the samples for every work condition. To solve these problems, a multi-scale adversarial subdomain adaptation bearing fault diagnosis method is proposed, which is based on Continuous Wavelet Transform (CWT) and our constructed Multi-scale Adversarial SubDomain Adaptation Network (MASDAN). Firstly, to extract the features of non-stationary signals with different frequency bands, CWT is used to convert continuous vibration signals into two-dimensional time-frequency images. Secondly, to enhance the correlation of the features across frequency bands, a multi-scale ConvNeXt is proposed, which adds a multi-scale module based on ConvNeXt to obtain features with different scales. Finally, to reduce the distribution discrepancy between the source domain and the target domain and to avoid feature confusion in different domains, a domain adaptive alignment network introducing domain information is constructed. Two modules are included: the Domain Alignment Classification Network Module (DACNM) based on Multi-kernel Local Maximum Mean Discrepancy (MK-LMMD) and the Domain Adversarial Network Module (DANM) based on domain discrimination. Thus, the MASDAN consists of the multi-scale ConvNeXt module, DACNM, and DANM, which can realize the multi-scale feature extraction and adaptively align the fault features at different domains. The experimental results on the Qingdao University of Technology (QUT) bearing dataset and the Case Western Reserve University (CWRU) bearing dataset demonstrate the proposed method can effectively diagnose bearing faults under different operating conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. CCE-UNet: Forest and Water Body Coverage Detection Method Based on Deep Learning: A Case Study in Australia's Nattai National Forest.
- Author
-
Huang, Bangjun, Yi, Xiaomei, Mo, Lufeng, Wang, Guoying, and Wu, Peng
- Subjects
BODIES of water ,ECOLOGICAL restoration monitoring ,FOREST protection ,NATURAL disasters ,FOREST fires ,FOREST monitoring - Abstract
Severe forest fires caused by extremely high temperatures have resulted in devastating disasters in the natural forest reserves of New South Wales, Australia. Traditional forest research methods primarily rely on manual field surveys, which have limited generalization capabilities. In order to monitor forest ecosystems more comprehensively and maintain the stability of the regional forest ecosystem, as well as to monitor post-disaster ecological restoration efforts, this study employed high-resolution remote sensing imagery and proposed a semantic segmentation architecture named CCE-UNet. This architecture focuses on the precise identification of forest coverage while simultaneously monitoring the distribution of water resources in the area. This architecture utilizes the Contextual Information Fusion Module (CIFM) and introduces the dual attention mechanism strategy to effectively filter background information and enhance image edge features. Meanwhile, it employs a multi-scale feature fusion algorithm to maximize the retention of image details and depth information, achieving precise segmentation of forests and water bodies. We have also trained seven semantic segmentation models as candidates. Experimental results show that the CCE-UNet architecture achieves the best performance, demonstrating optimal performance in forest and water body segmentation tasks, with the MIoU reaching 91.07% and the MPA reaching 95.15%. This study provides strong technical support for the detection of forest and water body coverage in the region and is conducive to the monitoring and protection of the forest ecosystem. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Remote sensing image Super-resolution reconstruction by fusing multi-scale receptive fields and hybrid transformer
- Author
-
Denghui Liu, Lin Zhong, Haiyang Wu, Songyang Li, and Yida Li
- Subjects
Remote sensing image ,Image Super-resolution ,GAN ,Attention mechanism ,Hybrid transformer ,Multi-scale feature extraction ,Medicine ,Science - Abstract
Abstract To enhance high-frequency perceptual information and texture details in remote sensing images and address the challenges of super-resolution reconstruction algorithms during training, particularly the issue of missing details, this paper proposes an improved remote sensing image super-resolution reconstruction model. The generator network of the model employs multi-scale convolutional kernels to extract image features and utilizes a multi-head self-attention mechanism to dynamically fuse these features, significantly improving the ability to capture both fine details and global information in remote sensing images. Additionally, the model introduces a multi-stage Hybrid Transformer structure, which processes features at different resolutions progressively, from low resolution to high resolution, substantially enhancing reconstruction quality and detail recovery. The discriminator combines multi-scale convolution, global Transformer, and hierarchical feature discriminators, providing a comprehensive and refined evaluation of image quality. Finally, the model incorporates a Charbonnier loss function and total variation (TV) loss function, which significantly improve training stability and accelerate convergence. Experimental results demonstrate that the proposed method, compared to the SRGAN algorithm, achieves average improvements of approximately 3.61 dB in Peak Signal-to-Noise Ratio (PSNR), 0.070 (8.2%) in Structural Similarity Index (SSIM), and 0.030 (3.1%) in Feature Similarity Index (FSIM) across multiple datasets, showing significant performance gains.
- Published
- 2025
- Full Text
- View/download PDF
24. TMFN: a text-based multimodal fusion network with multi-scale feature extraction and unsupervised contrastive learning for multimodal sentiment analysis
- Author
-
Junsong Fu, Youjia Fu, Huixia Xue, and Zihao Xu
- Subjects
Multimodal sentiment analysis ,Multi-scale feature extraction ,Multimodal data fusion ,Transformer ,Unsupervised contrastive learning ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract Multimodal sentiment analysis (MSA) is crucial in human-computer interaction. Current methods use simple sub-models for feature extraction, neglecting multi-scale features and the complexity of emotions. Text, visual, and audio each have unique characteristics in MSA, with text often providing more emotional cues due to its rich semantics. However, current approaches treat modalities equally, not maximizing text’s advantages. To solve these problems, we propose a novel method named a text-based multimodal fusion network with multi-scale feature extraction and unsupervised contrastive learning (TMFN). Firstly, we propose an innovative pyramid-structured multi-scale feature extraction method, which captures the multi-scale features of modal data through convolution kernels of different sizes and strengthens key features through channel attention mechanism. Second, we design a text-based multimodal feature fusion module, which consists of a text gating unit (TGU) and a text-based channel-wise attention transformer (TCAT). TGU is responsible for guiding and regulating the fusion process of other modal information, while TCAT improves the model’s ability to capture the relationship between features of different modalities and achieves effective feature interaction. Finally, to further optimize the representation of fused features, we introduce unsupervised contrastive learning to deeply explore the intrinsic connection between multi-scale features and fused features. Experimental results show that our proposed model outperforms the state-of-the-art models in MSA on two benchmark datasets.
- Published
- 2025
- Full Text
- View/download PDF
25. 基于多尺度特征提取的层次多标签文本分类方法.
- Author
-
武子轩, 王烨, and 于洪
- Abstract
Copyright of Journal of Zhengzhou University (Natural Science Edition) is the property of Journal of Zhengzhou University (Natural Science Edition) Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2025
- Full Text
- View/download PDF
26. 3T dilated inception network for enhanced autism spectrum disorder diagnosis using resting-state fMRI data.
- Author
-
Kavitha, V. and Siva, R.
- Abstract
Autism spectrum disorder (ASD) is one of the complicated neurodevelopmental disorders that impacts the daily functioning and social interactions of individuals. It includes diverse symptoms and severity levels, making it challenging to diagnose and treat efficiently. Various deep learning (DL) based methods have been developed for diagnosing ASD, which rely heavily on behavioral assessment. However, existing techniques have suffered from poor diagnostic outcomes, higher computational complexity, and overfitting issues. To address these challenges, this research work introduces an innovative framework called 3T Dilated Inception Network (3T-DINet) for effective ASD diagnosis using resting-state functional Magnetic Resonance Imaging (rs-fMRI) images. The proposed 3T-DINet technique designs a 3T dilated inception module that incorporates dilated convolutions along with the inception module, allowing it to extract multi-scale features from brain connectivity patterns. The 3T dilated inception module uses three distinct dilation rates (low, medium, and high) in parallel to determine local, mid-level, and global features from the brain. In addition, the proposed approach implements Residual networks (ResNet) to avoid the vanishing gradient problem and enhance the feature extraction ability. The model is further optimized using a Crossover-based Black Widow Optimization (CBWO) algorithm that fine-tunes the hyperparameters thereby enhancing the overall performance of the model. Further, the performance of the 3T-DINet model is evaluated using the five ASD datasets with distinct evaluation parameters. The proposed 3T-DINet technique achieved superior diagnosis results compared to recent previous works. From this simulation validation, it’s clear that the 3T-DINet provides an excellent contribution to early ASD diagnosis and enhances patient treatment outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
27. TMFN: a text-based multimodal fusion network with multi-scale feature extraction and unsupervised contrastive learning for multimodal sentiment analysis.
- Author
-
Fu, Junsong, Fu, Youjia, Xue, Huixia, and Xu, Zihao
- Subjects
ARTIFICIAL intelligence ,COGNITIVE psychology ,SENTIMENT analysis ,IMAGE processing ,COGNITIVE analysis - Abstract
Multimodal sentiment analysis (MSA) is crucial in human-computer interaction. Current methods use simple sub-models for feature extraction, neglecting multi-scale features and the complexity of emotions. Text, visual, and audio each have unique characteristics in MSA, with text often providing more emotional cues due to its rich semantics. However, current approaches treat modalities equally, not maximizing text's advantages. To solve these problems, we propose a novel method named a text-based multimodal fusion network with multi-scale feature extraction and unsupervised contrastive learning (TMFN). Firstly, we propose an innovative pyramid-structured multi-scale feature extraction method, which captures the multi-scale features of modal data through convolution kernels of different sizes and strengthens key features through channel attention mechanism. Second, we design a text-based multimodal feature fusion module, which consists of a text gating unit (TGU) and a text-based channel-wise attention transformer (TCAT). TGU is responsible for guiding and regulating the fusion process of other modal information, while TCAT improves the model's ability to capture the relationship between features of different modalities and achieves effective feature interaction. Finally, to further optimize the representation of fused features, we introduce unsupervised contrastive learning to deeply explore the intrinsic connection between multi-scale features and fused features. Experimental results show that our proposed model outperforms the state-of-the-art models in MSA on two benchmark datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
28. MSDCNet: Multi-stage and deep residual complementary multi-focus image fusion network based on multi-scale feature learning: MSDCNet: Multi-stage and deep residual complementary multi-focus image fusion network based on multi-scale feature learning: G. Hu et al.
- Author
-
Hu, Gang, Jiang, Jinlin, Sheng, Guanglei, and Wei, Guo
- Abstract
Addressing the boundary blurring problem in focus and out-of-focus regions is a key area of research in multifocus image fusion. Effective utilization of multi-scale modules is essential for enhancing performance. Therefore, this paper proposes a multi-stage feature extraction and deep residual complementary multifocus image fusion network. In the feature extraction stage, the V-shaped connection module captures the main objects and contours of the image. The feature thinning extraction module uses extended convolution to learn image details and refine textures at multiple scales. The advanced feature texture enhancement module targets boundary blurring regions, enhancing texture details and improving fusion quality. Asymmetric convolution reduces the network’s computational burden, improving feature learning efficiency. The fusion strategy uses a compound loss function to ensure image quality and prevent color distortion. The image reconstruction module uses residual connections with different-sized convolution kernels to maintain feature consistency and improve image quality. The network utilizes a dual-path Pseudo-Siamese structure, which handles image focus and defocus regions separately. Experimental results demonstrate the algorithm’s effectiveness. On the Lytro dataset, it achieves AG and EI metric values of 6.9 and 72.5, respectively, outperforming other methods. Fusion metrics SD = 61.80, SF = 19.63, and VIF = 0.94 surpass existing algorithms, effectively resolving the boundary blurring problem and providing better visual perception and broader applicability. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
29. A differential network with multiple gated reverse attention for medical image segmentation
- Author
-
Shun Yan, Benquan Yang, and Aihua Chen
- Subjects
Medical image segmentation ,Multi-scale feature extraction ,Differential feature ,Medicine ,Science - Abstract
Abstract UNet architecture has achieved great success in medical image segmentation applications. However, these models still encounter several challenges. One is the loss of pixel-level information caused by multiple down-sampling steps. Additionally, the addition or concatenation method used in the decoder can generate redundant information. These limitations affect the localization ability, weaken the complementarity of features at different levels and can lead to blurred boundaries. However, differential features can effectively compensate for these shortcomings and significantly enhance the performance of image segmentation. Therefore, we propose MGRAD-UNet (multi-gated reverse attention multi-scale differential UNet) based on UNet. We utilize the multi-scale differential decoder to generate abundant differential features at both the pixel level and structure level. These features which serve as gate signals, are transmitted to the gate controller and forwarded to the other differential decoder. In order to enhance the focus on important regions, another differential decoder is equipped with reverse attention. The features obtained by two differential decoders are differentiated for the second time. The resulting differential feature obtained is sent back to the controller as a control signal, then transmitted to the encoder for learning the differential feature by two differential decoders. The core design of MGRAD-UNet lies in extracting comprehensive and accurate features through caching overall differential features and multi-scale differential processing, enabling iterative learning from diverse information. We evaluate MGRAD-UNet against state-of-theart (SOTA) methods on two public datasets. Our method surpasses competitors and provides a new approach for the design of UNet.
- Published
- 2024
- Full Text
- View/download PDF
30. MFPIDet: improved YOLOV7 architecture based on multi-scale feature fusion for prohibited item detection in complex environment
- Author
-
Lang Zhang, Zhan Ao Huang, Canghong Shi, Hongjiang Ma, Xiaojie Li, and Xi Wu
- Subjects
Prohibited item detection ,MFPIDet ,Multi-scale feature extraction ,Adaptive context information fusion ,Attention mechanism ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract Prohibited item detection is crucial for the safety of public places. Deep learning, one of the mainstream methods in prohibited item detection tasks, has shown superior performance far beyond traditional prohibited item detection methods. However, most neural network architectures in deep learning still lack sufficient local feature representation ability for overlapping and small targets, and ignore the problem of semantic conflicts caused by direct feature fusion. In this paper, we propose MFPIDet, a novel prohibited item detection neural network architecture based on improved YOLOV7 to achieve reliable prohibited item detection in complex environments. Specifically, a multi-scale attention module (MAM) backbone is proposed to filter the redundant information of target regions and further applied to enhance the local feature representation ability of overlapping objects. Here, to reduce the redundant information of target regions, a squeeze-excitation (SE) block is used to filter the background. Then, aiming at enhancing the feature expression ability of overlapping objects, a multi-scale feature extraction module (MFEM) is designed for local feature representation. In addition, to obtain richer context information, We design an adaptive fusion feature pyramid network (AF-FPN) to combine the adaptive context information fusion module (ACIFM) with the feature fusion module (FFM) to improve the neck structure of YOLOV7. The proposed method is validated on the PIDray dataset, and the tested results showed that our method obtained the highest mAP (68.7%), which is improved by 3.5% than YOLOV7 methods. Our approach provides a new design pattern for prohibited item detection in complex environments and shows the development potential of deep learning in related fields.
- Published
- 2024
- Full Text
- View/download PDF
31. Remote sensing image cloud removal based on multi-scale spatial information perception.
- Author
-
Dou, Aozhe, Hao, Yang, Liu, Weifeng, Li, Liangliang, Wang, Zhenzhong, and Liu, Baodi
- Abstract
Remote sensing imagery is indispensable in diverse domains, including geographic information systems, climate monitoring, agricultural planning, and disaster management. Nonetheless, cloud cover can drastically degrade the utility and quality of these images. Current deep learning-based cloud removal methods rely on convolutional neural networks to extract features at the same scale, which can overlook detailed and global information, resulting in suboptimal cloud removal performance. To overcome these challenges, we develop a method for cloud removal that leverages multi-scale spatial information perception. Our technique employs convolution kernels of various sizes, enabling the integration of both global semantic information and local detail information. An attention mechanism enhances this process by targeting key areas within the images, and dynamically adjusting channel weights to improve feature reconstruction. We compared our method with current popular cloud removal methods across three datasets, and the results show that our proposed method improves metrics such as PSNR, SSIM, and cosine similarity, verifying the effectiveness of our method in cloud removal. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. 基于增强网格网络的井下尘雾图像清晰化算法.
- Author
-
谷亚楠, 李晴, 刘晨晨, and 张富凯
- Abstract
Copyright of Journal of Mine Automation is the property of Industry & Mine Automation Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
33. MDFA-Net: Multi-Scale Differential Feature Self-Attention Network for Building Change Detection in Remote Sensing Images.
- Author
-
Li, Yuanling, Zou, Shengyuan, Zhao, Tianzhong, and Su, Xiaohui
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models , *FEATURE extraction , *REMOTE sensing , *URBAN studies - Abstract
Building change detection (BCD) from remote sensing images is an essential field for urban studies. In this well-developed field, Convolutional Neural Networks (CNNs) and Transformer have been leveraged to empower BCD models in handling multi-scale information. However, it is still challenging to accurately detect subtle changes using current models, which has been the main bottleneck to improving detection accuracy. In this paper, a multi-scale differential feature self-attention network (MDFA-Net) is proposed to effectively integrate CNN and Transformer by balancing the global receptive field from the self-attention mechanism and the local receptive field from convolutions. In MDFA-Net, two innovative modules were designed. Particularly, a hierarchical multi-scale dilated convolution (HMDConv) module was proposed to extract local features with hybrid dilation convolutions, which can ameliorate the effect of CNN's local bias. In addition, a differential feature self-attention (DFA) module was developed to implement the self-attention mechanism at multi-scale difference feature maps to overcome the problem that local details may be lost in the global receptive field in Transformer. The proposed MDFA-Net achieves state-of-the-art accuracy performance in comparison with related works, e.g., USSFC-Net, in three open datasets: WHU-CD, CDD-CD, and LEVIR-CD. Based on the experimental results, MDFA-Net significantly exceeds other models in F1 score, IoU, and overall accuracy; the F1 score is 93.81%, 95.52%, and 91.21% in WHU-CD, CDD-CD, and LEVIR-CD datasets, respectively. Furthermore, MDFA-Net achieved first or second place in precision and recall in the test in all three datasets, which indicates its better balance in precision and recall than other models. We also found that subtle changes, i.e., small-sized building changes and irregular boundary changes, are better detected thanks to the introduction of HMDConv and DFA. To this end, with its better ability to leverage multi-scale differential information than traditional methods, MDFA-Net provides a novel and effective avenue to integrate CNN and Transformer in BCD. Further studies could focus on improving the model's insensitivity to hyper-parameters and the model's generalizability in practical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. A differential network with multiple gated reverse attention for medical image segmentation.
- Author
-
Yan, Shun, Yang, Benquan, and Chen, Aihua
- Subjects
IMAGE segmentation ,DIAGNOSTIC imaging ,ITERATIVE learning control ,ATTENTION ,FEATURE extraction - Abstract
UNet architecture has achieved great success in medical image segmentation applications. However, these models still encounter several challenges. One is the loss of pixel-level information caused by multiple down-sampling steps. Additionally, the addition or concatenation method used in the decoder can generate redundant information. These limitations affect the localization ability, weaken the complementarity of features at different levels and can lead to blurred boundaries. However, differential features can effectively compensate for these shortcomings and significantly enhance the performance of image segmentation. Therefore, we propose MGRAD-UNet (multi-gated reverse attention multi-scale differential UNet) based on UNet. We utilize the multi-scale differential decoder to generate abundant differential features at both the pixel level and structure level. These features which serve as gate signals, are transmitted to the gate controller and forwarded to the other differential decoder. In order to enhance the focus on important regions, another differential decoder is equipped with reverse attention. The features obtained by two differential decoders are differentiated for the second time. The resulting differential feature obtained is sent back to the controller as a control signal, then transmitted to the encoder for learning the differential feature by two differential decoders. The core design of MGRAD-UNet lies in extracting comprehensive and accurate features through caching overall differential features and multi-scale differential processing, enabling iterative learning from diverse information. We evaluate MGRAD-UNet against state-of-theart (SOTA) methods on two public datasets. Our method surpasses competitors and provides a new approach for the design of UNet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. MSCR-FuResNet: A Three-Residual Network Fusion Model Based on Multi-Scale Feature Extraction and Enhanced Channel Spatial Features for Close-Range Apple Leaf Diseases Classification under Optimal Conditions.
- Author
-
Chen, Xili, Xing, Xuanzhu, Zhang, Yongzhong, Liu, Ruifeng, Li, Lin, Zhang, Ruopeng, Tang, Lei, Shi, Ziyang, Zhou, Hao, Guo, Ruitian, and Dong, Jingrong
- Subjects
FEATURE extraction ,DATA augmentation ,DEEP learning ,MULTISCALE modeling ,AGRICULTURAL development - Abstract
The precise and automated diagnosis of apple leaf diseases is essential for maximizing apple yield and advancing agricultural development. Despite the widespread utilization of deep learning techniques, several challenges persist: (1) the presence of small disease spots on apple leaves poses difficulties for models to capture intricate features; (2) the high similarity among different types of apple leaf diseases complicates their differentiation; and (3) images with complex backgrounds often exhibit low contrast, thereby reducing classification accuracy. To tackle these challenges, we propose a three-residual fusion network known as MSCR-FuResNet (Fusion of Multi-scale Feature Extraction and Enhancements of Channels and Residual Blocks Net), which consists of three sub-networks: (1) enhancing detailed feature extraction through multi-scale feature extraction; (2) improving the discrimination of similar features by suppressing insignificant channels and pixels; and (3) increasing low-contrast feature extraction by modifying the activation function and residual blocks. The model was validated with a comprehensive dataset from public repositories, including Plant Village and Baidu Flying Paddle. Various data augmentation techniques were employed to address class imbalance. Experimental results demonstrate that the proposed model outperforms ResNet-50 with an accuracy of 97.27% on the constructed dataset, indicating significant advancements in apple leaf disease recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. MSFE-UIENet: A Multi-Scale Feature Extraction Network for Marine Underwater Image Enhancement.
- Author
-
Zhao, Shengya, Mei, Xinkui, Ye, Xiufen, and Guo, Shuxiang
- Subjects
IMAGE intensifiers ,LIGHT absorption ,OPTICAL images ,LIGHT scattering ,PYRAMIDS ,IMAGE enhancement (Imaging systems) - Abstract
Underwater optical images have outstanding advantages for short-range underwater target detection tasks. However, owing to the limitations of special underwater imaging environments, underwater images often have several problems, such as noise interference, blur texture, low contrast, and color distortion. Marine underwater image enhancement addresses degraded underwater image quality caused by light absorption and scattering. This study introduces MSFE-UIENet, a high-performance network designed to improve image feature extraction, resulting in deep-learning-based underwater image enhancement, addressing the limitations of single convolution and upsampling/downsampling techniques. This network is designed to enhance the image quality in underwater settings by employing an encoder–decoder architecture. In response to the underwhelming enhancement performance caused by the conventional networks' sole downsampling method, this study introduces a pyramid downsampling module that captures more intricate image features through multi-scale downsampling. Additionally, to augment the feature extraction capabilities of the network, an advanced feature extraction module was proposed to capture detailed information from underwater images. Furthermore, to optimize the network's gradient flow, forward and backward branches were introduced to accelerate its convergence rate and improve stability. Experimental validation using underwater image datasets indicated that the proposed network effectively enhances underwater image quality, effectively preserving image details and noise suppression across various underwater environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Fault Diagnosis Method of Special Vehicle Bearing Based on Multi-Scale Feature Fusion and Transfer Adversarial Learning.
- Author
-
Xiao, Zhiguo, Li, Dongni, Yang, Chunguang, and Chen, Wei
- Subjects
- *
ROLLER bearings , *TRANSFER of training , *LEARNING strategies , *DATA extraction , *DATA distribution - Abstract
To address the issues of inadequate feature extraction for rolling bearings, inaccurate fault diagnosis, and overfitting in complex operating conditions, this paper proposes a rolling bearing diagnosis method based on multi-scale feature fusion and transfer adversarial learning. Firstly, a multi-scale convolutional fusion layer is designed to effectively extract fault features from the original vibration signals at multiple time scales. Through a feature encoding fusion module based on the multi-head attention mechanism, feature fusion extraction is performed, which can model long-distance contextual information and significantly improve diagnostic accuracy and anti-noise capability. Secondly, based on the domain adaptation (DA) cross-domain feature adversarial learning strategy of transfer learning methods, the extraction of optimal domain-invariant features is achieved by reducing the gap in data distribution between the target domain and the source domain, addressing the call for research on fault diagnosis across operating conditions, equipment, and virtual–real migrations. Finally, experiments were conducted to verify and optimize the effectiveness of the feature extraction and fusion network. A public bearing dataset was used as the source domain data, and special vehicle bearing data were selected as the target domain data for comparative experiments on the effect of network transfer learning. The experimental results demonstrate that the proposed method exhibits an exceptional performance in cross-domain and variable load environments. In multiple bearing cross-domain transfer learning tasks, the method achieves an average migration fault diagnosis accuracy rate of up to 98.65%. When compared with existing methods, the proposed method significantly enhances the ability of data feature extraction, thereby achieving a more robust diagnostic performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. 多尺度特征提取和双向特征融合的场景文本检测.
- Author
-
连哲, 殷雁君, 智敏, and 徐巧枝
- Subjects
DETECTION algorithms ,FEATURE extraction ,IMAGE processing ,ALGORITHMS - Abstract
Copyright of Journal of Harbin University of Science & Technology is the property of Journal of Harbin University of Science & Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
39. High Efficiency Deep-learning Based Video Compression.
- Author
-
Tang, Lv and Zhang, Xinfeng
- Subjects
IMAGE compression ,RECURRENT neural networks ,VIDEO coding ,DEEP learning ,FEATURE extraction ,VIDEO compression - Abstract
Although deep learning technique has achieved significant improvement on image compression, but its advantages are not fully explored in video compression, which leads to the performance of deep-learning-based video compression (DLVC) is obviously inferior to that of hybrid video coding framework. In this article, we proposed a novel network to improve the performance of DLVC from its most important modules, including Motion Process (MP), Residual Compression (RC), and Frame Reconstruction (FR). In MP, we design a split second-order attention and multi-scale feature extraction module to fully remove the warping artifacts from multi-scale feature space and pixel space, which can help reduce the distortion in the following process. In RC, we propose a channel selection mechanism to gradually drop redundant information while preserving informative channels for a better rate-distortion performance. Finally, in FR, we introduce a residual multi-scale recurrent network to improve the quality of the current reconstructed frame by progressively exploiting temporal context information between it and its several previous reconstructed frames. Extensive experiments are conducted on the three widely used video compression datasets (HEVC, UVG, and MCL-JVC), and the performance demonstrates the superiority of our proposed approach over the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. MSLUnet: A Medical Image Segmentation Network Incorporating Multi-Scale Semantics and Large Kernel Convolution.
- Author
-
Zhu, Shijuan and Cheng, Lingfei
- Subjects
COMPUTER-assisted image analysis (Medicine) ,CONVOLUTIONAL neural networks ,DEEP learning ,FEATURE extraction ,DIAGNOSTIC imaging - Abstract
In recent years, various deep-learning methodologies have been developed for processing medical images, with Unet and its derivatives proving particularly effective in medical image segmentation. Our primary objective is to enhance the accuracy of these networks while also reducing the number of parameters and computational demands to facilitate deployment on mobile medical devices. To this end, we introduce a novel medical image segmentation network, MSLUnet, which aims to minimize parameter count and computational load without compromising segmentation effectiveness. The network features a U-shaped architecture. In the encoder module, we utilize multiple small convolutional kernels for successive convolutions rather than large ones, allowing for capturing multi-scale feature information at granular levels through varied receptive field scales. In the decoder module, an inverse bottleneck structure with depth-separable convolution employing large kernels is incorporated. This design effectively extracts spatial dimensional information and ensures a comprehensive integration of both shallow and deep features. Additionally, a lightweight three-branch attention mechanism within the skip connections enhances information transfer by capturing global contextual data across spatial and channel dimensions. Experimental evaluations conducted on several publicly available medical image datasets indicate that MSLUnet is more competitive than existing models in terms of efficiency and effectiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Attention-driven residual-dense network for no-reference image quality assessment.
- Author
-
Zhang, Yang, Wang, Changzhong, Lv, Xiang, and Song, Yingnan
- Abstract
With the rapid development of deep learning, convolutional neural networks have been applied to no-reference image quality assessment (NR-IQA), but most methods focus on the design of complex networks, which not only increase network parameters and make the training process more difficult, but also fail to make full use of the rich global and local information in images. To address this problem, this paper proposed an effective NR-IQA method, namely, attention-driven residual dense network, which can evaluate the quality of images quickly and accurately. Specifically, three different sizes of convolution kernels are first used to extract features from images by parallel, so that the feature information of images can be expressed at different scales. Next, several cascaded residual dense channel attention blocks are used to further extract high-level feature information, which can capture the most effective feature. In addition, we embed a novel channel attention mechanism into the multi-scale feature extraction block and the residual dense block to filter out channel-specific attention by learning correlations between channels. A series of experiments on public synthetic databases show that the proposed method outperforms the state-of-the-art NR-IQA methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Advancing the Diagnosis of Aero-Engine Bearing Faults with Rotational Spectrum and Scale-Aware Robust Network.
- Author
-
Li, Jin, Yang, Zhengbing, Zhou, Xiang, Song, Chenchen, and Wu, Yafeng
- Subjects
CONVOLUTIONAL neural networks ,FEATURE extraction ,ROTATIONAL motion ,FAULT diagnosis ,TRANSFORMER models - Abstract
The precise monitoring of bearings is crucial for the timely detection of issues in rotating mechanical systems. However, the high complexity of the structures makes the paths of vibration signal transmission exceedingly intricate, posing significant challenges in diagnosing aero-engine bearing faults. Therefore, a Rotational-Spectrum-informed Scale-aware Robustness (RSSR) neural network is proposed in this study to address intricate fault characteristics and significant noise interference. The RSSR algorithm amalgamates a scale-aware feature extraction block, a non-activation convolutional network, and an innovative channel attention block, striking a balance between simplicity and efficacy. We provide a comprehensive analysis by comparing traditional CNNs, transformers, and their respective variants. Our strategy not only elevates diagnostic precision but also judiciously moderates the network's parameter count and computational intensity, mitigating the propensity for overfitting. To assess the efficacy of our proposed network, we performed rigorous testing using two complex, publicly available datasets, with additional artificial noise introductions to simulate challenging operational environments. On the noise-free dataset, our technique increased the accuracy by 5.11% on the aero-engine dataset compared with the current mainstream methods. Even under maximal noise conditions, it enhances the average accuracy by 4.49% compared with other contemporary approaches. The results demonstrate that our approach outperforms other techniques in terms of diagnostic performance and generalization ability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Multimodal sleep staging network based on obstructive sleep apnea
- Author
-
Jingxin Fan, Mingfu Zhao, Li Huang, Bin Tang, Lurui Wang, Zhong He, and Xiaoling Peng
- Subjects
automatic sleep staging ,obstructive sleep apnea ,time-frequency representation ,multi-scale feature extraction ,transition rules ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
BackgroundAutomatic sleep staging is essential for assessing sleep quality and diagnosing sleep disorders. While previous research has achieved high classification performance, most current sleep staging networks have only been validated in healthy populations, ignoring the impact of Obstructive Sleep Apnea (OSA) on sleep stage classification. In addition, it remains challenging to effectively improve the fine-grained detection of polysomnography (PSG) and capture multi-scale transitions between sleep stages. Therefore, a more widely applicable network is needed for sleep staging.MethodsThis paper introduces MSDC-SSNet, a novel deep learning network for automatic sleep stage classification. MSDC-SSNet transforms two channels of electroencephalogram (EEG) and one channel of electrooculogram (EOG) signals into time-frequency representations to obtain feature sequences at different temporal and frequency scales. An improved Transformer encoder architecture ensures temporal consistency and effectively captures long-term dependencies in EEG and EOG signals. The Multi-Scale Feature Extraction Module (MFEM) employs convolutional layers with varying dilation rates to capture spatial patterns from fine to coarse granularity. It adaptively fuses the weights of features to enhance the robustness of the model. Finally, multiple channel data are integrated to address the heterogeneity between different modalities effectively and alleviate the impact of OSA on sleep stages.ResultsWe evaluated MSDC-SSNet on three public datasets and our collection of PSG records of 17 OSA patients. It achieved an accuracy of 80.4% on the OSA dataset. It also outperformed the state-of-the-art methods in terms of accuracy, F1 score, and Cohen's Kappa coefficient on the remaining three datasets.ConclusionThe MSDC-SSRNet multi-channel sleep staging architecture proposed in this study enhances widespread system applicability by supplementing inter-channel features. It employs multi-scale attention to extract transition rules between sleep stages and effectively integrates multimodal information. Our method address the limitations of single-channel approaches, enhancing interpretability for clinical applications.
- Published
- 2024
- Full Text
- View/download PDF
44. MRA-Net: an instance segmentation method based on multi-scale feature fusion for ethnic costumes images: MRA-Net: an instance segmentation method based on multi-scale…
- Author
-
Fan, Yingjie, Wen, Bin, and Deng, Hongfei
- Published
- 2025
- Full Text
- View/download PDF
45. Boundary-guided multi-scale refinement network for camouflaged object detection
- Author
-
Ye, Qian, Li, Qingwu, Huo, Guanying, Liu, Yan, and Zhou, Yan
- Published
- 2025
- Full Text
- View/download PDF
46. An image fusion algorithm based on image clustering theory: An image fusion algorithm based on image clustering theory
- Author
-
Zhao, Liangjun, Wang, Yinqing, Hu, Yueming, Dai, Hui, Xi, Yubin, Ning, Feng, He, Zhongliang, Liang, Gang, and Zhang, Yuanyang
- Published
- 2024
- Full Text
- View/download PDF
47. A deep learning framework for HbA1c levels assessment using short-term continuous glucose monitoring data
- Author
-
Han, Bowen, Wang, Yaxin, Li, Hongru, Sun, Xiaoyu, Zhou, Jian, and Yu, Xia
- Published
- 2024
- Full Text
- View/download PDF
48. Gearbox Fault Diagnosis Based on MSCNN-LSTM-CBAM-SE.
- Author
-
He, Chao, Yasenjiang, Jarula, Lv, Luhui, Xu, Lihua, and Lan, Zhigang
- Subjects
- *
FAULT diagnosis , *GEARBOXES , *DIAGNOSIS methods - Abstract
Ensuring the safety of mechanical equipment, gearbox fault diagnosis is crucial for the stable operation of the whole system. However, existing diagnostic methods still have limitations, such as the analysis of single-scale features and insufficient recognition of global temporal dependencies. To address these issues, this article proposes a new method for gearbox fault diagnosis based on MSCNN-LSTM-CBAM-SE. The output of the CBAM-SE module is deeply integrated with the multi-scale features from MSCNN and the temporal features from LSTM, constructing a comprehensive feature representation that provides richer and more precise information for fault diagnosis. The effectiveness of this method has been validated with two sets of gearbox datasets and through ablation studies on this model. Experimental results show that the proposed model achieves excellent performance in terms of accuracy and F1 score, among other metrics. Finally, a comparison with other relevant fault diagnosis methods further verifies the advantages of the proposed model. This research offers a new solution for accurate fault diagnosis of gearboxes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. 基于改进的CenterNet 变电站设备 红外温度检测方法.
- Author
-
张佳钰, 蔡泽烽, and 冯杰
- Abstract
Copyright of Computer Measurement & Control is the property of Magazine Agency of Computer Measurement & Control and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
50. Enhancing Tea Leaf Disease Identification with Lightweight MobileNetV2.
- Author
-
Li, Zhilin, Li, Yuxin, Yan, Chunyu, Yan, Peng, Li, Xiutong, Yu, Mei, Wen, Tingchi, and Xie, Benliang
- Subjects
TEA plantations ,FEATURE extraction ,TREE diseases & pests ,COMPUTATIONAL complexity ,MULTISCALE modeling - Abstract
Diseases in tea trees can result in significant losses in both the quality and quantity of tea production. Regular monitoring can help to prevent the occurrence of large-scale diseases in tea plantations. However, existing methods face challenges such as a high number of parameters and low recognition accuracy, which hinders their application in tea plantation monitoring equipment. This paper presents a lightweight I-MobileNetV2 model for identifying diseases in tea leaves, to address these challenges. The proposed method first embeds a Coordinate Attention (CA) module into the original MobileNetV2 network, enabling the model to locate disease regions accurately. Secondly, a Multi-branch Parallel Convolution (MPC) module is employed to extract disease features across multiple scales, improving the model's adaptability to different disease scales. Finally, the AutoML for Model Compression (AMC) is used to compress the model and reduce computational complexity. Experimental results indicate that our proposed algorithm attains an average accuracy of 96.12% on our self-built tea leaf disease dataset, surpassing the original MobileNetV2 by 1.91%. Furthermore, the number of model parameters have been reduced by 40%, making it more suitable for practical application in tea plantation environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.