9,509 results on '"feature fusion"'
Search Results
2. SDF-Net: A Hybrid Detection Network for Mediastinal Lymph Node Detection on Contrast CT Images
- Author
-
Xiong, Jiuli, Mei, Lanzhuju, Liu, Jiameng, Shen, Dinggang, Xue, Zhong, Cao, Xiaohuan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Xu, Xuanang, editor, Cui, Zhiming, editor, Rekik, Islem, editor, Ouyang, Xi, editor, and Sun, Kaicong, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Online Signature Verification Based on Recurrent Attentional Time-Delay Neural Networks
- Author
-
Ablat, Xirali, Li, Qixiang, Yadikar, Nurbiya, Ubul, Kurban, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
4. Multimodal Finger Recognition Based on Feature Fusion Attention for Fingerprints, Finger-Veins, and Finger-Knuckle-Prints
- Author
-
Lai, Xinbo, Xue, Yimin, Tursun, Tayir, Yadikarl, Nurbiya, Ubul, Kurban, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
5. Dynamic Feature Fusion Based on Consistency and Complementarity of Brain Atlases
- Author
-
Lin, Qiye, Zhao, Jiaqi, Fan, Ruiwen, Zhou, Xuezhong, Xia, Jianan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
6. Performance Evaluation of Deep Learning and Transformer Models Using Multimodal Data for Breast Cancer Classification
- Author
-
Hussain, Sadam, Ali, Mansoor, Naseem, Usman, Bosques Palomo, Beatriz Alejandra, Monsivais Molina, Mario Alexis, Garza Abdala, Jorge Alberto, Avendano Avalos, Daly Betzabeth, Cardona-Huerta, Servando, Aaron Gulliver, T., Tamez Pena, Jose Gerardo, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ali, Sharib, editor, van der Sommen, Fons, editor, Papież, Bartłomiej Władysław, editor, Ghatwary, Noha, editor, Jin, Yueming, editor, and Kolenbrander, Iris, editor
- Published
- 2025
- Full Text
- View/download PDF
7. Ensemble Learning with Feature Fusion for Well-Overflow Detection
- Author
-
Cui, Ziliang, Liu, Li, Xiong, Yinzhou, Liu, Yinguo, Su, Yu, Man, Zhimin, Wang, Ye, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhang, Haijun, editor, Li, Xianxian, editor, Hao, Tianyong, editor, Meng, Weizhi, editor, Wu, Zhou, editor, and He, Qian, editor
- Published
- 2025
- Full Text
- View/download PDF
8. Rehabilitation Training Program Recommendation System Based on ALBERT-LDA Model
- Author
-
Zhu, Xiaozhuang, Xu, Qianqian, Gao, Nuo, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhang, Haijun, editor, Li, Xianxian, editor, Hao, Tianyong, editor, Meng, Weizhi, editor, Wu, Zhou, editor, and He, Qian, editor
- Published
- 2025
- Full Text
- View/download PDF
9. Automatic seismic first‐break picking based on multi‐view feature fusion network.
- Author
-
Wu, Yinghe, Pan, Shulin, Lan, Haiqiang, Badal, José, Wei, Ze, and Chen, Yaojie
- Subjects
- *
ARTIFICIAL intelligence , *WORK design , *ELECTRONIC data processing , *GENERALIZATION , *ALGORITHMS - Abstract
Automatic first‐break picking is a basic step in seismic data processing, so much so that the quality of the picking largely determines the effect of subsequent processing. To a certain extent, artificial intelligence technology has solved the shortcomings of traditional first‐break picking algorithms, such as poor applicability and low efficiency. However, some problems still remain for seismic data, with a low signal‐to‐noise ratio and large first‐break change leading to inaccurate picking and poor generalization of the network. In order to improve the accuracy of the automatic first‐break picking results of the above seismic data, we propose a multi‐view automatic first‐break picking method driven by multi‐network. First, we analysed the single‐trace boundary characteristics and the two‐dimensional boundary characteristics of the first break. Based on these two characteristics of the first break, we used the Long Short‐Term Memory and the ResNet attention gate UNet (resudual attention gate UNet) networks to extract the characteristics of the first arrival and its location from the seismic data, respectively. Then, we introduced the idea of multi‐network learning in the first‐break picking work and designed a feature fusion network. Finally, the multi‐view first‐break features extracted by the Long Short‐Term Memory and resudual attention gate UNet networks are fused, which effectively improves the picking accuracy. The results obtained after applying the method to field seismic data show that the accuracy of the first break detected by a feature fusion network is higher than that given by the above two networks alone and has good applicability and resistance to noise. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Transformer Connections: Improving Segmentation in Blurred Near‐Infrared Blood Vessel Image in Different Depth.
- Author
-
Wang, Jiazhe, Shimizu, Koichi, and Yoshie, Osamu
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models , *ARTIFICIAL intelligence , *DEEP learning , *IMAGE segmentation , *RETINAL blood vessels - Abstract
High‐fidelity segmentation of blood vessels plays a pivotal role in numerous biomedical applications, such as injection assistance, cancer detection, various surgeries, and vein authentication. Near‐infrared (NIR) transillumination imaging is an effective and safe method to visualize the subcutaneous blood vessel network. However, such images are severely blurred because of the light scattering in body tissues. Inspired by the Vision Transformer model, this paper proposes a novel deep learning network known as transformer connection (TRC)‐Unet to capture global blurred and local clear correlations while using multi‐layer attention. Our method mainly consists of two blocks, thereby aiming to remap skip connection information flow and fuse different domain features. Specifically, the TRC extracts global blurred information from multiple layers and suppresses scattering to increase the clarity of vessel features. Transformer feature fusion eliminates the domain gap between the highly semantic feature maps of the convolutional neural network backbone and the adaptive self‐attention maps of TRCs. Benefiting from the long‐range dependencies of transformers, we achieved competitive results in relation to various competing methods on different data sets, including retinal vessel segmentation, simulated blur image segmentation, and real NIR blood vessel image segmentation. Moreover, our method remarkably improved the segmentation results of simulated blur image data sets and a real NIR vessel image data set. The quantitative results of ablation studies and visualizations are also reported to demonstrate the superiority of the TRC‐Unet design. © 2024 The Author(s). IEEJ Transactions on Electrical and Electronic Engineering published by Institute of Electrical Engineers of Japan and Wiley Periodicals LLC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Transgaze: exploring plain vision transformers for gaze estimation.
- Author
-
Ye, Lang, Wang, Xinggang, Yao, Jingfeng, and Liu, Wenyu
- Abstract
Recently, plain vision transformers (ViTs) have shown impressive performance in various computer vision tasks due to their powerful modeling capabilities and large-scale pre-training. However, they have yet to show excellent results in gaze estimation tasks. In this paper, we take the advanced Vision Transformers further into the task of Gaze Estimation (TransGaze). Our framework adeptly integrates the distinctive local features of the eyes while maintaining a simple and flexible structure. It can seamlessly adapt to various large-scale pre-trained models, enhancing its versatility and applicability in different contexts. It first demonstrates the pre-trained ViTs could also show strong capabilities on gaze estimation tasks. Our approach employs the following strategies: (i) Enhancing the self-attention module among facial feature maps through straightforward token manipulation, effectively achieving complex feature fusion, a feat previously requiring more intricate methods; (ii) Leveraging the plain of TransGaze and the inherent adaptability of Plain ViT, we introduce a pre-trained model for gaze estimation. This model reduces training time by over 50 % and exhibits strong generalization performance. We evaluate our TransGaze on GazeCapture and MPIIFaceGaze datasets and achieve state-of-the-art performance with less training costs. Our models and codes will be available. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Underwater image object detection based on multi-scale feature fusion.
- Author
-
Yang, Chao, Zhang, Ce, Jiang, Longyu, and Zhang, Xinwen
- Subjects
- *
OBJECT recognition (Computer vision) , *DATA augmentation , *IMAGE fusion , *DEEP learning , *PROBLEM solving - Abstract
Underwater object detection and classification technology is one of the most important ways for humans to explore the oceans. However, existing methods are still insufficient in terms of accuracy and speed, and have poor detection performance for small objects such as fish. In this paper, we propose a multi-scale aggregation enhanced (MAE-FPN) object detection method based on the feature pyramid network, including the multi-scale convolutional calibration module (MCCM) and the feature calibration distribution module (FCDM). First, we design the MCCM module, which can adaptively extract feature information from objects at different scales. Then, we built the FCDM structure to make the multi-scale information fusion more appropriate and to alleviate the problem of missing features from small objects. Finally, we construct the Fish Segmentation and Detection (FSD) dataset by fusing multiple data augmentation methods, which enriches the data resources for underwater object detection and solves the problem of limited training resources for deep learning. We conduct experiments on FSD and public datasets, and the results show that the proposed MAE-FPN network significantly improves the detection performance of underwater objects, especially small objects. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Exploring the potential of Wav2vec 2.0 for speech emotion recognition using classifier combination and attention-based feature fusion.
- Author
-
Nasersharif, Babak and Namvarpour, Mohammad
- Subjects
- *
EMOTION recognition , *SPEECH , *FEED additives , *ATTENTION - Abstract
Self-supervised learning models, such as Wav2vec 2.0, extract efficient features for speech processing applications including speech emotion recognition. In this study, we propose a Dimension Reduction Module (DRM) to apply to the output of each transformer block in the Wav2vec 2.0 model. Our DRM consists of an attentive average pooling, a linear layer with a maxout activation function, and a linear layer that reduces the number of dimensions to the number of classes. Subsequently, we propose two methods, classifier combination and feature fusion, to generate the final decision using DRM outputs. In the Classifier Combination method, the outputs of each DRM are fed to a distinct Additive Angular Margin (AAM) softmax loss function. This constructs an individual classifier for each DRM. Then, the outputs of these classifiers are combined using five different statistical methods. In the Feature Fusion method, the outputs of the DRMs are concatenated, and an attention mechanism is applied to them. Then, the attended outputs are fed to an AAM-Softmax loss function, which is used for training all DRMs in addition to the attention mechanism. The proposed models have been evaluated on the EMODB, IEMOCAP, and ShEMO datasets. Our best method, the Attention-based Feature Fusion, has obtained unweighted accuracies of 94.80% on EMODB, 74.00% on IEMOCAP, and 80.60% on ShEMO, which are competitive with the best baseline methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. 基于改进Upernet的遥感影像语义分割算法.
- Author
-
蔡博锋, 周城, 熊承义, and 刘仁峰
- Subjects
REMOTE sensing ,FEATURE extraction ,IMAGE segmentation ,IMAGE processing ,IMAGE fusion - Abstract
Copyright of Journal of South-Central Minzu University (Natural Science Edition) is the property of Journal of South-Central Minzu University (Natural Science Edition) Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
15. Compact bilinear pooling and multi-loss network for social media multimodal classification.
- Author
-
Li, Yushi, Zheng, Xin, Zhu, Ming, Mei, Jie, Chen, Ziwen, and Tao, Yunfei
- Abstract
Social media platforms have seen an influx of multimodal data, leading to heightened attention on image-text multimodal classification. Existing methods for multimodal classification primarily focus on multimodal fusion from different modalities. However, owing to the heterogeneity and high-dimensionality of multimodal data, the fusion process frequently introduces redundant information and noise limiting the accuracy and generalization. To resolve the limitation, we propose a Compact Bilinear pooling and Multi-Loss network (CBMLNet). Compact bilinear pooling is used for feature fusion to learn low-dimensional and expressive multimodal representations efficiently. Furthermore, a multi-loss function is proposed to import the specific information carried by each single modality. Therefore, CBMLNet simultaneously considers the correlation between multimodality and the specificity of single modality for image-text classification. We evaluate the proposed CBMLNet on two publicly available datasets, Twitter-15 and Twitter-17, and on a private dataset, AIFUN. CBMLNet is compared with the advanced methods such as multimodal BERT with Max Pooling, Multi-Interactive Memory Network, Multi-level Multi-modal Cross-attention Network, Image-Text Correlation model (ITC), Target-oriented multimodal BERT and multimodal hierarchical attention model (MHA). Experimental results demonstrate that CBMLNet averagely improves F1_score by 0.28% and 0.44% compared with the best fine-grained baseline, MHA and the best coarse-grained baseline, ITC. It illustrates that CBMLNet is practical for real-world fuzzy applications as a coarse-grained model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Fast-SegNet: fast semantic segmentation network for small objects.
- Author
-
Zhang, Xuan, Xu, Guoping, Wu, Xinglong, Liao, Wentao, Xiao, Lifang, Jiang, Yan, and Xing, Hanshuo
- Subjects
DATA augmentation ,IMAGE analysis ,IMAGE segmentation ,DEEP learning ,DIAGNOSTIC imaging - Abstract
Semantic segmentation is a fundamental step in image understanding, playing a crucial role in the fields of automatic driving, medical image analysis, defect detection, etc. Despite significant progress in deep learning-based image segmentation, challenges in terms of accuracy and efficiency still exist, especially for small-scale objects. In this paper, we present a novel data augmentation method for small-scale objects in images, aiming to address the issue of class imbalance. Specifically, we extract small-scale objects from one image and then copy-scale-and-paste them to other images. Additionally, a novel multi-scale feature fusion module is proposed to effectively combine features from both deep and shallow neural network layers. Subsequently, the data augmentation method and multi-scale feature fusion module are utilized in the proposed Fast-SegNet architecture for semantic segmentation. Extensive experiments demonstrate that Fast-SegNet could improve segmentation performance, especially for small-scale objects with an acceptable computational cost. State-of-the-art performance has been achieved on CamVid, CityScapes, and MOST (Micro-optical sectioning tomography) datasets with respect to the tradeoff between accuracy and speed. Specifically, the CamVid dataset yields mean IoU (Intersection over Union) values of 45.7% and 38.6% for small-scale objects as Pedestrian and Bicyclist, respectively. The CityScapes dataset demonstrates mean IoU of 43.43% and 43.56% for small-scale objects as Traffic Light and Rider, respectively. The MOST dataset results in a segmentation mean IoU of 88.2% for vessels in the mouse brain. In conclusion, our approach achieves better results in terms of accuracy and efficiency on three datasets. Codes are available at https://github.com/apple1986/Fast-SegNet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Frequency‐Aware Facial Image Shadow Removal through Skin Color and Texture Learning.
- Author
-
Zhang, Ling, Xie, Wenyang, and Xiao, Chunxia
- Subjects
- *
IMAGE fusion , *DATA mining , *TEXTURE mapping , *FEATURE extraction , *HUMAN skin color , *LIGHTING - Abstract
Existing facial image shadow removal methods predominantly rely on pre‐extracted facial features. However, these methods often fail to capitalize on the full potential of these features, resorting to simplified utilization. Furthermore, they tend to overlook the importance of low‐frequency information during the extraction of prior features, which can be easily compromised by noises. In our work, we propose a frequency‐aware shadow removal network (FSRNet) for facial image shadow removal, which utilizes the skin color and texture information in the face to help recover illumination in shadow regions. Our FSRNet uses a frequency‐domain image decomposition network to extract the low‐frequency skin color map and high‐frequency texture map from the face images, and applies a color‐texture guided shadow removal network to produce final shadow removal result. Concretely, the designed fourier sparse attention block (FSABlock) can transform images from the spatial domain to the frequency domain and help the network focus on the key information. We also introduce a skin color fusion module (CFModule) and a texture fusion module (TFModule) to enhance the understanding and utilization of color and texture features, promoting high‐quality result without color distortion and detail blurring. Extensive experiments demonstrate the superiority of the proposed method. The code is available at https://github.com/laoxie521/FSRNet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Deep and Shallow Feature Fusion in Feature Score Level for Palmprint Recognition.
- Author
-
Wu, Yihang, Hu, Junlin, and Conti, Vincenzo
- Subjects
- *
FISHER discriminant analysis , *FEATURE extraction , *PRINCIPAL components analysis , *STATISTICAL correlation , *CUSTOMER experience , *PALMPRINT recognition - Abstract
Contactless palmprint recognition offers friendly customer experience due to its ability to operate without touching the recognition device under rigid constrained conditions. Recent palmprint recognition methods have shown promising accuracy; however, there still exist some issues that need to be further studied such as the limited discrimination of the single feature and how to effectively fuse deep features and shallow features. In this paper, deep features and shallow features are integrated into a unified framework using feature‐level and score‐level fusion methods. Specifically, deep feature is extracted by residual neural network (ResNet), and shallow features are extracted by principal component analysis (PCA), linear discriminant analysis (LDA), and competitive coding (CompCode). In feature‐level fusion stage, ResNet feature and PCA feature are dimensionally reduced and fused by canonical correlation analysis technique to achieve the fused feature for the next stage. In score‐level fusion stage, score information is embedded in the fused feature, LDA feature, and CompCode feature to obtain a more reliable and robust recognition performance. The proposed method achieves competitive performance on Tongji dataset and demonstrates more satisfying generalization capabilities on IITD and CASIA datasets. Comprehensive validation across three palmprint datasets confirms the effectiveness of our proposed deep and shallow feature fusion approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Prediction of benign and malignant ground glass pulmonary nodules based on multi-feature fusion of attention mechanism.
- Author
-
Heng Deng, Wenjun Huang, Xiuxiu Zhou, Taohu Zhou, Li Fan, and Shiyuan Liu
- Subjects
CONVOLUTIONAL neural networks ,COMPUTED tomography ,RADIOMICS ,DEEP learning ,SURGICAL pathology ,PULMONARY nodules - Abstract
Objectives: The purpose of this study was to develop and validate a new feature fusion algorithm to improve the classification performance of benign and malignant ground-glass nodules (GGNs) based on deep learning. Methods: We retrospectively collected 385 cases of GGNs confirmed by surgical pathology from three hospitals. We utilized 239 GGNs from Hospital 1 as the training and internal validation set, and 115 and 31 GGNs from Hospital 2 and Hospital 3, respectively, as external test sets 1 and 2. Among these GGNs, 172 were benign and 203 were malignant. First, we evaluated clinical and morphological features of GGNs at baseline chest CT and simultaneously extracted whole-lung radiomics features. Then, deep convolutional neural networks (CNNs) and backpropagation neural networks (BPNNs) were applied to extract deep features from whole-lung CT images, clinical, morphological features, and whole-lung radiomics features separately. Finally, we integrated these four types of deep features using an attention mechanism. Multiple metrics were employed to evaluate the predictive performance of the model. Results: The deep learning model integrating clinical, morphological, radiomics and whole lung CT image features with attention mechanism (CMRI-AM) achieved the best performance, with area under the curve (AUC) values of 0.941 (95% CI: 0.898-0.972), 0.861 (95% CI: 0.823-0.882), and 0.906 (95% CI: 0.878-0.932) on the internal validation set, external test set 1, and external test set 2, respectively. The AUC differences between the CMRI-AM model and other feature combination models were statistically significant in all three groups (all p<0.05). Conclusion: Our experimental results demonstrated that (1) applying attention mechanism to fuse whole-lung CT images, radiomics features, clinical, and morphological features is feasible, (2) clinical, morphological, and radiomics features provide supplementary information for the classification of benign and malignant GGNs based on CT images, and (3) utilizing baseline whole-lung CT features to predict the benign and malignant of GGNs is an effective method. Therefore, optimizing the fusion of baseline whole-lung CT features can effectively improve the classification performance of GGNs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Cross-scale information enhancement for object detection.
- Author
-
Li, Tie-jun and Zhao, Hui-feng
- Subjects
MULTISCALE modeling ,PROBLEM solving ,DETECTORS ,INFORMATION design - Abstract
Object detection usually adopts multi-scale fusion to enrich the information of the object, and the Feature Pyramid Network (FPN) is a common method for multi-scale fusion. However, traditional fusion methods such as FPN cause information loss when fusing high-level feature maps with low-level feature maps. To solve these problems, we propose a simple but effective cross-scale fusion method that fully uses the information of multi-scale feature maps. In addition, to better utilize the multi-scale contextual information, we designed the Selective Information Enhancement (SIE) module. The SIE dynamically selects information at more important scales for objects of different size and fuse the selected information with feature maps for information enhancement. Apply our method to Single Shot Multibox Detector (SSD) and propose a Cross-Scale Information Enhancement Single Shot Multibox Detector (CESSD). The CESSD improves the object detection capability of SSD models by fusing multi-scale features and selectively enhancing feature map information. To evaluate the effectiveness of the model, we validated it on the Pascal VOC2007 test set for 300 × 300 inputs, and the mean Average Precision (mAP) of CESSD reached 79.8%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. AFpoint: adaptively fusing local and global features for point cloud.
- Author
-
Li, Guangping, Liu, Chenghui, Gao, Xiang, Xiao, Huanling, and Ling, Bingo Wing-Kuen
- Subjects
POINT cloud ,FEATURE extraction ,NETWORK performance ,CLASSIFICATION - Abstract
Due to the sparseness and irregularity of the point cloud, accurate extraction of internal structural details from the point cloud as well as fast identification of the overall contour remains a challenging task. Currently, most studies focus on introducing sophisticated designs to unilaterally capture local or global features of point cloud, and rarely combine local features with global features. More importantly, it is easy to increase the computational burden while pursuing efficiency. We propose a lightweight feature extractor that efficiently extract and fuse local and global features of point cloud, which is named as AFpoint. Specifically, AFpoint is composed of two modules: the Local-Global Parallelized Feature Extraction module (LGP) and the Adaptive Feature Fusion module (AFF). The LGP module encodes local and global features in parallel by using point-by-point convolution and relative attention mechanism, respectively. It simultaneously performs the task that extracts the fine-grained structure and captures the global relationships. The AFF module adaptively selects and integrates the local and global features by estimating the attention maps of encoded features and helps the model to autonomously focus on important regions. Note that AFpoint is a plug-and-play and universal module. We use AFpoint to construct classification and segmentation networks of point cloud, which greatly improves the accuracy and robustness of the baseline model and reduces the parameters by nearly half. Experiments on the widely adopted ModelNet40, ScanObjectNN classification dataset demonstrate the state-of-the-art performance of our network and also show good results on experiments with the ShapeNetPart part segmentation dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Feasibility study of opportunistic osteoporosis screening on chest CT using a multi-feature fusion DCNN model.
- Author
-
Pan, Jing, Lin, Peng-cheng, Gong, Shen-chu, Wang, Ze, Cao, Rui, Lv, Yuan, Zhang, Kun, and Wang, Lin
- Abstract
Summary: A multi-feature fusion DCNN model for automated evaluation of lumbar vertebrae L1 on chest combined with clinical information and radiomics permits estimation of volumetric bone mineral density for evaluation of osteoporosis. Purpose: To develop a multi-feature deep learning model based on chest CT, combined with clinical information and radiomics to explore the feasibility in screening for osteoporosis based on estimation of volumetric bone mineral density. Methods: The chest CT images of 1048 health check subjects were retrospectively collected as the master dataset, and the images of 637 subjects obtained from a different CT scanner were used for the external validation cohort. The subjects were divided into three categories according to the quantitative CT (QCT) examination, namely, normal group, osteopenia group, and osteoporosis group. Firstly, a deep learning–based segmentation model was constructed. Then, classification models were established and selected, and then, an optimal model to build bone density value prediction regression model was chosen. Results: The DSC value was 0.951 ± 0.030 in the testing dataset and 0.947 ± 0.060 in the external validation cohort. The multi-feature fusion model based on the lumbar 1 vertebra had the best performance in the diagnosis. The area under the curve (AUC) of diagnosing normal, osteopenia, and osteoporosis was 0.992, 0.973, and 0.989. The mean absolute errors (MAEs) of the bone density prediction regression model in the test set and external testing dataset are 8.20 mg/cm
3 and 9.23 mg/cm3 , respectively, and the root mean square errors (RMSEs) are 10.25 mg/cm3 and 11.91 mg/cm3 , respectively. The R-squared values are 0.942 and 0.923, respectively. The Pearson correlation coefficients are 0.972 and 0.965. Conclusion: The multi-feature fusion DCNN model based on only the lumbar 1 vertebrae and clinical variables can perform bone density three-classification diagnosis and estimate volumetric bone mineral density. If confirmed in independent populations, this automated opportunistic chest CT evaluation can help clinical screening of large-sample populations to identify subjects at high risk of osteoporotic fracture. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
23. Identification of drug use degree by integrating multi-modal features with dual-input deep learning method.
- Author
-
Zhou, Yuxing, Gu, Xuelin, Wang, Zhen, and Li, Xiaoou
- Abstract
AbstractMost of studies on drug use degree are based on subjective judgments without objective quantitative assessment, in this paper, a dual-input bimodal fusion algorithm is proposed to study drug use degree by using electroencephalogram (EEG) and near-infrared spectroscopy (NIRS). Firstly, this paper uses the optimized dual-input multi-modal TiCBnet for extracting the deep encoding features of the bimodal signal, then fuses and screens the features using different methods, and finally fused deep encoding features are classified. The classification accuracy of bimodal is found to be higher than that of single modal, and the classification accuracy is up to 89.9%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Vehicle Localization Method in Complex SAR Images Based on Feature Reconstruction and Aggregation.
- Author
-
Han, Jinwei, Kang, Lihong, Tian, Jing, Jiang, Mingyong, and Guo, Ningbo
- Abstract
Due to the small size of vehicle targets, complex background environments, and the discrete scattering characteristics of high-resolution synthetic aperture radar (SAR) images, existing deep learning networks face challenges in extracting high-quality vehicle features from SAR images, which impacts vehicle localization accuracy. To address this issue, this paper proposes a vehicle localization method for SAR images based on feature reconstruction and aggregation with rotating boxes. Specifically, our method first employs a backbone network that integrates the space-channel reconfiguration module (SCRM), which contains spatial and channel attention mechanisms specifically designed for SAR images to extract features. The network then connects a progressive cross-fusion mechanism (PCFM) that effectively combines multi-view features from different feature layers, enhancing the information content of feature maps and improving feature representation quality. Finally, these features containing a large receptive field region and enhanced rich contextual information are input into a rotating box vehicle detection head, which effectively reduces false alarms and missed detections. Experiments on a complex scene SAR image vehicle dataset demonstrate that the proposed method significantly improves vehicle localization accuracy. Our method achieves state-of-the-art performance, which demonstrates the superiority and effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. A Depth Awareness and Learnable Feature Fusion Network for Enhanced Geometric Perception in Semantic Correspondence.
- Author
-
Li, Fazeng, Zou, Chunlong, Yun, Juntong, Huang, Li, Liu, Ying, Tao, Bo, and Xie, Yuanmin
- Abstract
Deep learning is becoming the most widely used technology for multi-sensor data fusion. Semantic correspondence has recently emerged as a foundational task, enabling a range of downstream applications, such as style or appearance transfer, robot manipulation, and pose estimation, through its ability to provide robust correspondence in RGB images with semantic information. However, current representations generated by self-supervised learning and generative models are often limited in their ability to capture and understand the geometric structure of objects, which is significant for matching the correct details in applications of semantic correspondence. Furthermore, efficiently fusing these two types of features presents an interesting challenge. Achieving harmonious integration of these features is crucial for improving the expressive power of models in various tasks. To tackle these issues, our key idea is to integrate depth information from depth estimation or depth sensors into feature maps and leverage learnable weights for feature fusion. First, depth information is used to model pixel-wise depth distributions, assigning relative depth weights to feature maps for perceiving an object's structural information. Then, based on a contrastive learning optimization objective, a series of weights are optimized to leverage feature maps from self-supervised learning and generative models. Depth features are naturally embedded into feature maps, guiding the network to learn geometric structure information about objects and alleviating depth ambiguity issues. Experiments on the SPair-71K and AP-10K datasets show that the proposed method achieves scores of 81.8 and 83.3 on the percentage of correct keypoints (PCK) at the 0.1 level, respectively. Our approach not only demonstrates significant advantages in experimental results but also introduces the depth awareness module and a learnable feature fusion module, which enhances the understanding of object structures through depth information and fully utilizes features from various pre-trained models, offering new possibilities for the application of deep learning in RGB and depth data fusion technologies. We will also continue to focus on accelerating model inference and optimizing model lightweighting, enabling our model to operate at a faster speed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. DenseFusion-DA2: End-to-End Pose-Estimation Network Based on RGB-D Sensors and Multi-Channel Attention Mechanisms.
- Author
-
Li, Hanqi, Wan, Guoyang, Li, Xuna, Wang, Chengwen, Zhang, Hong, and Liu, Bingyou
- Abstract
Notably, 6D pose estimation is a critical technology that enables robotics to perceive and interact with their operational environment. However, occlusion causes a loss of local features, which, in turn, restricts the estimation accuracy. To address these challenges, this paper proposes an end-to-end pose-estimation network based on a multi-channel attention mechanism, DA2Net. Firstly, a multi-channel attention mechanism, designated as "DA2Net", was devised using A2-Nets as its foundation. This mechanism is constructed in two steps. In the first step, the essential characteristics are extracted from the global feature space through the second-order attention pool. In the second step, a feature map is generated by the integration of position and channel attention. Subsequently, the extracted key features are assigned to each position of the feature map, enhancing both the feature representation capacity and the overall performance. Secondly, the designed attention mechanism is introduced into both the feature fusion and pose iterative refinement networks to enhance the network's capacity to acquire local features thus improving its overall performance. The experimental results demonstrated that the estimation accuracy of DenseFusion-DA2 on the LineMOD dataset was approximately 3.4% higher than that of DenseFusion. Furthermore, the estimation accuracy surpassed that of PoseCNN, PVNet, SSD6D, and PointFusion by 8.3%, 11.1%, 20.3%, and 23.8%, respectively. The estimation accuracy also shows a significant advantage on the Occluded LineMOD and HR-Vision datasets. This research not only presents a more efficient solution for robot perception but also introduces novel ideas and methods for technological advancements and applications in related fields. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. A Novel Detection Transformer Framework for Ship Detection in Synthetic Aperture Radar Imagery Using Advanced Feature Fusion and Polarimetric Techniques.
- Author
-
Ahmed, Mahmoud, El-Sheimy, Naser, and Leung, Henry
- Abstract
Ship detection in synthetic aperture radar (SAR) imagery faces significant challenges due to the limitations of traditional methods, such as convolutional neural network (CNN) and anchor-based matching approaches, which struggle with accurately detecting smaller targets as well as adapting to varying environmental conditions. These methods, relying on either intensity values or single-target characteristics, often fail to enhance the signal-to-clutter ratio (SCR) and are prone to false detections due to environmental factors. To address these issues, a novel framework is introduced that leverages the detection transformer (DETR) model along with advanced feature fusion techniques to enhance ship detection. This feature enhancement DETR (FEDETR) module manages clutter and improves feature extraction through preprocessing techniques such as filtering, denoising, and applying maximum and median pooling with various kernel sizes. Furthermore, it combines metrics like the line spread function (LSF), peak signal-to-noise ratio (PSNR), and F1 score to predict optimal pooling configurations and thus enhance edge sharpness, image fidelity, and detection accuracy. Complementing this, the weighted feature fusion (WFF) module integrates polarimetric SAR (PolSAR) methods such as Pauli decomposition, coherence matrix analysis, and feature volume and helix scattering (Fvh) components decomposition, along with FEDETR attention maps, to provide detailed radar scattering insights that enhance ship response characterization. Finally, by integrating wave polarization properties, the ability to distinguish and characterize targets is augmented, thereby improving SCR and facilitating the detection of weakly scattered targets in SAR imagery. Overall, this new framework significantly boosts DETR's performance, offering a robust solution for maritime surveillance and security. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Dual-Feature Fusion Learning: An Acoustic Signal Recognition Method for Marine Mammals.
- Author
-
Lü, Zhichao, Shi, Yaqian, Lü, Liangang, Han, Dongyue, Wang, Zhengkai, and Yu, Fei
- Abstract
Marine mammal acoustic signal recognition is a key technology for species conservation and ecological environment monitoring. Aiming at the complex and changing marine environment, and because the traditional recognition method based on a single feature input has the problems of poor environmental adaptability and low recognition accuracy, this paper proposes a dual-feature fusion learning method. First, dual-domain feature extraction is performed on marine mammal acoustic signals to overcome the limitations of single feature input methods by interacting feature information between the time-frequency domain and the Delay-Doppler domain. Second, this paper constructs a dual-feature fusion learning target recognition model, which improves the generalization ability and robustness of mammal acoustic signal recognition in complex marine environments. Finally, the feasibility and effectiveness of the dual-feature fusion learning target recognition model are verified in this study by using the acoustic datasets of three marine mammals, namely, the Fraser's Dolphin, the Spinner Dolphin, and the Long-Finned Pilot Whale. The dual-feature fusion learning target recognition model improved the accuracy of the training set by 3% to 6% and 20% to 23%, and the accuracy of the test set by 1% to 3% and 25% to 38%, respectively, compared to the model that used the time-frequency domain features and the Delay-Doppler domain features alone for recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Unsupervised Multi-Scale Hybrid Feature Extraction Network for Semantic Segmentation of High-Resolution Remote Sensing Images.
- Author
-
Song, Wanying, Nie, Fangxin, Wang, Chi, Jiang, Yinyin, and Wu, Yan
- Abstract
Generating pixel-level annotations for semantic segmentation tasks of high-resolution remote sensing images is both time-consuming and labor-intensive, which has led to increased interest in unsupervised methods. Therefore, in this paper, we propose an unsupervised multi-scale hybrid feature extraction network based on the CNN-Transformer architecture, referred to as MSHFE-Net. The MSHFE-Net consists of three main modules: a Multi-Scale Pixel-Guided CNN Encoder, a Multi-Scale Aggregation Transformer Encoder, and a Parallel Attention Fusion Module. The Multi-Scale Pixel-Guided CNN Encoder is designed for multi-scale, fine-grained feature extraction in unsupervised tasks, efficiently recovering local spatial information in images. Meanwhile, the Multi-Scale Aggregation Transformer Encoder introduces a multi-scale aggregation module, which further enhances the unsupervised acquisition of multi-scale contextual information, obtaining global features with stronger feature representation. The Parallel Attention Fusion Module employs an attention mechanism to fuse global and local features in both channel and spatial dimensions in parallel, enriching the semantic relations extracted during unsupervised training and improving the performance of unsupervised semantic segmentation. K-means clustering is then performed on the fused features to achieve high-precision unsupervised semantic segmentation. Experiments with MSHFE-Net on the Potsdam and Vaihingen datasets demonstrate its effectiveness in significantly improving the accuracy of unsupervised semantic segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Deep and shallow feature fusion framework for remote sensing open pit coal mine scene recognition.
- Author
-
Liu, Yang and Zhang, Jin
- Abstract
Understanding land use and damage in open-pit coal mining areas is crucial for effective scientific oversight and management. Current recognition methods exhibit limitations: traditional approaches depend on manually designed features, which offer limited expressiveness, whereas deep learning techniques are heavily reliant on sample data. In order to overcome the aforementioned limitations, a three-branch feature extraction framework was proposed in the present study. The proposed framework effectively fuses deep features (DF) and shallow features (SF), and can accomplish scene recognition tasks with high accuracy and fewer samples. Deep features are enhanced through a neighbouring feature attention module and a Graph Convolutional Network (GCN) module, which capture both neighbouring features and the correlation between local scene information. Shallow features are extracted using the Gray-Level Co-occurrence Matrix (GLCM) and Gabor filters, which respectively capture local and overall texture variations. Evaluation results on the AID and RSSCN7 datasets demonstrate that the proposed deep feature extraction model achieved classification accuracies of 97.53% and 96.73%, respectively, indicating superior performance in deep feature extraction tasks. Finally, the two kinds of features were fused and input into the particle swarm algorithm optimised support vector machine (PSO-SVM) to classify the scenes of remote sensing images, and the classification accuracy reached 92.78%, outperforming four other classification methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. DPF-YOLOv8: Dual Path Feature Fusion Network for Traffic Sign Detection in Hazy Weather.
- Author
-
Zhang, Yuechong, Liu, Haiying, Dong, Dehao, Duan, Xuehu, Lin, Fei, and Liu, Zengxiao
- Abstract
Traffic sign detection plays an integral role in intelligent driving systems. It was found that in real driving scenarios, traffic signs were easily obscured by haze leading to traffic sign detection inaccuracy in assisted driving systems. Therefore, we designed a traffic sign detection model for hazy weather that can effectively help drivers to recognize road signs and reduce the incidence of traffic accidents. A high-precision traffic sign detection network has been designed to address the problem of decreased model recognition performance caused by external factors such as small size of traffic signs and haze obstruction in real-world scenarios. First, the default YOLOv8 was found to have low model detection accuracy in hazy weather occlusion conditions through experimental studies. Therefore, a deeper lightweight and efficient multi-branch CSP (Cross Stage Partial) module was introduced. Second, a dual path feature fusion network was designed to address the problem of insufficient feature fusion due to the small size of traffic signs. Finally, in order to be able to better simulate the real haze weather scene, we added fog to the raw data to enrich the data samples. This was verified through experiments on a public Chinese traffic sign detection dataset after fogging treatment, compared to the default YOLOv8 model. The improved DPF-YOLOv8 algorithm achieved 2.1% and 2.2% improvement in mAP@0.5 and mAP@0.5:0.95 performance metrics to 65.0% and 47.4%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Microexpression Recognition Method Based on ADP-DSTN Feature Fusion and Convolutional Block Attention Module.
- Author
-
Song, Junfang, Lei, Shanzhong, and Wu, Wenzhe
- Abstract
Microexpressions are subtle facial movements that occur within an extremely brief time frame, often revealing suppressed emotions. These expressions hold significant importance across various fields, including security monitoring and human–computer interaction. However, the accuracy of microexpression recognition is severely constrained by the inherent characteristics of these expressions. To address the issue of low detection accuracy regarding the subtle features present in microexpressions' facial action units, this paper proposes a microexpression action unit detection algorithm, Attention-embedded Dual Path and Shallow Three-stream Networks (ADP-DSTN), that incorporates an attention-embedded dual path and a shallow three-stream network. First, an attention mechanism was embedded after each Bottleneck layer in the foundational Dual Path Networks to extract static features representing subtle texture variations that have significant weights in the action units. Subsequently, a shallow three-stream 3D convolutional neural network was employed to extract optical flow features that were particularly sensitive to temporal and discriminative characteristics specific to microexpression action units. Finally, the acquired static facial feature vectors and optical flow feature vectors were concatenated to form a fused feature vector that encompassed more effective information for recognition. Each facial action unit was then trained individually to address the issue of weak correlations among the facial action units, thereby facilitating the classification of microexpression emotions. The experimental results demonstrated that the proposed method achieved great performance across several microexpression datasets. The unweighted average recall (UAR) values were 80.71%, 89.55%, 44.64%, 80.59%, and 88.32% for the SAMM, CASME II, CAS(ME)
3 , SMIC, and MEGC2019 datasets, respectively. The unweighted F1 scores (UF1) were 79.32%, 88.30%, 43.03%, 81.12%, and 88.95%, respectively. Furthermore, when compared to the benchmark model, our proposed model achieved better performance with lower computational complexity, characterized by a Floating Point Operations (FLOPs) value of 1087.350 M and a total of 6.356 × 106 model parameters. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
33. An Efficient UAV Image Object Detection Algorithm Based on Global Attention and Multi-Scale Feature Fusion.
- Author
-
Qian, Rui and Ding, Yong
- Abstract
Object detection technology holds significant promise in unmanned aerial vehicle (UAV) applications. However, traditional methods face challenges in detecting denser, smaller, and more complex targets within UAV aerial images. To address issues such as target occlusion and dense small objects, this paper proposes a multi-scale object detection algorithm based on YOLOv5s. A novel feature extraction module, DCNCSPELAN4, which combines CSPNet and ELAN, is introduced to enhance the receptive field of feature extraction while maintaining network efficiency. Additionally, a lightweight Vision Transformer module, the CloFormer Block, is integrated to provide the network with a global receptive field. Moreover, the algorithm incorporates a three-scale feature fusion (TFE) module and a scale sequence feature fusion (SSFF) module in the neck network to effectively leverage multi-scale spatial information across different feature maps. To address dense small objects, an additional small object detection head was added to the detection layer. The original large object detection head was removed to reduce computational load. The proposed algorithm has been evaluated through ablation experiments and compared with other state-of-the-art methods on the VisDrone2019 and AU-AIR datasets. The results demonstrate that our algorithm outperforms other baseline methods in terms of both accuracy and speed. Compared to the YOLOv5s baseline model, the enhanced algorithm achieves improvements of 12.4% and 8.4% in AP
50 and AP metrics, respectively, with only a marginal parameter increase of 0.3 M. These experiments validate the effectiveness of our algorithm for object detection in drone imagery. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
34. Dual-Branch Dynamic Object Segmentation Network Based on Spatio-Temporal Information Fusion.
- Author
-
Huang, Fei, Wang, Zhiwen, Zheng, Yu, Wang, Qi, Hao, Bingsen, and Xiang, Yangkai
- Abstract
To address the issue of low accuracy in the segmentation of dynamic objects using semantic segmentation networks, a dual-branch dynamic object segmentation network has been proposed, which is based on the fusion of spatiotemporal information. First, an appearance–motion feature fusion module is designed, which characterizes the motion information of objects by introducing a residual graph. This module combines a co-attention mechanism and a motion correction method to enhance the extraction of appearance features for dynamic objects. Furthermore, to mitigate boundary blurring and misclassification issues when 2D semantic information is projected back into 3D point clouds, a majority voting strategy based on time-series point cloud information has been proposed. This approach aims to overcome the limitations of post-processing in single-frame point clouds. By doing this, this method can significantly enhance the accuracy of segmenting moving objects in practical scenarios. Test results from the semantic KITTI public dataset demonstrate that our improved method outperforms mainstream dynamic object segmentation networks like LMNet and MotionSeg3D. Specifically, it achieves an Intersection over Union (IoU) of 72.19%, representing an improvement of 9.68% and 4.86% compared to LMNet and MotionSeg3D, respectively. The proposed method, with its precise algorithm, has practical applications in autonomous driving perception. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. YOLO-ESL: An Enhanced Pedestrian Recognition Network Based on YOLO.
- Author
-
Wang, Feilong, Yang, Xiaobing, and Wei, Juan
- Abstract
Pedestrian detection is a critical task in computer vision; however, mainstream algorithms often struggle to achieve high detection accuracy in complex scenarios, particularly due to target occlusion and the presence of small objects. This paper introduces a novel pedestrian detection algorithm, YOLO-ESL, based on the YOLOv7 framework. YOLO-ESL integrates the ELAN-SA module, designed to enhance feature extraction, with the LGA module, which improves feature fusion. The ELAN-SA module optimizes the flexibility and efficiency of small object feature extraction, while the LGA module effectively integrates multi-scale features through local and global attention mechanisms. Additionally, the CIOUNMS algorithm addresses the issue of target loss in cases of high overlap, improving boundary box filtering. Evaluated on the VOC2012 pedestrian dataset, YOLO-ESL achieved an accuracy of 93.7%, surpassing the baseline model by 3.0%. Compared to existing methods, this model not only demonstrates strong performance in handling occluded and small object detection but also remarkable robustness and efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Lightweight Multi-Domain Fusion Model for Through-Wall Human Activity Recognition Using IR-UWB Radar.
- Author
-
Huang, Ling, Lei, Dong, Zheng, Bowen, Chen, Guiping, An, Huifeng, and Li, Mingxuan
- Abstract
Impulse radio ultra-wideband (IR-UWB) radar, operating in the low-frequency band, can penetrate walls and utilize its high range resolution to recognize different human activities. Complex deep neural networks have demonstrated significant performance advantages in classifying radar spectrograms of various actions, but at the cost of a substantial computational overhead. In response, this paper proposes a lightweight model named TG2-CAFNet. First, clutter suppression and time–frequency analysis are used to obtain range–time and micro-Doppler feature maps of human activities. Then, leveraging GhostV2 convolution, a lightweight feature extraction module, TG2, suitable for radar spectrograms is constructed. Using a parallel structure, the features of the two spectrograms are extracted separately. Finally, to further explore the correlation between the two spectrograms and enhance the feature representation capabilities, an improved nonlinear fusion method called coordinate attention fusion (CAF) is proposed based on attention feature fusion (AFF). This method extends the adaptive weighting fusion of AFF to a spatial distribution, effectively capturing the subtle spatial relationships between the two radar spectrograms. Experiments showed that the proposed method achieved a high degree of model lightweightness, while also achieving a recognition accuracy of 99.1%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. YOLO-ACT: an adaptive cross-layer integration method for apple leaf disease detection.
- Author
-
Silu Zhang, Jingzhe Wang, Kai Yang, and Minglei Guan
- Subjects
LEAF spots ,POWDERY mildew diseases ,ALTERNARIA ,CROPS ,FROGS - Abstract
Apple is a significant economic crop in China, and leaf diseases represent a major challenge to its growth and yield. To enhance the efficiency of disease detection, this paper proposes an Adaptive Cross-layer Integration Method for apple leaf disease detection. This approach, built upon the YOLOv8s architecture, incorporates three novel modules specifically designed to improve detection accuracy and mitigate the impact of environmental factors. Furthermore, the proposed method addresses challenges arising from large feature discrepancies and similar disease characteristics, ultimately improving the model's overall detection performance. Experimental results show that the proposed method achieves a mean Average Precision (mAP) of 85.1% for apple leaf disease detection, outperforming the latest state-of-the-art YOLOv10s model by 2.2%. Compared to the baseline, the method yields a 2.8% increase in mAP, with improvements of 5.1%, 3.3%, and 2% in Average Precision, Recall, and mAP50-95, respectively. This method demonstrates superiority over other classic detection algorithms. Notably, the model exhibits optimal performance in detecting Alternaria leaf spot, frog eye leaf spot, gray spot, powdery mildew, and rust, achieving mAPs of 84.3%, 90.4%, 80.8%, 75.7%, and 92.0%, respectively. These results highlight the model's ability to significantly reduce false negatives and false positives, thereby enhancing both detection and localization of diseases. This research offers a new theoretical foundation and direction for future advancements in apple leaf disease detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Revolutionizing tomato disease detection in complex environments.
- Author
-
Diye Xin and Tianqi Li
- Subjects
AGRICULTURE ,COMPUTATIONAL complexity ,SPINE ,PYRAMIDS ,ALGORITHMS - Abstract
In the current agricultural landscape, a significant portion of tomato plants suffer from leaf diseases, posing a major challenge to manual detection due to the task's extensive scope. Existing detection algorithms struggle to balance speed with accuracy, especially when identifying small-scale leaf diseases across diverse settings. Addressing this need, this study presents FCHF-DETR (Faster-Cascaded-attention-High-feature-fusion-Focaler Detection-Transformer), an innovative, high-precision, and lightweight detection algorithm based on RTDETR-R18 (Real-Time-Detection-Transformer-ResNet18). The algorithm was developed using a carefully curated dataset of 3147 RGB images, showcasing tomato leaf diseases across a range of scenes and resolutions. FasterNet replaces ResNet18 in the algorithm's backbone network, aimed at reducing the model's size and improving memory efficiency. Additionally, replacing the conventional AIFI (Attention-based Intra-scale Feature Interaction) module with Cascaded Group Attention and the original CCFM (CNN-based Cross-scale Feature-fusion Module) module with HSFPN (High-Level Screening-feature Fusion Pyramid Networks) in the Efficient Hybrid Encoder significantly enhanced detection accuracy without greatly affecting efficiency. To tackle the challenge of identifying challenging samples, the Focaler-CIoU loss function was incorporated, refining the model's performance throughout the dataset. Empirical results show that FCHF-DETR achieved 96.4% Precision, 96.7% Recall, 89.1% mAP (Mean Average Precision) 50-95 and 97.2% mAP50 on the test set, with a reduction of 9.2G in FLOPs (floating point of operations) and 3.6M in parameters. These findings clearly demonstrate that the proposed method improves detection accuracy and reduces computational complexity, addressing the dual challenges of precision and efficiency in tomato leaf disease detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. A novel embedded kernel CNN-PCFF algorithm for breast cancer pathological image classification.
- Author
-
Liu, Wenbo, Liang, Shengnan, and Qin, Xiwen
- Subjects
- *
IMAGE recognition (Computer vision) , *PRINCIPAL components analysis , *BREAST cancer , *KERNEL functions , *TUMOR classification , *DEEP learning - Abstract
Early screening of breast cancer through image recognition technology can significantly increase the survival rate of patients. Therefore, breast cancer pathological image is of great significance for medical diagnosis and clinical research. In recent years, numerous deep learning models have been applied to breast cancer image classification, with deep CNN being a typical representative. Due to the use of multi-depth small convolutional kernels in mainstream CNN architectures such as VGG and Inception, the obtained image features often have high dimensionality. Although high dimensionality can bring more fine-grained features, it also increases the computational complexity of subsequent classifiers and may even lead to the curse of dimensionality and overfitting. To address these issues, a novel embedded kernel CNN principal component feature fusion (CNN-PCFF) algorithm is proposed. The constructed kernel function is embedded in the principal component analysis to form the multi-kernel principal component. Multi-kernel principal component analysis is used to fuse the high dimensional features obtained from the convolution base into some representative comprehensive variables, which are called kernel principal components, so as to achieve the purpose of dimensionality reduction. Any type of classifier can be added based on multi-kernel principal components. Through experimental analysis on two public breast cancer image datasets, the results show that the proposed algorithm can improve the performance of the current mainstream CNN architecture and subsequent classifiers. Therefore, the proposed algorithm in this paper is an effective tool for the classification of breast cancer pathological images. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. A full-scale lung image segmentation algorithm based on hybrid skip connection and attention mechanism.
- Author
-
Zhang, Qiong, Min, Byungwon, Hang, Yiliu, Chen, Hao, and Qiu, Jianlin
- Subjects
- *
IMAGE segmentation , *ALGORITHMS , *PIXELS , *X-rays , *LUNGS - Abstract
The segmentation accuracy of the lung images is affected by the occlusion of the front background objects. To address this problem, we propose a full-scale lung image segmentation algorithm based on hybrid skip connection and attention mechanism (HAFS). The algorithm uses yolov8 as the underlying network and enhancement of multi-layer feature fusion by incorporating dense and sparse skip connections into the network structure, and increased weighting of important features through attention gates. Finally the proposed algorithm was applied to the lung datasets Montgomery County chest X-ray and Shenzhen chest X-ray. The experimental results show that the proposed algorithm improves the precision, recall, pixel accuracy, Dice, mIoU, mAP and GFLOPs metrics compared to the comparison algorithms, which proves the advancement and effectiveness of the proposed algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. 多任务特征融合的CenterNet 运动车辆检测方法.
- Author
-
李晓晗, 刘石坚, 邹峥, and 戴宇晨
- Abstract
Motion vehicle detection based on deep learning technology is currently a research hotspot in the intersection of traffic and computer science. To address challenges in dynamic vehicle detection tasks, such as multi-scale issues, overlapping targets, and the difficulty of distinguishing between dynamic and static vehicles, this paper proposes a multi-task feature fusion approach for CenterNet motion vehicle detection. Firstly, a task branch for vehicle segmentation is added to the network, forming a dualstream mechanism along with the original object detection stream. Subsequently, an appropriate method is employed to achieve feature fusion between the two streams, assisting in enhancing critical feature information in the object detection stream. Additionally, the introduction of attention mechanisms further optimizes model accuracy. On a test set created based on the UA-DETRAC public dataset, our proposed method achieves an average precision of 70%, representing a 5.8% improvement compared to the original CenterNet model. With a frame rate of 30 frames per second, our method demonstrates the best balance between speed and accuracy compared to the contrastive methods. Extensive experiments indicate that our approach performs well in motion vehicle detection tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
42. Estimation of Cotton SPAD Based on Multi-Source Feature Fusion and Voting Regression Ensemble Learning in Intercropping Pattern of Cotton and Soybean.
- Author
-
Wang, Xiaoli, Li, Jingqian, Zhang, Junqiang, Yang, Lei, Cui, Wenhao, Han, Xiaowei, Qin, Dulin, Han, Guotao, Zhou, Qi, Wang, Zesheng, Zhao, Jing, and Lan, Yubin
- Abstract
The accurate estimation of soil plant analytical development (SPAD) values in cotton under various intercropping patterns with soybean is crucial for monitoring cotton growth and determining a suitable intercropping pattern. In this study, we utilized an unmanned aerial vehicle (UAV) to capture visible (RGB) and multispectral (MS) data of cotton at the bud stage, early flowering stage, and full flowering stage in a cotton–soybean intercropping pattern in the Yellow River Delta region of China, and we used SPAD502 Plus and tapeline to collect SPAD and cotton plant height (CH) data of the cotton canopy, respectively. We analyzed the differences in cotton SPAD and CH under different intercropping ratio patterns. It was conducted using Pearson correlation analysis between the RGB features, MS features, and cotton SPAD, then the recursive feature elimination (RFE) method was employed to select image features. Seven feature sets including MS features (five vegetation indices + five texture features), RGB features (five vegetation indices + cotton cover), and CH, as well as combinations of these three types of features with each other, were established. Voting regression (VR) ensemble learning was proposed for estimating cotton SPAD and compared with the performances of three models: random forest regression (RFR), gradient boosting regression (GBR), and support vector regression (SVR). The optimal model was then used to estimate and visualize cotton SPAD under different intercropping patterns. The results were as follows: (1) There was little difference in the mean value of SPAD or CH under different intercropping patterns; a significant positive correlation existed between CH and SPAD throughout the entire growth period. (2) All VR models were optimal when each of the seven feature sets were used as input. When the features set was MS + RGB, the determination coefficient (R2) of the validation set of the VR model was 0.902, the root mean square error (RMSE) was 1.599, and the relative prediction deviation (RPD) was 3.24. (3) When the features set was CH + MS + RGB, the accuracy of the VR model was further improved, compared with the feature set MS + RGB, the R2 and RPD were increased by 1.55% and 8.95%, respectively, and the RMSE was decreased by 7.38%. (4) In the intercropping of cotton and soybean, cotton growing under 4:6 planting patterns was better. The results can provide a reference for the selection of intercropping patterns and the estimation of cotton SPAD. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. A Training-Free Latent Diffusion Style Transfer Method.
- Author
-
Xiang, Zhengtao, Wan, Xing, Xu, Libo, Yu, Xin, and Mao, Yuhan
- Abstract
Diffusion models have attracted considerable scholarly interest for their outstanding performance in generative tasks. However, current style transfer techniques based on diffusion models still rely on fine-tuning during the inference phase to optimize the generated results. This approach is not merely laborious and resource-demanding but also fails to fully harness the creative potential of expansive diffusion models. To overcome this limitation, this paper introduces an innovative solution that utilizes a pretrained diffusion model, thereby obviating the necessity for additional training steps. The scheme proposes a Feature Normalization Mapping Module with Cross-Attention Mechanism (INN-FMM) based on the dual-path diffusion model. This module employs soft attention to extract style features and integrate them with content features. Additionally, a parameter-free Similarity Attention Mechanism (SimAM) is employed within the image feature space to facilitate the transfer of style image textures and colors, while simultaneously minimizing the loss of structural content information. The fusion of these dual attention mechanisms enables us to achieve style transfer in texture and color without sacrificing content integrity. The experimental results indicate that our approach exceeds existing methods in several evaluation metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Method based on a multi-image feature fusion model for detecting weld defects in time-of-flight diffraction images.
- Author
-
Kun Yue, Hongquan Jiang, Zelin Zhi, Deyan Yang, and Zhixiang Cheng
- Subjects
- *
OBJECT recognition (Computer vision) , *WELDING defects , *ARTIFICIAL intelligence , *WELDING inspection , *FEATURE extraction - Abstract
The time-of-flight diffraction (TOFD) technology is widely used in weld defect inspection and the utilisation of TOFD images in automatic defect recognition is attracting interest from enterprises. However, the extant artificial intelligence (Al)-based object detection methods have limitations: they use only a single image, the ability to extract defect features is weak and they are easily affected by interference fringes. These limitations result in high miss and false detection rates. In this study, a weld defect object detection method based on a multi-image feature fusion model (MIFFM) is proposed. The original TOFD input image was preprocessed using adaptive Gaussian filtering (AGF) and anisotropic filtering (specifically, Perona-Malik (P-M) filtering) to remove noise and enhance the defect features in the image. Subsequently, the AGF preprocessed, P-M filtering preprocessed and original TOFD images were stacked and used as input images. To enhance the detection performance, the you only look once X (YOLOX) bidirectional feature fusion network combined with the non-monotonic dynamic Scylla-intersection over union (SloU) loss function was constructed. Finally, the feasibility of the proposed method was verified using the TOFD image of a large-pressure spherical tank weld. The proposed method achieved a mean average precision (mAP) of78.05%, which is higher than YOLOX by 4.59%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. A renewed adversarial network for bearing fault diagnosis based on vibro-acoustic signals under speed fluctuating conditions.
- Author
-
Xing, Shuo, Wang, Jinrui, Han, Baokun, Zhang, Zongzhen, Ma, Hao, Jiang, Xingwang, Ma, Junqing, Yao, Shunxiang, Yang, Zujie, and Bao, Huaiqian
- Subjects
- *
FAULT diagnosis , *DIAGNOSIS methods , *SPEED , *SIGNALS & signaling - Abstract
Large discrepancy of sample distribution resulting from speed fluctuation is a great challenge to mechanical equipment health monitoring. Existing fault diagnosis methods are often limited by the acquisition mechanism of single-modal measurement. Considering the above problems, a multidimensional features dynamically adjusted adaptive network (MFDAAN) fused vibro-acoustic modal signals is proposed in this paper. The MFDAAN considers the context information of activation features by Funnel activation (FReLU) function to activate the vibro-acoustic signal features. In order to obtain fusion features, the multidimensional features of vibro-acoustic signals are dynamically adjusted at different stages by channel attention mechanisms, which is capable of considering the global information. Wasserstein distance is employed in the domain-adversarial training strategy to improve the property extracting domain-invariant features. The effectiveness of the MFDAAN is verified by cross-domain fault diagnosis experiments in two different scenarios. The results show that the MFDAAN can achieve good diagnostic effect for the tasks set of cross-domain fault diagnosis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Deep learning and feature fusion-based lung sound recognition model to diagnoses the respiratory diseases.
- Author
-
Shehab, Sara A., Mohammed, Kamel K., Darwish, Ashraf, and Hassanien, Aboul Ella
- Subjects
- *
LUNG diseases , *MEDICAL personnel , *RESPIRATORY diseases , *IMAGE representation , *SPECTROGRAMS , *LUNGS , *DEEP learning - Abstract
This paper proposed a novel approach for detecting lung sound disorders using deep learning feature fusion. The lung sound dataset are oversampled and converted into spectrogram images. Then, extracting deep features from CNN architectures, which are pre-trained on large-scale image datasets. These deep features capture rich representations of spectrogram images from the input signals, allowing for a comprehensive analysis of lung disorders. Next, a fusion technique is employed to combine the extracted features from multiple CNN architectures totlaly 8064 feature. This fusion process enhances the discriminative power of the features, facilitating more accurate and robust detection of lung disorders. To further improve the detection performance, an improved CNN Architecture is employed. To evaluate the effectiveness of the proposed approach, an experiments conducted on a large dataset of lung disorder signals. The results demonstrate that the deep feature fusion from different CNN architectures, combined with different CNN Layers, achieves superior performance in lung disorder detection. Compared to individual CNN architectures, the proposed approach achieves higher accuracy, sensitivity, and specificity, effectively reducing false negatives and false positives. The proposed model achieves 96.03% accuracy, 96.53% Sensitivity, 99.424% specificity, 96.52% precision, and 96.50% F1 Score when predicting lung diseases from sound files. This approach has the potential to assist healthcare professionals in the early detection and diagnosis of lung disorders, ultimately leading to improved patient outcomes and enhanced healthcare practices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. OCTNet: A Modified Multi-Scale Attention Feature Fusion Network with InceptionV3 for Retinal OCT Image Classification.
- Author
-
Khalil, Irshad, Mehmood, Asif, Kim, Hyunchul, and Kim, Jungsuk
- Subjects
- *
IMAGE recognition (Computer vision) , *OPTICAL coherence tomography , *NOSOLOGY , *MACULAR edema , *FEATURE extraction , *DEEP learning - Abstract
Classification and identification of eye diseases using Optical Coherence Tomography (OCT) has been a challenging task and a trending research area in recent years. Accurate classification and detection of different diseases are crucial for effective care management and improving vision outcomes. Current detection methods fall into two main categories: traditional methods and deep learning-based approaches. Traditional approaches rely on machine learning for feature extraction, while deep learning methods utilize data-driven classification model training. In recent years, Deep Learning (DL) and Machine Learning (ML) algorithms have become essential tools, particularly in medical image classification, and are widely used to classify and identify various diseases. However, due to the high spatial similarities in OCT images, accurate classification remains a challenging task. In this paper, we introduce a novel model called "OCTNet" that integrates a deep learning model combining InceptionV3 with a modified multi-scale attention-based spatial attention block to enhance model performance. OCTNet employs an InceptionV3 backbone with a fusion of dual attention modules to construct the proposed architecture. The InceptionV3 model generates rich features from images, capturing both local and global aspects, which are then enhanced by utilizing the modified multi-scale spatial attention block, resulting in a significantly improved feature map. To evaluate the model's performance, we utilized two state-of-the-art (SOTA) datasets that include images of normal cases, Choroidal Neovascularization (CNV), Drusen, and Diabetic Macular Edema (DME). Through experimentation and simulation, the proposed OCTNet improves the classification accuracy of the InceptionV3 model by 1.3%, yielding higher accuracy than other SOTA models. We also performed an ablation study to demonstrate the effectiveness of the proposed method. The model achieved an overall average accuracy of 99.50% and 99.65% with two different OCT datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. An Improved YOLOv8-Based Foreign Detection Algorithm for Transmission Lines.
- Author
-
Duan, Pingting and Liang, Xiao
- Subjects
- *
OBJECT recognition (Computer vision) , *ELECTRIC lines , *FOREIGN bodies , *SPACE perception , *DATA augmentation - Abstract
This research aims to overcome three major challenges in foreign object detection on power transmission lines: data scarcity, background noise, and high computational costs. In the improved YOLOv8 algorithm, the newly introduced lightweight GSCDown (Ghost Shuffle Channel Downsampling) module effectively captures subtle image features by combining 1 × 1 convolution and GSConv technology, thereby enhancing detection accuracy. CSPBlock (Cross-Stage Partial Block) fusion enhances the model's accuracy and stability by strengthening feature expression and spatial perception while maintaining the algorithm's lightweight nature and effectively mitigating the issue of vanishing gradients, making it suitable for efficient foreign object detection in complex power line environments. Additionally, PAM (pooling attention mechanism) effectively distinguishes between background and target without adding extra parameters, maintaining high accuracy even in the presence of background noise. Furthermore, AIGC (AI-generated content) technology is leveraged to produce high-quality images for training data augmentation, and lossless feature distillation ensures higher detection accuracy and reduces false positives. In conclusion, the improved architecture reduces the parameter count by 18% while improving the mAP@0.5 metric by a margin of 5.5 points when compared to YOLOv8n. Compared to state-of-the-art real-time object detection frameworks, our research demonstrates significant advantages in both model accuracy and parameter size. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. VQGNet: An Unsupervised Defect Detection Approach for Complex Textured Steel Surfaces.
- Author
-
Yu, Ronghao, Liu, Yun, Yang, Rui, and Wu, Yingna
- Subjects
- *
SURFACE texture , *GRAYSCALE model , *STEEL , *GENERALIZATION , *CLASSIFICATION - Abstract
Defect detection on steel surfaces with complex textures is a critical and challenging task in the industry. The limited number of defect samples and the complexity of the annotation process pose significant challenges. Moreover, performing defect segmentation based on accurate identification further increases the task's difficulty. To address this issue, we propose VQGNet, an unsupervised algorithm that can precisely recognize and segment defects simultaneously. A feature fusion method based on aggregated attention and a classification-aided module is proposed to segment defects by integrating different features in the original images and the anomaly maps, which direct the attention to the anomalous information instead of the irregular complex texture. The anomaly maps are generated more confidently using strategies for multi-scale feature fusion and neighbor feature aggregation. Moreover, an anomaly generation method suitable for grayscale images is introduced to facilitate the model's learning on the anomalous samples. The refined anomaly maps and fused features are both input into the classification-aided module for the final classification and segmentation. VQGNet achieves state-of-the-art (SOTA) performance on the industrial steel dataset, with an I-AUROC of 99.6%, I-F1 of 98.8%, P-AUROC of 97.0%, and P-F1 of 80.3%. Additionally, ViT-Query demonstrates robust generalization capabilities in generating anomaly maps based on the Kolektor Surface-Defect Dataset 2. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Detection of Thymoma Disease Using mRMR Feature Selection and Transformer Models.
- Author
-
Agar, Mehmet, Aydin, Siyami, Cakmak, Muharrem, Koc, Mustafa, and Togacar, Mesut
- Subjects
- *
TRANSFORMER models , *FEATURE selection , *THYMUS , *DEEP learning , *EPSTEIN-Barr virus , *THYMOMA - Abstract
Background: Thymoma is a tumor that originates in the thymus gland, a part of the human body located behind the breastbone. It is a malignant disease that is rare in children but more common in adults and usually does not spread outside the thymus. The exact cause of thymic disease is not known, but it is thought to be more common in people infected with the EBV virus at an early age. Various surgical methods are used in clinical settings to treat thymoma. Expert opinion is very important in the diagnosis of the disease. Recently, next-generation technologies have become increasingly important in disease detection. Today's early detection systems already use transformer models that are open to technological advances. Methods: What makes this study different is the use of transformer models instead of traditional deep learning models. The data used in this study were obtained from patients undergoing treatment at Fırat University, Department of Thoracic Surgery. The dataset consisted of two types of classes: thymoma disease images and non-thymoma disease images. The proposed approach consists of preprocessing, model training, feature extraction, feature set fusion between models, efficient feature selection, and classification. In the preprocessing step, unnecessary regions of the images were cropped, and the region of interest (ROI) technique was applied. Four types of transformer models (Deit3, Maxvit, Swin, and ViT) were used for model training. As a result of the training of the models, the feature sets obtained from the best three models were merged between the models (Deit3 and Swin, Deit3 and ViT, Deit3 and ViT, Swin and ViT, and Deit3 and Swin and ViT). The combined feature set of the model (Deit3 and ViT) that gave the best performance with fewer features was analyzed using the mRMR feature selection method. The SVM method was used in the classification process. Results: With the mRMR feature selection method, 100% overall accuracy was achieved with feature sets containing fewer features. The cross-validation technique was used to verify the overall accuracy of the proposed approach and 99.22% overall accuracy was achieved in the analysis with this technique. Conclusions: These findings emphasize the added value of the proposed approach in the detection of thymoma. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.