541 results on '"Feature aggregation"'
Search Results
2. VLAD-BuFF: Burst-Aware Fast Feature Aggregation for Visual Place Recognition
- Author
-
Khaliq, Ahmad, Xu, Ming, Hausler, Stephen, Milford, Michael, Garg, Sourav, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
3. LDAGM: prediction lncRNA-disease asociations by graph convolutional auto-encoder and multilayer perceptron based on multi-view heterogeneous networks.
- Author
-
Zhang, Bing, Wang, Haoyu, Ma, Chao, Huang, Hai, Fang, Zhou, and Qu, Jiaxing
- Subjects
- *
FEATURE extraction , *LINCRNA , *ASSOCIATION rule mining , *MICRORNA , *INFORMATION resources management - Abstract
Background: Long non-coding RNAs (lncRNAs) can prevent, diagnose, and treat a variety of complex human diseases, and it is crucial to establish a method to efficiently predict lncRNA-disease associations. Results: In this paper, we propose a prediction method for the lncRNA-disease association relationship, named LDAGM, which is based on the Graph Convolutional Autoencoder and Multilayer Perceptron model. The method first extracts the functional similarity and Gaussian interaction profile kernel similarity of lncRNAs and miRNAs, as well as the semantic similarity and Gaussian interaction profile kernel similarity of diseases. It then constructs six homogeneous networks and deeply fuses them using a deep topology feature extraction method. The fused networks facilitate feature complementation and deep mining of the original association relationships, capturing the deep connections between nodes. Next, by combining the obtained deep topological features with the similarity network of lncRNA, disease, and miRNA interactions, we construct a multi-view heterogeneous network model. The Graph Convolutional Autoencoder is employed for nonlinear feature extraction. Finally, the extracted nonlinear features are combined with the deep topological features of the multi-view heterogeneous network to obtain the final feature representation of the lncRNA-disease pair. Prediction of the lncRNA-disease association relationship is performed using the Multilayer Perceptron model. To enhance the performance and stability of the Multilayer Perceptron model, we introduce a hidden layer called the aggregation layer in the Multilayer Perceptron model. Through a gate mechanism, it controls the flow of information between each hidden layer in the Multilayer Perceptron model, aiming to achieve optimal feature extraction from each hidden layer. Conclusions: Parameter analysis, ablation studies, and comparison experiments verified the effectiveness of this method, and case studies verified the accuracy of this method in predicting lncRNA-disease association relationships. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Few-shot fine-grained recognition in remote sensing ship images with global and local feature aggregation.
- Author
-
Zhou, Guoqing, Huang, Liang, and Zhang, Xianfeng
- Subjects
- *
REMOTE sensing , *FISHERY management , *GENERALIZATION , *SHIPS - Abstract
Remote sensing ship image detection methods have broad application prospects in areas such as maritime traffic and fisheries management. However, previous detection methods relied heavily on a large amount of accurately annotated training data. When the number of remote sensing ship targets is scarce, the detection performance of previous methods is unsatisfactory. To address this issue, this paper proposes a few-shot detection method based on global and local feature aggregation. Specifically, we introduce global and local feature aggregation. We aggregate query-image global and local features with support features. This encourages the model to learn invariant features under varying global feature conditions which enhances the model's performance in training and inference. Building upon this, we propose combined feature aggregation, where query features are aggregated with all support features in the same batch, further reducing the confusion of target features caused by the imbalance between base-class samples and novel-class samples, improving the model's learning effectiveness for novel classes. Additionally, we employ an adversarial autoencoder to reconstruct support features, enhancing the model's generalization performance. Finally, the model underwent extensive experiments on the publicly available remote sensing ship dataset HRSC-2016. The results indicate that compared to the baseline model, our model achieved new state-of-the-art performance under various dataset settings. This model presented in this paper will provide new insights for few-shot detection work based on meta -learning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Multi-scale and multi-path cascaded convolutional network for semantic segmentation of colorectal polyps.
- Author
-
Manan, Malik Abdul, Feng, Jinchao, Yaqub, Muhammad, Ahmed, Shahzad, Imran, Syed Muhammad Ali, Chuhan, Imran Shabir, and Khan, Haroon Ahmed
- Subjects
COLON polyps ,CASCADE connections ,COLORECTAL cancer ,GASTROINTESTINAL system ,CONFIDENCE intervals ,ADENOMATOUS polyps - Abstract
Colorectal polyps are structural abnormalities of the gastrointestinal tract that can potentially become cancerous in some cases. The study introduces a novel framework for colorectal polyp segmentation named the Multi-Scale and Multi-Path Cascaded Convolution Network (MMCC-Net), aimed at addressing the limitations of existing models, such as inadequate spatial dependence representation and the absence of multi-level feature integration during the decoding stage by integrating multi-scale and multi-path cascaded convolutional techniques and enhances feature aggregation through dual attention modules, skip connections, and a feature enhancer. MMCC-Net achieves superior performance in identifying polyp areas at the pixel level. The Proposed MMCC-Net was tested across six public datasets and compared against eight SOTA models to demonstrate its efficiency in polyp segmentation. The MMCC-Net's performance shows Dice scores with confidence interval ranging between 77.43 ± 0.12, (77.08, 77.56) and 94.45 ± 0.12, (94.19, 94.71) and Mean Intersection over Union (MIoU) scores with confidence interval ranging from 72.71 ± 0.19, (72.20, 73.00) to 90.16 ± 0.16, (89.69, 90.53) on the six databases. These results highlight the model's potential as a powerful tool for accurate and efficient polyp segmentation, contributing to early detection and prevention strategies in colorectal cancer. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Robustness study of speaker recognition based on ECAPA-TDNN-CIFG.
- Author
-
Wang, Chunli, Xu, Linming, Zhu, Hongxin, and Cheng, Xiaoyang
- Subjects
- *
DELAY lines , *FEATURE extraction , *GENERALIZATION , *ALGORITHMS - Abstract
This paper describes a study on speaker recognition using the ECAPA-TDNN architecture, which stands for Extended Context-Aware Parallel Aggregations Time-Delay Neural Network. It utilizes X-vectors, a method for extracting speaker features by converting speech into fixed-length vectors, and introduces a squeeze-and-excitation block to model dependencies between channels. In order to better explore temporal relationships in the context of speaker recognition and improve the algorithm's generalization performance in complex acoustic scenarios, this study adds input gates and forget gates to the ECAPA-TDNN architecture, combining them with CIFG (Convolutional LSTM with Input and Forget Gates) modules. These are embedded into a residual structure of multi-layer aggregated features. A sub-center Arcface, an improved loss function based on Arcface, is used for selecting sub-centers for subclass discrimination, retaining advantageous sub-centers to enhance intra-class compactness and strengthen the robustness of the network. Experimental results demonstrate that the improved ECAPA-TDNN-CIFG in this study outperforms the baseline model, yielding more accurate and efficient recognition results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Boundary enhancement and refinement network for camouflaged object detection.
- Author
-
Xia, Chenxing, Cao, Huizhen, Gao, Xiuju, Ge, Bin, Li, Kuan-Ching, Fang, Xianjin, Zhang, Yan, and Liang, Xingzhu
- Abstract
Camouflaged object detection aims to locate and segment objects accurately that conceal themselves well in the environment. Despite the advancements in deep learning methods, prevalent issues persist, including coarse boundary identification in complex scenes and the ineffective integration of multi-source features. To this end, we propose a novel boundary enhancement and refinement network named BERNet, which mainly consists of three modules for enhancing and refining boundary information: an asymmetric edge module (AEM) with multi-groups dilated convolution block (GDCB), a residual mixed pooling enhanced module (RPEM), and a multivariate information interaction refiner module (M2IRM). AEM with GDCB is designed to obtain rich boundary clues, where different dilation rates are used to expand the receptive field. RPEM is capable of enhancing boundary features under the guidance of boundary cues to improve the detection accuracy of small and multiple camouflaged objects. M2IRM is introduced to refine the side-out prediction maps progressively under the supervision of the ground truth by the fusion of multi-source information. Comprehensive experiments on three benchmark datasets demonstrate the effectiveness of our BERNet with competitive state-of-the-art methods under the most evaluation metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. LDAGM: prediction lncRNA-disease asociations by graph convolutional auto-encoder and multilayer perceptron based on multi-view heterogeneous networks
- Author
-
Bing Zhang, Haoyu Wang, Chao Ma, Hai Huang, Zhou Fang, and Jiaxing Qu
- Subjects
LncRNA-disease associations ,Graph convolutional auto-encoder ,Multilayer perceptron ,Deep topological feature extraction ,Feature aggregation ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Long non-coding RNAs (lncRNAs) can prevent, diagnose, and treat a variety of complex human diseases, and it is crucial to establish a method to efficiently predict lncRNA-disease associations. Results In this paper, we propose a prediction method for the lncRNA-disease association relationship, named LDAGM, which is based on the Graph Convolutional Autoencoder and Multilayer Perceptron model. The method first extracts the functional similarity and Gaussian interaction profile kernel similarity of lncRNAs and miRNAs, as well as the semantic similarity and Gaussian interaction profile kernel similarity of diseases. It then constructs six homogeneous networks and deeply fuses them using a deep topology feature extraction method. The fused networks facilitate feature complementation and deep mining of the original association relationships, capturing the deep connections between nodes. Next, by combining the obtained deep topological features with the similarity network of lncRNA, disease, and miRNA interactions, we construct a multi-view heterogeneous network model. The Graph Convolutional Autoencoder is employed for nonlinear feature extraction. Finally, the extracted nonlinear features are combined with the deep topological features of the multi-view heterogeneous network to obtain the final feature representation of the lncRNA-disease pair. Prediction of the lncRNA-disease association relationship is performed using the Multilayer Perceptron model. To enhance the performance and stability of the Multilayer Perceptron model, we introduce a hidden layer called the aggregation layer in the Multilayer Perceptron model. Through a gate mechanism, it controls the flow of information between each hidden layer in the Multilayer Perceptron model, aiming to achieve optimal feature extraction from each hidden layer. Conclusions Parameter analysis, ablation studies, and comparison experiments verified the effectiveness of this method, and case studies verified the accuracy of this method in predicting lncRNA-disease association relationships.
- Published
- 2024
- Full Text
- View/download PDF
9. Multi-scale and multi-path cascaded convolutional network for semantic segmentation of colorectal polyps
- Author
-
Malik Abdul Manan, Jinchao Feng, Muhammad Yaqub, Shahzad Ahmed, Syed Muhammad Ali Imran, Imran Shabir Chuhan, and Haroon Ahmed Khan
- Subjects
Colorectal polyp ,Semantic segmentation ,Cascaded convolution network ,Feature aggregation ,Attention modules ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Colorectal polyps are structural abnormalities of the gastrointestinal tract that can potentially become cancerous in some cases. The study introduces a novel framework for colorectal polyp segmentation named the Multi-Scale and Multi-Path Cascaded Convolution Network (MMCC-Net), aimed at addressing the limitations of existing models, such as inadequate spatial dependence representation and the absence of multi-level feature integration during the decoding stage by integrating multi-scale and multi-path cascaded convolutional techniques and enhances feature aggregation through dual attention modules, skip connections, and a feature enhancer. MMCC-Net achieves superior performance in identifying polyp areas at the pixel level. The Proposed MMCC-Net was tested across six public datasets and compared against eight SOTA models to demonstrate its efficiency in polyp segmentation. The MMCC-Net's performance shows Dice scores with confidence interval ranging between 77.43 ± 0.12, (77.08, 77.56) and 94.45 ± 0.12, (94.19, 94.71) and Mean Intersection over Union (MIoU) scores with confidence interval ranging from 72.71 ± 0.19, (72.20, 73.00) to 90.16 ± 0.16, (89.69, 90.53) on the six databases. These results highlight the model's potential as a powerful tool for accurate and efficient polyp segmentation, contributing to early detection and prevention strategies in colorectal cancer.
- Published
- 2024
- Full Text
- View/download PDF
10. GaitSTAR: Spatial–Temporal Attention-Based Feature-Reweighting Architecture for Human Gait Recognition.
- Author
-
Bilal, Muhammad, Jianbiao, He, Mushtaq, Husnain, Asim, Muhammad, Ali, Gauhar, and ElAffendi, Mohammed
- Subjects
- *
COMPUTER vision , *DEEP learning , *FEATURE extraction , *BIOMETRIC identification , *DISCRIMINANT analysis , *GAIT in humans - Abstract
Human gait recognition (HGR) leverages unique gait patterns to identify individuals, but the effectiveness of this technique can be hindered due to various factors such as carrying conditions, foot shadows, clothing variations, and changes in viewing angles. Traditional silhouette-based systems often neglect the critical role of instantaneous gait motion, which is essential for distinguishing individuals with similar features. We introduce the "Enhanced Gait Feature Extraction Framework (GaitSTAR)", a novel method that incorporates dynamic feature weighting through the discriminant analysis of temporal and spatial features within a channel-wise architecture. Key innovations in GaitSTAR include dynamic stride flow representation (DSFR) to address silhouette distortion, a transformer-based feature set transformation (FST) for integrating image-level features into set-level features, and dynamic feature reweighting (DFR) for capturing long-range interactions. DFR enhances contextual understanding and improves detection accuracy by computing attention distributions across channel dimensions. Empirical evaluations show that GaitSTAR achieves impressive accuracies of 98.5%, 98.0%, and 92.7% under NM, BG, and CL conditions, respectively, with the CASIA-B dataset; 67.3% with the CASIA-C dataset; and 54.21% with the Gait3D dataset. Despite its complexity, GaitSTAR demonstrates a favorable balance between accuracy and computational efficiency, making it a powerful tool for biometric identification based on gait patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Drfnet: dual stream recurrent feature sharing network for video dehazing.
- Author
-
Galshetwar, Vijay M., Saini, Poonam, and Chaudhary, Sachin
- Abstract
The primary effects of haze on captured images/frames are visibility degradation and color disturbance. Even though extensive research has been done on the tasks of video dehazing, they fail to perform better on varicolored hazy videos. The varicolored haze is still a challenging problem in video de-hazing. To tackle the problem of varicolored haze, the contextual information alone is not sufficient. In addition to adequate contextual information, color balancing is required to restore varicolored hazy images/videos. Therefore, this paper proposes a novel lightweight dual stream recurrent feature sharing network (with only 1.77 M parameters) for video de-hazing. The proposed framework involves: (1) A color balancing module to balance the color of input hazy frame in YCbCr space, (2) A multi-receptive multi-resolution module (MMM), which interlinks the RGB and YCbCr based features to learn global and rich contextual data, (3) Further, we have proposed a feature aggregation residual module (FARM) to strengthen the representative capability during reconstruction, (4) A channel attention module is proposed to resist redundant features by recalibrating weights of input features. Experimental results and ablation study show that the proposed model is superior to existing state-of-the-art approaches for video de-hazing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Gaze Target Detection Network Based on Attention Mechanism and Depth Prior.
- Author
-
ZHU Yun, ZHU Dongchen, ZHANG Guanghui, SUN Yanzan, and ZHANG Xiaolin
- Subjects
GAZE ,NONVERBAL cues ,COMPUTER vision ,DATA mining ,ATTENTION ,INTENTION - Abstract
Human gaze behavior, as a non-verbal cue, plays a crucial role in revealing human intentions. Gaze target detection has attracted extensive attention from the machine vision community. However, existing gaze target detection methods usually focus on the texture information extraction of images, ignoring the importance of stereo depth information for gaze target detection, which makes it difficult to deal with scenes with complex texture. In this work, a novel gaze target detection network based on attention mechanism and depth prior is proposed, which adopts two-stage architecture (i.e., a gaze direction prediction stage and a saliency detection stage). In the gaze direction predication stage, a channel-spatial attention mechanism module is established to recalibrate texture features, and a head position encoding branch is designed to achieve texture and head position-aware enhanced high-representation features to accurately predict gaze. Furthermore, a strategy is proposed to introduce the depth representing the stereoscopic or distance information in the 3D scene as a prior into the saliency detection stage. At the same time, the channel-spatial attention mechanism is used to enhance the multi- scale texture features, and the advantages of depth geometric information and image texture information are fully utilized to improve the accuracy of gaze target detection. Experimental results show that the proposed model performs favorably against the state-of-the-art methods on GazeFollow and DLGaze datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Cascaded Aggregation Convolution Network for Salient Grain Pests Detection.
- Author
-
Yu, Junwei, Chen, Shihao, Liu, Nan, Zhai, Fupin, and Pan, Quan
- Subjects
- *
OBJECT recognition (Computer vision) , *PEST control , *GRAIN storage , *STORAGE facilities , *FOOD security - Abstract
Simple Summary: Infestations of pests in grain storage can have a significant impact on both the quantity and quality of stored grains. Drawing inspiration from the detection abilities of humans and birds in identifying pests, we present an innovative deep learning solution designed for the detection and management of pests in stored grains. Specifically focusing on the detection of small grain pests within cluttered backgrounds, we propose a cascaded feature aggregation convolution network. Our approach outperforms existing models in terms of both trainable parameters and detection accuracy, as evidenced by experiments conducted on our newly introduced GrainPest dataset as well as publicly available datasets. By sharing our dataset and refining our model's architecture, we aim to advance the field of research in grain pest detection and the classification of stored grains based on pest density. This study is expected to contribute to the reduction of economic losses caused by storage pests and to enhance food security measures. Pest infestation poses significant threats to grain storage due to pests' behaviors of feeding, respiration, excretion, and reproduction. Efficient pest detection and control are essential to mitigate these risks. However, accurate detection of small grain pests remains challenging due to their small size, high variability, low contrast, and cluttered background. Salient pest detection focuses on the visual features that stand out, improving the accuracy of pest identification in complex environments. Drawing inspiration from the rapid pest recognition abilities of humans and birds, we propose a novel Cascaded Aggregation Convolution Network (CACNet) for pest detection and control in stored grain. Our approach aims to improve detection accuracy by employing a reverse cascade feature aggregation network that imitates the visual attention mechanism in humans when observing and focusing on objects of interest. The CACNet uses VGG16 as the backbone network and incorporates two key operations, namely feature enhancement and feature aggregation. These operations merge the high-level semantic information and low-level positional information of salient objects, enabling accurate segmentation of small-scale grain pests. We have curated the GrainPest dataset, comprising 500 images showcasing zero to five or more pests in grains. Leveraging this dataset and the MSRA-B dataset, we validated our method's efficacy, achieving a structure S-measure of 91.9%, and 90.9%, and a weighted F-measure of 76.4%, and 91.0%, respectively. Our approach significantly surpasses the traditional saliency detection methods and other state-of-the-art salient object detection models based on deep learning. This technology shows great potential for pest detection and assessing the severity of pest infestation based on pest density in grain storage facilities. It also holds promise for the prevention and control of pests in agriculture and forestry. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Interpretable linear dimensionality reduction based on bias-variance analysis.
- Author
-
Bonetti, Paolo, Metelli, Alberto Maria, and Restelli, Marcello
- Subjects
CONTINUOUS groups ,MACHINE learning ,LINEAR statistical models ,DESIGN techniques ,ALGORITHMS - Abstract
One of the central issues of several machine learning applications on real data is the choice of the input features. Ideally, the designer should select a small number of the relevant, nonredundant features to preserve the complete information contained in the original dataset, with little collinearity among features. This procedure helps mitigate problems like overfitting and the curse of dimensionality, which arise when dealing with high-dimensional problems. On the other hand, it is not desirable to simply discard some features, since they may still contain information that can be exploited to improve results. Instead, dimensionality reduction techniques are designed to limit the number of features in a dataset by projecting them into a lower dimensional space, possibly considering all the original features. However, the projected features resulting from the application of dimensionality reduction techniques are usually difficult to interpret. In this paper, we seek to design a principled dimensionality reduction approach that maintains the interpretability of the resulting features. Specifically, we propose a bias-variance analysis for linear models and we leverage these theoretical results to design an algorithm, Linear Correlated Features Aggregation (LinCFA), which aggregates groups of continuous features with their average if their correlation is "sufficiently large". In this way, all features are considered, the dimensionality is reduced and the interpretability is preserved. Finally, we provide numerical validations of the proposed algorithm both on synthetic datasets to confirm the theoretical results and on real datasets to show some promising applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Estimation of Fractal Dimension and Segmentation of Brain Tumor with Parallel Features Aggregation Network.
- Author
-
Sultan, Haseeb, Ullah, Nadeem, Hong, Jin Seong, Kim, Seung Gu, Lee, Dong Chan, Jung, Seung Yong, and Park, Kang Ryoung
- Subjects
- *
FRACTAL dimensions , *BRAIN tumors , *PETRI nets , *DEEP learning , *DATABASES , *CANCER invasiveness - Abstract
The accurate recognition of a brain tumor (BT) is crucial for accurate diagnosis, intervention planning, and the evaluation of post-intervention outcomes. Conventional methods of manually identifying and delineating BTs are inefficient, prone to error, and time-consuming. Subjective methods for BT recognition are biased because of the diffuse and irregular nature of BTs, along with varying enhancement patterns and the coexistence of different tumor components. Hence, the development of an automated diagnostic system for BTs is vital for mitigating subjective bias and achieving speedy and effective BT segmentation. Recently developed deep learning (DL)-based methods have replaced subjective methods; however, these DL-based methods still have a low performance, showing room for improvement, and are limited to heterogeneous dataset analysis. Herein, we propose a DL-based parallel features aggregation network (PFA-Net) for the robust segmentation of three different regions in a BT scan, and we perform a heterogeneous dataset analysis to validate its generality. The parallel features aggregation (PFA) module exploits the local radiomic contextual spatial features of BTs at low, intermediate, and high levels for different types of tumors and aggregates them in a parallel fashion. To enhance the diagnostic capabilities of the proposed segmentation framework, we introduced the fractal dimension estimation into our system, seamlessly combined as an end-to-end task to gain insights into the complexity and irregularity of structures, thereby characterizing the intricate morphology of BTs. The proposed PFA-Net achieves the Dice scores (DSs) of 87.54%, 93.42%, and 91.02%, for the enhancing tumor region, whole tumor region, and tumor core region, respectively, with the multimodal brain tumor segmentation (BraTS)-2020 open database, surpassing the performance of existing state-of-the-art methods. Additionally, PFA-Net is validated with another open database of brain tumor progression and achieves a DS of 64.58% for heterogeneous dataset analysis, surpassing the performance of existing state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Multi-modality information refinement fusion network for RGB-D salient object detection.
- Author
-
Bao, Hua and Fan, Bo
- Subjects
- *
DATA fusion (Statistics) , *PROBLEM solving - Abstract
RGB-D salient object detection (SOD) has gained more and more research interest in recent years. Due to various imaging mechanisms of RGB and depth modalities, RGB-D images contain different information. Thus, how to effectively fuse multi-modality features and aggregate multi-scale features to generate accurate saliency prediction are still the problems. In this article, we present a Multi-Modality Information Refinement Fusion Network (MIRFNet) for RGB-D SOD to solve the problems. Specifically, a Feature-Enhancement and Cross-Refinement Module (FCM) is proposed to reduce redundant features and the gap between cross-modality data to achieve multi-modality feature fusion effectively. In FCM, the Feature-Enhancement step utilizes attention mechanisms to obtain enhanced features which contain less redundant information and more common salient information, and the Cross-Refinement step employs the enhanced features to reduce the gap between cross-modality features and achieve effective feature fusion. Then, we propose an Edge Guidance Module (EGM) to extract edge information from RGB features. Finally, to effectively aggregate multi-level features and achieve accurate saliency prediction, a Feature-Aggregation and Edge-Refinement Module (FEM) is designed, which introduces specific-modality information and edge information to conduct sufficient information interaction. In FEM, the Feature-Aggregation step aggregates multi-scale features with specific-modality information, and the Edge-Refinement step uses edge information to refine the aggregation features. Extensive experiments demonstrate that MIRFNet can achieve comparable performance against the other 12 SOTA methods on five datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Discriminative multi-scale adjacent feature for person re-identification.
- Author
-
Qi, Mengzan, Chan, Sixian, Hong, Feng, Yao, Yuan, and Zhou, Xiaolong
- Subjects
FEATURE extraction ,TRANSFORMER models ,IDENTIFICATION - Abstract
Recently, discriminative and robust identification information has played an increasingly critical role in Person Re-identification (Re-ID). It is a fact that the existing part-based methods demonstrate strong performance in the extraction of fine-grained features. However, their intensive partitions lead to semantic information ambiguity and background interference. Meanwhile, we observe that the body with different structural proportions. Hence, we assume that aggregation with the multi-scale adjacent features can effectively alleviate the above issues. In this paper, we propose a novel Discriminative Multi-scale Adjacent Feature (MSAF) learning framework to enrich semantic information and disregard background. In summary, we establish multi-scale interaction in two stages: the feature extraction stage and the feature aggregation stage. Firstly, a Multi-scale Feature Extraction (MFE) module is designed by combining CNN and Transformer structure to obtain the discriminative specific feature, as the basis for the feature aggregation stage. Secondly, a Jointly Part-based Feature Aggregation (JPFA) mechanism is revealed to implement adjacent feature aggregation with diverse scales. The JPFA contains Same-scale Feature Correlation (SFC) and Cross-scale Feature Correlation (CFC) sub-modules. Finally, to verify the effectiveness of the proposed method, extensive experiments are performed on the common datasets of Market-1501, CUHK03-NP, DukeMTMC, and MSMT17. The experimental results achieve better performance than many state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Combining dual attention mechanism and efficient feature aggregation for road and vehicle segmentation from UAV imagery.
- Author
-
Trung Dung Nguyen, Trung Kien Pham, Chi Kien Ha, Long Ho Le, Thanh Quyen Ngo, and Hoanh Nguyen
- Subjects
TRAFFIC monitoring ,DRONE aircraft ,EMERGENCY management ,URBAN planning - Abstract
Unmanned aerial vehicles (UAVs) have gained significant popularity in recent years due to their ability to capture high-resolution aerial imagery for various applications, including traffic monitoring, urban planning, and disaster management. Accurate road and vehicle segmentation from UAV imagery plays a crucial role in these applications. In this paper, we propose a novel approach combining dual attention mechanisms and efficient multilayer feature aggregation to enhance the performance of road and vehicle segmentation from UAV imagery. Our approach integrates a spatial attention mechanism and a channel-wise attention mechanism to enable the model to selectively focus on relevant features for segmentation tasks. In conjunction with these attention mechanisms, we introduce an efficient multi-layer feature aggregation method that synthesizes and integrates multi-scale features at different levels of the network, resulting in a more robust and informative feature representation. Our proposed method is evaluated on the UAVid semantic segmentation dataset, showcasing its exceptional performance in comparison to renowned approaches such as U-Net, DeepLabv3+, and SegNet. The experimental results affirm that our approach surpasses these state-of-the-art methods in terms of segmentation accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Spatial relaxation transformer for image super-resolution
- Author
-
Yinghua Li, Ying Zhang, Hao Zeng, Jinglu He, and Jie Guo
- Subjects
Super-resolution ,Vision transformer ,Feature aggregation ,Image enhancement ,Swin transformer ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Transformer-based approaches have demonstrated remarkable performance in image processing tasks due to their ability to model long-range dependencies. Current mainstream Transformer-based methods typically confine self-attention computation within windows to reduce computational burden. However, this constraint may lead to grid artifacts in the reconstructed images due to insufficient cross-window information exchange, particularly in image super-resolution tasks. To address this issue, we propose the Multi-Scale Texture Complementation Block based on Spatial Relaxation Transformer (MSRT), which leverages features at multiple scales and augments information exchange through cross windows attention computation. In addition, we introduce a loss function based on the prior of texture smoothness transformation, which utilizes the continuity of textures between patches to constrain the generation of more coherent texture information in the reconstructed images. Specifically, we employ learnable compressive sensing technology to extract shallow features from images, preserving image features while reducing feature dimensions and improving computational efficiency. Extensive experiments conducted on multiple benchmark datasets demonstrate that our method outperforms previous state-of-the-art approaches in both qualitative and quantitative evaluations.
- Published
- 2024
- Full Text
- View/download PDF
20. Deep Learning for Action Recognition
- Author
-
Wu, Zuxuan, Jiang, Yu-Gang, Shen, Xuemin Sherman, Series Editor, Wu, Zuxuan, and Jiang, Yu-Gang
- Published
- 2024
- Full Text
- View/download PDF
21. An Improved U-Net Model for Simultaneous Nuclei Segmentation and Classification
- Author
-
Liu, Taotao, Zhang, Dongdong, Wang, Hongcheng, Qi, Xumai, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Si, Zhanjun, editor, and Guo, Jiayang, editor
- Published
- 2024
- Full Text
- View/download PDF
22. MFANet: Multi-feature Aggregation Network for Domain Generalized Stereo Matching
- Author
-
Yang, Jinlong, Wang, Gang, Wu, Cheng, Chen, Dong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Pan, Yijie, editor, and Zhang, Qinhu, editor
- Published
- 2024
- Full Text
- View/download PDF
23. Graphite Ore Grade Classification Algorithm Based on Multi-scale Fused Image Features
- Author
-
Wang, Jionghui, Liu, Yaokun, Huang, Xueyu, Chang, Shaopeng, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Wu, Celimuge, editor, Chen, Xianfu, editor, Feng, Jie, editor, and Wu, Zhen, editor
- Published
- 2024
- Full Text
- View/download PDF
24. TextBFA: Arbitrary Shape Text Detection with Bidirectional Feature Aggregation
- Author
-
Xu, Hui, Wang, Qiu-Feng, Li, Zhenghao, Shi, Yu, Zhou, Xiang-Dong, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Luo, Biao, editor, Cheng, Long, editor, Wu, Zheng-Guang, editor, Li, Hongyi, editor, and Li, Chaojie, editor
- Published
- 2024
- Full Text
- View/download PDF
25. Dual-Memory Feature Aggregation for Video Object Detection
- Author
-
Fan, Diwei, Zheng, Huicheng, Dang, Jisheng, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
26. Relation-Guided Multi-stage Feature Aggregation Network for Video Object Detection
- Author
-
Yao, Tingting, Cao, Fuxiao, Mi, Fuheng, Li, Danmeng, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
27. Adversarial Keyword Extraction and Semantic-Spatial Feature Aggregation for Clinical Report Guided Thyroid Nodule Segmentation
- Author
-
Zhang, Yudi, Chen, Wenting, Li, Xuechen, Shen, Linlin, Lai, Zhihui, Kong, Heng, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
28. Self-guided Transformer for Video Super-Resolution
- Author
-
Xue, Tong, Wang, Qianrui, Huang, Xinyi, Li, Dengshi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
29. Deep Stereo Matching with Superpixel Based Feature and Cost
- Author
-
Zeng, Kai, Zhang, Hui, Wang, Wei, Wang, Yaonan, Mao, Jianxu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
30. MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition
- Author
-
Rao, Yao, Li, Chaofeng, Xu, Feiran, and Guo, Ya
- Published
- 2024
- Full Text
- View/download PDF
31. Evaluating learned feature aggregators for writer retrieval
- Author
-
Mattick, Alexander, Mayr, Martin, Seuret, Mathias, Kordon, Florian, Wu, Fei, and Christlein, Vincent
- Published
- 2024
- Full Text
- View/download PDF
32. Discriminative multi-scale adjacent feature for person re-identification
- Author
-
Mengzan Qi, Sixian Chan, Feng Hong, Yuan Yao, and Xiaolong Zhou
- Subjects
Person re-identification ,Feature extraction ,Feature aggregation ,Discriminative feature ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract Recently, discriminative and robust identification information has played an increasingly critical role in Person Re-identification (Re-ID). It is a fact that the existing part-based methods demonstrate strong performance in the extraction of fine-grained features. However, their intensive partitions lead to semantic information ambiguity and background interference. Meanwhile, we observe that the body with different structural proportions. Hence, we assume that aggregation with the multi-scale adjacent features can effectively alleviate the above issues. In this paper, we propose a novel Discriminative Multi-scale Adjacent Feature (MSAF) learning framework to enrich semantic information and disregard background. In summary, we establish multi-scale interaction in two stages: the feature extraction stage and the feature aggregation stage. Firstly, a Multi-scale Feature Extraction (MFE) module is designed by combining CNN and Transformer structure to obtain the discriminative specific feature, as the basis for the feature aggregation stage. Secondly, a Jointly Part-based Feature Aggregation (JPFA) mechanism is revealed to implement adjacent feature aggregation with diverse scales. The JPFA contains Same-scale Feature Correlation (SFC) and Cross-scale Feature Correlation (CFC) sub-modules. Finally, to verify the effectiveness of the proposed method, extensive experiments are performed on the common datasets of Market-1501, CUHK03-NP, DukeMTMC, and MSMT17. The experimental results achieve better performance than many state-of-the-art methods.
- Published
- 2024
- Full Text
- View/download PDF
33. Efficient breast cancer diagnosis using multi‐level progressive feature aggregation based deep transfer learning system.
- Author
-
Patel, Vivek and Chaurasia, Vijayshri
- Subjects
- *
CANCER diagnosis , *INSTRUCTIONAL systems , *COMPUTER-aided diagnosis , *TUMOR classification , *MAMMOGRAMS , *BREAST cancer , *DEEP learning - Abstract
Breast cancer is a worldwide fatal disease that exists mostly among women. The deep learning technique has proven its effectiveness, but the performance of the existing deep learning systems is quite compromising. In this work, a deep transfer learning system is suggested for efficient breast cancer classification from histopathology images. This system is based on a novel multi‐level progressive feature aggregation (MPFA) and a spatial domain learning approach. The combination of a pretrained Resnet101 backbone network with MPFA is implemented to extract more significant features. In addition, a mixed‐dilated spatial domain learning network (MSLN) is further incorporated to enhance the receptive field and increase discrimination between features. The proposed method achieved superior performance as compared to the existing state‐of‐the‐art methods, offering 99.24% accuracy, a 98.79% F‐1 score, 98.59% precision, and 98.99% recall values over BreaKHis dataset. An ablation study is carried out over the ICIAR2018 dataset to verify the generalizability and effectiveness of the system. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. MFACNet: A Multi-Frame Feature Aggregating and Inter-Feature Correlation Framework for Multi-Object Tracking in Satellite Videos.
- Author
-
Zhao, Hu, Shen, Yanyun, Wang, Zhipan, and Zhang, Qingling
- Subjects
- *
OBJECT tracking (Computer vision) , *ARTIFICIAL satellite tracking , *VIDEOS , *ENVIRONMENTAL monitoring - Abstract
Efficient multi-object tracking (MOT) in satellite videos is crucial for numerous applications, ranging from surveillance to environmental monitoring. Existing methods often struggle with effectively exploring the correlation and contextual cues inherent in the consecutive features of video sequences, resulting in redundant feature inference and unreliable motion estimation for tracking. To address these challenges, we propose the MFACNet, a novel multi-frame features aggregating and inter-feature correlation framework for enhancing MOT in satellite videos with the idea of utilizing the features of consecutive frames. The MFACNet integrates multi-frame feature aggregation techniques with inter-feature correlation mechanisms to improve tracking accuracy and robustness. Specifically, our framework leverages temporal information across the features of consecutive frames to capture contextual cues and refine object representations over time. Moreover, we introduce a mechanism to explicitly model the correlations between adjacent features in video sequences, facilitating a more accurate motion estimation and trajectory associations. We evaluated the MFACNet using benchmark datasets for satellite-based video MOT tasks and demonstrated its superiority in terms of tracking accuracy and robustness over state-of-the-art performance by 2.0% in MOTA and 1.6% in IDF1. Our experimental results highlight the potential of precisely utilizing deep features from video sequences. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Speech Emotion Recognition Using a Multi-Time-Scale Approach to Feature Aggregation and an Ensemble of SVM Classifiers.
- Author
-
STEFANOWSKA, Antonina and ZIELIŃSKI, Sławomir K.
- Subjects
- *
EMOTION recognition , *DATA augmentation , *FEATURE extraction , *AUTOMATIC speech recognition , *GENETIC algorithms , *SPEECH , *SPEECH synthesis , *SUPPORT vector machines - Abstract
Due to its relevant real-life applications, the recognition of emotions from speech signals constitutes a popular research topic. In the traditional methods applied for speech emotion recognition, audio features are typically aggregated using a fixed-duration time window, potentially discarding information conveyed by speech at various signal durations. By contrast, in the proposed method, audio features are aggregated simultaneously using time windows of different lengths (a multi-time-scale approach), hence, potentially better utilizing information carried at phonemic, syllabic, and prosodic levels compared to the traditional approach. A genetic algorithm is employed to optimize the feature extraction procedure. The features aggregated at different time windows are subsequently classified by an ensemble of support vector machine (SVM) classifiers. To enhance the generalization property of the method, a data augmentation technique based on pitch shifting and time stretching is applied. According to the obtained results, the developed method outperforms the traditional one for the selected datasets, demonstrating the benefits of using a multi-time-scale approach to feature aggregation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Dual-stream Co-enhanced Network for Unsupervised Video Object Segmentation.
- Author
-
Hongliang Zhu, Hui Yin, Yanting Liu, and Ning Chen
- Subjects
OPTICAL flow ,OPTICAL images ,VIDEOS - Abstract
Unsupervised Video Object Segmentation (UVOS) is a highly challenging problem in computer vision as the annotation of the target object in the testing video is unknown at all. The main difficulty is to effectively handle the complicated and changeable motion state of the target object and the confusion of similar background objects in video sequence. In this paper, we propose a novel deep Dual-stream Co-enhanced Network (DC-Net) for UVOS via bidirectional motion cues refinement and multi-level feature aggregation, which can fully take advantage of motion cues and effectively integrate different level features to produce highquality segmentation mask. DC-Net is a dual-stream architecture where the two streams are co-enhanced by each other. One is a motion stream with a Motion-cues Refine Module (MRM), which learns from bidirectional optical flow images and produces fine-grained and complete distinctive motion saliency map, and the other is an appearance stream with a Multi-level Feature Aggregation Module (MFAM) and a Context Attention Module (CAM) which are designed to integrate the different level features effectively. Specifically, the motion saliency map obtained by the motion stream is fused with each stage of the decoder in the appearance stream to improve the segmentation, and in turn the segmentation loss in the appearance stream feeds back into the motion stream to enhance the motion refinement. Experimental results on three datasets (Davis2016, VideoSD, SegTrack-v2) demonstrate that DC-Net has achieved comparable results with some state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. RVPNet: A real time unstructured road vanishing point detection algorithm using attention mechanism and global context information.
- Author
-
Liu, Yu, Fan, Xue, Han, Shiyuan, Zhou, Jin, Yang, Xiaohui, and Li, Zhongtao
- Abstract
The detection of the vanishing point (VP) in unstructured road is crucial for the advancement of autonomous vehicle technology. However, due to the inadequate fusion of intra-level features and high computational requirements of existing CNN-based road VP detection methods, a model named RVPNet is proposed in this paper. To begin, the proposed algorithm adopts the architecture of encoder-decoder combined lightweight backbone to extract unstructured road features efficiently. Second, the Simple Residual Pyramid Pooling Module (SRPPM) is designed in this model to obtain cross-path global contextual information with low computational cost. And a Dual Attention-based Feature Aggregation Module (DAFAM) is proposed to obtain better inter-level feature representations. Finally, the offset loss is introduced to compensate for the inherent offset errors caused by the output stride of the heatmap. The experimental results show that the average detection error rate of our approach is only 0.03128 on the Kong dataset, and the average processing time reaches 238 FPS. The average detection error rate of our approach based is only 0.03600 on the Moghhadam dataset. Compared with the state-of-the-art methods, the proposed approach achieves the highest detection accuracy and speed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. An intelligent payment card fraud detection system.
- Author
-
Seera, Manjeevan, Lim, Chee Peng, Kumar, Ajay, Dhamotharan, Lalitha, and Tan, Kim Hua
- Subjects
- *
FRAUD investigation , *MACHINE learning , *STATISTICAL hypothesis testing , *TRANSACTION records , *STATISTICAL learning - Abstract
Payment cards offer a simple and convenient method for making purchases. Owing to the increase in the usage of payment cards, especially in online purchases, fraud cases are on the rise. The rise creates financial risk and uncertainty, as in the commercial sector, it incurs billions of losses each year. However, real transaction records that can facilitate the development of effective predictive models for fraud detection are difficult to obtain, mainly because of issues related to confidentially of customer information. In this paper, we apply a total of 13 statistical and machine learning models for payment card fraud detection using both publicly available and real transaction records. The results from both original features and aggregated features are analyzed and compared. A statistical hypothesis test is conducted to evaluate whether the aggregated features identified by a genetic algorithm can offer a better discriminative power, as compared with the original features, in fraud detection. The outcomes positively ascertain the effectiveness of using aggregated features for undertaking real-world payment card fraud detection problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Pose Calibrated Feature Aggregation for Video Face Set Recognition in Unconstrained Environments
- Author
-
Ibrahim Ali Hasani and Omar Arif
- Subjects
Video face recognition ,feature aggregation ,frame selection ,open sets ,multi-stream networks ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This paper presents Pose Calibrated Feature Aggregation Network (PCFAN), an architecture for set/video face recognition. Using stacked attention blocks and a multi-modal architecture, it automatically assigns adaptive weights to every instance in the set, based on both the recognition embeddings and the associated face metadata. It uses these weights to produce a single, compact feature vector for the set. The model automatically learns to advocate for features from images with more favourable qualities and poses, which inherently hold more information. Our block can be inserted on top of any standard recognition model for set prediction and improved performance, particularly in unconstrained scenarios where subject pose and image quality vary considerably between frames. We test our approach on three challenging video face-recognition datasets, IJB-A, IJB-B, and YTF, and report state-of-the-art results. Moreover, a comparison with top aggregation methods as our baselines demonstrates that PCFAN is the superior approach.
- Published
- 2024
- Full Text
- View/download PDF
40. Three-Dimensional Millimeter-Wave Object Detector Based on the Enhancement of Local-Global Contextual Information
- Author
-
Yanyi Chang, Ying Liu, Zhaohui Bu, Haipo Cui, and Li Ding
- Subjects
Three-dimensional millimeter-wave images ,object detection ,IA-SSD ,local-global context ,feature aggregation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Millimeter-wave (MMW) point clouds, characterized by their low resolution and high noise, limit the detection accuracy of point-based IA-SSD method due to the inadequate consideration of contextual information in MMW scenarios. Therefore, this paper proposes a three-dimensional (3D) MMW object detector, greatly augmenting the detection performance of the baseline model IA-SSD by the integration of the local-global context information. Central to our approach is the implementation of a multi-scale feature aggregation (MFA) module in the encoder stage of IA-SSD, which utilizes a self-attention mechanism to apprehend local contextual distinctions. This module is further applied to the centroid aggregation stage to enhance the capture of local context from foreground points. Complementarily, a global feature fusion module is devised to combine global contextual insights, drawing upon the localized information delineated by the MFA modules. This integrated framework significantly diminishes the false detection rate while concurrently elevating the detection precision for occluded objects. Relative to the IA-SSD baseline, the empirical evaluations validate the efficiency of our proposed model, demonstrating marked decreases in false positives and false negatives. Specifically, there is a 2.78% and 7.39% improvement in AP_R40_0.25 and AP_R40_0.5, respectively. When the intersection-over-union threshold is set as 0.25 and 0.5, the corresponding recall rate increases by 2.13% and 6.2%, respectively. Moreover, the inference speed reaches 32.3 frames per second(FPS), only a slight decrease of 2.9 FPS compared to the baseline model. These results demonstrate that the proposed detector significantly enhances detection performance without compromising on speed, marking a considerable advancement in the domain of 3D MMW object detection.
- Published
- 2024
- Full Text
- View/download PDF
41. Super-Resolution GAN and Global Aware Object Detection System for Vehicle Detection in Complex Traffic Environments
- Author
-
Hongqing Wang, Jun Kit Chaw, Sim Kuan Goh, Liantao Shi, Ting Tin Tin, Nannan Huang, and Hong-Seng Gan
- Subjects
Intelligent vehicle detection ,self-attention ,multi-scale semantic feature ,generative adversarial network ,feature aggregation ,transportation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Intelligent vehicle detection systems have the potential to improve road safety and optimize traffic management. Despite the continuous advancements in AI technology, the detection of different types of vehicles in complex traffic environments remains a persistent challenge. In this paper, an end-to-end solution is proposed. The image enhancement part proposes a super-resolution synthetic image GAN (SSIGAN) to improve detection of small, distant objects in low-resolution (LR) images. An edge enhancer (EE) and a hierarchical self-attention module (HS) are applied to address the loss of high-frequency edge information and texture details in the super-resolved images. The output super-resolution (SR) image is fed into detection part. In the detection part, we introduce a global context-aware network (GCAFormer) for accurate vehicle detection. GCAFormer utilizes a cascade transformer backbone (CT) that enables internal information interaction and generates multi-scale feature maps. This approach effectively addresses the challenge of varying vehicle scales, ensuring robust detection performance. We also built in a cross-scale aggregation feature (CSAF) module inside GCAFormer, which fuses low- and high-dimensional semantic information and provides multi-resolution feature maps as input to the detection head, so as to make the network more adaptable to complex traffic environments and realize accurate detection. In addition, we validate the effectiveness of our proposed method on a large number of datasets, reaching 89.12% mAP on the KITTI dataset, 90.62% on the IITM-hetra, 86.83% on the Pascal VOC and 93.33% on the BDD-100k. The results were compared to SOTA and demonstrated the competitive advantages of our proposed method for Vehicle Detection in complex traffic environments.
- Published
- 2024
- Full Text
- View/download PDF
42. Feature Aggregation in Joint Sound Classification and Localization Neural Networks
- Author
-
Brendan Healy, Patrick Mcnamee, and Zahra Nili Ahmadabadi
- Subjects
Joint sound signal classification and localization ,multi-task deep learning ,feature aggregation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Current state-of-the-art sound source localization (SSL) deep learning networks lack feature aggregation within their architecture. Feature aggregation within neural network architectures enhances model performance by enabling the consolidation of information from different feature scales, thereby improving feature robustness and invariance. We adapt feature aggregation sub-architectures from computer vision neural networks onto a baseline neural network architecture for SSL, the Sound Event Localization and Detection network (SELDnet). The incorporated sub-architecture are: Path Aggregation Network (PANet); Weighted Bi-directional Feature Pyramid Network (BiFPN); and a novel Scale Encoding Network (SEN). These sub-architectures were evaluated using two metrics for signal classification and two metrics for direction-of-arrival regression. The results show that models incorporating feature aggregations outperformed the baseline SELDnet, in both sound signal classification and localization. Among the feature aggregators, PANet exhibited superior performance compared to other methods, which were otherwise comparable. The results provide evidence that feature aggregation sub-architectures enhance the performance of sound detection neural networks, particularly in direction-of-arrival regression.
- Published
- 2024
- Full Text
- View/download PDF
43. HGM: A General High-Order Spatial and Spectral Global Feature Fusion Module for Visual Multitasking
- Author
-
Chengcheng Chen, Xiliang Zhang, Yuhao Zhou, Yugang Chang, and Weiming Zeng
- Subjects
Convolutional neural networks (CNNs) ,feature aggregation ,global frequency domain features ,high-order feature interaction ,remote sensing ship detection ,transformer architecture ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Recent computer vision research has mainly focused on designing efficient network architectures, with limited exploration of high- and low-frequency information in the high-order frequency domain. This study introduces a novel approach utilizing spatial and frequency domain information to design a high-order global feature fusion module (HGM) and develop a specialized remote sensing detection network, HGNet. HGM leverages cyclic convolution to achieve arbitrary high-order features, overcoming the second-order limitation of transformers. Furthermore, HGM integrates cyclic convolution and fast Fourier transform, utilizing the former to capture interaction information between high-order spatial and channel domains and the latter to transform high-order features from spatial to frequency domain for global information extraction. This combination fundamentally addresses the issue of long-distance dependency in convolutions and avoids quadratic growth in computational complexity. Moreover, we have constructed an information truncation gate to minimize high-order redundant features, achieving a “win–win” scenario for network accuracy and parameter efficiency. In addition, HGM acts as a plug-and-play module, boosting performance when integrated into various networks. Experimental findings reveal that HGNet achieves a 93.0% $\text{mAP}_{\text{0.5}}$ with just 12.1M parameters on the HRSID remote sensing ship detection dataset. In addition, applying HGM enhances a performance in CIFAR100 classification and WHDLD remote sensing segmentation tasks.
- Published
- 2024
- Full Text
- View/download PDF
44. Attention-guided cross-modal multiple feature aggregation network for RGB-D salient object detection
- Author
-
Bojian Chen, Wenbin Wu, Zhezhou Li, Tengfei Han, Zhuolei Chen, and Weihao Zhang
- Subjects
salient object detection (sod) ,rgb-d ,feature aggregation ,attention ,cross-modal ,Mathematics ,QA1-939 ,Applied mathematics. Quantitative methods ,T57-57.97 - Abstract
The goal of RGB-D salient object detection is to aggregate the information of the two modalities of RGB and depth to accurately detect and segment salient objects. Existing RGB-D SOD models can extract the multilevel features of single modality well and can also integrate cross-modal features, but it can rarely handle both at the same time. To tap into and make the most of the correlations of intra- and inter-modality information, in this paper, we proposed an attention-guided cross-modal multi-feature aggregation network for RGB-D SOD. Our motivation was that both cross-modal feature fusion and multilevel feature fusion are crucial for RGB-D SOD task. The main innovation of this work lies in two points: One is the cross-modal pyramid feature interaction (CPFI) module that integrates multilevel features from both RGB and depth modalities in a bottom-up manner, and the other is cross-modal feature decoder (CMFD) that aggregates the fused features to generate the final saliency map. Extensive experiments on six benchmark datasets showed that the proposed attention-guided cross-modal multiple feature aggregation network (ACFPA-Net) achieved competitive performance over 15 state of the art (SOTA) RGB-D SOD methods, both qualitatively and quantitatively.
- Published
- 2024
- Full Text
- View/download PDF
45. Hierarchical Attentive Feature Aggregation for Person Re-Identification
- Author
-
Husheng Dong and Ping Lu
- Subjects
Attention ,diverse features ,feature aggregation ,person re-identification ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Recent efforts on person re-identification have shown promising results by learning discriminative features via the multi-branch network. To further boost feature discrimination, attention mechanism has also been extensively employed. However, the branches on the main level rarely communicate with others in existing branching models, which may compromise the ability of mining diverse features. To mitigate this issue, a novel framework called Hierarchical Attentive Feature Aggregation (Hi-AFA) is proposed. In Hi-AFA, a hierarchical aggregation mechanism is applied to learn attentive features. The current feature map is not only fed into the next stage, but also aggregated into another branch, leading to hierarchical feature flows along depth and parallel branches. We also present a simple Feature Suppression Operation (FSO) and a Lightweight Dual Attention Module (LDAM) to guide feature learning. The FSO can partially erase the salient features already discovered, such that more potential clues can be mined by other branches with the help of LDAM. By this manner, the branches could cooperate to mine richer and more diverse feature representations. The hierarchical aggregation and multi-granularity feature learning are integrated into a unified architecture that builds upon OSNet, resulting a resource-economical and effective person re-identification model. Extensive experiments on four mainstream datasets, including Market-1501, DukeMTMC-reID, MSMT17, and CUHK03, are conducted to validate the effectiveness of the proposed method, and results show that state-of-the-art performance is achieved.
- Published
- 2024
- Full Text
- View/download PDF
46. FSOD4RSI: Few-Shot Object Detection for Remote Sensing Images via Features Aggregation and Scale Attention
- Author
-
Honghao Gao, Shuping Wu, Ye Wang, Jung Yoon Kim, and Yueshen Xu
- Subjects
Attention mechanism ,feature aggregation ,few-shot learning ,object detection ,remote sensing images ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Due to the continuous development of few-shot learning, there have been notable advancements in methods for few-shot object detection in recent years. However, most existing methods in this domain primarily focus on natural images, neglecting the challenges posed by variations in object scales, which are usually encountered in remote sensing images. This article proposes a new few-shot object detection model designed to handle the issue of object scale variation in remote sensing images. Our developed model has two essential parts: a feature aggregation module (FAM) and a scale-aware attention module (SAM). Considering the few-shot features of remote sensing images, we designed the FAM to improve the support and query features through channel multiplication operations utilizing a feature pyramid network and a transformer encoder. The created FAM better extracts the global features of remote sensing images and enhances the significant feature representation of few-shot remote sensing objects. In addition, we design the SAM to address the scale variation problems that frequently occur in remote sensing images. By employing multiscale convolutions, the SAM enables the acquisition of contextual features while adapting to objects of varying scales. Extensive experiments were conducted on benchmark datasets, including NWPU VHR-10 and DIOR datasets, and the results show that our model indeed addresses the challenges posed by object scale variation and improves the applicability of few-shot object detection in the remote sensing domain.
- Published
- 2024
- Full Text
- View/download PDF
47. PSFNet: Efficient Detection of SAR Image Based on Petty-Specialized Feature Aggregation
- Author
-
Peng Zhou, Peng Wang, Jie Cao, Daiyin Zhu, Qiyuan Yin, Jiming Lv, Ping Chen, Yongshi Jie, and Cheng Jiang
- Subjects
Feature aggregation ,mix-attention ,object detection ,synthetic aperture radar (SAR) ,swin transformer ,YOLOv7 ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
With the rapid development of deep learning, convolutional neural networks have achieved milestones in synthetic aperture radar (SAR) image object detection. However, object detection in SAR images is still a great challenge due to the difficulty in distinguishing targets from complex backgrounds. At the same time, most of the targets in SAR images are small and unevenly distributed, which makes it challenging to extract sufficient feature information. To solve these issues mentioned above, an efficient object detection network for SAR images based on Swin transformer and YOLOv7 is proposed in this article. First, we design a novel feature aggregation module Petty-specialized feature aggregation (PS-FPN) to enrich small targets’ semantic and spatial features while keeping the model lightweight. PS-FPN module uses the fusion of deep and shallow features by using cross-layer feature aggregation and single-branch feature aggregation to enhance the detection of small targets. Second, a novel attention mechanism strategy mix-attention is proposed to find more attention regions. Finally, we add one more prediction head to extract shallow features that effectively preserve small targets’ feature information. To verify the effectiveness of the proposed algorithm, extensive experiments are carried out on several challenging SAR image datasets. The results show that, compared with other state-of-the-art detectors, the proposed method can achieve significant performance based on lightweight detection.
- Published
- 2024
- Full Text
- View/download PDF
48. Learning Adequate Alignment and Interaction for Cross-Modal Retrieval
- Author
-
MingKang Wang, Min Meng, Jigang Liu, and Jigang Wu
- Subjects
Cross-modal Retrieval ,Visual Semantic Embedding ,Feature Aggregation ,Transformer ,Computer engineering. Computer hardware ,TK7885-7895 - Abstract
Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications, especially image-text retrieval in the fields of computer vision and natural language processing. Recently, visual and semantic embedding (VSE) learning has shown promising improvements on image-text retrieval tasks. Most existing VSE models employ two unrelated encoders to extract features, then use complex methods to contextualize and aggregate those features into holistic embeddings. Despite recent advances, existing approaches still suffer from two limitations: 1) without considering intermediate interaction and adequate alignment between different modalities, these models cannot guarantee the discriminative ability of representations; 2) existing feature aggregators are susceptible to certain noisy regions, which may lead to unreasonable pooling coefficients and affect the quality of the final aggregated features. To address these challenges, we propose a novel cross-modal retrieval model containing a well-designed alignment module and a novel multimodal fusion encoder, which aims to learn adequate alignment and interaction on aggregated features for effectively bridging the modality gap. Experiments on Microsoft COCO and Flickr30k datasets demonstrates the superiority of our model over the state-of-the-art methods.
- Published
- 2023
- Full Text
- View/download PDF
49. IMC-Det: Intra–Inter Modality Contrastive Learning for Video Object Detection
- Author
-
Qi, Qiang, Qiu, Zhenyu, Yan, Yan, Lu, Yang, and Wang, Hanzi
- Published
- 2024
- Full Text
- View/download PDF
50. Camouflaged objects detection network via contradiction detection and feature aggregation.
- Author
-
Bi, Hongbo, Tong, Jinghui, Zhang, Cong, Mo, Disen, and Wang, Xiufang
- Abstract
Camouflaged Object Detection(COD) aims to segment objects with a similar appearance to the background. There are some problems in existing algorithms, such as blurred edges and incomplete detection. To address the above issues, we propose a novel COD framework termed CFNet. Our network consists of the Contradiction Area Detection Module(CADM) and Feature Aggregation Module(FAM). In the CADM, we propose an improved receptive field mechanism, which utilizes max pooling operation and convolution block to highlight the contradictory areas and refine the edge of the hidden object. Besides, the FAM is designed to connect two adjacent layers via attention and anti-attention strategy, leading to cross-layer feature enhancement and information fusion. More specifically, the self-attention mechanism helps supplement semantic information, and the anti-attention mechanism contributes to removing redundant information. Extensive experiments conducted on the four public COD datasets show the comparable performance of the proposed CFNet with SOTAs, and the ablation experiments demonstrate the effectiveness of the proposed modules. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.