556 results on '"feature aggregation"'
Search Results
2. Efficient Language-Driven Action Localization by Feature Aggregation and Prediction Adjustment
- Author
-
Shang, Zirui, Yang, Shuo, Wu, Xinxiao, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
3. VLAD-BuFF: Burst-Aware Fast Feature Aggregation for Visual Place Recognition
- Author
-
Khaliq, Ahmad, Xu, Ming, Hausler, Stephen, Milford, Michael, Garg, Sourav, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
4. 基于稠密向量图的物体 6D 位姿回归算法.
- Author
-
左国玉, 喻 杉, 顾宗函, and 郑榜贵
- Abstract
Copyright of Journal of Beijing University of Technology is the property of Journal of Beijing University of Technology, Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
5. 基于最优传输特征聚合的温室视觉位置识别方法.
- Author
-
侯玉涵, 周云成, 刘泽钰, 张润池, and 周金桥
- Abstract
As the foundation for implementing closed-loop detection within the realm of visual SLAM (simultaneous localization and mapping), visual place recognition (VPR) has great potential in various applications of greenhouse robot navigation and other fields. However, the existing VPR cannot fully meet the actual requirements of greenhouse scenes due to the complexity and constant variations in the greenhouse environment. In particular, the local feature aggregation paradigm strongly depends on the induction bias of training samples in VPR models, which leads to the issue of information redundancy during feature aggregation. In this study, a greenhouse VPR was presented, according to the optimal transport of local feature aggregation. The process of aggregating local features into a global descriptor was framed as an optimal transport problem, where the cost matrix was predicted through an MLP (multi-layer perceptron). Thus, a cost matrix was dynamically generated using the local features that was extracted from the greenhouse scene images. Additionally, a 'dustbin' cluster was introduced into the cost matrix to allocate the redundant features. Taking the cost matrix as the input, the Sinkhorn algorithm was employed to determine an optimal solution to the assignment matrix. Furthermore, the soft assignment of local features to various clusters was achieved through the assignment matrix. Ultimately, the assignment was concatenated to form a global descriptor for the scene image, which was used for place recognition. A deep neural network (DNN) was optimized and designed to serve as the backbone for local feature extraction of greenhouse scene images, by combining the advantages of CNN (convolutional neural network) and Transformer. Furthermore, cosine similarity was used as the metric function to calculate the similarity measure between scene image global descriptors, so as to perform descriptor matching. A series of experiments were conducted in a tomato greenhouse. The experimental results showed that the improved model achieved better performance. The top-1 recall rate (R@1) for place recognition was achieved at 88.96%, which was 29.67, 2.97, and 2.89 percentage points higher than the those of NetVLAD, MixVPR, and EigenPlaces models, respectively. When compared to the aggregators employed in MixVPR and NetVLAD, our aggregator achieved improvements in R@1 by 1.09 and 21.65 percentage points, respectively, showcasing its effectiveness. Compared with the CNN, the improved network achieved an increase of 5.45 percentage points in R@1. There was even more pronounced R@1 improvement (reaching 10.48 percentage points), compared with a Transformer network. Simultaneously, our network resulted in a 1.6-fold increase in computation speed compared to the previous Transformer. In addition, the experiments further demonstrated that the improved model exhibited excellent performance of place recognition and strong robustness when dealing with factors, such as small sampling distance shifts, small viewpoint shifts, and different sunlight intensities. The greenhouse VPR achieved a place recognition rate of no less than 81.94% in actual greenhouses, indicating its practical application potential. The method based on optimal transport of local feature aggregation and global descriptor generation was effective for place recognition, and the image local feature extraction network can boost the performance of place recognition. These findings can provide technical support to the visual systems of intelligent agricultural machinery in the greenhouse. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. A Directional Enhanced Adaptive Detection Framework for Small Targets.
- Author
-
Li, Chao, Chang, Yifan, Yang, Shimeng, Li, Kaiju, and Yin, Guangqiang
- Subjects
RESEARCH personnel ,SIMPLICITY ,NOISE ,SPEED - Abstract
Due to the challenges posed by limited size and features, positional and noise issues, and dataset imbalance and simplicity, small object detection is one of the most challenging tasks in the field of object detection. Consequently, an increasing number of researchers are focusing on this area. In this paper, we propose a Directional Enhanced Adaptive (DEA) detection framework for small targets. This framework effectively combines the detection accuracy advantages of two-stage methods with the detection speed advantages of one-stage methods. Additionally, we introduce a Multi-Scale Object Adaptive Slicing (MASA) module and an improved IoU-based aggregation module that integrate with this framework to enhance detection performance. For better comparison, we use the F1 score as one of the evaluation metrics. The experimental results demonstrate that our DEA framework improves the performance of various backbone detection networks and achieves better comprehensive detection performance than other proposed methods, even though our network has not been trained on the test dataset while others have. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. PDeT: A Progressive Deformable Transformer for Photovoltaic Panel Defect Segmentation.
- Author
-
Zhou, Peng, Fang, Hong, and Wu, Gaochang
- Subjects
- *
SOLAR cells , *CURRENT distribution , *FEATURE extraction , *SEMANTICS - Abstract
Defects in photovoltaic (PV) panels can significantly reduce the power generation efficiency of the system and may cause localized overheating due to uneven current distribution. Therefore, adopting precise pixel-level defect detection, i.e., defect segmentation, technology is essential to ensuring stable operation. However, for effective defect segmentation, the feature extractor must adaptively determine the appropriate scale or receptive field for accurate defect localization, while the decoder must seamlessly fuse coarse-level semantics with fine-grained features to enhance high-level representations. In this paper, we propose a Progressive Deformable Transformer (PDeT) for defect segmentation in PV cells. This approach effectively learns spatial sampling offsets and refines features progressively through coarse-level semantic attention. Specifically, the network adaptively captures spatial offset positions and computes self-attention, expanding the model's receptive field and enabling feature extraction across objects of various shapes. Furthermore, we introduce a semantic aggregation module to refine semantic information, converting the fused feature map into a scale space and balancing contextual information. Extensive experiments demonstrate the effectiveness of our method, achieving an mIoU of 88.41% on our solar cell dataset, outperforming other methods. Additionally, to validate the PDeT's applicability across different domains, we trained and tested it on the MVTec-AD dataset. The experimental results demonstrate that the PDeT exhibits excellent recognition performance in various other scenarios as well. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. MSAPVT: a multi-scale attention pyramid vision transformer network for large-scale fruit recognition.
- Author
-
Rao, Yao, Li, Chaofeng, Xu, Feiran, and Guo, Ya
- Subjects
TRANSFORMER models ,MACHINE learning ,CONVOLUTIONAL neural networks ,DEEP learning ,COMPUTATIONAL complexity - Abstract
Efficient and accurate fruit recognition is critical for applications such as automated fruit-picking systems, quality evaluation, and self-checkout services in supermarkets. Existing vision-based methods, primarily leveraging Convolutional Neural Networks (CNNs), often achieve high performance but are hindered by high computational complexity, making real-time deployment on edge devices challenging. Moreover, the diversity and similarity among fruit varieties, along with imbalanced fruit datasets, pose significant obstacles to general-purpose deep learning algorithms. To address these challenges, we propose the Multi-Scale Attention Pyramid Vision Transformer (MSAPVT) alongside an enhanced version of the Fru92 dataset. Our MSAPVT introduces four innovative improvements: attention enhancement, dimension adjustment, multi-scale feature aggregation and loss function improvement. Firstly, the Hybrid Attention Module (HAM) is designed for better refining the multi-level features of the Pyramid Vision Transformer v2 (PVTv2). Secondly, the Dimension Adjustment Layer (DAL) is designed for increasing the weight of the high-level features. Thirdly, the multi-scale feature aggregation strategy is introduced to fuse multi-scale complementary features. Finally, the KL-divergence loss is added for enhancing the difference between multi-scale features. These innovations enable MSAPVT to capture fine-grained details in fruit images, generating highly discriminative representations with slight low model complexity. Our model achieves the best results on the Fru92 and Fru92s datasets, with Top-1 Acc. of 91.40% and 94.29%, and Top-5 Acc. of 98.95% and 99.55%, respectively. In the end, an approachable and efficient fruit classification system based on MSAPVT is devised for potential applications. The improved dataset is available at https://github.com/iamraoyao/MSAPVT-Inference-Demo. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. LDAGM: prediction lncRNA-disease asociations by graph convolutional auto-encoder and multilayer perceptron based on multi-view heterogeneous networks.
- Author
-
Zhang, Bing, Wang, Haoyu, Ma, Chao, Huang, Hai, Fang, Zhou, and Qu, Jiaxing
- Subjects
- *
FEATURE extraction , *LINCRNA , *ASSOCIATION rule mining , *MICRORNA , *INFORMATION resources management - Abstract
Background: Long non-coding RNAs (lncRNAs) can prevent, diagnose, and treat a variety of complex human diseases, and it is crucial to establish a method to efficiently predict lncRNA-disease associations. Results: In this paper, we propose a prediction method for the lncRNA-disease association relationship, named LDAGM, which is based on the Graph Convolutional Autoencoder and Multilayer Perceptron model. The method first extracts the functional similarity and Gaussian interaction profile kernel similarity of lncRNAs and miRNAs, as well as the semantic similarity and Gaussian interaction profile kernel similarity of diseases. It then constructs six homogeneous networks and deeply fuses them using a deep topology feature extraction method. The fused networks facilitate feature complementation and deep mining of the original association relationships, capturing the deep connections between nodes. Next, by combining the obtained deep topological features with the similarity network of lncRNA, disease, and miRNA interactions, we construct a multi-view heterogeneous network model. The Graph Convolutional Autoencoder is employed for nonlinear feature extraction. Finally, the extracted nonlinear features are combined with the deep topological features of the multi-view heterogeneous network to obtain the final feature representation of the lncRNA-disease pair. Prediction of the lncRNA-disease association relationship is performed using the Multilayer Perceptron model. To enhance the performance and stability of the Multilayer Perceptron model, we introduce a hidden layer called the aggregation layer in the Multilayer Perceptron model. Through a gate mechanism, it controls the flow of information between each hidden layer in the Multilayer Perceptron model, aiming to achieve optimal feature extraction from each hidden layer. Conclusions: Parameter analysis, ablation studies, and comparison experiments verified the effectiveness of this method, and case studies verified the accuracy of this method in predicting lncRNA-disease association relationships. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Few-shot fine-grained recognition in remote sensing ship images with global and local feature aggregation.
- Author
-
Zhou, Guoqing, Huang, Liang, and Zhang, Xianfeng
- Subjects
- *
REMOTE sensing , *FISHERY management , *GENERALIZATION , *SHIPS - Abstract
Remote sensing ship image detection methods have broad application prospects in areas such as maritime traffic and fisheries management. However, previous detection methods relied heavily on a large amount of accurately annotated training data. When the number of remote sensing ship targets is scarce, the detection performance of previous methods is unsatisfactory. To address this issue, this paper proposes a few-shot detection method based on global and local feature aggregation. Specifically, we introduce global and local feature aggregation. We aggregate query-image global and local features with support features. This encourages the model to learn invariant features under varying global feature conditions which enhances the model's performance in training and inference. Building upon this, we propose combined feature aggregation, where query features are aggregated with all support features in the same batch, further reducing the confusion of target features caused by the imbalance between base-class samples and novel-class samples, improving the model's learning effectiveness for novel classes. Additionally, we employ an adversarial autoencoder to reconstruct support features, enhancing the model's generalization performance. Finally, the model underwent extensive experiments on the publicly available remote sensing ship dataset HRSC-2016. The results indicate that compared to the baseline model, our model achieved new state-of-the-art performance under various dataset settings. This model presented in this paper will provide new insights for few-shot detection work based on meta -learning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Multi-scale and multi-path cascaded convolutional network for semantic segmentation of colorectal polyps.
- Author
-
Manan, Malik Abdul, Feng, Jinchao, Yaqub, Muhammad, Ahmed, Shahzad, Imran, Syed Muhammad Ali, Chuhan, Imran Shabir, and Khan, Haroon Ahmed
- Subjects
COLON polyps ,CASCADE connections ,COLORECTAL cancer ,GASTROINTESTINAL system ,CONFIDENCE intervals ,ADENOMATOUS polyps - Abstract
Colorectal polyps are structural abnormalities of the gastrointestinal tract that can potentially become cancerous in some cases. The study introduces a novel framework for colorectal polyp segmentation named the Multi-Scale and Multi-Path Cascaded Convolution Network (MMCC-Net), aimed at addressing the limitations of existing models, such as inadequate spatial dependence representation and the absence of multi-level feature integration during the decoding stage by integrating multi-scale and multi-path cascaded convolutional techniques and enhances feature aggregation through dual attention modules, skip connections, and a feature enhancer. MMCC-Net achieves superior performance in identifying polyp areas at the pixel level. The Proposed MMCC-Net was tested across six public datasets and compared against eight SOTA models to demonstrate its efficiency in polyp segmentation. The MMCC-Net's performance shows Dice scores with confidence interval ranging between 77.43 ± 0.12, (77.08, 77.56) and 94.45 ± 0.12, (94.19, 94.71) and Mean Intersection over Union (MIoU) scores with confidence interval ranging from 72.71 ± 0.19, (72.20, 73.00) to 90.16 ± 0.16, (89.69, 90.53) on the six databases. These results highlight the model's potential as a powerful tool for accurate and efficient polyp segmentation, contributing to early detection and prevention strategies in colorectal cancer. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Hybrid Feature Engineering Based on Customer Spending Behavior for Credit Card Anomaly and Fraud Detection.
- Author
-
Alamri, Maram and Ykhlef, Mourad
- Subjects
ARTIFICIAL neural networks ,CREDIT card fraud ,FRAUD investigation ,CREDIT cards ,DECISION trees - Abstract
For financial institutions, credit card fraud detection is a critical activity where the accuracy and efficiency of detection models are important. Traditional methods often use standard feature selection techniques that may ignore refined patterns in transaction data. This paper presents a new approach that combines feature aggregation with Exhaustive Feature Selection (EFS) to enhance the performance of credit card fraud detection models. Through feature aggregation, higher-order characteristics are created to capture complex relationships within the data, then find the most relevant features by evaluating all possible subsets of features systemically using EFS. Our method was tested using a public credit card fraud dataset, PaySim. Four popular learning classifiers—random forest (RF), decision tree (DT), logistic regression (LR), and deep neural network (DNN)—are used with balanced datasets to evaluate the techniques. The findings show a large improvement in detection accuracy, F1 score, and AUPRC compared to other approaches. Specifically, our method had improved F1 score, precision, and recall measures, which underlines its ability to handle fraudulent transactions' nuances more effectively as compared to other approaches. This article provides an overall analysis of this method's impact on model performance, giving some insights for future studies regarding fraud detection and related fields. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. 结合自注意力机制和软阈值降噪的对比联邦学习特征聚合算法.
- Author
-
王毅, 瞿治国, and 孙乐
- Subjects
FEDERATED learning ,MACHINE learning ,DATA privacy ,ARTIFICIAL intelligence ,NOISE control - Abstract
Copyright of Journal of Chongqing University of Posts & Telecommunications (Natural Science Edition) is the property of Chongqing University of Posts & Telecommunications and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
14. LDAGM: prediction lncRNA-disease asociations by graph convolutional auto-encoder and multilayer perceptron based on multi-view heterogeneous networks
- Author
-
Bing Zhang, Haoyu Wang, Chao Ma, Hai Huang, Zhou Fang, and Jiaxing Qu
- Subjects
LncRNA-disease associations ,Graph convolutional auto-encoder ,Multilayer perceptron ,Deep topological feature extraction ,Feature aggregation ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Long non-coding RNAs (lncRNAs) can prevent, diagnose, and treat a variety of complex human diseases, and it is crucial to establish a method to efficiently predict lncRNA-disease associations. Results In this paper, we propose a prediction method for the lncRNA-disease association relationship, named LDAGM, which is based on the Graph Convolutional Autoencoder and Multilayer Perceptron model. The method first extracts the functional similarity and Gaussian interaction profile kernel similarity of lncRNAs and miRNAs, as well as the semantic similarity and Gaussian interaction profile kernel similarity of diseases. It then constructs six homogeneous networks and deeply fuses them using a deep topology feature extraction method. The fused networks facilitate feature complementation and deep mining of the original association relationships, capturing the deep connections between nodes. Next, by combining the obtained deep topological features with the similarity network of lncRNA, disease, and miRNA interactions, we construct a multi-view heterogeneous network model. The Graph Convolutional Autoencoder is employed for nonlinear feature extraction. Finally, the extracted nonlinear features are combined with the deep topological features of the multi-view heterogeneous network to obtain the final feature representation of the lncRNA-disease pair. Prediction of the lncRNA-disease association relationship is performed using the Multilayer Perceptron model. To enhance the performance and stability of the Multilayer Perceptron model, we introduce a hidden layer called the aggregation layer in the Multilayer Perceptron model. Through a gate mechanism, it controls the flow of information between each hidden layer in the Multilayer Perceptron model, aiming to achieve optimal feature extraction from each hidden layer. Conclusions Parameter analysis, ablation studies, and comparison experiments verified the effectiveness of this method, and case studies verified the accuracy of this method in predicting lncRNA-disease association relationships.
- Published
- 2024
- Full Text
- View/download PDF
15. Multi-scale and multi-path cascaded convolutional network for semantic segmentation of colorectal polyps
- Author
-
Malik Abdul Manan, Jinchao Feng, Muhammad Yaqub, Shahzad Ahmed, Syed Muhammad Ali Imran, Imran Shabir Chuhan, and Haroon Ahmed Khan
- Subjects
Colorectal polyp ,Semantic segmentation ,Cascaded convolution network ,Feature aggregation ,Attention modules ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Colorectal polyps are structural abnormalities of the gastrointestinal tract that can potentially become cancerous in some cases. The study introduces a novel framework for colorectal polyp segmentation named the Multi-Scale and Multi-Path Cascaded Convolution Network (MMCC-Net), aimed at addressing the limitations of existing models, such as inadequate spatial dependence representation and the absence of multi-level feature integration during the decoding stage by integrating multi-scale and multi-path cascaded convolutional techniques and enhances feature aggregation through dual attention modules, skip connections, and a feature enhancer. MMCC-Net achieves superior performance in identifying polyp areas at the pixel level. The Proposed MMCC-Net was tested across six public datasets and compared against eight SOTA models to demonstrate its efficiency in polyp segmentation. The MMCC-Net's performance shows Dice scores with confidence interval ranging between 77.43 ± 0.12, (77.08, 77.56) and 94.45 ± 0.12, (94.19, 94.71) and Mean Intersection over Union (MIoU) scores with confidence interval ranging from 72.71 ± 0.19, (72.20, 73.00) to 90.16 ± 0.16, (89.69, 90.53) on the six databases. These results highlight the model's potential as a powerful tool for accurate and efficient polyp segmentation, contributing to early detection and prevention strategies in colorectal cancer.
- Published
- 2024
- Full Text
- View/download PDF
16. Robustness study of speaker recognition based on ECAPA-TDNN-CIFG.
- Author
-
Wang, Chunli, Xu, Linming, Zhu, Hongxin, and Cheng, Xiaoyang
- Subjects
- *
DELAY lines , *FEATURE extraction , *GENERALIZATION , *ALGORITHMS - Abstract
This paper describes a study on speaker recognition using the ECAPA-TDNN architecture, which stands for Extended Context-Aware Parallel Aggregations Time-Delay Neural Network. It utilizes X-vectors, a method for extracting speaker features by converting speech into fixed-length vectors, and introduces a squeeze-and-excitation block to model dependencies between channels. In order to better explore temporal relationships in the context of speaker recognition and improve the algorithm's generalization performance in complex acoustic scenarios, this study adds input gates and forget gates to the ECAPA-TDNN architecture, combining them with CIFG (Convolutional LSTM with Input and Forget Gates) modules. These are embedded into a residual structure of multi-layer aggregated features. A sub-center Arcface, an improved loss function based on Arcface, is used for selecting sub-centers for subclass discrimination, retaining advantageous sub-centers to enhance intra-class compactness and strengthen the robustness of the network. Experimental results demonstrate that the improved ECAPA-TDNN-CIFG in this study outperforms the baseline model, yielding more accurate and efficient recognition results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Boundary enhancement and refinement network for camouflaged object detection.
- Author
-
Xia, Chenxing, Cao, Huizhen, Gao, Xiuju, Ge, Bin, Li, Kuan-Ching, Fang, Xianjin, Zhang, Yan, and Liang, Xingzhu
- Abstract
Camouflaged object detection aims to locate and segment objects accurately that conceal themselves well in the environment. Despite the advancements in deep learning methods, prevalent issues persist, including coarse boundary identification in complex scenes and the ineffective integration of multi-source features. To this end, we propose a novel boundary enhancement and refinement network named BERNet, which mainly consists of three modules for enhancing and refining boundary information: an asymmetric edge module (AEM) with multi-groups dilated convolution block (GDCB), a residual mixed pooling enhanced module (RPEM), and a multivariate information interaction refiner module (M2IRM). AEM with GDCB is designed to obtain rich boundary clues, where different dilation rates are used to expand the receptive field. RPEM is capable of enhancing boundary features under the guidance of boundary cues to improve the detection accuracy of small and multiple camouflaged objects. M2IRM is introduced to refine the side-out prediction maps progressively under the supervision of the ground truth by the fusion of multi-source information. Comprehensive experiments on three benchmark datasets demonstrate the effectiveness of our BERNet with competitive state-of-the-art methods under the most evaluation metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. GaitSTAR: Spatial–Temporal Attention-Based Feature-Reweighting Architecture for Human Gait Recognition.
- Author
-
Bilal, Muhammad, Jianbiao, He, Mushtaq, Husnain, Asim, Muhammad, Ali, Gauhar, and ElAffendi, Mohammed
- Subjects
- *
COMPUTER vision , *DEEP learning , *FEATURE extraction , *BIOMETRIC identification , *DISCRIMINANT analysis , *GAIT in humans - Abstract
Human gait recognition (HGR) leverages unique gait patterns to identify individuals, but the effectiveness of this technique can be hindered due to various factors such as carrying conditions, foot shadows, clothing variations, and changes in viewing angles. Traditional silhouette-based systems often neglect the critical role of instantaneous gait motion, which is essential for distinguishing individuals with similar features. We introduce the "Enhanced Gait Feature Extraction Framework (GaitSTAR)", a novel method that incorporates dynamic feature weighting through the discriminant analysis of temporal and spatial features within a channel-wise architecture. Key innovations in GaitSTAR include dynamic stride flow representation (DSFR) to address silhouette distortion, a transformer-based feature set transformation (FST) for integrating image-level features into set-level features, and dynamic feature reweighting (DFR) for capturing long-range interactions. DFR enhances contextual understanding and improves detection accuracy by computing attention distributions across channel dimensions. Empirical evaluations show that GaitSTAR achieves impressive accuracies of 98.5%, 98.0%, and 92.7% under NM, BG, and CL conditions, respectively, with the CASIA-B dataset; 67.3% with the CASIA-C dataset; and 54.21% with the Gait3D dataset. Despite its complexity, GaitSTAR demonstrates a favorable balance between accuracy and computational efficiency, making it a powerful tool for biometric identification based on gait patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Drfnet: dual stream recurrent feature sharing network for video dehazing.
- Author
-
Galshetwar, Vijay M., Saini, Poonam, and Chaudhary, Sachin
- Abstract
The primary effects of haze on captured images/frames are visibility degradation and color disturbance. Even though extensive research has been done on the tasks of video dehazing, they fail to perform better on varicolored hazy videos. The varicolored haze is still a challenging problem in video de-hazing. To tackle the problem of varicolored haze, the contextual information alone is not sufficient. In addition to adequate contextual information, color balancing is required to restore varicolored hazy images/videos. Therefore, this paper proposes a novel lightweight dual stream recurrent feature sharing network (with only 1.77 M parameters) for video de-hazing. The proposed framework involves: (1) A color balancing module to balance the color of input hazy frame in YCbCr space, (2) A multi-receptive multi-resolution module (MMM), which interlinks the RGB and YCbCr based features to learn global and rich contextual data, (3) Further, we have proposed a feature aggregation residual module (FARM) to strengthen the representative capability during reconstruction, (4) A channel attention module is proposed to resist redundant features by recalibrating weights of input features. Experimental results and ablation study show that the proposed model is superior to existing state-of-the-art approaches for video de-hazing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Gaze Target Detection Network Based on Attention Mechanism and Depth Prior.
- Author
-
ZHU Yun, ZHU Dongchen, ZHANG Guanghui, SUN Yanzan, and ZHANG Xiaolin
- Subjects
GAZE ,NONVERBAL cues ,COMPUTER vision ,DATA mining ,ATTENTION ,INTENTION - Abstract
Human gaze behavior, as a non-verbal cue, plays a crucial role in revealing human intentions. Gaze target detection has attracted extensive attention from the machine vision community. However, existing gaze target detection methods usually focus on the texture information extraction of images, ignoring the importance of stereo depth information for gaze target detection, which makes it difficult to deal with scenes with complex texture. In this work, a novel gaze target detection network based on attention mechanism and depth prior is proposed, which adopts two-stage architecture (i.e., a gaze direction prediction stage and a saliency detection stage). In the gaze direction predication stage, a channel-spatial attention mechanism module is established to recalibrate texture features, and a head position encoding branch is designed to achieve texture and head position-aware enhanced high-representation features to accurately predict gaze. Furthermore, a strategy is proposed to introduce the depth representing the stereoscopic or distance information in the 3D scene as a prior into the saliency detection stage. At the same time, the channel-spatial attention mechanism is used to enhance the multi- scale texture features, and the advantages of depth geometric information and image texture information are fully utilized to improve the accuracy of gaze target detection. Experimental results show that the proposed model performs favorably against the state-of-the-art methods on GazeFollow and DLGaze datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Cascaded Aggregation Convolution Network for Salient Grain Pests Detection.
- Author
-
Yu, Junwei, Chen, Shihao, Liu, Nan, Zhai, Fupin, and Pan, Quan
- Subjects
- *
OBJECT recognition (Computer vision) , *PEST control , *GRAIN storage , *STORAGE facilities , *FOOD security - Abstract
Simple Summary: Infestations of pests in grain storage can have a significant impact on both the quantity and quality of stored grains. Drawing inspiration from the detection abilities of humans and birds in identifying pests, we present an innovative deep learning solution designed for the detection and management of pests in stored grains. Specifically focusing on the detection of small grain pests within cluttered backgrounds, we propose a cascaded feature aggregation convolution network. Our approach outperforms existing models in terms of both trainable parameters and detection accuracy, as evidenced by experiments conducted on our newly introduced GrainPest dataset as well as publicly available datasets. By sharing our dataset and refining our model's architecture, we aim to advance the field of research in grain pest detection and the classification of stored grains based on pest density. This study is expected to contribute to the reduction of economic losses caused by storage pests and to enhance food security measures. Pest infestation poses significant threats to grain storage due to pests' behaviors of feeding, respiration, excretion, and reproduction. Efficient pest detection and control are essential to mitigate these risks. However, accurate detection of small grain pests remains challenging due to their small size, high variability, low contrast, and cluttered background. Salient pest detection focuses on the visual features that stand out, improving the accuracy of pest identification in complex environments. Drawing inspiration from the rapid pest recognition abilities of humans and birds, we propose a novel Cascaded Aggregation Convolution Network (CACNet) for pest detection and control in stored grain. Our approach aims to improve detection accuracy by employing a reverse cascade feature aggregation network that imitates the visual attention mechanism in humans when observing and focusing on objects of interest. The CACNet uses VGG16 as the backbone network and incorporates two key operations, namely feature enhancement and feature aggregation. These operations merge the high-level semantic information and low-level positional information of salient objects, enabling accurate segmentation of small-scale grain pests. We have curated the GrainPest dataset, comprising 500 images showcasing zero to five or more pests in grains. Leveraging this dataset and the MSRA-B dataset, we validated our method's efficacy, achieving a structure S-measure of 91.9%, and 90.9%, and a weighted F-measure of 76.4%, and 91.0%, respectively. Our approach significantly surpasses the traditional saliency detection methods and other state-of-the-art salient object detection models based on deep learning. This technology shows great potential for pest detection and assessing the severity of pest infestation based on pest density in grain storage facilities. It also holds promise for the prevention and control of pests in agriculture and forestry. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Interpretable linear dimensionality reduction based on bias-variance analysis.
- Author
-
Bonetti, Paolo, Metelli, Alberto Maria, and Restelli, Marcello
- Subjects
CONTINUOUS groups ,MACHINE learning ,LINEAR statistical models ,DESIGN techniques ,ALGORITHMS - Abstract
One of the central issues of several machine learning applications on real data is the choice of the input features. Ideally, the designer should select a small number of the relevant, nonredundant features to preserve the complete information contained in the original dataset, with little collinearity among features. This procedure helps mitigate problems like overfitting and the curse of dimensionality, which arise when dealing with high-dimensional problems. On the other hand, it is not desirable to simply discard some features, since they may still contain information that can be exploited to improve results. Instead, dimensionality reduction techniques are designed to limit the number of features in a dataset by projecting them into a lower dimensional space, possibly considering all the original features. However, the projected features resulting from the application of dimensionality reduction techniques are usually difficult to interpret. In this paper, we seek to design a principled dimensionality reduction approach that maintains the interpretability of the resulting features. Specifically, we propose a bias-variance analysis for linear models and we leverage these theoretical results to design an algorithm, Linear Correlated Features Aggregation (LinCFA), which aggregates groups of continuous features with their average if their correlation is "sufficiently large". In this way, all features are considered, the dimensionality is reduced and the interpretability is preserved. Finally, we provide numerical validations of the proposed algorithm both on synthetic datasets to confirm the theoretical results and on real datasets to show some promising applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Deep Learning for Action Recognition
- Author
-
Wu, Zuxuan, Jiang, Yu-Gang, Shen, Xuemin Sherman, Series Editor, Wu, Zuxuan, and Jiang, Yu-Gang
- Published
- 2024
- Full Text
- View/download PDF
24. An Improved U-Net Model for Simultaneous Nuclei Segmentation and Classification
- Author
-
Liu, Taotao, Zhang, Dongdong, Wang, Hongcheng, Qi, Xumai, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Si, Zhanjun, editor, and Guo, Jiayang, editor
- Published
- 2024
- Full Text
- View/download PDF
25. MFANet: Multi-feature Aggregation Network for Domain Generalized Stereo Matching
- Author
-
Yang, Jinlong, Wang, Gang, Wu, Cheng, Chen, Dong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Huang, De-Shuang, editor, Pan, Yijie, editor, and Zhang, Qinhu, editor
- Published
- 2024
- Full Text
- View/download PDF
26. Graphite Ore Grade Classification Algorithm Based on Multi-scale Fused Image Features
- Author
-
Wang, Jionghui, Liu, Yaokun, Huang, Xueyu, Chang, Shaopeng, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Wu, Celimuge, editor, Chen, Xianfu, editor, Feng, Jie, editor, and Wu, Zhen, editor
- Published
- 2024
- Full Text
- View/download PDF
27. TextBFA: Arbitrary Shape Text Detection with Bidirectional Feature Aggregation
- Author
-
Xu, Hui, Wang, Qiu-Feng, Li, Zhenghao, Shi, Yu, Zhou, Xiang-Dong, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Luo, Biao, editor, Cheng, Long, editor, Wu, Zheng-Guang, editor, Li, Hongyi, editor, and Li, Chaojie, editor
- Published
- 2024
- Full Text
- View/download PDF
28. Dual-Memory Feature Aggregation for Video Object Detection
- Author
-
Fan, Diwei, Zheng, Huicheng, Dang, Jisheng, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
29. Relation-Guided Multi-stage Feature Aggregation Network for Video Object Detection
- Author
-
Yao, Tingting, Cao, Fuxiao, Mi, Fuheng, Li, Danmeng, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
30. Adversarial Keyword Extraction and Semantic-Spatial Feature Aggregation for Clinical Report Guided Thyroid Nodule Segmentation
- Author
-
Zhang, Yudi, Chen, Wenting, Li, Xuechen, Shen, Linlin, Lai, Zhihui, Kong, Heng, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
31. Self-guided Transformer for Video Super-Resolution
- Author
-
Xue, Tong, Wang, Qianrui, Huang, Xinyi, Li, Dengshi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
32. Deep Stereo Matching with Superpixel Based Feature and Cost
- Author
-
Zeng, Kai, Zhang, Hui, Wang, Wei, Wang, Yaonan, Mao, Jianxu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Liu, Qingshan, editor, Wang, Hanzi, editor, Ma, Zhanyu, editor, Zheng, Weishi, editor, Zha, Hongbin, editor, Chen, Xilin, editor, Wang, Liang, editor, and Ji, Rongrong, editor
- Published
- 2024
- Full Text
- View/download PDF
33. Learnable Context in Multiple Instance Learning for Whole Slide Image Classification and Segmentation
- Author
-
Huang, Yu-Yuan and Chu, Wei-Ta
- Published
- 2024
- Full Text
- View/download PDF
34. Estimation of Fractal Dimension and Segmentation of Brain Tumor with Parallel Features Aggregation Network.
- Author
-
Sultan, Haseeb, Ullah, Nadeem, Hong, Jin Seong, Kim, Seung Gu, Lee, Dong Chan, Jung, Seung Yong, and Park, Kang Ryoung
- Subjects
- *
FRACTAL dimensions , *BRAIN tumors , *PETRI nets , *DEEP learning , *DATABASES , *CANCER invasiveness - Abstract
The accurate recognition of a brain tumor (BT) is crucial for accurate diagnosis, intervention planning, and the evaluation of post-intervention outcomes. Conventional methods of manually identifying and delineating BTs are inefficient, prone to error, and time-consuming. Subjective methods for BT recognition are biased because of the diffuse and irregular nature of BTs, along with varying enhancement patterns and the coexistence of different tumor components. Hence, the development of an automated diagnostic system for BTs is vital for mitigating subjective bias and achieving speedy and effective BT segmentation. Recently developed deep learning (DL)-based methods have replaced subjective methods; however, these DL-based methods still have a low performance, showing room for improvement, and are limited to heterogeneous dataset analysis. Herein, we propose a DL-based parallel features aggregation network (PFA-Net) for the robust segmentation of three different regions in a BT scan, and we perform a heterogeneous dataset analysis to validate its generality. The parallel features aggregation (PFA) module exploits the local radiomic contextual spatial features of BTs at low, intermediate, and high levels for different types of tumors and aggregates them in a parallel fashion. To enhance the diagnostic capabilities of the proposed segmentation framework, we introduced the fractal dimension estimation into our system, seamlessly combined as an end-to-end task to gain insights into the complexity and irregularity of structures, thereby characterizing the intricate morphology of BTs. The proposed PFA-Net achieves the Dice scores (DSs) of 87.54%, 93.42%, and 91.02%, for the enhancing tumor region, whole tumor region, and tumor core region, respectively, with the multimodal brain tumor segmentation (BraTS)-2020 open database, surpassing the performance of existing state-of-the-art methods. Additionally, PFA-Net is validated with another open database of brain tumor progression and achieves a DS of 64.58% for heterogeneous dataset analysis, surpassing the performance of existing state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Multi-modality information refinement fusion network for RGB-D salient object detection.
- Author
-
Bao, Hua and Fan, Bo
- Subjects
- *
DATA fusion (Statistics) , *PROBLEM solving - Abstract
RGB-D salient object detection (SOD) has gained more and more research interest in recent years. Due to various imaging mechanisms of RGB and depth modalities, RGB-D images contain different information. Thus, how to effectively fuse multi-modality features and aggregate multi-scale features to generate accurate saliency prediction are still the problems. In this article, we present a Multi-Modality Information Refinement Fusion Network (MIRFNet) for RGB-D SOD to solve the problems. Specifically, a Feature-Enhancement and Cross-Refinement Module (FCM) is proposed to reduce redundant features and the gap between cross-modality data to achieve multi-modality feature fusion effectively. In FCM, the Feature-Enhancement step utilizes attention mechanisms to obtain enhanced features which contain less redundant information and more common salient information, and the Cross-Refinement step employs the enhanced features to reduce the gap between cross-modality features and achieve effective feature fusion. Then, we propose an Edge Guidance Module (EGM) to extract edge information from RGB features. Finally, to effectively aggregate multi-level features and achieve accurate saliency prediction, a Feature-Aggregation and Edge-Refinement Module (FEM) is designed, which introduces specific-modality information and edge information to conduct sufficient information interaction. In FEM, the Feature-Aggregation step aggregates multi-scale features with specific-modality information, and the Edge-Refinement step uses edge information to refine the aggregation features. Extensive experiments demonstrate that MIRFNet can achieve comparable performance against the other 12 SOTA methods on five datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Combining dual attention mechanism and efficient feature aggregation for road and vehicle segmentation from UAV imagery.
- Author
-
Trung Dung Nguyen, Trung Kien Pham, Chi Kien Ha, Long Ho Le, Thanh Quyen Ngo, and Hoanh Nguyen
- Subjects
TRAFFIC monitoring ,DRONE aircraft ,EMERGENCY management ,URBAN planning - Abstract
Unmanned aerial vehicles (UAVs) have gained significant popularity in recent years due to their ability to capture high-resolution aerial imagery for various applications, including traffic monitoring, urban planning, and disaster management. Accurate road and vehicle segmentation from UAV imagery plays a crucial role in these applications. In this paper, we propose a novel approach combining dual attention mechanisms and efficient multilayer feature aggregation to enhance the performance of road and vehicle segmentation from UAV imagery. Our approach integrates a spatial attention mechanism and a channel-wise attention mechanism to enable the model to selectively focus on relevant features for segmentation tasks. In conjunction with these attention mechanisms, we introduce an efficient multi-layer feature aggregation method that synthesizes and integrates multi-scale features at different levels of the network, resulting in a more robust and informative feature representation. Our proposed method is evaluated on the UAVid semantic segmentation dataset, showcasing its exceptional performance in comparison to renowned approaches such as U-Net, DeepLabv3+, and SegNet. The experimental results affirm that our approach surpasses these state-of-the-art methods in terms of segmentation accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Discriminative multi-scale adjacent feature for person re-identification.
- Author
-
Qi, Mengzan, Chan, Sixian, Hong, Feng, Yao, Yuan, and Zhou, Xiaolong
- Subjects
FEATURE extraction ,TRANSFORMER models ,IDENTIFICATION - Abstract
Recently, discriminative and robust identification information has played an increasingly critical role in Person Re-identification (Re-ID). It is a fact that the existing part-based methods demonstrate strong performance in the extraction of fine-grained features. However, their intensive partitions lead to semantic information ambiguity and background interference. Meanwhile, we observe that the body with different structural proportions. Hence, we assume that aggregation with the multi-scale adjacent features can effectively alleviate the above issues. In this paper, we propose a novel Discriminative Multi-scale Adjacent Feature (MSAF) learning framework to enrich semantic information and disregard background. In summary, we establish multi-scale interaction in two stages: the feature extraction stage and the feature aggregation stage. Firstly, a Multi-scale Feature Extraction (MFE) module is designed by combining CNN and Transformer structure to obtain the discriminative specific feature, as the basis for the feature aggregation stage. Secondly, a Jointly Part-based Feature Aggregation (JPFA) mechanism is revealed to implement adjacent feature aggregation with diverse scales. The JPFA contains Same-scale Feature Correlation (SFC) and Cross-scale Feature Correlation (CFC) sub-modules. Finally, to verify the effectiveness of the proposed method, extensive experiments are performed on the common datasets of Market-1501, CUHK03-NP, DukeMTMC, and MSMT17. The experimental results achieve better performance than many state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Efficient breast cancer diagnosis using multi‐level progressive feature aggregation based deep transfer learning system.
- Author
-
Patel, Vivek and Chaurasia, Vijayshri
- Subjects
- *
CANCER diagnosis , *INSTRUCTIONAL systems , *COMPUTER-aided diagnosis , *TUMOR classification , *MAMMOGRAMS , *BREAST cancer , *DEEP learning - Abstract
Breast cancer is a worldwide fatal disease that exists mostly among women. The deep learning technique has proven its effectiveness, but the performance of the existing deep learning systems is quite compromising. In this work, a deep transfer learning system is suggested for efficient breast cancer classification from histopathology images. This system is based on a novel multi‐level progressive feature aggregation (MPFA) and a spatial domain learning approach. The combination of a pretrained Resnet101 backbone network with MPFA is implemented to extract more significant features. In addition, a mixed‐dilated spatial domain learning network (MSLN) is further incorporated to enhance the receptive field and increase discrimination between features. The proposed method achieved superior performance as compared to the existing state‐of‐the‐art methods, offering 99.24% accuracy, a 98.79% F‐1 score, 98.59% precision, and 98.99% recall values over BreaKHis dataset. An ablation study is carried out over the ICIAR2018 dataset to verify the generalizability and effectiveness of the system. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. MFACNet: A Multi-Frame Feature Aggregating and Inter-Feature Correlation Framework for Multi-Object Tracking in Satellite Videos.
- Author
-
Zhao, Hu, Shen, Yanyun, Wang, Zhipan, and Zhang, Qingling
- Subjects
- *
OBJECT tracking (Computer vision) , *ARTIFICIAL satellite tracking , *VIDEOS , *ENVIRONMENTAL monitoring - Abstract
Efficient multi-object tracking (MOT) in satellite videos is crucial for numerous applications, ranging from surveillance to environmental monitoring. Existing methods often struggle with effectively exploring the correlation and contextual cues inherent in the consecutive features of video sequences, resulting in redundant feature inference and unreliable motion estimation for tracking. To address these challenges, we propose the MFACNet, a novel multi-frame features aggregating and inter-feature correlation framework for enhancing MOT in satellite videos with the idea of utilizing the features of consecutive frames. The MFACNet integrates multi-frame feature aggregation techniques with inter-feature correlation mechanisms to improve tracking accuracy and robustness. Specifically, our framework leverages temporal information across the features of consecutive frames to capture contextual cues and refine object representations over time. Moreover, we introduce a mechanism to explicitly model the correlations between adjacent features in video sequences, facilitating a more accurate motion estimation and trajectory associations. We evaluated the MFACNet using benchmark datasets for satellite-based video MOT tasks and demonstrated its superiority in terms of tracking accuracy and robustness over state-of-the-art performance by 2.0% in MOTA and 1.6% in IDF1. Our experimental results highlight the potential of precisely utilizing deep features from video sequences. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Speech Emotion Recognition Using a Multi-Time-Scale Approach to Feature Aggregation and an Ensemble of SVM Classifiers.
- Author
-
STEFANOWSKA, Antonina and ZIELIŃSKI, Sławomir K.
- Subjects
- *
EMOTION recognition , *DATA augmentation , *FEATURE extraction , *AUTOMATIC speech recognition , *GENETIC algorithms , *SPEECH , *SPEECH synthesis , *SUPPORT vector machines - Abstract
Due to its relevant real-life applications, the recognition of emotions from speech signals constitutes a popular research topic. In the traditional methods applied for speech emotion recognition, audio features are typically aggregated using a fixed-duration time window, potentially discarding information conveyed by speech at various signal durations. By contrast, in the proposed method, audio features are aggregated simultaneously using time windows of different lengths (a multi-time-scale approach), hence, potentially better utilizing information carried at phonemic, syllabic, and prosodic levels compared to the traditional approach. A genetic algorithm is employed to optimize the feature extraction procedure. The features aggregated at different time windows are subsequently classified by an ensemble of support vector machine (SVM) classifiers. To enhance the generalization property of the method, a data augmentation technique based on pitch shifting and time stretching is applied. According to the obtained results, the developed method outperforms the traditional one for the selected datasets, demonstrating the benefits of using a multi-time-scale approach to feature aggregation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Dual-stream Co-enhanced Network for Unsupervised Video Object Segmentation.
- Author
-
Hongliang Zhu, Hui Yin, Yanting Liu, and Ning Chen
- Subjects
OPTICAL flow ,OPTICAL images ,VIDEOS - Abstract
Unsupervised Video Object Segmentation (UVOS) is a highly challenging problem in computer vision as the annotation of the target object in the testing video is unknown at all. The main difficulty is to effectively handle the complicated and changeable motion state of the target object and the confusion of similar background objects in video sequence. In this paper, we propose a novel deep Dual-stream Co-enhanced Network (DC-Net) for UVOS via bidirectional motion cues refinement and multi-level feature aggregation, which can fully take advantage of motion cues and effectively integrate different level features to produce highquality segmentation mask. DC-Net is a dual-stream architecture where the two streams are co-enhanced by each other. One is a motion stream with a Motion-cues Refine Module (MRM), which learns from bidirectional optical flow images and produces fine-grained and complete distinctive motion saliency map, and the other is an appearance stream with a Multi-level Feature Aggregation Module (MFAM) and a Context Attention Module (CAM) which are designed to integrate the different level features effectively. Specifically, the motion saliency map obtained by the motion stream is fused with each stage of the decoder in the appearance stream to improve the segmentation, and in turn the segmentation loss in the appearance stream feeds back into the motion stream to enhance the motion refinement. Experimental results on three datasets (Davis2016, VideoSD, SegTrack-v2) demonstrate that DC-Net has achieved comparable results with some state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Spatial relaxation transformer for image super-resolution
- Author
-
Yinghua Li, Ying Zhang, Hao Zeng, Jinglu He, and Jie Guo
- Subjects
Super-resolution ,Vision transformer ,Feature aggregation ,Image enhancement ,Swin transformer ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Transformer-based approaches have demonstrated remarkable performance in image processing tasks due to their ability to model long-range dependencies. Current mainstream Transformer-based methods typically confine self-attention computation within windows to reduce computational burden. However, this constraint may lead to grid artifacts in the reconstructed images due to insufficient cross-window information exchange, particularly in image super-resolution tasks. To address this issue, we propose the Multi-Scale Texture Complementation Block based on Spatial Relaxation Transformer (MSRT), which leverages features at multiple scales and augments information exchange through cross windows attention computation. In addition, we introduce a loss function based on the prior of texture smoothness transformation, which utilizes the continuity of textures between patches to constrain the generation of more coherent texture information in the reconstructed images. Specifically, we employ learnable compressive sensing technology to extract shallow features from images, preserving image features while reducing feature dimensions and improving computational efficiency. Extensive experiments conducted on multiple benchmark datasets demonstrate that our method outperforms previous state-of-the-art approaches in both qualitative and quantitative evaluations.
- Published
- 2024
- Full Text
- View/download PDF
43. Discriminative multi-scale adjacent feature for person re-identification
- Author
-
Mengzan Qi, Sixian Chan, Feng Hong, Yuan Yao, and Xiaolong Zhou
- Subjects
Person re-identification ,Feature extraction ,Feature aggregation ,Discriminative feature ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract Recently, discriminative and robust identification information has played an increasingly critical role in Person Re-identification (Re-ID). It is a fact that the existing part-based methods demonstrate strong performance in the extraction of fine-grained features. However, their intensive partitions lead to semantic information ambiguity and background interference. Meanwhile, we observe that the body with different structural proportions. Hence, we assume that aggregation with the multi-scale adjacent features can effectively alleviate the above issues. In this paper, we propose a novel Discriminative Multi-scale Adjacent Feature (MSAF) learning framework to enrich semantic information and disregard background. In summary, we establish multi-scale interaction in two stages: the feature extraction stage and the feature aggregation stage. Firstly, a Multi-scale Feature Extraction (MFE) module is designed by combining CNN and Transformer structure to obtain the discriminative specific feature, as the basis for the feature aggregation stage. Secondly, a Jointly Part-based Feature Aggregation (JPFA) mechanism is revealed to implement adjacent feature aggregation with diverse scales. The JPFA contains Same-scale Feature Correlation (SFC) and Cross-scale Feature Correlation (CFC) sub-modules. Finally, to verify the effectiveness of the proposed method, extensive experiments are performed on the common datasets of Market-1501, CUHK03-NP, DukeMTMC, and MSMT17. The experimental results achieve better performance than many state-of-the-art methods.
- Published
- 2024
- Full Text
- View/download PDF
44. Pose Calibrated Feature Aggregation for Video Face Set Recognition in Unconstrained Environments
- Author
-
Ibrahim Ali Hasani and Omar Arif
- Subjects
Video face recognition ,feature aggregation ,frame selection ,open sets ,multi-stream networks ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This paper presents Pose Calibrated Feature Aggregation Network (PCFAN), an architecture for set/video face recognition. Using stacked attention blocks and a multi-modal architecture, it automatically assigns adaptive weights to every instance in the set, based on both the recognition embeddings and the associated face metadata. It uses these weights to produce a single, compact feature vector for the set. The model automatically learns to advocate for features from images with more favourable qualities and poses, which inherently hold more information. Our block can be inserted on top of any standard recognition model for set prediction and improved performance, particularly in unconstrained scenarios where subject pose and image quality vary considerably between frames. We test our approach on three challenging video face-recognition datasets, IJB-A, IJB-B, and YTF, and report state-of-the-art results. Moreover, a comparison with top aggregation methods as our baselines demonstrates that PCFAN is the superior approach.
- Published
- 2024
- Full Text
- View/download PDF
45. Three-Dimensional Millimeter-Wave Object Detector Based on the Enhancement of Local-Global Contextual Information
- Author
-
Yanyi Chang, Ying Liu, Zhaohui Bu, Haipo Cui, and Li Ding
- Subjects
Three-dimensional millimeter-wave images ,object detection ,IA-SSD ,local-global context ,feature aggregation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Millimeter-wave (MMW) point clouds, characterized by their low resolution and high noise, limit the detection accuracy of point-based IA-SSD method due to the inadequate consideration of contextual information in MMW scenarios. Therefore, this paper proposes a three-dimensional (3D) MMW object detector, greatly augmenting the detection performance of the baseline model IA-SSD by the integration of the local-global context information. Central to our approach is the implementation of a multi-scale feature aggregation (MFA) module in the encoder stage of IA-SSD, which utilizes a self-attention mechanism to apprehend local contextual distinctions. This module is further applied to the centroid aggregation stage to enhance the capture of local context from foreground points. Complementarily, a global feature fusion module is devised to combine global contextual insights, drawing upon the localized information delineated by the MFA modules. This integrated framework significantly diminishes the false detection rate while concurrently elevating the detection precision for occluded objects. Relative to the IA-SSD baseline, the empirical evaluations validate the efficiency of our proposed model, demonstrating marked decreases in false positives and false negatives. Specifically, there is a 2.78% and 7.39% improvement in AP_R40_0.25 and AP_R40_0.5, respectively. When the intersection-over-union threshold is set as 0.25 and 0.5, the corresponding recall rate increases by 2.13% and 6.2%, respectively. Moreover, the inference speed reaches 32.3 frames per second(FPS), only a slight decrease of 2.9 FPS compared to the baseline model. These results demonstrate that the proposed detector significantly enhances detection performance without compromising on speed, marking a considerable advancement in the domain of 3D MMW object detection.
- Published
- 2024
- Full Text
- View/download PDF
46. Super-Resolution GAN and Global Aware Object Detection System for Vehicle Detection in Complex Traffic Environments
- Author
-
Hongqing Wang, Jun Kit Chaw, Sim Kuan Goh, Liantao Shi, Ting Tin Tin, Nannan Huang, and Hong-Seng Gan
- Subjects
Intelligent vehicle detection ,self-attention ,multi-scale semantic feature ,generative adversarial network ,feature aggregation ,transportation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Intelligent vehicle detection systems have the potential to improve road safety and optimize traffic management. Despite the continuous advancements in AI technology, the detection of different types of vehicles in complex traffic environments remains a persistent challenge. In this paper, an end-to-end solution is proposed. The image enhancement part proposes a super-resolution synthetic image GAN (SSIGAN) to improve detection of small, distant objects in low-resolution (LR) images. An edge enhancer (EE) and a hierarchical self-attention module (HS) are applied to address the loss of high-frequency edge information and texture details in the super-resolved images. The output super-resolution (SR) image is fed into detection part. In the detection part, we introduce a global context-aware network (GCAFormer) for accurate vehicle detection. GCAFormer utilizes a cascade transformer backbone (CT) that enables internal information interaction and generates multi-scale feature maps. This approach effectively addresses the challenge of varying vehicle scales, ensuring robust detection performance. We also built in a cross-scale aggregation feature (CSAF) module inside GCAFormer, which fuses low- and high-dimensional semantic information and provides multi-resolution feature maps as input to the detection head, so as to make the network more adaptable to complex traffic environments and realize accurate detection. In addition, we validate the effectiveness of our proposed method on a large number of datasets, reaching 89.12% mAP on the KITTI dataset, 90.62% on the IITM-hetra, 86.83% on the Pascal VOC and 93.33% on the BDD-100k. The results were compared to SOTA and demonstrated the competitive advantages of our proposed method for Vehicle Detection in complex traffic environments.
- Published
- 2024
- Full Text
- View/download PDF
47. Feature Aggregation in Joint Sound Classification and Localization Neural Networks
- Author
-
Brendan Healy, Patrick Mcnamee, and Zahra Nili Ahmadabadi
- Subjects
Joint sound signal classification and localization ,multi-task deep learning ,feature aggregation ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Current state-of-the-art sound source localization (SSL) deep learning networks lack feature aggregation within their architecture. Feature aggregation within neural network architectures enhances model performance by enabling the consolidation of information from different feature scales, thereby improving feature robustness and invariance. We adapt feature aggregation sub-architectures from computer vision neural networks onto a baseline neural network architecture for SSL, the Sound Event Localization and Detection network (SELDnet). The incorporated sub-architecture are: Path Aggregation Network (PANet); Weighted Bi-directional Feature Pyramid Network (BiFPN); and a novel Scale Encoding Network (SEN). These sub-architectures were evaluated using two metrics for signal classification and two metrics for direction-of-arrival regression. The results show that models incorporating feature aggregations outperformed the baseline SELDnet, in both sound signal classification and localization. Among the feature aggregators, PANet exhibited superior performance compared to other methods, which were otherwise comparable. The results provide evidence that feature aggregation sub-architectures enhance the performance of sound detection neural networks, particularly in direction-of-arrival regression.
- Published
- 2024
- Full Text
- View/download PDF
48. HGM: A General High-Order Spatial and Spectral Global Feature Fusion Module for Visual Multitasking
- Author
-
Chengcheng Chen, Xiliang Zhang, Yuhao Zhou, Yugang Chang, and Weiming Zeng
- Subjects
Convolutional neural networks (CNNs) ,feature aggregation ,global frequency domain features ,high-order feature interaction ,remote sensing ship detection ,transformer architecture ,Ocean engineering ,TC1501-1800 ,Geophysics. Cosmic physics ,QC801-809 - Abstract
Recent computer vision research has mainly focused on designing efficient network architectures, with limited exploration of high- and low-frequency information in the high-order frequency domain. This study introduces a novel approach utilizing spatial and frequency domain information to design a high-order global feature fusion module (HGM) and develop a specialized remote sensing detection network, HGNet. HGM leverages cyclic convolution to achieve arbitrary high-order features, overcoming the second-order limitation of transformers. Furthermore, HGM integrates cyclic convolution and fast Fourier transform, utilizing the former to capture interaction information between high-order spatial and channel domains and the latter to transform high-order features from spatial to frequency domain for global information extraction. This combination fundamentally addresses the issue of long-distance dependency in convolutions and avoids quadratic growth in computational complexity. Moreover, we have constructed an information truncation gate to minimize high-order redundant features, achieving a “win–win” scenario for network accuracy and parameter efficiency. In addition, HGM acts as a plug-and-play module, boosting performance when integrated into various networks. Experimental findings reveal that HGNet achieves a 93.0% $\text{mAP}_{\text{0.5}}$ with just 12.1M parameters on the HRSID remote sensing ship detection dataset. In addition, applying HGM enhances a performance in CIFAR100 classification and WHDLD remote sensing segmentation tasks.
- Published
- 2024
- Full Text
- View/download PDF
49. Attention-guided cross-modal multiple feature aggregation network for RGB-D salient object detection
- Author
-
Bojian Chen, Wenbin Wu, Zhezhou Li, Tengfei Han, Zhuolei Chen, and Weihao Zhang
- Subjects
salient object detection (sod) ,rgb-d ,feature aggregation ,attention ,cross-modal ,Mathematics ,QA1-939 ,Applied mathematics. Quantitative methods ,T57-57.97 - Abstract
The goal of RGB-D salient object detection is to aggregate the information of the two modalities of RGB and depth to accurately detect and segment salient objects. Existing RGB-D SOD models can extract the multilevel features of single modality well and can also integrate cross-modal features, but it can rarely handle both at the same time. To tap into and make the most of the correlations of intra- and inter-modality information, in this paper, we proposed an attention-guided cross-modal multi-feature aggregation network for RGB-D SOD. Our motivation was that both cross-modal feature fusion and multilevel feature fusion are crucial for RGB-D SOD task. The main innovation of this work lies in two points: One is the cross-modal pyramid feature interaction (CPFI) module that integrates multilevel features from both RGB and depth modalities in a bottom-up manner, and the other is cross-modal feature decoder (CMFD) that aggregates the fused features to generate the final saliency map. Extensive experiments on six benchmark datasets showed that the proposed attention-guided cross-modal multiple feature aggregation network (ACFPA-Net) achieved competitive performance over 15 state of the art (SOTA) RGB-D SOD methods, both qualitatively and quantitatively.
- Published
- 2024
- Full Text
- View/download PDF
50. Hierarchical Attentive Feature Aggregation for Person Re-Identification
- Author
-
Husheng Dong and Ping Lu
- Subjects
Attention ,diverse features ,feature aggregation ,person re-identification ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Recent efforts on person re-identification have shown promising results by learning discriminative features via the multi-branch network. To further boost feature discrimination, attention mechanism has also been extensively employed. However, the branches on the main level rarely communicate with others in existing branching models, which may compromise the ability of mining diverse features. To mitigate this issue, a novel framework called Hierarchical Attentive Feature Aggregation (Hi-AFA) is proposed. In Hi-AFA, a hierarchical aggregation mechanism is applied to learn attentive features. The current feature map is not only fed into the next stage, but also aggregated into another branch, leading to hierarchical feature flows along depth and parallel branches. We also present a simple Feature Suppression Operation (FSO) and a Lightweight Dual Attention Module (LDAM) to guide feature learning. The FSO can partially erase the salient features already discovered, such that more potential clues can be mined by other branches with the help of LDAM. By this manner, the branches could cooperate to mine richer and more diverse feature representations. The hierarchical aggregation and multi-granularity feature learning are integrated into a unified architecture that builds upon OSNet, resulting a resource-economical and effective person re-identification model. Extensive experiments on four mainstream datasets, including Market-1501, DukeMTMC-reID, MSMT17, and CUHK03, are conducted to validate the effectiveness of the proposed method, and results show that state-of-the-art performance is achieved.
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.