556 results on '"feature aggregation"'
Search Results
352. Sequential inter-hop graph convolution neural network (SIhGCN) for skeleton-based human action recognition.
- Author
-
Setiawan, Feri, Yahya, Bernardo Nugroho, Chun, Seok-Ju, and Lee, Seok-Lyong
- Subjects
- *
CONVOLUTIONAL neural networks , *HUMAN behavior , *LAPLACIAN matrices , *HUMAN skeleton , *SKELETON - Abstract
• A graph convolution model for skeleton-based action recognition is proposed. • Normalized Laplacian Matrix is utilized to encode the graph information. • An attention-based feature aggregation is proposed to extract the salient features. • The proposed method achieves better results than the baseline models. Skeleton-based human action recognition has attracted a lot of attention due to its capability and potential to provide more information than just using the sequence of RGB images. The use of Graph Convolutional Neural Network (GCN) becomes more popular since it can model the human skeleton very well. However, the existing GCN architectures ignore the different levels of importance on each hop during the feature aggregation and use the final hop information for further calculation, resulting in considerable information loss. Besides, they use the standard Laplacian or adjacency matrix to encode the property of a graph into a set of vectors which has a limitation in terms of graph invariants. In this work, we propose a Sequential Inter-hop Graph Convolution Neural Network (SIhGCN) which can capture salient graph information from every single hop rather than the final hop only and our work utilizes the normalized Laplacian matrix which provides better representation since it relates well to graph invariants. The proposed method is validated on two large datasets, NTU-RBG + D and Kinetics, to demonstrate the superiority of our proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
353. Multi-scale feature aggregation and boundary awareness network for salient object detection.
- Author
-
Wu, Qin, Wang, Jianzhe, Chai, Zhilei, and Guo, Guodong
- Subjects
- *
AWARENESS , *PROBLEM solving , *OBJECT recognition (Computer vision) , *MAPS - Abstract
Salient object detection aims to detect the most visually distinctive objects in an image. Although existing FCN-based methods have shown strong advantages in this field, scale variation and complex boundary are still great challenges. In this paper, we propose a multi-scale feature aggregation and boundary awareness network to overcome the problems. Multi-scale feature aggregation module is proposed to integrate adjacent hierarchical features and the multiple aggregation strategy solves the problem of scale variation. To obtain more effective multi-scale features from integrated features, a cross feature refinement module is proposed to compose the decoder. For the issue of complex boundary, we design a boundary pixel awareness loss function to enable the network to acquire boundary information and generate high-quality saliency maps with better boundary. Experiments on five benchmark datasets show that our network outperforms recent state-of-the-art detectors quantitatively and qualitatively. • A multi-scale feature aggregation and boundary awareness network for salient object detection is proposed. • We adopt a multiple aggregation strategy to reduce the loss of local details in multi-level feature fusion. • A boundary pixel awareness loss is designed to solve complex bondary problem. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
354. Two-stage segmentation network with feature aggregation and multi-level attention mechanism for multi-modality heart images.
- Author
-
Song, Yuhui, Du, Xiuquan, Zhang, Yanping, and Li, Shuo
- Subjects
- *
CARDIAC imaging , *HEART , *CARDIOVASCULAR disease diagnosis , *IMAGE segmentation - Abstract
Accurate segmentation of cardiac substructures in multi-modality heart images is an important prerequisite for the diagnosis and treatment of cardiovascular diseases. However, the segmentation of cardiac images remains a challenging task due to (1) the interference of multiple targets, (2) the imbalance of sample size. Therefore, in this paper, we propose a novel two-stage segmentation network with feature aggregation and multi-level attention mechanism (TSFM-Net) to comprehensively solve these challenges. Firstly, in order to improve the effectiveness of multi-target features, we adopt the encoder-decoder structure as the backbone segmentation framework and design a feature aggregation module (FAM) to realize the multi-level feature representation (Stage 1). Secondly, because the segmentation results obtained from Stage 1 are limited to the decoding of single scale feature maps, we design a multi-level attention mechanism (MLAM) to assign more attention to the multiple targets, so as to get multi-level attention maps. We fuse these attention maps and concatenate the output of Stage 1 to carry out the second segmentation to get the final segmentation result (Stage 2). The proposed method has better segmentation performance and balance on 2017 MM-WHS multi-modality whole heart images than the state-of-the-art methods, which demonstrates the feasibility of TSFM-Net for accurate segmentation of heart images. • We propose TSFM-Net to solve the unbalance and disturbance problem of multi-target segmentation with single end-to-end network. • We design FAM to extract advanced features containing multi-level information for the guidance of segmentation. • We propose MLAM to focus on target areas in the intermediate feature maps and BAM could improve the salience of target. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
355. Infinite-dimensional feature aggregation via a factorized bilinear model.
- Author
-
Dai, Jindou, Wu, Yuwei, Gao, Zhi, and Jia, Yunde
- Subjects
- *
INNER product spaces , *ORDER statistics , *APPROXIMATION error - Abstract
• Infinite-dimensional features are directly aggregated without approximation error. • Our descriptors contain infinite order statistics of input features. • The sigmoid kernel is introduced to construct infinite-dimensional features. • Our method outperforms the state-of-the-art finite-dimensional and infinite-dimensional feature aggregation methods. Aggregating infinite-dimensional features has demonstrated superiority compared with their finite-dimensional counterparts. However, most existing methods approximate infinite-dimensional features with finite-dimensional representations, which inevitably results in approximation error and inferior performance. In this paper, we propose a non-approximate aggregation method that directly aggregates infinite-dimensional features rather than relying on approximation strategies. Specifically, since infinite-dimensional features are infeasible to store, represent and compute explicitly, we introduce a factorized bilinear model to capture pairwise second-order statistics of infinite-dimensional features as a global descriptor. It enables the resulting aggregation formulation to only involve the inner product in an infinite-dimensional space. The factorized bilinear model is calculated by a Sigmoid kernel to generate informative features containing infinite order statistics. Experiments on four visual tasks including the fine-grained, indoor scene, texture, and material classification, demonstrate that our method consistently achieves the state-of-the-art performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
356. Cross-scale global attention feature pyramid network for person search
- Author
-
Minjie Bian, Huahu Xu, Junsheng Xiao, and Yang Li
- Subjects
Person detection ,Feature aggregation ,business.industry ,Computer science ,Cosine similarity ,Pattern recognition ,Feature (computer vision) ,Signal Processing ,Pyramid ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Layer (object-oriented design) ,Cross scale ,business - Abstract
Person search aims to locate the target person in real unconstrained scene images. It faces many challenges such as multi-scale and fine-grained. To address the challenges, a novel cross-scale global attention feature pyramid network (CSGAFPN) is proposed. Firstly, we design a novel multi-head global attention module (MHGAM), which adopts cosine similarity and sparse query location methods to effectively capture cross-scale long-distance dependence. Then, we design the CSGAFPN, which extends top-down feature pyramid network with bottom-up connections and embeds MHGAMs to the connections. CSGAFPN can capture cross-scale long-distance global correlation from multi-scale feature maps, selectively strengthen important features and restrain less important features. CSGAFPN is applied for both person detection and person re-identification (reID) subtasks of person search, it can well handle the multi-scale and fine-grained challenges, and significantly improve person search performance. Furthermore, the output multi-scale feature maps of CSGAFPN are processed by an adaptive feature aggregation with attention (AFAA) layer to further improve the performance. Numerous experiments with two public person search datasets, CUHK-SYSU and PRW, show our CSGAFPN based approach acquires better performance than other state-of-the-art (SOTA) person search approaches.
- Published
- 2021
- Full Text
- View/download PDF
357. EAR: Efficient action recognition with local-global temporal aggregation
- Author
-
Yuexian Zou, Lei Gan, Guang Chen, and Can Zhang
- Subjects
Feature aggregation ,Computer science ,business.industry ,Feature vector ,Optical flow ,Representation (systemics) ,Pattern recognition ,FLOPS ,Motion (physics) ,Signal Processing ,Action recognition ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Temporal modeling - Abstract
Temporal modeling in videos is crucial for action recognition. Traditionally, it involves feature aggregation for both local motion and global semantic. In this paper, we propose an Efficient Action Recognition network (EAR), which includes a Persistence of Appearance (PA) module and a Various-timescale Aggregation (VA) module for local and global temporal aggregations respectively. For local motion aggregation, instead of using the previous time-consuming optical flow, our PA calculates pixel-wise differences in feature space as the motion representation, which is much more efficient (8196 fps vs. 8 fps in optical flow). Besides, to capture global semantic hints, we propose VA module which adaptively emphasizes expressive features and suppresses less informative ones across various timescales. Empowered by the local-global temporal aggregation, our EAR achieves competitive results on six challenging action recognition benchmarks at low FLOPs.
- Published
- 2021
- Full Text
- View/download PDF
358. BCHisto-Net: Breast histopathological image classification by global and local feature aggregation
- Author
-
Keerthana Prasad, R. Rashmi, and Chethana Babu K. Udupa
- Subjects
Feature aggregation ,Contextual image classification ,Computer science ,business.industry ,Deep learning ,Medicine (miscellaneous) ,Magnification ,Breast Neoplasms ,Pattern recognition ,medicine.disease ,Convolutional neural network ,Breast cancer ,Artificial Intelligence ,Computer-aided diagnosis ,Image Processing, Computer-Assisted ,medicine ,Humans ,Female ,Breast ,Neural Networks, Computer ,Artificial intelligence ,Focus (optics) ,business - Abstract
Breast cancer among women is the second most common cancer worldwide. Non-invasive techniques such as mammograms and ultrasound imaging are used to detect the tumor. However, breast histopathological image analysis is inevitable for the detection of malignancy of the tumor. Manual analysis of breast histopathological images is subjective, tedious, laborious and is prone to human errors. Recent developments in computational power and memory have made automation a popular choice for the analysis of these images. One of the key challenges of breast histopathological image classification at 100× magnification is to extract the features of the potential regions of interest to decide on the malignancy of the tumor. The current state-of-the-art CNN based methods for breast histopathological image classification extract features from the entire image (global features) and thus may overlook the features of the potential regions of interest. This can lead to inaccurate diagnosis of breast histopathological images. This research gap has motivated us to propose BCHisto-Net to classify breast histopathological images at 100× magnification. The proposed BCHisto-Net extracts both global and local features required for the accurate classification of breast histopathological images. The global features extract abstract image features while local features focus on potential regions of interest. Furthermore, a feature aggregation branch is proposed to combine these features for the classification of 100× images. The proposed method is quantitatively evaluated on red a private dataset and publicly available BreakHis dataset. An extensive evaluation of the proposed model showed the effectiveness of the local and global features for the classification of these images. The proposed method achieved an accuracy of 95% and 89% on KMC and BreakHis datasets respectively, outperforming state-of-the-art classifiers.
- Published
- 2021
- Full Text
- View/download PDF
359. Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention.
- Author
-
Jin Y, Wang M, Luo L, Zhao D, and Liu Z
- Subjects
- Algorithms, Hearing, Sound, Acoustics, Neural Networks, Computer
- Abstract
The complexity of polyphonic sounds imposes numerous challenges on their classification. Especially in real life, polyphonic sound events have discontinuity and unstable time-frequency variations. Traditional single acoustic features cannot characterize the key feature information of the polyphonic sound event, and this deficiency results in poor model classification performance. In this paper, we propose a convolutional recurrent neural network model based on the temporal-frequency (TF) attention mechanism and feature space (FS) attention mechanism (TFFS-CRNN). The TFFS-CRNN model aggregates Log-Mel spectrograms and MFCCs feature as inputs, which contains the TF-attention module, the convolutional recurrent neural network (CRNN) module, the FS-attention module and the bidirectional gated recurrent unit (BGRU) module. In polyphonic sound events detection (SED), the TF-attention module can capture the critical temporal-frequency features more capably. The FS-attention module assigns different dynamically learnable weights to different dimensions of features. The TFFS-CRNN model improves the characterization of features for key feature information in polyphonic SED. By using two attention modules, the model can focus on semantically relevant time frames, key frequency bands, and important feature spaces. Finally, the BGRU module learns contextual information. The experiments were conducted on the DCASE 2016 Task3 dataset and the DCASE 2017 Task3 dataset. Experimental results show that the F1-score of the TFFS-CRNN model improved 12.4% and 25.2% compared with winning system models in DCASE challenge; the ER is reduced by 0.41 and 0.37 as well. The proposed TFFS-CRNN model algorithm has better classification performance and lower ER in polyphonic SED.
- Published
- 2022
- Full Text
- View/download PDF
360. Adaptive key frame extraction for video summarization using an aggregation mechanism
- Author
-
Ejaz, Naveed, Tariq, Tayyab Bin, and Baik, Sung Wook
- Subjects
- *
ADAPTIVE control systems , *DATA extraction , *AGGREGATION (Statistics) , *VIDEO recording , *STATISTICAL correlation , *COLOR image processing , *STATISTICAL smoothing , *EXPERIMENTAL design - Abstract
Abstract: Video summarization is a method to reduce redundancy and generate succinct representation of the video data. One of the mechanisms to generate video summaries is to extract key frames which represent the most important content of the video. In this paper, a new technique for key frame extraction is presented. The scheme uses an aggregation mechanism to combine the visual features extracted from the correlation of RGB color channels, color histogram, and moments of inertia to extract key frames from the video. An adaptive formula is then used to combine the results of the current iteration with those from the previous. The use of the adaptive formula generates a smooth output function and also reduces redundancy. The results are compared to some of the other techniques based on objective criteria. The experimental results show that the proposed technique generates summaries that are closer to the summaries created by humans. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
361. A learning approach to hierarchical feature selection and aggregation for audio classification
- Author
-
Ruvolo, Paul, Fasel, Ian, and Movellan, Javier R.
- Subjects
- *
SIGNAL processing , *PERFORMANCE evaluation , *MACHINE learning , *FEATURE extraction , *PATTERN perception , *CLASSIFICATION - Abstract
Abstract: Audio classification typically involves feeding a fixed set of low-level features to a machine learning method, then performing feature aggregation before or after learning. Instead, we jointly learn a selection and hierarchical temporal aggregation of features, achieving significant performance gains. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
362. Point cloud up-sampling network with multi-level spatial local feature aggregation
- Author
-
Nan Li, Guang Zeng, Haisheng Li, and Xiaochuan Wang
- Subjects
General Computer Science ,Feature aggregation ,business.industry ,Computer science ,Aggregate (data warehouse) ,Point cloud ,Sampling (statistics) ,Convolution ,Computer graphics ,Hausdorff distance ,Control and Systems Engineering ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,Baseline (configuration management) ,business - Abstract
Point clouds is one of popular 3D representations in computer vision and computer graphics. However, due to the sparseness and non-uniformity, raw point cloud from scanning devices cannot applied to down-stream geometry analyzing tasks directly. In this paper, we propose an end-to-end point cloud up-sampling network to reconstruct the dense yet uniform-distributed point clouds. Firstly, we utilize the spatial relationship of local regions and capture point-wise features progressively. We then propose a novel network to aggregate those features from different levels. Finally, we design an up-sampling module which consists of multi-branch convolution units to generate the dense point clouds. We conduct sufficient experiments on currently available public benchmarks. Experimental results show that proposed method has achieved 0.103 and 0.010 performance on Hausdorff distance and Chamfer Distance on VisionAir dataset, in comparison with the baseline towards uniformity, proximity-to-surface and mesh reconstruction.
- Published
- 2021
- Full Text
- View/download PDF
363. Content Based Image Retrieval Using Unclean Positive Examples.
- Author
-
Jun Zhang and Lei Ye
- Subjects
- *
IMAGE retrieval , *IMAGE processing , *NOISE , *HEURISTIC , *SUPPORT vector machines - Abstract
Abstract--Conventional content-based image retrieval (CBIR) schemes employing relevance feedback may suffer from some problems in the practical applications. First, most ordinary users would like to complete their search in a single interaction especially on the web. Second, it is time consuming and difficult to label a lot of negative examples with sufficient variety. Third, ordinary users may introduce some noisy examples into the query. This correspondence explores solutions to a new issue that image retrieval using unclean positive examples. In the proposed scheme, multiple feature distances are combined to obtain image similarity using classification technology. To handle the noisy positive examples, a new two-step strategy is proposed by incorporating the methods of data cleaning and noise tolerant classifier. The extensive experiments carried out on two different real image collections validate the effectiveness of the proposed scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
364. FANet: Feature aggregation network for RGBD saliency detection.
- Author
-
Zhou, Xiaofei, Wen, Hongfa, Shi, Ran, Yin, Haibing, Zhang, Jiyong, and Yan, Chenggang
- Subjects
- *
K-nearest neighbor classification , *FEATURE extraction - Abstract
The crucial issue in RGBD saliency detection is how to adequately mine and fuse the geometric information and the appearance information contained in depth maps and RGB images, respectively. In this paper, we propose a novel feature aggregation network FANet including a feature extraction module and an aggregation module for RGBD saliency detection. The premier characteristic of FANet is the feature aggregation module consisting of a designed region enhanced module (REM) and a series of deployed hierarchical fusion module (HFM). Specifically, on one hand, the REM provides the powerful capability in differentiating salient objects and background. On the other hand, the HFM is used to gradually integrate high-level semantic information and low-level spatial details, where the K-nearest neighbor graph neural networks (KGNNs) and the non-local module (NLM) are embedded into HFM to dig the geometric information and enhance high-level appearance features, respectively. Extensive experiments on five RGBD datasets show that our model achieves compelling performance against the current 11 state-of-the-art RGBD saliency models. • We propose a novel feature aggregation network (FANet) for RGBD saliency detection. • Region enhanced module REM is used to differentiate salient regions and backgrounds. • Hierarchical fusion module HFM is used to aggregate multi-modal cues. • HFM is supported by the graph neural networks (KGNNs) and the non-local module (NLM). • Extensive experimental results verify the effectiveness of the proposed FANet. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
365. Multi-Scale Feature Aggregation Network for Water Area Segmentation.
- Author
-
Hu, Kai, Li, Meng, Xia, Min, and Lin, Haifeng
- Subjects
- *
IMAGE segmentation , *FEATURE extraction , *REMOTE sensing , *PROBLEM solving , *DEEP learning - Abstract
Water area segmentation is an important branch of remote sensing image segmentation, but in reality, most water area images have complex and diverse backgrounds. Traditional detection methods cannot accurately identify small tributaries due to incomplete mining and insufficient utilization of semantic information, and the edge information of segmentation is rough. To solve the above problems, we propose a multi-scale feature aggregation network. In order to improve the ability of the network to process boundary information, we design a deep feature extraction module using a multi-scale pyramid to extract features, combined with the designed attention mechanism and strip convolution, extraction of multi-scale deep semantic information and enhancement of spatial and location information. Then, the multi-branch aggregation module is used to interact with different scale features to enhance the positioning information of the pixels. Finally, the two high-performance branches designed in the Feature Fusion Upsample module are used to deeply extract the semantic information of the image, and the deep information is fused with the shallow information generated by the multi-branch module to improve the ability of the network. Global and local features are used to determine the location distribution of each image category. The experimental results show that the accuracy of the segmentation method in this paper is better than that in the previous detection methods, and has important practical significance for the actual water area segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
366. Fuzzy Clustering and Aggregation of Relational Data With Instance-Level Constraints.
- Author
-
Frigui, Hichem and Hwang, Cheul
- Subjects
FUZZY sets ,CLUSTER set theory ,CLUSTER analysis (Statistics) ,IMAGE databases ,IMAGE storage & retrieval systems - Abstract
In this paper, we introduce a semisupervised approach for clustering and aggregating relational data (SS-CARD). We assume that data is available in a relational form, where information only about the degrees to which pairs of objects in the dataset are related is available. Moreover, we assume that the relational information is represented by multiple dissimilarity matrices. These matrices could have been generated using different features, different mappings, or even different sensors. SS-CARD is designed to aggregate pairwise distances from multiple relational matrices, partition the data into clusters, and learn a relevance weight for each matrix in each cluster simultaneously. These weights have two main advantages. First, they help in partitioning the data into more meaningful clusters. Second, they can be used as part of a more complex learning system to enhance its learning behavior. SS-CARD uses partial supervision information that consists of a small set of constraints on which instances (should link) or (should not link) reside in the same cluster. This additional information can guide the algorithm in learning the optimal relevance weights and in generating a better partition. The performance of the proposed algorithm is illustrated by using it in two different applications. The first one consists of categorizing the discrete nominal-valued mushroom data. The second application consists of categorizing a collection of images where each image is represented by several continuous features. For both applications, we represent the pairwise image dissimilarities by multiple relational matrices extracted from different feature sets. The results are compared with those obtained by three traditional relational clustering methods. We show that the partial supervision information and the learned aggregation weights can improve the results significantly. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
367. Clustering and aggregation of relational data with applications to image database categorization
- Author
-
Frigui, Hichem, Hwang, Cheul, and Chung-Hoon Rhee, Frank
- Subjects
- *
ALGORITHMS , *UNIVERSAL algebra , *INSTRUCTIONAL systems , *SELF-organizing systems - Abstract
Abstract: In this paper, we introduce a new algorithm for clustering and aggregating relational data (CARD). We assume that data is available in a relational form, where we only have information about the degrees to which pairs of objects in the data set are related. Moreover, we assume that the relational information is represented by multiple dissimilarity matrices. These matrices could have been generated using different sensors, features, or mappings. CARD is designed to aggregate pairwise distances from multiple relational matrices, partition the data into clusters, and learn a relevance weight for each matrix in each cluster simultaneously. The cluster dependent relevance weights offer two advantages. First, they guide the clustering process to partition the data set into more meaningful clusters. Second, they can be used in subsequent steps of a learning system to improve its learning behavior. The performance of the proposed algorithm is illustrated by using it to categorize a collection of 500 color images. We represent the pairwise image dissimilarities by six different relational matrices that encode color, texture, and structure information. [Copyright &y& Elsevier]
- Published
- 2007
- Full Text
- View/download PDF
368. Part-Based Feature Aggregation Method for Dynamic Scene Recognition
- Author
-
Abdesselam Bouzerdoum and Xiaoming Peng
- Subjects
Feature aggregation ,business.industry ,Computer science ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Set cover problem ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Discriminative model ,Feature (computer vision) ,0202 electrical engineering, electronic engineering, information engineering ,Deep neural networks ,020201 artificial intelligence & image processing ,Artificial intelligence ,Layer (object-oriented design) ,business ,Representation (mathematics) ,0105 earth and related environmental sciences - Abstract
Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed for aggregating local features from multiple video frames. A pre-trained Fast R-CNN model is used to extract local convolutional layer features from the regions of interest (ROIs) of training images. These features are then clustered to locate representative parts. A set cover problem is formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN. Local convolutional layer features and fully-connected layer features are extracted using the fine-tuned Fast R-CNN model, and then aggregated separately from a video segment to form two feature representations. They are concatenated into a global feature representation. Experimental results show that the proposed method outperforms several state-of-the-art features on two dynamic scene datasets.
- Published
- 2019
- Full Text
- View/download PDF
369. Sparse Graph Attention Networks
- Author
-
Yang Ye and Shihao Ji
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Theoretical computer science ,Dense graph ,Feature aggregation ,Computer science ,Graph neural networks ,Machine Learning (stat.ML) ,Overfitting ,Regularization (mathematics) ,Graph ,Computer Science Applications ,Machine Learning (cs.LG) ,Computational Theory and Mathematics ,Statistics - Machine Learning ,Code (cryptography) ,Feature learning ,Information Systems ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
Graph Neural Networks (GNNs) have proved to be an effective representation learning framework for graph-structured data, and have achieved state-of-the-art performance on many practical predictive tasks, such as node classification, link prediction and graph classification. Among the variants of GNNs, Graph Attention Networks (GATs) learn to assign dense attention coefficients over all neighbors of a node for feature aggregation, and improve the performance of many graph learning tasks. However, real-world graphs are often very large and noisy, and GATs are prone to overfitting if not regularized properly. Even worse, the local aggregation mechanism of GATs may fail on disassortative graphs, where nodes within local neighborhood provide more noise than useful information for feature aggregation. In this paper, we propose Sparse Graph Attention Networks (SGATs) that learn sparse attention coefficients under an $L_0$-norm regularization, and the learned sparse attentions are then used for all GNN layers, resulting in an edge-sparsified graph. By doing so, we can identify noisy/task-irrelevant edges, and thus perform feature aggregation on most informative neighbors. Extensive experiments on synthetic and real-world graph learning benchmarks demonstrate the superior performance of SGATs. In particular, SGATs can remove about 50\%-80\% edges from large assortative graphs, while retaining similar classification accuracies. On disassortative graphs, SGATs prune majority of noisy edges and outperform GATs in classification accuracies by significant margins. Furthermore, the removed edges can be interpreted intuitively and quantitatively. To the best of our knowledge, this is the first graph learning algorithm that shows significant redundancies in graphs and edge-sparsified graphs can achieve similar or sometimes higher predictive performances than original graphs., Published as a journal paper at IEEE TKDE 2021
- Published
- 2019
370. A Graph Based Unsupervised Feature Aggregation for Face Recognition
- Author
-
Venkata Sai Vijay Kumar Pedapudi, Yu Cheng, Liu Qiankun, Xiaotian Fan, Sheng Mei Shen, Chi Su, Yuan Yao, and Yanfeng Li
- Subjects
Feature aggregation ,Iterative method ,business.industry ,Computer science ,Gaussian ,Feature extraction ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Directed graph ,Mutual information ,Facial recognition system ,Graph ,symbols.namesake ,Matrix (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
In most of the testing dataset, the images are collected from video clips or different environment conditions, which implies that the mutual information between pairs are significantly important. To address this problem and utilize this information, in this paper, we propose a graph-based unsupervised feature aggregation method for face recognition. Our method uses the inter-connection between pairs with a directed graph approach thus refine the pair-wise scores. First, based on the assumption that all features follow Gaussian distribution, we derive a iterative updating formula of features. Second, in discrete conditions, we build a directed graph where the affinity matrix is obtained from pair-wise similarities, and filtered by a pre-defined threshold along with K-nearest neighbor. Third, the affinity matrix is used to obtain a pseudo center matrix for the iterative update process. Besides evaluation on face recognition testing dataset, our proposed method can further be applied to semi-supervised learning to handle the unlabelled data for improving the performance of the deep models. We verified the effectiveness on 5 different datasets: IJB-C, CFP, YTF, TrillionPair and IQiYi Video dataset.
- Published
- 2019
- Full Text
- View/download PDF
371. Multiple Context Aggregation Network for Saliency Prediction
- Author
-
Shuai Li, Yipeng Liu, Ce Zhu, Lingtong Meng, and Feng Jiaqi
- Subjects
Multiple context ,Kernel (image processing) ,Feature aggregation ,business.industry ,Computer science ,Deep learning ,Pattern recognition ,Artificial intelligence ,business ,Convolutional neural network - Abstract
With the rapid development of deep learning techniques, saliency prediction has been widely studied with deep convolutional neural networks (DCNNs). In this paper, we delve into the feature characteristics that heavily affect saliency prediction, feature scale and feature level, to be specific. It is shown that in addition to high-level features, large-scale and/or low-level features are also important to saliency prediction. Therefore, a dilated asymmetric convolution with large kernel (DACLK) is designed to obtain large-scale features. Then, a global multi-level feature aggregation method (GMLFA) is proposed to incorporate features from shallow layers. GMLFA can increase the scale of the features by down-sampling operations and improve saliency prediction with more large-scale (global) multilevel features. Based on the two modules, a multiple context aggregation network (MCA-Net) is developed for saliency prediction. Experiments demonstrate that MCA-Net outperforms other models on SALICON test dataset and have very competitive results on MIT300 Benchmark.
- Published
- 2019
- Full Text
- View/download PDF
372. A Single-Shot Object Detector with Feature Aggregation and Enhancement
- Author
-
Guizhong Liu and Weiqiang Li
- Subjects
Feature aggregation ,Computer science ,business.industry ,Feature extraction ,Detector ,Single shot ,02 engineering and technology ,Pascal (programming language) ,010501 environmental sciences ,01 natural sciences ,Object detection ,0202 electrical engineering, electronic engineering, information engineering ,Object detector ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences ,computer.programming_language - Abstract
For many real applications, it’s equally important to detect objects accurately and quickly. In this paper, we propose an accurate and efficient single shot object detector with feature aggregation and enhancement (FAENet). Our motivation is to enhance and exploit the shallow and deep feature maps of the whole network simultaneously. To achieve it we introduce a pair of novel feature aggregation modules and two feature enhancement blocks, and integrate them into the original structure of SSD. Extensive experiments on both the PASCAL VOC and MS COCO datasets demonstrate that the proposed method achieves much higher accuracy than SSD. In addition, our method performs better than the state-of-the-art one-stage detector RefineDet on small objects and can run at a faster speed.
- Published
- 2019
- Full Text
- View/download PDF
373. Dense Feature Aggregation and Pruning for RGBT Tracking
- Author
-
Yabin Zhu, Jin Tang, Chenglong Li, Xiao Wang, and Bin Luo
- Subjects
FOS: Computer and information sciences ,Boosting (machine learning) ,Feature aggregation ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Pooling ,Computer Science - Computer Vision and Pattern Recognition ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Convolutional neural network ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
How to perform effective information fusion of different modalities is a core factor in boosting the performance of RGBT tracking. This paper presents a novel deep fusion algorithm based on the representations from an end-to-end trained convolutional neural network. To deploy the complementarity of features of all layers, we propose a recursive strategy to densely aggregate these features that yield robust representations of target objects in each modality. In different modalities, we propose to prune the densely aggregated features of all modalities in a collaborative way. In a specific, we employ the operations of global average pooling and weighted random selection to perform channel scoring and selection, which could remove redundant and noisy features to achieve more robust feature representation. Experimental results on two RGBT tracking benchmark datasets suggest that our tracker achieves clear state-of-the-art against other RGB and RGBT tracking methods., arXiv admin note: text overlap with arXiv:1811.09855
- Published
- 2019
374. Using Temporal Feature Aggregation and Gradient Boosting Tree on Missing Data Imputation
- Author
-
Guotong Xie, Jia Xiaoyu, Kang Yanni, and Xiang Li
- Subjects
Time information ,Feature aggregation ,Computer science ,Missing data imputation ,Missing value imputation ,Gradient boosting ,Data mining ,Imputation (statistics) ,Missing data ,computer.software_genre ,computer ,Decision tree model - Abstract
The goal of the 7th IEEE ICHI Challenge is to find the most appropriate method to impute missing data in the longitudinal ICU laboratory test data derived from MIMIC-III [1]. Missing data/measurements is the most common problem in clinical data, because clinical data series contains many streams of measurements that are sampled at multiple and irregular time, which would generate missing data at different rates. However, these missing measurements is often critical for accurate diagnosis, prognosis, treatment, and accurate modeling and statistical analyses as well. In this challenge, during missing value imputation process, we trained 13 models independently for each feature. For every model, we extracted 200 features including the measurements, statistics, time intervals and other time information as the features of the predict model to capture the missing data. The missing target values are interpolated by using both the temporal relationships within each stream and the relationships across streams. However, the replacements for missing values are different and are determined by the model performance of distinct imputation strategies. This current study utilized two imputation methods: replacing the missing value with the adjacent measurement, and using the tree model that can automatically identify the missing patterns to impute the missing data when needed.
- Published
- 2019
- Full Text
- View/download PDF
375. Content-Dependent Image Search System for Aggregation of Color, Shape and Texture Features
- Author
-
Achmad Basuki, Arvita Agus Kurniasari, and Ali Ridho Barakbah
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Content-dependent Image Search ,Image Search System ,Pattern recognition ,Texture (geology) ,Image (mathematics) ,Feature Aggregation ,lcsh:TA1-2040 ,Image Feature Extraction ,Content (measure theory) ,Artificial intelligence ,business ,lcsh:Engineering (General). Civil engineering (General) - Abstract
The existing image search system often faces difficulty to find a appropriate retrieved image corresponding to an image query. The difficulty is commonly caused by that the users’ intention for searching image is different with dominant information of the image collected from feature extraction. In this paper we present a new approach for content-dependent image search system. The system utilizes information of color distribution inside an image and detects a cloud of clustered colors as something - supposed as an object. We applies segmentation of image as content-dependent process before feature extraction in order to identify is there any object or not inside an image. The system extracts 3 features, which are color, shape, and texture features and aggregates these features for similarity measurement between an image query and image database. HSV histogram color is used to extract color feature of image. While the shape feature extraction used Connected Component Labeling (CCL) which is calculated the area value, equivalent diameter, extent, convex hull, solidity, eccentricity, and perimeter of each object. The texture feature extraction used Leung Malik (LM)’s approach with 15 kernels. For applicability of our proposed system, we applied the system with benchmark 1000 image SIMPLIcity dataset consisting of 10 categories namely Africans, beaches, buildings historians, buses, dinosaurs, elephants, roses, horses, mountains, and food. The experimental results performed 62% accuracy rate to detect objects by color feature, 71% by texture feature, 60% by shape feature, 72% by combined color-texture feature, 67% by combined color-shape feature, 72 % combined texture-shape features and 73% combined all features.
- Published
- 2019
376. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation
- Author
-
Hanchao Li, Haoqiang Fan, Pengfei Xiong, and Jian Sun
- Subjects
FOS: Computer and information sciences ,Feature aggregation ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Segmentation ,Artificial intelligence ,business - Abstract
This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. Our proposed network starts from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade respectively. Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed and segmentation performance. Experiments on Cityscapes and CamVid datasets demonstrate the superior performance of DFANet with 8$\times$ less FLOPs and 2$\times$ faster than the existing state-of-the-art real-time semantic segmentation methods while providing comparable accuracy. Specifically, it achieves 70.3\% Mean IOU on the Cityscapes test dataset with only 1.7 GFLOPs and a speed of 160 FPS on one NVIDIA Titan X card, and 71.3\% Mean IOU with 3.4 GFLOPs while inferring on a higher resolution image.
- Published
- 2019
- Full Text
- View/download PDF
377. Aggregation-Induced Emission of Triphenyl-Substituted Tristyrylbenzenes
- Author
-
Frank Rominger, Hao Zhang, Jan Freudenberg, and Uwe H. F. Bunz
- Subjects
Feature aggregation ,010405 organic chemistry ,Chemistry ,Intermolecular force ,Organic Chemistry ,Cover (algebra) ,General Chemistry ,Aggregation-induced emission ,010402 general chemistry ,Photochemistry ,01 natural sciences ,Catalysis ,0104 chemical sciences - Abstract
The synthesis, properties and X-ray single-crystal structures of two regioisomeric triphenyl tristyrylbenzenes are reported. Both C3v and Cs derivatives display aggregation-induced emission (AIE) behavior. Regioisomerism impacts the solid-state intermolecular interactions, the photophysical characteristics and photostability in solution.
- Published
- 2019
378. Multimodal Deep Feature Aggregation for Facial Action Unit Recognition using Visible Images and Physiological Signals
- Author
-
Venu Govindaraju, Nagashri N. Lakshminarayana, Nishant Sankaran, and Srirangaraj Setlur
- Subjects
Feature aggregation ,Computer science ,business.industry ,Deep learning ,05 social sciences ,Representation (systemics) ,Pattern recognition ,010501 environmental sciences ,01 natural sciences ,Metadata ,Improved performance ,Pulse rate ,Action (philosophy) ,0502 economics and business ,Artificial intelligence ,050207 economics ,business ,0105 earth and related environmental sciences - Abstract
In this paper we present a feature aggregation method to combine the information from the visible light domain and the physiological signals for predicting the 12 facial action units in the MMSE dataset. Although multimodal affect analysis has gained lot of attention, the utility of physiological signals in recognizing facial action units is relatively unexplored. In this paper we investigate if physiological signals such as Electro Dermal Activity (EDA), Respiration Rate and Pulse Rate can be used as metadata for action unit recognition. We exploit the effectiveness of deep learning methods to learn an optimal combined representation that is derived from the individual modalities. We obtained an improved performance on MMSE dataset further validating our claim. To the best of our knowledge this is the first study on facial action unit recognition using physiological signals.
- Published
- 2019
- Full Text
- View/download PDF
379. Feature Aggregation in Perceptual Loss for Ultra Low-Dose (ULD) CT Denoising
- Author
-
Arnaldo Mayer, Michael Green, Edith M. Marom, Eli Konen, and Nahum Kiryati
- Subjects
Ultra low dose ,Feature aggregation ,Computer science ,Image quality ,business.industry ,Noise reduction ,Physics::Medical Physics ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,Convolutional neural network ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,Feature (computer vision) ,Computer Science::Computer Vision and Pattern Recognition ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
Lung cancer CT screening programs are continuously reducing patient exposure to radiation at the expense of image quality. State-of-the-art denoising algorithms are instrumental in preserving the diagnostic value of these images. In this work, a novel neural denoising scheme is proposed for ULD chest CT. The proposed method aggregates multi-scale features that provide rich information for the computation of a perceptive loss. The loss is further optimized for chest CT data by using denoising auto-encoders on real CT images to build the feature extracting network instead of using an existing network trained on natural images. The proposed method was validated on co-registered pairs of real ULD and normal dose scans and compared favorably with published state-of-the-art denoising$ networks both qualitatively and quantitatively.
- Published
- 2019
- Full Text
- View/download PDF
380. Learning Feature Aggregation in Temporal Domain for Re-Identification
- Author
-
Jakub Sochor, Petr Dobeš, Adam Herout, Jakub Špaňhel, Roman Juránek, and Vojtěch Bartl
- Subjects
FOS: Computer and information sciences ,Thesaurus (information retrieval) ,Feature aggregation ,Computer science ,business.industry ,Feature vector ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Viewpoints ,Object (computer science) ,Weighting ,Domain (software engineering) ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Focus (optics) ,business ,Software - Abstract
Person re-identification is a standard and established problem in the computer vision community. In recent years, vehicle re-identification is also getting more attention. In this paper, we focus on both these tasks and propose a method for aggregation of features in temporal domain as it is common to have multiple observations of the same object. The aggregation is based on weighting different elements of the feature vectors by different weights and it is trained in an end-to-end manner by a Siamese network. The experimental results show that our method outperforms other existing methods for feature aggregation in temporal domain on both vehicle and person re-identification tasks. Furthermore, to push research in vehicle re-identification further, we introduce a novel dataset CarsReId74k. The dataset is not limited to frontal/rear viewpoints. It contains 17,681 unique vehicles, 73,976 observed tracks, and 277,236 positive pairs. The dataset was captured by 66 cameras from various angles., Under consideration at Computer Vision and Image Understanding
- Published
- 2019
381. Unified Image Aesthetic Prediction via Scanpath-Guided Feature Aggregation Network
- Author
-
Ying Yu, Xiaodan Zhang, Xinbo Gao, Lihuo He, and Wen Lu
- Subjects
Feature aggregation ,Computer science ,business.industry ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Convolutional neural network ,Gaze ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0105 earth and related environmental sciences - Abstract
The performance of automatic aesthetic prediction has achieved significant improvement by utilizing deep convolutional neural networks (CNNs). However, existing CNN methods can only achieve limited success because (1) most of the methods take one fixed-size patch as the training example, which loses the fine-grained details and the holistic layout information, and (2) most of the methods ignore the biologically cues such as the gaze shifting sequence in image aesthetic assessment. To address these challenges, we propose a scanpath-guided feature aggregation model for aesthetic prediction. In our model, human fixation map and the view scanpath are predicted by a multi-scale network. Then a sequence of regions are adaptively selected according to the scanpath. These attended regions are then progressively fed into the CNN and LSTM network to accumulate the information, yielding a compact image level representation. Extensive experiments on the large scale aesthetics assessment benchmark AVA and Photo.net data set thoroughly demonstrate the efficacy of our approach for unified aesthetic prediction tasks: (i) aesthetic quality classification; (ii) aesthetic score regression; and (iii) aesthetic score distribution prediction.
- Published
- 2019
- Full Text
- View/download PDF
382. PFA-ScanNet: Pyramidal Feature Aggregation with Synergistic Learning for Breast Cancer Metastasis Analysis
- Author
-
Hao Chen, Pheng-Ann Heng, Huangjing Lin, and Zixu Zhao
- Subjects
Feature aggregation ,Computer science ,business.industry ,Fast scanning ,Cancer metastasis ,Pattern recognition ,Breast cancer metastasis ,Convolutional neural network ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,030220 oncology & carcinogenesis ,Artificial intelligence ,business - Abstract
Automatic detection of cancer metastasis from whole slide images (WSIs) is a crucial step for following patient staging and prognosis. Recent convolutional neural network based approaches are struggling with the trade-off between accuracy and computational efficiency due to the difficulty in processing large-scale gigapixel WSIs. To meet this challenge, we propose a novel Pyramidal Feature Aggregation ScanNet (PFA-ScanNet) for robust and fast analysis of breast cancer metastasis. Our method mainly benefits from the aggregation of extracted local-to-global features with diverse receptive fields, as well as the proposed synergistic learning for training the main detector and extra decoder with semantic guidance. Furthermore, a high-efficiency inference mechanism is designed with dense pooling layers, which allows dense and fast scanning for gigapixel WSI analysis. As a result, the proposed PFA-ScanNet achieved the state-of-the-art FROC of 89.1% on the Camelyon16 dataset, as well as competitive kappa score of 0.905 on the Camelyon17 leaderboard without model ensemble. In addition, our method shows leading speed advantage over other methods, about 7.2 min per WSI with a single GPU, making automatic analysis of breast cancer metastasis more applicable in the clinical usage.
- Published
- 2019
- Full Text
- View/download PDF
383. On the Importance of Feature Aggregation for Face Reconstruction
- Author
-
Xiang Xu, Ioannis A. Kakadiaris, and Ha Le
- Subjects
Feature aggregation ,Artificial neural network ,business.industry ,Computer science ,Pattern recognition ,Reconstruction algorithm ,02 engineering and technology ,Iterative reconstruction ,Solid modeling ,01 natural sciences ,Set (abstract data type) ,Face (geometry) ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Sensitivity (control systems) ,010306 general physics ,business - Abstract
The goal of this work is to seek principles of designing a deep neural network for 3D face reconstruction from a single image. To make the evaluation simple, we generated a synthetic dataset and used it for evaluation. We conducted extensive experiments using an end-to-end face reconstruction algorithm using E2FAR and its variations, and analyzed the reason why it can be successfully applied for 3D face reconstruction. From the comparative studies, we conclude that feature aggregation from different layers is a key point to training better neural networks for 3D face reconstruction. Based on these observations, a face reconstruction feature aggregation network (FR-FAN) is proposed, which obtains significant improvements compared with baselines on the synthetic validation set. We evaluate our model on existing popular indoor and in-the-wild 2D-3D datasets. Extensive experiments demonstrate that FR-FAN performs 16.50% and 9.54% better than E2FAR on BU-3DFE and JNU-3D, respectively. Finally, the sensitivity analysis we performed on controlled datasets demonstrates that our designed network is robust to large variations of pose, illumination, and expressions.
- Published
- 2019
- Full Text
- View/download PDF
384. Computationally Modeling the Impact of Task-Appropriate Language Complexity and Accuracy on Human Grading of German Essays
- Author
-
Anja Riemenschneider, Zarah Weiss, Detmar Meurers, and Pauline Schröter
- Subjects
Language complexity ,Feature aggregation ,Computer science ,business.industry ,Rubric ,computer.software_genre ,language.human_language ,German ,ComputingMilieux_COMPUTERSANDEDUCATION ,language ,Student writing ,Educational science ,Overall performance ,Artificial intelligence ,Grading (education) ,business ,computer ,Natural language processing - Abstract
Computational linguistic research on the language complexity of student writing typically involves human ratings as a gold standard. However, educational science shows that teachers find it difficult to identify and cleanly separate accuracy, different aspects of complexity, contents, and structure. In this paper, we therefore explore the use of computational linguistic methods to investigate how task-appropriate complexity and accuracy relate to the grading of overall performance, content performance, and language performance as assigned by teachers. Based on texts written by students for the official school-leaving state examination (Abitur), we show that teachers successfully assign higher language performance grades to essays with higher task-appropriate language complexity and properly separate this from content scores. Yet, accuracy impacts teacher assessment for all grading rubrics, also the content score, overemphasizing the role of accuracy. Our analysis is based on broad computational linguistic modeling of German language complexity and an innovative theory- and data-driven feature aggregation method inferring task-appropriate language complexity.
- Published
- 2019
- Full Text
- View/download PDF
385. Multi-label Image Set Recognition in Visually-Aware Recommender Systems
- Author
-
Kirill V. Demochkin and Andrey V. Savchenko
- Subjects
Set (abstract data type) ,Training set ,Feature aggregation ,Computer science ,business.industry ,Feature vector ,Pattern recognition ,Artificial intelligence ,Recommender system ,Focus (optics) ,business ,Convolutional neural network ,Image (mathematics) - Abstract
In this paper we focus on the problem of multi-label image recognition for visually-aware recommender systems. We propose a two stage approach in which a deep convolutional neural network is firstly fine-tuned on a part of the training set. Secondly, an attention-based aggregation network is trained to compute the weighted average of visual features in an input image set. Our approach is implemented as a mobile fashion recommender system application. It is experimentally show on the Amazon Fashion dataset that our approach achieves an F1-measure of 0.58 for 15 recommendations, which is twice as good as the 0.25 F1-measure for conventional averaging of feature vectors.
- Published
- 2019
- Full Text
- View/download PDF
386. Multi-view X-Ray R-CNN
- Author
-
Stefan Roth, Jan-Martin O. Steitz, and Faraz Saeedan
- Subjects
0209 industrial biotechnology ,Exploit ,Feature aggregation ,Computer science ,business.industry ,Pipeline (computing) ,Pooling ,Pattern recognition ,02 engineering and technology ,Avionics ,Object detection ,Image (mathematics) ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Layer (object-oriented design) ,business - Abstract
Motivated by the detection of prohibited objects in carry-on luggage as a part of avionic security screening, we develop a CNN-based object detection approach for multi-view X-ray image data. Our contributions are two-fold. First, we introduce a novel multi-view pooling layer to perform a 3D aggregation of 2D CNN-features extracted from each view. To that end, our pooling layer exploits the known geometry of the imaging system to ensure geometric consistency of the feature aggregation. Second, we introduce an end-to-end trainable multi-view detection pipeline based on Faster R-CNN, which derives the region proposals and performs the final classification in 3D using these aggregated multi-view features. Our approach shows significant accuracy gains compared to single-view detection while even being more efficient than performing single-view detection in each view.
- Published
- 2019
- Full Text
- View/download PDF
387. High-resolution remote sensing image semantic segmentation based on a deep feature aggregation network
- Author
-
Shanwen Zhang, Jianxin Guo, Zhen Wang, and Wenzhun Huang
- Subjects
Feature aggregation ,Computer science ,business.industry ,Remote sensing (archaeology) ,Applied Mathematics ,High resolution ,Computer vision ,Segmentation ,Artificial intelligence ,business ,Instrumentation ,Engineering (miscellaneous) ,Image (mathematics) - Abstract
Semantic segmentation of high-resolution remote sensing images has a wide range of applications, such as territorial planning, geographic monitoring and smart cities. The proper operation of semantic segmentation for remote sensing images remains challenging due to the complex and diverse transitions between different ground areas. Although several convolution neural networks (CNNs) have been developed for remote sensing semantic segmentation, the performance of CNNs is far from the expected target. This study presents a deep feature aggregation network (DFANet) for remote sensing image semantic segmentation. It is composed of a basic feature representation layer, an intermediate feature aggregation layer, a deep feature aggregation layer and a feature aggregation module (FAM). Specially, the basic feature representation layer is used to obtain feature maps with different resolutions: the intermediate feature aggregation layer and deep feature aggregation layer can fuse various resolution features and multi-scale features; the FAM is used to splice the features and form more abundant spatial feature maps; and the conditional random field module is used to optimize semantic segmentation results. We have performed extensive experiments on the ISPRS two-dimensional Vaihingen and Potsdam remote sensing image datasets and compared the proposed method with several variations of semantic segmentation networks. The experimental results show that DFANet outperforms the other state-of-the-art approaches.
- Published
- 2021
- Full Text
- View/download PDF
388. Toward More Robust and Real-Time Unmanned Aerial Vehicle Detection and Tracking via Cross-Scale Feature Aggregation Based on the Center Keypoint
- Author
-
Mengdao Xing, Liang Han, Guyo Chala Urgessa, Rui Chen, and Min Bao
- Subjects
Computational complexity theory ,center point estimation ,Computer science ,Science ,02 engineering and technology ,Tracking (particle physics) ,01 natural sciences ,Constant false alarm rate ,Region of interest ,Vehicle detection ,0103 physical sciences ,unmanned aerial vehicle ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,Point estimation ,010302 applied physics ,Feature aggregation ,business.industry ,Kalman filter ,cross-scale feature aggregation ,General Earth and Planetary Sciences ,020201 artificial intelligence & image processing ,region of interest ,Artificial intelligence ,business - Abstract
Unmanned aerial vehicles (UAVs) play an essential role in various applications, such as transportation and intelligent environmental sensing. However, due to camera motion and complex environments, it can be difficult to recognize the UAV from its surroundings thus, traditional methods often miss detection of UAVs and generate false alarms. To address these issues, we propose a novel method for detecting and tracking UAVs. First, a cross-scale feature aggregation CenterNet (CFACN) is constructed to recognize the UAVs. CFACN is a free anchor-based center point estimation method that can effectively decrease the false alarm rate, the misdetection of small targets, and computational complexity. Secondly, the region of interest-scale-crop-resize (RSCR) method is utilized to merge CFACN and region-of-interest (ROI) CFACN (ROI-CFACN) further, in order to improve the accuracy at a lower computational cost. Finally, the Kalman filter is adopted to track the UAV. The effectiveness of our method is validated using a collected UAV dataset. The experimental results demonstrate that our methods can achieve higher accuracy with lower computational cost, being superior to BiFPN, CenterNet, YoLo, and their variants on the same dataset.
- Published
- 2021
- Full Text
- View/download PDF
389. Blind Image Quality Assessment Based on Classification Guidance and Feature Aggregation
- Author
-
Cai Weipeng, Minyuan Wu, Cien Fan, Lian Zou, Yifeng Liu, and Ma Yang
- Subjects
Computer Networks and Communications ,Computer science ,Image quality ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,lcsh:TK7800-8360 ,02 engineering and technology ,Convolutional neural network ,Image (mathematics) ,Distortion ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Layer (object-oriented design) ,deep neutral networks ,business.industry ,lcsh:Electronics ,020206 networking & telecommunications ,Pattern recognition ,Mixture model ,feature aggregation ,Hardware and Architecture ,Control and Systems Engineering ,Feature (computer vision) ,Signal Processing ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,blind image quality assessment - Abstract
In this work, we present a convolutional neural network (CNN) named CGFA-CNN for blind image quality assessment (BIQA). A unique two-stage strategy is utilized which firstly identifies the distortion type in an image using Sub-Network I and then quantifies this distortion using Sub-Network II. Different from most deep neural networks, we extract hierarchical features as descriptors to enhance the image representation and design a feature aggregation layer in an end-to-end training manner applying Fisher encoding to visual vocabularies modeled by Gaussian mixture models (GMMs). Considering the authentic distortions and synthetic distortions, the hierarchical feature contains the characteristics of a CNN trained on the self-built dataset and a CNN trained on ImageNet. We evaluated our algorithm on four publicly available databases, and the results demonstrate that our CGFA-CNN has superior performance over other methods both on synthetic and authentic databases.
- Published
- 2020
- Full Text
- View/download PDF
390. Adaptive deep feature aggregation using Fourier transform and low-pass filtering for robust object retrieval
- Author
-
Zhongyu Li, Chen Li, Ziyao Zhou, Xinsheng Wang, and Ming Zeng
- Subjects
Feature aggregation ,business.industry ,Computer science ,Deep learning ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Convolutional neural network ,Weighting ,symbols.namesake ,Fourier transform ,Robustness (computer science) ,Frequency domain ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,symbols ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Image retrieval - Abstract
With the rapid development of deep learning techniques, convolutional neural networks (CNN) have been widely investigated for the feature representations in the image retrieval task. However, the key step in CNN-based retrieval, i.e., feature aggregation has not been solved in a robust and general manner when tackling different kinds of images. In this paper, we present a deep feature aggregation method for image retrieval using the Fourier transform and low-pass filtering, which can adaptively compute the weights for each feature map with discrimination. Specifically, the low-pass filtering can preserve the semantic information in each feature map by transforming images to the frequency domain. In addition, we develop three adaptive methods to further improve the robustness of feature aggregation, i.e., Region of Interests (ROI) selection, spatial weighting and channel weighting. Experimental results demonstrate the superiority of the proposed method in comparison with other state-of-the-art, in achieving robust and accurate object retrieval under five benchmark datasets.
- Published
- 2020
- Full Text
- View/download PDF
391. Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention
- Author
-
Fazliddin Anvarov, Dae Ha Kim, and Byung Cheol Song
- Subjects
Computer Networks and Communications ,Computer science ,lcsh:TK7800-8360 ,02 engineering and technology ,Convolutional neural network ,Field (computer science) ,Image (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,action recognition ,Feature aggregation ,business.industry ,lcsh:Electronics ,020206 networking & telecommunications ,Pattern recognition ,3D CNN ,Action (philosophy) ,Hardware and Architecture ,Control and Systems Engineering ,Filter (video) ,Signal Processing ,Action recognition ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,deep feature attention - Abstract
Action recognition is an active research field that aims to recognize human actions and intentions from a series of observations of human behavior and the environment. Unlike image-based action recognition mainly using a two-dimensional (2D) convolutional neural network (CNN), one of the difficulties in video-based action recognition is that video action behavior should be able to characterize both short-term small movements and long-term temporal appearance information. Previous methods aim at analyzing video action behavior only using a basic framework of 3D CNN. However, these approaches have a limitation on analyzing fast action movements or abruptly appearing objects because of the limited coverage of convolutional filter. In this paper, we propose the aggregation of squeeze-and-excitation (SE) and self-attention (SA) modules with 3D CNN to analyze both short and long-term temporal action behavior efficiently. We successfully implemented SE and SA modules to present a novel approach to video action recognition that builds upon the current state-of-the-art methods and demonstrates better performance with UCF-101 and HMDB51 datasets. For example, we get accuracies of 92.5% (16f-clip) and 95.6% (64f-clip) with the UCF-101 dataset, and 68.1% (16f-clip) and 74.1% (64f-clip) with HMDB51 for the ResNext-101 architecture in a 3D CNN.
- Published
- 2020
- Full Text
- View/download PDF
392. A NEW APPROACH FOR FEATURE POINT CLASSIFICATION, AGGREGATION, AND DESCRIPTION.
- Author
-
CHEN, LING-HWEI
- Abstract
In this paper, an edge point, line point, or curve point is called a feature point. A new approach to extract junction points and to describe feature points is proposed here. It accepts as input data a binary image resulting from a feature detector without thinning. In the binary image, each black point is first classified based on the number of lines passing through it and on a local property that the classes of its neighboring points are almost the same. Next, an aggregation method is presented to group those classified points into several segments. The orientation of each segment is kept either clockwise or counterclockwise. Conic curves are then used to describe these segments. Finally, junction points including corner points, cross points, branch points, and inflection points are located. It is worth mentioning that the proposed method does not use any thinning process and curvature information. The effectiveness of the approach is also verified by one illustrative example and two experimental results. [ABSTRACT FROM AUTHOR]
- Published
- 1992
- Full Text
- View/download PDF
393. Main Aortic Segmentation from CTA with Deep Feature Aggregation Network
- Author
-
Wenji Wang and Haogang Zhu
- Subjects
Feature aggregation ,Computer science ,Generalization ,business.industry ,0206 medical engineering ,Pattern recognition ,02 engineering and technology ,Image segmentation ,020601 biomedical engineering ,Convolution ,Visualization ,Level set ,0202 electrical engineering, electronic engineering, information engineering ,Medical imaging ,020201 artificial intelligence & image processing ,Segmentation ,Artificial intelligence ,business - Abstract
In this study, we propose a Deep Feature Aggregation network (DFA-Net) for main aortic segmentation from CTA (Computed Tomography Angiography) by aggregating features from forwarding layers to Ieverage more visual information. To practically verify the effectiveness of our method, we collect 90 CTA volumes from Beijing AnZhen Hospital up to over 60 thousands 2-D slices. First, we use a level-set based algorithm to efficiently generate the dataset for training and validating the deep model. Then the dataset is divided into three parts, 70 instances are used for training and 5 instances are used for validating the best parameters, and the rest 15 instances are used for testing the generalization of the model. Finally, the testing result shows that mIoU (mean Intersection-over-Union) of the segmentation result is 0.943, which indicates that by properly aggregating more visual features in a deep network the segmentation model can achieve state-of-the-art performance.
- Published
- 2018
- Full Text
- View/download PDF
394. Long Length Document Classification by Local Convolutional Feature Aggregation
- Author
-
Cong Zhenghai, Jun He, Liu Kaile, Jiali Zhao, Liu Liu, and Ji Yefei
- Subjects
lcsh:T55.4-60.8 ,Computer science ,convolutional feature aggregation ,recurrent attention model ,02 engineering and technology ,computer.software_genre ,Convolutional neural network ,lcsh:QA75.5-76.95 ,Theoretical Computer Science ,03 medical and health sciences ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,lcsh:Industrial engineering. Management engineering ,Numerical Analysis ,Feature aggregation ,business.industry ,document classification ,Deep learning ,Document classification ,Sentiment analysis ,deep learning ,Computational Mathematics ,Recurrent neural network ,Computational Theory and Mathematics ,020201 artificial intelligence & image processing ,recurrent neural network ,Data mining ,Artificial intelligence ,lcsh:Electronic computers. Computer science ,business ,computer ,Classifier (UML) ,030217 neurology & neurosurgery - Abstract
The exponential increase in online reviews and recommendations makes document classification and sentiment analysis a hot topic in academic and industrial research. Traditional deep learning based document classification methods require the use of full textual information to extract features. In this paper, in order to tackle long document, we proposed three methods that use local convolutional feature aggregation to implement document classification. The first proposed method randomly draws blocks of continuous words in the full document. Each block is then fed into the convolution neural network to extract features and then are concatenated together to output the classification probability through a classifier. The second model improves the first by capturing the contextual order information of the sampled blocks with a recurrent neural network. The third model is inspired by the recurrent attention model (RAM), in which a reinforcement learning module is introduced to act as a controller for selecting the next block position based on the recurrent state. Experiments on our collected four-class arXiv paper dataset show that the three proposed models all perform well, and the RAM model achieves the best test accuracy with the least information.
- Published
- 2018
395. Deeply-Supervised CNN Model for Action Recognition with Trainable Feature Aggregation
- Author
-
Yang Li, Kan Li, and Xinxin Wang
- Subjects
Feature aggregation ,Computer science ,business.industry ,0502 economics and business ,05 social sciences ,Action recognition ,Pattern recognition ,Artificial intelligence ,050207 economics ,010501 environmental sciences ,business ,01 natural sciences ,0105 earth and related environmental sciences - Abstract
In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs. In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers. Moreover, we train this model in a deep supervision manner, which brings improvement in both performance and efficiency. Meanwhile, in order to capture the temporal structure as well as preserve more details about actions, we propose a trainable aggregation module. It models the temporal evolution of each spatial location and projects them into a semantic space using the Vector of Locally Aggregated Descriptors (VLAD) technique. This deeply-supervised CNN model integrating the powerful aggregation module provides a promising solution to recognize actions in videos. We conduct experiments on two action recognition datasets: HMDB51 and UCF101. Results show that our model outperforms the state-of-the-art methods.
- Published
- 2018
- Full Text
- View/download PDF
396. Rare Feature Selection in High Dimensions
- Author
-
Jacob Bien and Xiaohan Yan
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Computer science ,Machine Learning (stat.ML) ,Feature selection ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,Predictor variables ,01 natural sciences ,Statistics - Computation ,Methodology (stat.ME) ,010104 statistics & probability ,Statistics - Machine Learning ,0502 economics and business ,FOS: Mathematics ,0101 mathematics ,Computation (stat.CO) ,Statistics - Methodology ,050205 econometrics ,Feature aggregation ,business.industry ,05 social sciences ,Pattern recognition ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business - Abstract
It is common in modern prediction problems for many predictor variables to be counts of rarely occurring events. This leads to design matrices in which many columns are highly sparse. The challenge posed by such "rare features" has received little attention despite its prevalence in diverse areas, ranging from natural language processing (e.g., rare words) to biology (e.g., rare species). We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response. Our strategy leverages side information in the form of a tree that encodes feature similarity. We apply our method to data from TripAdvisor, in which we predict the numerical rating of a hotel based on the text of the associated review. Our method achieves high accuracy by making effective use of rare words; by contrast, the lasso is unable to identify highly predictive words if they are too rare. A companion R package, called rare, implements our new estimator, using the alternating direction method of multipliers., 42 pages, 10 figures
- Published
- 2018
397. Deep neural networks for automatic detection of osteoporotic vertebral fractures on CT scans
- Author
-
Saeed Hassanpour, Naofumi Tomita, and Yvonne Y. Cheung
- Subjects
medicine.medical_specialty ,Imaging informatics ,Health Informatics ,Computed tomography ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,Image Interpretation, Computer-Assisted ,medicine ,Humans ,Stage (cooking) ,Pelvis ,Feature aggregation ,medicine.diagnostic_test ,business.industry ,Sagittal plane ,Computer Science Applications ,medicine.anatomical_structure ,030220 oncology & carcinogenesis ,Radiological weapon ,Deep neural networks ,Osteoporosis ,Spinal Fractures ,Radiology ,Neural Networks, Computer ,business ,Tomography, X-Ray Computed ,Osteoporotic Fractures - Abstract
Osteoporotic vertebral fractures (OVFs) are prevalent in older adults and are associated with substantial personal suffering and socio-economic burden. Early diagnosis and treatment of OVFs are critical to prevent further fractures and morbidity. However, OVFs are often under-diagnosed and under-reported in computed tomography (CT) exams as they can be asymptomatic at an early stage. In this paper, we present and evaluate an automatic system that can detect incidental OVFs in chest, abdomen, and pelvis CT examinations at the level of practicing radiologists. Our OVF detection system leverages a deep convolutional neural network (CNN) to extract radiological features from each slice in a CT scan. These extracted features are processed through a feature aggregation module to make the final diagnosis for the full CT scan. In this work, we explored different methods for this feature aggregation, including the use of a long short-term memory (LSTM) network. We trained and evaluated our system on 1432 CT scans, comprised of 10,546 two-dimensional (2D) images in sagittal view. Our system achieved an accuracy of 89.2% and an F1 score of 90.8% based on our evaluation on a held-out test set of 129 CT scans, which were established as reference standards through standard semiquantitative and quantitative methods. The results of our system matched the performance of practicing radiologists on this test set in real-world clinical circumstances. We expect the proposed system will assist and improve OVF diagnosis in clinical settings by pre-screening routine CT examinations and flagging suspicious cases prior to review by radiologists.
- Published
- 2018
398. Learning Local Feature Aggregation Functions with Backpropagation
- Author
-
Despoina Paschalidou, Anastasios Delopoulos, Christos Diou, and Angelos Katharopoulos
- Subjects
FOS: Computer and information sciences ,Noise measurement ,Feature aggregation ,Computer science ,business.industry ,Feature vector ,Feature extraction ,Machine Learning (stat.ML) ,Fisher vector ,Pattern recognition ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Backpropagation ,Machine Learning (cs.LG) ,Computer Science - Learning ,Bag-of-words model ,Statistics - Machine Learning ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Classifier (UML) ,0105 earth and related environmental sciences - Abstract
This paper introduces a family of local feature aggregation functions and a novel method to estimate their parameters, such that they generate optimal representations for classification (or any task that can be expressed as a cost function minimization problem). To achieve that, we compose the local feature aggregation function with the classifier cost function and we backpropagate the gradient of this cost function in order to update the local feature aggregation function parameters. Experiments on synthetic datasets indicate that our method discovers parameters that model the class-relevant information in addition to the local feature space. Further experiments on a variety of motion and visual descriptors, both on image and video datasets, show that our method outperforms other state-of-the-art local feature aggregation functions, such as Bag of Words, Fisher Vectors and VLAD, by a large margin., In Proceedings of the 25th European Signal Processing Conference (EUSIPCO 2017)
- Published
- 2018
- Full Text
- View/download PDF
399. Pyramid-Net: Intra-layer Pyramid-Scale Feature Aggregation Network for Retinal Vessel Segmentation.
- Author
-
Zhang J, Zhang Y, Qiu H, Xie W, Yao Z, Yuan H, Jia Q, Wang T, Shi Y, Huang M, Zhuang J, and Xu X
- Abstract
Retinal vessel segmentation plays an important role in the diagnosis of eye-related diseases and biomarkers discovery. Existing works perform multi-scale feature aggregation in an inter-layer manner, namely inter-layer feature aggregation . However, such an approach only fuses features at either a lower scale or a higher scale, which may result in a limited segmentation performance, especially on thin vessels. This discovery motivates us to fuse multi-scale features in each layer, intra-layer feature aggregation , to mitigate the problem. Therefore, in this paper, we propose Pyramid-Net for accurate retinal vessel segmentation, which features intra-layer pyramid-scale aggregation blocks (IPABs). At each layer, IPABs generate two associated branches at a higher scale and a lower scale, respectively, and the two with the main branch at the current scale operate in a pyramid-scale manner. Three further enhancements including pyramid inputs enhancement, deep pyramid supervision, and pyramid skip connections are proposed to boost the performance. We have evaluated Pyramid-Net on three public retinal fundus photography datasets (DRIVE, STARE, and CHASE-DB1). The experimental results show that Pyramid-Net can effectively improve the segmentation performance especially on thin vessels, and outperforms the current state-of-the-art methods on all the adopted three datasets. In addition, our method is more efficient than existing methods with a large reduction in computational cost. We have released the source code at https://github.com/JerRuy/Pyramid-Net., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2021 Zhang, Zhang, Qiu, Xie, Yao, Yuan, Jia, Wang, Shi, Huang, Zhuang and Xu.)
- Published
- 2021
- Full Text
- View/download PDF
400. Global context guided hierarchically residual feature refinement network for defocus blur detection.
- Author
-
Zhai, Yongping, Wang, Junhua, Deng, Jinsheng, Yue, Guanghui, Zhang, Wei, and Tang, Chang
- Subjects
- *
MODULAR coordination (Architecture) , *COMPUTER vision , *FEATURE extraction , *SIGNAL convolution , *THEATRICAL scenery , *METAL refining - Abstract
• We propose a novel deep neural network for defocus blur detection. • A global context pooling module is designed to capture global context information. • A HRFRM module is designed to refine stage-wise outputs. • A feature fusion module is designed to integrate the outputs of different stages. As an important pre-processing step, defocus blur detection makes critical role in various computer vision tasks. However, previous methods cannot obtain satisfactory results due to the complex image background clutter, scale sensitivity and miss of region boundary details. In this paper, for addressing these issues, we introduce a global context guided hierarchically residual feature refinement network (HRFRNet) for defocus blur detection from a natural image. In our network, the low-level fine detail features, high-level semantic and global context information are aggregated in a hierarchical manner to boost the final detection performance. In order to reduce the affect of complex background clutter and smooth regions without enough textures on the final results, we design a multi-scale dilation convolution based global context pooling module to capture the global context information from the most deep feature layer of the backbone feature extraction network. Then, a global context guiding module is introduced to add the global context information into different feature refining stages for guiding the feature refining process. In addition, by considering that the defocus blur is sensitive to image scales, we add a deep features guided fusion module to integrate the outputs of different stages for generating the final score map. Extensive experiments with ablation studies on two commonly used datasets are carried out to validate the superiority of our proposed network when compared with other 11 state-of-the-art methods in terms of both efficiency and accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.