Descriptor: "feature aggregation" / Topic: 0202 electrical engineering, electronic engineering, information engineering - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"feature aggregation"' showing total 55 results

Start Over Descriptor "feature aggregation" Topic 0202 electrical engineering, electronic engineering, information engineering

55 results on '"feature aggregation"'

1. Bidirectional Gated Temporal Convolution with Attention for text classification

Author: Jiansi Ren, Ruoxiang Wang, Gang Liu, Zhe Chen, and Wei Wu
Subjects: 0209 industrial biotechnology, Feature aggregation, business.industry, Computer science, Cognitive Neuroscience, Deep learning, Feature extraction, Pattern recognition, 02 engineering and technology, Computer Science Applications, Convolution, 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), Benchmark (computing), 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: In text classification models based on deep learning, feature extraction and feature aggregation are two key steps. As one of the basic feature extraction methods, CNN has certain limitations due to its inability to effectively extract temporal features from text data. Using max-pooling can significantly reduce the amount of calculation while performing feature aggregation, but it will have an adverse effect on the classification results due to the loss of some text features. In this paper, in response to the above two issues, a Bidirectional Gated Temporal Convolutional Attention(BG-TCA) model is proposed. In the feature extraction stage, the BG-TCA model uses the bidirectional TCN to extract the bidirectional temporal features in text data, and a gating mechanism similar to the LSTM is added between the convolution layers. In the feature aggregation stage, the BG-TCA model uses the attention mechanism to replace the max-pooling method, which makes it possible to distinguish the importance of different features while retaining the text features to the maximum. Finally, experimental results on five benchmark datasets show that the classification accuracy of the BG-TCA model has been greatly improved compared to basic models, and is better than several other state-of-the-art models.
Published: 2021
Full Text: View/download PDF

2. A multi-scale and multi-level feature aggregation network for crowd counting

Author: Tong Li, Zhengyu Zhang, Hua Yan, Fushun Zhu, and Xinyue Chen
Subjects: 0209 industrial biotechnology, Feature aggregation, Computer science, business.industry, Cognitive Neuroscience, Pattern recognition, 02 engineering and technology, Computer Science Applications, 020901 industrial engineering & automation, Kernel (image processing), Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Crowd counting
Abstract: Recently, crowd counting has drawn widespread attention in computer vision, but it is extremely challenging because of the varying scales and densities. Many existing methods focus on improving the multi-scale representation by utilizing multi-column or multi-branch architectures with different kernel sizes. However, such networks cannot extract the feature maps with large receptive fields due to limitation of depth. In addition, the importance of utilizing the multi-level feature information in a deep network is ignored. In this paper, we propose a multi-scale and multi-level features aggregation network (MFANet) for accurate and efficient crowd counting, and it can be trained by end-to-end. A vital component of the network is the scale and level aggregation module (SLAM), which can extract multi-scale features and make full use of multi-level feature information for more accurate estimation. When six SLAMs are stacked together and applied to our network, our method can achieve the best performance. Furthermore, we introduce a new loss function called normalized Euclidean loss (NEL) to balance the contribution of all samples to network training. To demonstrate the performance of the proposed method, extensive experiments are conducted on four benchmark crowd counting datasets, including ShanghaiTec Part A/B, UCF-CC-50, Mall, and UCF-QNRF. Experimental results show that our MFANet achieves state-of-the-art performance in crowd counting and crowd localization.
Published: 2021
Full Text: View/download PDF

3. Sketch-and-Fill Network for Semantic Segmentation

Author: Youngsaeng Jin, Sungmin Eum, David K. Han, and Hanseok Ko
Subjects: General Computer Science, Computer science, Computation, Scene understanding, 02 engineering and technology, Semantics, Machine learning, computer.software_genre, Reduction (complexity), 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Segmentation, 050210 logistics & transportation, business.industry, Deep learning, 05 social sciences, Aggregate (data warehouse), General Engineering, Image segmentation, Sketch, semantic segmentation, feature aggregation, TK1-9971, 020201 artificial intelligence & image processing, Artificial intelligence, Electrical engineering. Electronics. Nuclear engineering, business, computer
Abstract: Recent efforts in semantic segmentation using deep learning framework have made notable advances. While achieving high performance, however, they often require heavy computation, making them impractical to be used in real world applications. There are two reasons that produce prohibitive computational cost: 1) heavy backbone CNN to create high resolution of contextual information and 2) complex modules to aggregate multi-level features. To address these issues, we propose the computationally efficient architecture called “Sketch-and-Fill Network (SFNet)” with a three-stage Coarse-to-Fine Aggregation (CFA) module for semantic segmentation. In the proposed network, lower-resolution contextual information is first produced so that the overall computation in the backbone CNN is largely reduced. Then, to alleviate the detail loss of the lower-resolution contextual information, the CFA module forms global structures and fills fine details in a coarse-to-fine manner. To preserve global structures, the contextual information is passed without any reduction to the CFA module. Experimental results show that the proposed SFNet achieves significantly lower computational loads while delivering comparable or improved segmentation performance with state-of-the-art methods. Qualitative results show that our method is superior to state-of-the-art methods in capturing fine detail while keeping global structures on Cityscapes, ADE20K and RUGD benchmarks.
Published: 2021

4. A part-based spatial and temporal aggregation method for dynamic scene recognition

Author: Xiaoming Peng, Abdesselam Bouzerdoum, and Son Lam Phung
Subjects: 0209 industrial biotechnology, Feature aggregation, business.industry, Computer science, Frame (networking), Aggregate (data warehouse), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, 02 engineering and technology, 020901 industrial engineering & automation, Discriminative model, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Deep neural networks, 020201 artificial intelligence & image processing, Artificial intelligence, business, Software
Abstract: Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed to aggregate local features from video frames. A pre-trained Fast R-CNN model is used to extract local convolutional features from the regions of interest of training images. These features are clustered to locate representative parts. A set cover problem is then formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN model. Local features from a video segment are extracted at different layers of the fine-tuned Fast R-CNN model and aggregated both spatially and temporally. Extensive experimental results show that the proposed method is very competitive with state-of-the-art approaches.
Published: 2020
Full Text: View/download PDF

5. Video object detection algorithm based on dynamic combination of sparse feature propagation and dense feature aggregation

Author: Cao Danyang, Zhixin Chen, and Jinfeng Ma
Subjects: Feature aggregation, Computer Networks and Communications, business.industry, Computer science, Motion blur, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, Frame rate, Object detection, Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Key frame, Anomaly detection, Computer vision, Artificial intelligence, business, Software
Abstract: In comparison with static image object detection, focusing on video objects has greater research significance in realizing intelligent monitoring and automatic anomaly detection. However, it may be challenging to apply the most advanced image recognition networks to video data, as the number of static frame files represented in a video is often huge, thereby causing the problem of the slow evaluation speed, in addition to other issues, such as motion blur, low resolution, occlusion, and object deformation. In the present study, to mitigate these deficiencies, we applied sparse feature propagation to improve the detection speed and dense feature aggregation to refine the detection accuracy. Moreover, we utilized the key frame scheduling strategy relying on the consistency of feature information. Implementing these technologies allowed steadily improving the detection speed and accuracy to achieve high performance. To verify the applicability of the optimized video detection strategy proposed in this paper, we selected the part of the video data in the ImageNet VID training dataset. Then, the other part of this dataset was used to conduct the experiments, including the calculation and comparison of mean average precision (MAP) and frames per second (FPS).
Published: 2020
Full Text: View/download PDF

6. A2Net: Adjacent Aggregation Networks for Image Raindrop Removal

Author: Huangxing Lin, Changxing Jing, Yue Huang, and Xinghao Ding
Subjects: General Computer Science, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, General Engineering, deep learning, YUV space, 02 engineering and technology, Color space, Raindrop removal, feature aggregation, Luminance channel, 0202 electrical engineering, electronic engineering, information engineering, RGB color model, 020201 artificial intelligence & image processing, General Materials Science, Computer vision, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971
Abstract: Existing methods for single images raindrop removal either have poor robustness or suffer from parameter burdens. In this paper, we propose a new Adjacent Aggregation Network (A2Net) with lightweight architectures to remove raindrops from single images. Instead of directly cascading convolutional layers, we design an adjacent aggregation architecture to better fuse features for rich representations generation, which can lead to high quality images reconstruction. To further simplify the learning process, we utilize a problem-specific knowledge to force the network focus on the luminance channel in the YUV color space instead of all RGB channels. By combining adjacent aggregating operation with color space transformation, the proposed A2Net can achieve state-of-the-art performances on raindrop removal with significant parameters reduction.
Published: 2020
Full Text: View/download PDF

7. Scale-Aware Hierarchical Detection Network for Pedestrian Detection

Author: Chenglizhao Chen, Xiao-Wei Zhang, and Shuai Cao
Subjects: Scale variation, General Computer Science, Scale (ratio), Computer science, Pedestrian detection, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, Discriminative model, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, scale-aware weighting, 0105 earth and related environmental sciences, Pixel, business.industry, General Engineering, Pattern recognition, hierarchical detection, Object detection, feature aggregation, Feature (computer vision), Benchmark (computing), 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971
Abstract: Several or even dozens of times spatial scale variation is one of the major bottleneck for pedestrian detection. Although the Region-based Convolutional Neural Network (R-CNN) families have shown promising results for object detection, they are still limited to detect pedestrians with large scale variations due to the fixed receptive field sizes on a single convolutional output layer. In contrast to previous methods that simply combined pedestrian predictions on feature maps with different resolution, we propose a scale-aware hierarchical detection network for pedestrian detection under large scale variations. First, we introduce a cross-scale features aggregation module to accomplish feature augmentation for pedestrian representation through merging the lateral connection, the top-down path and bottom-up path. Specifically, the cross-scale features aggregation module can adaptively fuse hierarchical features to enhance feature pyramid representation for robust semantic and accurate localization. Further, we design a scale-aware hierarchical detection network to effectively integrate multiscale pedestrian detection into a unified framework by adaptively perceiving the augmented feature level for special-scale pedestrian detection. Experimentally, the proposed scale-aware hierarchical detection network forms a more robust and discriminative model for pedestrian instances with different scales on widely-used ETH and Caltech benchmarks. In particular, compared with the state-of-the-art method FasterRCNN+ATT, the log-average miss rate of pedestrian detection is reduced by 11.98% for medium scale pedestrians (between 30-80 pixels in height), and 14.12% for whole scale pedestrians (above 20 pixels in height) on Caltech benchmark.
Published: 2020

8. Multi-Level Context Aggregation Network With Channel-Wise Attention for Salient Object Detection

Author: Fang Wan, Zhenyu Weng, Zihui Jia, and Yuesheng Zhu
Subjects: 0209 industrial biotechnology, General Computer Science, Computer science, Context (language use), 02 engineering and technology, ENCODE, Convolutional neural network, channel-wise attention, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Salient object detection, business.industry, Message passing, Aggregate (data warehouse), General Engineering, Process (computing), Pattern recognition, Object (computer science), feature aggregation, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971, Communication channel
Abstract: Fully convolutional neural networks (FCNs) have shown their advantages in the salient object detection task. However, the prediction results do not perform well in most existing FCN-based methods, such as coarse object boundaries or even getting wrong predictions, which resulted from ignoring the difference between multi-level features during feature aggregation or underutilizing the spatial details suitable for locating boundaries. In this paper, we propose a novel end-to-end multi-level context aggregation network (MLCANet) to solve the problem mentioned-above, in which both bottom-up and top-down message passing can cooperate in a joint manner.The bottom-up process that aggregates low-level fine details features into high-level semantically-richer features would enhance high-level features, and in turn the top-down process that passes refined features from deeper layers to the shallower ones could benefit from the enhanced high-level features. Also by considering that the features from different layers may not be equally important, a multi-level feature aggregation mechanism with channel-wise attention is proposed to aggregate multi-level features by flexibly adjusting their contributions and absorbing useful information to refine themselves. The features after message passing which simultaneously encode semantic information and spatial details are used to predict saliency maps in our network. Extensive experiments demonstrate that our method can obtain high quality saliency maps with clear boundaries, and perform favorably against the state-of-the-art methods without any pre-processing and post-processing.
Published: 2020

9. Deep Feature Aggregation and Image Re-Ranking With Heat Diffusion for Image Retrieval

Author: Vicente Ordonez, Jin Ma, Shanmin Pang, Jianru Xue, and Jihua Zhu
Subjects: FOS: Computer and information sciences, Feature aggregation, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Computer Science - Computer Vision and Pattern Recognition, Pattern recognition, 02 engineering and technology, Computer Science - Information Retrieval, Computer Science Applications, Image (mathematics), Feature (computer vision), Re ranking, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, 020201 artificial intelligence & image processing, Heat equation, Artificial intelligence, Electrical and Electronic Engineering, business, Image retrieval, Information Retrieval (cs.IR)
Abstract: Image retrieval based on deep convolutional features has demonstrated state-of-the-art performance in popular benchmarks. In this paper, we present a unified solution to address deep convolutional feature aggregation and image re-ranking by simulating the dynamics of heat diffusion. A distinctive problem in image retrieval is that repetitive or \emph{bursty} features tend to dominate final image representations, resulting in representations less distinguishable. We show that by considering each deep feature as a heat source, our unsupervised aggregation method is able to avoid over-representation of \emph{bursty} features. We additionally provide a practical solution for the proposed aggregation method and further show the efficiency of our method in experimental evaluation. Inspired by the aforementioned deep feature aggregation method, we also propose a method to re-rank a number of top ranked images for a given query image by considering the query as the heat source. Finally, we extensively evaluate the proposed approach with pre-trained and fine-tuned deep networks on common public benchmarks and show superior performance compared to previous work., Comment: The paper has been accepted to IEEE Transactions on Multimedia
Published: 2019
Full Text: View/download PDF

10. GestureVLAD: Combining Unsupervised Features Representation and Spatio-Temporal Aggregation for Doppler-Radar Gesture Recognition

Author: Andre Bourdoux, Habib-Ur-Rehman Khalid, Abel Diaz Berenguer, Mitchel Alioscha-Perez, Hichem Sahli, Meshia Cédric Oveneke, Faculty of Engineering, Electronics and Informatics, and Audio Visual Signal Processing
Subjects: General Computer Science, Computer science, Feature extraction, Inference, 02 engineering and technology, 01 natural sciences, Convolutional neural network, Discriminative model, unsupervised representation learning, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Doppler-radar, business.industry, 010401 analytical chemistry, General Engineering, Pattern recognition, hand gesture recognition, feature aggregation, 0104 chemical sciences, Gesture recognition, Task analysis, Convolutional neural networks, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971, Gesture
Abstract: In this paper we propose a novel framework to process Doppler-radar signals for hand gesture recognition. Doppler-radar sensors provide many advantages over other emerging sensing modalities, including low development costs and high sensitivity to capture subtle gestures with precision. Furthermore, they have attractive properties for ubiquitous deployment and can be conveniently embedded into different devices. In this scope, current recognition methods still rely in deep CNN-LSTM and $3D$ CNN-LSTM structures that require sufficient labelled data to optimize millions of parameters and significant amount of computational resources for inference; which limits their deployment. Indeed, subtle gestures recognition is a challenging task due to the high variability of gestures among different subjects. To overcome the challenges in the recognition task and the limitations of the current methods, we propose a shallow learning approach for gesture recognition, that is based on unsupervised range-Doppler features representation, along with a learnable pooling aggregation via NetVLAD. The proposed framework can encode extremely valuable information across time, and results in features that are highly discriminative for hand gesture recognition. Experimentation on publicly available Doppler-radar data shows that the proposed framework outperforms state-of-the-art approaches in terms of recognition accuracy and speed for sequence-level hand gesture classification.
Published: 2019
Full Text: View/download PDF

11. FAMN: Feature Aggregation Multipath Network for Small Traffic Sign Detection

Author: Meina Song, Zhonghong Ou, Baiqiao Xiong, Fenrui Xiao, and Shenda Shi
Subjects: Structure (mathematical logic), General Computer Science, Feature aggregation, Computer science, business.industry, Aggregate (data warehouse), General Engineering, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Object detection, Traffic sign detection, Convolution, Feature (computer vision), small object detection, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, General Materials Science, Artificial intelligence, fine-grained classification, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971, Multipath propagation
Abstract: Traffic sign detection has achieved promising results in recent years. Nevertheless, there are still two problems remain to be overcome. One problem is the detection of small traffic signs, which usually occupy less than 2% of the image area. The other problem is fine-grained classification, with difficulties arising from similar appearances between traffic signs. For example, different speed-limit traffic signs have differences solely from the speed numbers. In this paper, we propose a Feature Aggregation MultiPath Network (FAMN) to tackle the problems simultaneously. First, we propose a Feature Aggregation (FA) structure to aggregate regional features from different feature maps by using element-wise Max, then convolution layers are used to extract rich semantic features. Accordingly, objects of different scales can choose the best features to improve performance of small object detection. Second, we propose a Multipath Network (MN) structure to obtain fine-grained features. The MN structure consists of three paths, extracting instance-level, part-level, and context-level features, respectively. The three types of features are then concatenated to form fine-grained features of the proposals. Experimental results demonstrate the effectiveness of our proposed FAMN. Specifically, FAMN is able to obtain an average F1-measure of 93.1% in TT100K dataset, 2.9% higher than the state-of-the-art.
Published: 2019

12. Multi-Level and Multi-Scale Feature Aggregation Network for Semantic Segmentation in Vehicle-Mounted Scenes

Author: Yong Liao and Qiong Liu
Subjects: Scale (ratio), Computer science, Computation, 02 engineering and technology, TP1-1185, Semantics, Biochemistry, real-time semantic segmentation, Article, Analytical Chemistry, Encoding (memory), 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, Segmentation, Pyramid (image processing), Electrical and Electronic Engineering, Instrumentation, 050210 logistics & transportation, Backbone network, business.industry, Chemical technology, 05 social sciences, vehicle-mounted scenes, Pattern recognition, Atomic and Molecular Physics, and Optics, feature aggregation, multi-scale feature extraction, Feature (computer vision), 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: The main challenges of semantic segmentation in vehicle-mounted scenes are object scale variation and trading off model accuracy and efficiency. Lightweight backbone networks for semantic segmentation usually extract single-scale features layer-by-layer only by using a fixed receptive field. Most modern real-time semantic segmentation networks heavily compromise spatial details when encoding semantics, and sacrifice accuracy for speed. Many improving strategies adopt dilated convolution and add a sub-network, in which either intensive computation or redundant parameters are brought. We propose a multi-level and multi-scale feature aggregation network (MMFANet). A spatial pyramid module is designed by cascading dilated convolutions with different receptive fields to extract multi-scale features layer-by-layer. Subseqently, a lightweight backbone network is built by reducing the feature channel capacity of the module. To improve the accuracy of our network, we design two additional modules to separately capture spatial details and high-level semantics from the backbone network without significantly increasing the computation cost. Comprehensive experimental results show that our model achieves 79.3% MIoU on the Cityscapes test dataset at a speed of 58.5 FPS, and it is more accurate than SwiftNet (75.5% MIoU). Furthermore, the number of parameters of our model is at least 53.38% less than that of other state-of-the-art models.
Published: 2021

13. Redundancy Removing Aggregation Network with Distance Calibration for Video Face Recognition

Author: Meina Song, Zheng Yan, Yucheng Hu, Zhonghong Ou, Pan Hui, Beijing University of Posts and Telecommunications, Network Security and Trust, Hong Kong University of Science and Technology, Department of Communications and Networking, Aalto-yliopisto, and Aalto University
Subjects: Aggregates, Computer Networks and Communications, Image quality, Computer science, Feature extraction, 0211 other engineering and technologies, 02 engineering and technology, Convolutional neural network, Facial recognition system, Redundancy, 0202 electrical engineering, electronic engineering, information engineering, Redundancy (engineering), Training, Computer architecture, Face recognition, 021110 strategic, defence & security studies, business.industry, Convolutional Neural Networks, 020206 networking & telecommunications, Pattern recognition, Computer Science Applications, Feature Aggregation, Hardware and Architecture, Face (geometry), Signal Processing, Metric (mathematics), Calibration, Noise (video), Artificial intelligence, Video-based Face Recognition, business, Metric Learning, Information Systems
Abstract: Attention-based techniques have been successfully used for rating image quality, and have been widely employed for set-based face recognition. Nevertheless, for video face recognition, where the base convolutional neural network (CNN) trained on large-scale data already provides discriminative features, fusing features with only predicted quality scores to generate representation are likely to cause duplicate sample dominant problem, and degrade performance correspondingly. To resolve the problem mentioned above, we propose a redundancy removing aggregation network (RRAN) for video face recognition. Compared with other quality-aware aggregation schemes, RRAN can take advantage of similarity information to tackle the noise introduced by redundant video frames. By leveraging metric learning, RRAN introduces a distance calibration scheme to align distance distributions of negative pairs of different video representations, which improves the accuracy under a uniform threshold. A series of experiments is conductedon multiple realistic data sets to evaluate the performance of RRAN, including YouTube Faces, IJB-A, and IJB-C. In comprehensive experiments, we demonstrate that our method can diminish the overall influence of poor quality components with large proportion in the video and further improve the overall recognition performance with individual difference. Specifically, RRAN achieves a 96.84% accuracy on YouTube Face, outperforming all existing aggregation schemes.
Published: 2021

14. Part-Aware Attention Network for Person Re-identification

Author: Lei Zhang, Wangmeng Xiang, Xiansheng Hua, and Jianqiang Huang
Subjects: Information propagation, Feature aggregation, Computer science, business.industry, Feature extraction, Aggregate (data warehouse), Pooling, Pattern recognition, 02 engineering and technology, Re identification, Feature (computer vision), Attention network, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: Multi-level feature aggregation and part feature extraction are widely used to boost the performance of person re-identification (Re-ID). Most multi-level feature aggregation methods treat feature maps on different levels equally and use simple local operations for feature fusion, which neglects the long-distance connection among feature maps. On the other hand, the popular horizon pooling part based feature extraction methods may lead to feature misalignment. In this paper, we propose a novel Part-aware Attention Network (PAN) to connect part feature maps and middle-level features. Given a part feature map and a source feature map, PAN uses part features as queries to perform second-order information propagation from the source feature map. The attention is computed based on the compatibility of the source feature map with the part feature map. Specifically, PAN uses high-level part features of different human body parts to aggregate information from mid-level feature maps. As a part-aware feature aggregation method, PAN operates on all spatial positions of feature maps so that it can discover long-distance relations. Extensive experiments show that PAN achieves leading performance on Re-ID benchmarks Market1501, DukeMTMC, and CUHK03.
Published: 2021
Full Text: View/download PDF

15. A Local Feature Aggregation PointNet++ Network Based on Graph Network

Author: Xingwei Li, Jiazhe Zhang, Xinlong Li, Shaojie Guan, and Jiating Jin
Subjects: Feature aggregation, Computer science, business.industry, Deep learning, Aggregate (data warehouse), Point cloud, 020207 software engineering, 02 engineering and technology, Object (computer science), computer.software_genre, Data type, 0202 electrical engineering, electronic engineering, information engineering, Graph (abstract data type), 020201 artificial intelligence & image processing, Segmentation, Data mining, Artificial intelligence, business, computer
Abstract: The 3D point cloud data contains the semantic information of the measured scene object, and has the characteristics of massive, high density and high precision. It has become a main data type used to understand, analyze and semantically explain the 3D natural scene. This paper takes indoor 3D point cloud data as the research object and studies the semantic segmentation of 3D point cloud. Based on the in-depth study of PointNet++ network, this paper proposes a local information aggregation module based on graph network to optimize PointNet++ network. The local information aggregation module based on graph network proposed in this paper uses graph network learning to explore the mutual influence of points in a local area, and then aggregate local information to generate more representative local features. Finally, we evaluated our model on S3DIS and compared it quantitatively with PointNet++ to show the improved performance of the original algorithm.
Published: 2021
Full Text: View/download PDF

16. Deep Multi-Instance Learning Using Multi-Modal Data for Diagnosis of Lymphocytosis

Author: Eugenie Maurin, Béatrice Grange, Evangelia I. Zacharaki, Nikos Paragios, Laurent Jallades, Pierre Sujobert, Maria Vakalopoulou, Mihir Sahasrabudhe, Mathématiques et Informatique pour la Complexité et les Systèmes (MICS), CentraleSupélec-Université Paris-Saclay, OPtimisation Imagerie et Santé (OPIS), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de vision numérique (CVN), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-CentraleSupélec-Université Paris-Saclay, Faculté de Médecine Lyon-Sud, FACULTE DE LYON, Cancer Research Center of Lyon, INSERM, Centre National de la Recherche Scientifique (CNRS), University of Patras [Patras], TheraPanacea [Paris], Sahasrabudhe, Mihir, Centre de Recherche en Cancérologie de Lyon (UNICANCER/CRCL), Centre Léon Bérard [Lyon]-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), and University of Patras
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing, Computer science, Pipeline (computing), Feature extraction, [INFO.INFO-IM] Computer Science [cs]/Medical Imaging, Lymphocytosis, 02 engineering and technology, Convolutional neural network, multiple-instance learning, mixture-of-experts, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Machine Learning, 03 medical and health sciences, [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Text mining, Health Information Management, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, convolutional neural networks, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), Code (cryptography), [INFO.INFO-IM]Computer Science [cs]/Medical Imaging, Humans, Electrical and Electronic Engineering, 030304 developmental biology, 0303 health sciences, business.industry, Deep learning, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Pattern recognition, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], 3. Good health, Computer Science Applications, [INFO.INFO-TI] Computer Science [cs]/Image Processing [eess.IV], [INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV], Embedding, multi-source data, 020201 artificial intelligence & image processing, Neural Networks, Computer, Artificial intelligence, Feature aggregation, business, Biotechnology
Abstract: We investigate the use of recent advances in deep learning and propose an end-to-end trainable multi-instance convolutional neural network within a mixture-of-experts formulation that combines information from two types of data—images and clinical attributes—for the diagnosis of lymphocytosis. The convolutional network learns to extract meaningful features from images of blood cells using an embedding level approach and aggregates them. Moreover, the mixture-of-experts model combines information from these images as well as clinical attributes to form an end-to-end trainable pipeline for diagnosis of lymphocytosis. Our results demonstrate that even the convolutional network by itself is able to discover meaningful associations between the images and the diagnosis, indicating the presence of important unexploited information in the images. The mixture-of-experts formulation is shown to be more robust while maintaining performance via. a repeatability study to assess the effect of variability in data acquisition on the predictions. The proposed methods are compared with different methods from literature based both on conventional handcrafted features and machine learning, and on recent deep learning models based on attention mechanisms. Our method reports a balanced accuracy of $\text{85.41}\%$ and outperfroms the handcrafted feature-based and attention-based approaches as well that of biologists which scored $\text{79.44}\%$ , $\text{82.89}\%$ and $\text{77.07}\%$ respectively. These results give insights on the potentials of the applicability of the proposed method in clinical practice. Our code and datasets can be found at https://github.com/msahasrabudhe/lymphoMIL .
Published: 2020

17. Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment

Author: Wei Zhou and Zhibo Chen
Subjects: FOS: Computer and information sciences, Feature aggregation, Computer science, business.industry, Image quality, media_common.quotation_subject, Deep learning, Image and Video Processing (eess.IV), Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Electrical Engineering and Systems Science - Image and Video Processing, Video quality, Convolutional neural network, Multimedia (cs.MM), FOS: Electrical engineering, electronic engineering, information engineering, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Quality (business), Artificial intelligence, business, Computer Science - Multimedia, media_common
Abstract: In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in videos, very little work has been done on video quality assessment (VQA) by exploiting powerful deep convolutional neural networks (DCNNs). In this paper, we propose an efficient VQA method named Deep SpatioTemporal video Quality assessor (DeepSTQ) to predict the perceptual quality of various distorted videos in a no-reference manner. In the proposed DeepSTQ, we first extract local and global spatiotemporal features by pre-trained deep learning models without fine-tuning or training from scratch. The composited features consider distorted video frames as well as frame difference maps from both global and local views. Then, the feature aggregation is conducted by the regression model to predict the perceptual video quality. Finally, experimental results demonstrate that our proposed DeepSTQ outperforms state-of-the-art quality assessment algorithms.
Published: 2020
Full Text: View/download PDF

18. Predicting Valence and Arousal by Aggregating Acoustic Features for Acoustic-Linguistic Information Fusion

Author: Yasuhiro Hamada, Bagus Tris Atmaja, and Masato Akagi
Subjects: Fusion, Computer science, Speech recognition, Feature extraction, valence, arousal, 020206 networking & telecommunications, 02 engineering and technology, feature aggregation, Arousal, Support vector machine, Rule-based machine translation, 0202 electrical engineering, electronic engineering, information engineering, feature fusion, 020201 artificial intelligence & image processing, Valence (psychology), affective computing
Abstract: This paper presents an evaluation of acoustic feature aggregation and acoustic-linguistic features combination for valence and arousal prediction within a speech. First, acoustic features were aggregated from chunk-based processing for story-based processing. We evaluated mean and maximum aggregation methods for those acoustic features and compared the results with the baseline, which used majority voting aggregation. Second, the extracted acoustic features are combined with linguistic features for predicting valence and arousal categories: low, medium, or high. The unimodal result using acoustic features aggregation showed an improvement over the baseline majority voting on development partition for the same acoustic feature set. The bimodal results (by combining acoustic and linguistic information at the feature level) improved both development and test scores over the official baseline. This combination of acoustic-linguistic information targeted speech-based applications where acoustic and linguistic features can be extracted from the sole speech modality.
Published: 2020
Full Text: View/download PDF

19. Anchor-free Convolutional Network with Dense Attention Feature Aggregation for Ship Detection in SAR Images

Author: Amir Hussain, Fei Gao, Huiyu Zhou, Yishan He, and Jun Wang
Subjects: Synthetic aperture radar, ship detection, convolutional neural networks (CNN), synthetic aperture radar (SAR), anchor-free, feature aggregation, attention mechanism, Channel (digital image), Computer science, Feature extraction, 0211 other engineering and technologies, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Overfitting, Upsampling, 0202 electrical engineering, electronic engineering, information engineering, lcsh:Science, 021101 geological & geomatics engineering, Block (data storage), business.industry, Deep learning, Pattern recognition, Feature (computer vision), General Earth and Planetary Sciences, 020201 artificial intelligence & image processing, lcsh:Q, Artificial intelligence, business
Abstract: In recent years, with the improvement of synthetic aperture radar (SAR) imaging resolution, it is urgent to develop methods with higher accuracy and faster speed for ship detection in high-resolution SAR images. Among all kinds of methods, deep-learning-based algorithms bring promising performance due to end-to-end detection and automated feature extraction. However, several challenges still exist: (1) standard deep learning detectors based on anchors have certain unsolved problems, such as tuning of anchor-related parameters, scale-variation and high computational costs. (2) SAR data is huge but the labeled data is relatively small, which may lead to overfitting in training. (3) To improve detection speed, deep learning detectors generally detect targets based on low-resolution features, which may cause missed detections for small targets. In order to address the above problems, an anchor-free convolutional network with dense attention feature aggregation is proposed in this paper. Firstly, we use a lightweight feature extractor to extract multiscale ship features. The inverted residual blocks with depth-wise separable convolution reduce the network parameters and improve the detection speed. Secondly, a novel feature aggregation scheme called dense attention feature aggregation (DAFA) is proposed to obtain a high-resolution feature map with multiscale information. By combining the multiscale features through dense connections and iterative fusions, DAFA improves the generalization performance of the network. In addition, an attention block, namely spatial and channel squeeze and excitation (SCSE) block is embedded in the upsampling process of DAFA to enhance the salient features of the target and suppress the background clutters. Third, an anchor-free detector, which is a center-point-based ship predictor (CSP), is adopted in this paper. CSP regresses the ship centers and ship sizes simultaneously on the high-resolution feature map to implement anchor-free and nonmaximum suppression (NMS)-free ship detection. The experiments on the AirSARShip-1.0 dataset demonstrate the effectiveness of our method. The results show that the proposed method outperforms several mainstream detection algorithms in both accuracy and speed.
Published: 2020

20. Residual Feature Aggregation Network for Image Super-Resolution

Author: Yuting Tang, Gangshan Wu, Wenjie Zhang, Jie Liu, and Jie Tang
Subjects: Boosting (machine learning), Feature aggregation, Computer science, business.industry, Feature extraction, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Iterative reconstruction, Residual, Convolutional neural network, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Image resolution
Abstract: Recently, very deep convolutional neural networks (CNNs) have shown great power in single image super-resolution (SISR) and achieved significant improvements against traditional methods. Among these CNN-based methods, the residual connections play a critical role in boosting the network performance. As the network depth grows, the residual features gradually focused on different aspects of the input image, which is very useful for reconstructing the spatial details. However, existing methods neglect to fully utilize the hierarchical features on the residual branches. To address this issue, we propose a novel residual feature aggregation (RFA) framework for more efficient feature extraction. The RFA framework groups several residual modules together and directly forwards the features on each local residual branch by adding skip connections. Therefore, the RFA framework is capable of aggregating these informative residual features to produce more representative features. To maximize the power of the RFA framework, we further propose an enhanced spatial attention (ESA) block to make the residual features to be more focused on critical spatial contents. The ESA block is designed to be lightweight and efficient. Our final RFANet is constructed by applying the proposed RFA framework with the ESA blocks. Comprehensive experiments demonstrate the necessity of our RFA framework and the superiority of our RFANet over state-of-the-art SISR methods.
Published: 2020
Full Text: View/download PDF

21. Spatial-Temporal Feature Aggregation Network For Video Object Detection

Author: Zhu Chen, Nenghai Yu, Bin Liu, Weihai Li, and Chi Fei
Subjects: Feature aggregation, business.industry, Computer science, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Object detection, 0202 electrical engineering, electronic engineering, information engineering, Graph (abstract data type), 020201 artificial intelligence & image processing, Artificial intelligence, business, 0105 earth and related environmental sciences
Abstract: Video object detection is a challenging problem in computer vision. In this paper, we propose a novel spatial-temporal feature aggregation network to deal with this issue. Specifically, we present a novel instance-level feature aggregation module as complementary to traditional pixel-level feature aggregation, in which we build a new movement estimation module to learn instance movements across frames. Then the Graph Convolutional Networks (GCNs) is applied to obtain temporal relation among instances over frames to implement instance-level feature aggregation. At last, we combine pixel-level and instance-level features by learnable soft weights to make use of their complementary information. Our framework is simple to implement and enables end-to-end training, which achieves state-of-art performance on the ImageNet VID dataset by extensive experiments.
Published: 2020
Full Text: View/download PDF

22. Optimal Feature Aggregation and Combination for Two-Dimensional Ensemble Feature Selection

Author: Wisnu Jatmiko and Machmud R Alhamidi
Subjects: 0209 industrial biotechnology, lcsh:T58.5-58.64, Feature aggregation, lcsh:Information technology, Computer science, business.industry, Stability (learning theory), Automatic threshold, ensemble feature selection, Pattern recognition, Feature selection, 02 engineering and technology, stability, feature aggregation, Reduction (complexity), 020901 industrial engineering & automation, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, threshold, 020201 artificial intelligence & image processing, Artificial intelligence, business, Information Systems
Abstract: Feature selection is a way of reducing the features of data such that, when the classification algorithm runs, it produces better accuracy. In general, conventional feature selection is quite unstable when faced with changing data characteristics. It would be inefficient to implement individual feature selection in some cases. Ensemble feature selection exists to overcome this problem. However, with the advantages of ensemble feature selection, some issues like stability, threshold, and feature aggregation still need to be overcome. We propose a new framework to deal with stability and feature aggregation. We also used an automatic threshold to see whether it was efficient or not, the results showed that the proposed method always produces the best performance in both accuracy and feature reduction. The accuracy comparison between the proposed method and other methods was 0.5&ndash, 14% and reduced more features than other methods by 50%. The stability of the proposed method was also excellent, with an average of 0.9. However, when we applied the automatic threshold, there was no beneficial improvement compared to without an automatic threshold. Overall, the proposed method presented excellent performance compared to previous work and standard ReliefF.
Published: 2020
Full Text: View/download PDF

23. Learning Delicate Local Representations for Multi-person Pose Estimation

Author: Erjin Zhou, Yuanhao Cai, Jian Sun, Zhengxiong Luo, Haoqian Wang, Binyi Yin, Xinyu Zhou, Xiangyu Zhang, Angang Du, and Zhicheng Wang
Subjects: Source code, Single model, Training set, Feature aggregation, Computer science, business.industry, media_common.quotation_subject, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, Residual, 01 natural sciences, 0202 electrical engineering, electronic engineering, information engineering, Coco, 020201 artificial intelligence & image processing, Artificial intelligence, business, Spatial analysis, Pose, 0105 earth and related environmental sciences, media_common
Abstract: In this paper, we propose a novel method called Residual Steps Network (RSN). RSN aggregates features with the same spatial size (Intra-level features) efficiently to obtain delicate local representations, which retain rich low-level spatial information and result in precise keypoint localization. Additionally, we observe the output features contribute differently to final performance. To tackle this problem, we propose an efficient attention mechanism - Pose Refine Machine (PRM) to make a trade-off between local and global representations in output features and further refine the keypoint locations. Our approach won the 1st place of COCO Keypoint Challenge 2019 and achieves state-of-the-art results on both COCO and MPII benchmarks, without using extra training data and pretrained model. Our single model achieves 78.6 on COCO test-dev, 93.0 on MPII test dataset. Ensembled models achieve 79.2 on COCO test-dev, 77.1 on COCO test-challenge dataset. The source code is publicly available for further research at https://github.com/caiyuanhao1998/RSN/.
Published: 2020
Full Text: View/download PDF

24. SLGAT: Soft Labels Guided Graph Attention Networks

Author: Tingwen Liu, Li Guo, Yubin Wang, and Zhenyu Zhang
Subjects: Feature aggregation, business.industry, Computer science, Graph neural networks, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, ComputingMethodologies_PATTERNRECOGNITION, Central node, 020204 information systems, Attention network, 0202 electrical engineering, electronic engineering, information engineering, Graph (abstract data type), Artificial intelligence, business, Self training, Feature learning, 0105 earth and related environmental sciences
Abstract: Graph convolutional neural networks have been widely studied for semi-supervised classification on graph-structured data in recent years. They usually learn node representations by transforming, propagating, aggregating node features and minimizing the prediction loss on labeled nodes. However, the pseudo labels generated on unlabeled nodes are usually overlooked during the learning process. In this paper, we propose a soft labels guided graph attention network (SLGAT) to improve the performance of node representation learning by leveraging generated pseudo labels. Unlike the prior graph attention networks, our SLGAT uses soft labels as guidance to learn different weights for neighboring nodes, which allows SLGAT to pay more attention to the features closely related to the central node labels during the feature aggregation process. We further propose a self-training based optimization method to train SLGAT on both labeled and pseudo labeled nodes. Specifically, we first pre-train SLGAT on labeled nodes and generate pseudo labels for unlabeled nodes. Next, for each iteration, we train SLGAT on the combination of labeled and pseudo labeled nodes, and then generate new pseudo labels for further training. Experimental results on semi-supervised node classification show that SLGAT achieves state-of-the-art performance.
Published: 2020
Full Text: View/download PDF

25. Global Feature Aggregation for Accident Anticipation

Author: Muhammad Umar Karim Khan, Chong-Min Kyung, and Mishal Fatima
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Machine Learning (cs.LG), Accident (fallacy), 0202 electrical engineering, electronic engineering, information engineering, Accident avoidance, 0105 earth and related environmental sciences, Block (data storage), Feature aggregation, business.industry, Frame (networking), Video sequence, Pattern recognition, Object (computer science), Artificial Intelligence (cs.AI), Anticipation (artificial intelligence), 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: Anticipation of accidents ahead of time in autonomous and non -autonomous vehicles aids in accident avoidance. In order to recognize abnormal events such as traffic accidents in a video sequence, it is important that the network takes into account interactions of objects in a given frame. We propose a novel Feature Aggregation (FA) block that refines each object's features by computing a weighted sum of the features of all objects in a frame. We use FA block along with Long Short Term Memory (LSTM) network to anticipate accidents in the video sequences. We report mean Average Precision (mAP) and Average Time-to-Accident (ATTA) on Street Accident (SA) dataset. Our proposed method achieves the highest score for risk anticipation by predicting accidents 0.32 sec and 0.75 sec earlier compared to the best results with Adaptive Loss and dynamic parameter prediction based methods respectively.
Published: 2020
Full Text: View/download PDF

26. Clip-Level Feature Aggregation: A Key Factor for Video-Based Person Re-identification

Author: Bart Goossens, Ljiljana Platisa, Wilfried Philips, Chengjin Lyu, Patrick Heyer-Wollenberg, Peter Veelaert, Blanc-Talon, Jacques, Delmas, Patrice, Philips, Wilfried, Popescu, Dan, and Scheunders, Paul
Subjects: Normalization (statistics), Technology and Engineering, business.industry, Computer science, 010401 analytical chemistry, Aggregate (data warehouse), Convolutional neural network, Pattern recognition, 02 engineering and technology, 01 natural sciences, 0104 chemical sciences, Task (computing), Person re-identification, Feature (computer vision), Factor (programming language), 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), 020201 artificial intelligence & image processing, Artificial intelligence, Feature aggregation, business, Representation (mathematics), computer, computer.programming_language
Abstract: In the task of video-based person re-identification, features of persons in the query and gallery sets are compared to search the best match. Generally, most existing methods aggregate the frame-level features together using a temporal method to generate the clip-level fea- tures, instead of the sequence-level representations. In this paper, we propose a new method that aggregates the clip-level features to obtain the sequence-level representations of persons, which consists of two parts, i.e., Average Aggregation Strategy (AAS) and Raw Feature Utilization (RFU). AAS makes use of all frames in a video sequence to generate a better representation of a person, while RFU investigates how batch normalization operation influences feature representations in person re- identification. The experimental results demonstrate that our method can boost the performance of existing models for better accuracy. In particular, we achieve 87.7% rank-1 and 82.3% mAP on MARS dataset without any post-processing procedure, which outperforms the existing state-of-the-art.
Published: 2020
Full Text: View/download PDF

27. Face Feature Extraction: A Complete Review

Author: Weihong Deng, Hongjun Wang, and Jiani Hu
Subjects: 021110 strategic, defence & security studies, General Computer Science, business.industry, Computer science, feature extraction, Deep learning, Feature extraction, 0211 other engineering and technologies, General Engineering, Pattern recognition, 02 engineering and technology, filtering, Facial recognition system, feature aggregation, Robustness (computer science), Histogram, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, General Materials Science, feature encoding, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, Face recognition, business, lcsh:TK1-9971
Abstract: Feature extraction is vital for face recognition. In this paper, we focus on the general feature extraction framework for robust face recognition. We collect about 300 papers regarding face feature extraction. While some works apply handcrafted features, other works employ statistical learning methods. We believe that a general framework for face feature extraction consists of four major components: filtering, encoding, spatial pooling, and holistic representation. We analyze each component in detail. Each component could be applied in a task with multiple levels. Then, we provide a brief review of deep learning networks, which can be seen as a hierarchical extension of the framework above. Finally, we provide a detailed performance comparison of various features on LFW and FERET face database.
Published: 2018
Full Text: View/download PDF

28. A Local Feature Aggregation Method for Music Retrieval

Author: Jin S. Seo
Subjects: Feature aggregation, Computer science, business.industry, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Artificial Intelligence, Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Electrical and Electronic Engineering, business, Software
Published: 2018
Full Text: View/download PDF

29. Part-Based Feature Aggregation Method for Dynamic Scene Recognition

Author: Abdesselam Bouzerdoum and Xiaoming Peng
Subjects: Feature aggregation, business.industry, Computer science, Frame (networking), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Set cover problem, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Discriminative model, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, Deep neural networks, 020201 artificial intelligence & image processing, Artificial intelligence, Layer (object-oriented design), business, Representation (mathematics), 0105 earth and related environmental sciences
Abstract: Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed for aggregating local features from multiple video frames. A pre-trained Fast R-CNN model is used to extract local convolutional layer features from the regions of interest (ROIs) of training images. These features are then clustered to locate representative parts. A set cover problem is formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN. Local convolutional layer features and fully-connected layer features are extracted using the fine-tuned Fast R-CNN model, and then aggregated separately from a video segment to form two feature representations. They are concatenated into a global feature representation. Experimental results show that the proposed method outperforms several state-of-the-art features on two dynamic scene datasets.
Published: 2019
Full Text: View/download PDF

30. A Graph Based Unsupervised Feature Aggregation for Face Recognition

Author: Venkata Sai Vijay Kumar Pedapudi, Yu Cheng, Liu Qiankun, Xiaotian Fan, Sheng Mei Shen, Chi Su, Yuan Yao, and Yanfeng Li
Subjects: Feature aggregation, Iterative method, business.industry, Computer science, Gaussian, Feature extraction, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Directed graph, Mutual information, Facial recognition system, Graph, symbols.namesake, Matrix (mathematics), 0202 electrical engineering, electronic engineering, information engineering, symbols, Graph (abstract data type), 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: In most of the testing dataset, the images are collected from video clips or different environment conditions, which implies that the mutual information between pairs are significantly important. To address this problem and utilize this information, in this paper, we propose a graph-based unsupervised feature aggregation method for face recognition. Our method uses the inter-connection between pairs with a directed graph approach thus refine the pair-wise scores. First, based on the assumption that all features follow Gaussian distribution, we derive a iterative updating formula of features. Second, in discrete conditions, we build a directed graph where the affinity matrix is obtained from pair-wise similarities, and filtered by a pre-defined threshold along with K-nearest neighbor. Third, the affinity matrix is used to obtain a pseudo center matrix for the iterative update process. Besides evaluation on face recognition testing dataset, our proposed method can further be applied to semi-supervised learning to handle the unlabelled data for improving the performance of the deep models. We verified the effectiveness on 5 different datasets: IJB-C, CFP, YTF, TrillionPair and IQiYi Video dataset.
Published: 2019
Full Text: View/download PDF

31. A Single-Shot Object Detector with Feature Aggregation and Enhancement

Author: Guizhong Liu and Weiqiang Li
Subjects: Feature aggregation, Computer science, business.industry, Feature extraction, Detector, Single shot, 02 engineering and technology, Pascal (programming language), 010501 environmental sciences, 01 natural sciences, Object detection, 0202 electrical engineering, electronic engineering, information engineering, Object detector, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, computer, 0105 earth and related environmental sciences, computer.programming_language
Abstract: For many real applications, it’s equally important to detect objects accurately and quickly. In this paper, we propose an accurate and efficient single shot object detector with feature aggregation and enhancement (FAENet). Our motivation is to enhance and exploit the shallow and deep feature maps of the whole network simultaneously. To achieve it we introduce a pair of novel feature aggregation modules and two feature enhancement blocks, and integrate them into the original structure of SSD. Extensive experiments on both the PASCAL VOC and MS COCO datasets demonstrate that the proposed method achieves much higher accuracy than SSD. In addition, our method performs better than the state-of-the-art one-stage detector RefineDet on small objects and can run at a faster speed.
Published: 2019
Full Text: View/download PDF

32. Dense Feature Aggregation and Pruning for RGBT Tracking

Author: Yabin Zhu, Jin Tang, Chenglong Li, Xiao Wang, and Bin Luo
Subjects: FOS: Computer and information sciences, Boosting (machine learning), Feature aggregation, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Pooling, Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, Pattern recognition, 02 engineering and technology, Convolutional neural network, 0202 electrical engineering, electronic engineering, information engineering, RGB color model, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: How to perform effective information fusion of different modalities is a core factor in boosting the performance of RGBT tracking. This paper presents a novel deep fusion algorithm based on the representations from an end-to-end trained convolutional neural network. To deploy the complementarity of features of all layers, we propose a recursive strategy to densely aggregate these features that yield robust representations of target objects in each modality. In different modalities, we propose to prune the densely aggregated features of all modalities in a collaborative way. In a specific, we employ the operations of global average pooling and weighted random selection to perform channel scoring and selection, which could remove redundant and noisy features to achieve more robust feature representation. Experimental results on two RGBT tracking benchmark datasets suggest that our tracker achieves clear state-of-the-art against other RGB and RGBT tracking methods., arXiv admin note: text overlap with arXiv:1811.09855
Published: 2019

33. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

Author: Hanchao Li, Haoqiang Fan, Pengfei Xiong, and Jian Sun
Subjects: FOS: Computer and information sciences, Feature aggregation, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, Pattern recognition, 02 engineering and technology, Discriminative model, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Segmentation, Artificial intelligence, business
Abstract: This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. Our proposed network starts from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade respectively. Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed and segmentation performance. Experiments on Cityscapes and CamVid datasets demonstrate the superior performance of DFANet with 8$\times$ less FLOPs and 2$\times$ faster than the existing state-of-the-art real-time semantic segmentation methods while providing comparable accuracy. Specifically, it achieves 70.3\% Mean IOU on the Cityscapes test dataset with only 1.7 GFLOPs and a speed of 160 FPS on one NVIDIA Titan X card, and 71.3\% Mean IOU with 3.4 GFLOPs while inferring on a higher resolution image.
Published: 2019
Full Text: View/download PDF

34. Feature Aggregation in Perceptual Loss for Ultra Low-Dose (ULD) CT Denoising

Author: Arnaldo Mayer, Michael Green, Edith M. Marom, Eli Konen, and Nahum Kiryati
Subjects: Ultra low dose, Feature aggregation, Computer science, Image quality, business.industry, Noise reduction, Physics::Medical Physics, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, 02 engineering and technology, Convolutional neural network, 030218 nuclear medicine & medical imaging, 03 medical and health sciences, 0302 clinical medicine, Feature (computer vision), Computer Science::Computer Vision and Pattern Recognition, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: Lung cancer CT screening programs are continuously reducing patient exposure to radiation at the expense of image quality. State-of-the-art denoising algorithms are instrumental in preserving the diagnostic value of these images. In this work, a novel neural denoising scheme is proposed for ULD chest CT. The proposed method aggregates multi-scale features that provide rich information for the computation of a perceptive loss. The loss is further optimized for chest CT data by using denoising auto-encoders on real CT images to build the feature extracting network instead of using an existing network trained on natural images. The proposed method was validated on co-registered pairs of real ULD and normal dose scans and compared favorably with published state-of-the-art denoising$ networks both qualitatively and quantitatively.
Published: 2019
Full Text: View/download PDF

35. Learning Feature Aggregation in Temporal Domain for Re-Identification

Author: Jakub Sochor, Petr Dobeš, Adam Herout, Jakub Špaňhel, Roman Juránek, and Vojtěch Bartl
Subjects: FOS: Computer and information sciences, Thesaurus (information retrieval), Feature aggregation, Computer science, business.industry, Feature vector, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, Pattern recognition, 02 engineering and technology, Viewpoints, Object (computer science), Weighting, Domain (software engineering), Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Focus (optics), business, Software
Abstract: Person re-identification is a standard and established problem in the computer vision community. In recent years, vehicle re-identification is also getting more attention. In this paper, we focus on both these tasks and propose a method for aggregation of features in temporal domain as it is common to have multiple observations of the same object. The aggregation is based on weighting different elements of the feature vectors by different weights and it is trained in an end-to-end manner by a Siamese network. The experimental results show that our method outperforms other existing methods for feature aggregation in temporal domain on both vehicle and person re-identification tasks. Furthermore, to push research in vehicle re-identification further, we introduce a novel dataset CarsReId74k. The dataset is not limited to frontal/rear viewpoints. It contains 17,681 unique vehicles, 73,976 observed tracks, and 277,236 positive pairs. The dataset was captured by 66 cameras from various angles., Under consideration at Computer Vision and Image Understanding
Published: 2019

36. Unified Image Aesthetic Prediction via Scanpath-Guided Feature Aggregation Network

Author: Ying Yu, Xiaodan Zhang, Xinbo Gao, Lihuo He, and Wen Lu
Subjects: Feature aggregation, Computer science, business.industry, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, Gaze, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, 0105 earth and related environmental sciences
Abstract: The performance of automatic aesthetic prediction has achieved significant improvement by utilizing deep convolutional neural networks (CNNs). However, existing CNN methods can only achieve limited success because (1) most of the methods take one fixed-size patch as the training example, which loses the fine-grained details and the holistic layout information, and (2) most of the methods ignore the biologically cues such as the gaze shifting sequence in image aesthetic assessment. To address these challenges, we propose a scanpath-guided feature aggregation model for aesthetic prediction. In our model, human fixation map and the view scanpath are predicted by a multi-scale network. Then a sequence of regions are adaptively selected according to the scanpath. These attended regions are then progressively fed into the CNN and LSTM network to accumulate the information, yielding a compact image level representation. Extensive experiments on the large scale aesthetics assessment benchmark AVA and Photo.net data set thoroughly demonstrate the efficacy of our approach for unified aesthetic prediction tasks: (i) aesthetic quality classification; (ii) aesthetic score regression; and (iii) aesthetic score distribution prediction.
Published: 2019
Full Text: View/download PDF

37. On the Importance of Feature Aggregation for Face Reconstruction

Author: Xiang Xu, Ioannis A. Kakadiaris, and Ha Le
Subjects: Feature aggregation, Artificial neural network, business.industry, Computer science, Pattern recognition, Reconstruction algorithm, 02 engineering and technology, Iterative reconstruction, Solid modeling, 01 natural sciences, Set (abstract data type), Face (geometry), 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Sensitivity (control systems), 010306 general physics, business
Abstract: The goal of this work is to seek principles of designing a deep neural network for 3D face reconstruction from a single image. To make the evaluation simple, we generated a synthetic dataset and used it for evaluation. We conducted extensive experiments using an end-to-end face reconstruction algorithm using E2FAR and its variations, and analyzed the reason why it can be successfully applied for 3D face reconstruction. From the comparative studies, we conclude that feature aggregation from different layers is a key point to training better neural networks for 3D face reconstruction. Based on these observations, a face reconstruction feature aggregation network (FR-FAN) is proposed, which obtains significant improvements compared with baselines on the synthetic validation set. We evaluate our model on existing popular indoor and in-the-wild 2D-3D datasets. Extensive experiments demonstrate that FR-FAN performs 16.50% and 9.54% better than E2FAR on BU-3DFE and JNU-3D, respectively. Finally, the sensitivity analysis we performed on controlled datasets demonstrates that our designed network is robust to large variations of pose, illumination, and expressions.
Published: 2019
Full Text: View/download PDF

38. Multi-view X-Ray R-CNN

Author: Stefan Roth, Jan-Martin O. Steitz, and Faraz Saeedan
Subjects: 0209 industrial biotechnology, Exploit, Feature aggregation, Computer science, business.industry, Pipeline (computing), Pooling, Pattern recognition, 02 engineering and technology, Avionics, Object detection, Image (mathematics), 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Layer (object-oriented design), business
Abstract: Motivated by the detection of prohibited objects in carry-on luggage as a part of avionic security screening, we develop a CNN-based object detection approach for multi-view X-ray image data. Our contributions are two-fold. First, we introduce a novel multi-view pooling layer to perform a 3D aggregation of 2D CNN-features extracted from each view. To that end, our pooling layer exploits the known geometry of the imaging system to ensure geometric consistency of the feature aggregation. Second, we introduce an end-to-end trainable multi-view detection pipeline based on Faster R-CNN, which derives the region proposals and performs the final classification in 3D using these aggregated multi-view features. Our approach shows significant accuracy gains compared to single-view detection while even being more efficient than performing single-view detection in each view.
Published: 2019
Full Text: View/download PDF

39. Toward More Robust and Real-Time Unmanned Aerial Vehicle Detection and Tracking via Cross-Scale Feature Aggregation Based on the Center Keypoint

Author: Mengdao Xing, Liang Han, Guyo Chala Urgessa, Rui Chen, and Min Bao
Subjects: Computational complexity theory, center point estimation, Computer science, Science, 02 engineering and technology, Tracking (particle physics), 01 natural sciences, Constant false alarm rate, Region of interest, Vehicle detection, 0103 physical sciences, unmanned aerial vehicle, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Point estimation, 010302 applied physics, Feature aggregation, business.industry, Kalman filter, cross-scale feature aggregation, General Earth and Planetary Sciences, 020201 artificial intelligence & image processing, region of interest, Artificial intelligence, business
Abstract: Unmanned aerial vehicles (UAVs) play an essential role in various applications, such as transportation and intelligent environmental sensing. However, due to camera motion and complex environments, it can be difficult to recognize the UAV from its surroundings thus, traditional methods often miss detection of UAVs and generate false alarms. To address these issues, we propose a novel method for detecting and tracking UAVs. First, a cross-scale feature aggregation CenterNet (CFACN) is constructed to recognize the UAVs. CFACN is a free anchor-based center point estimation method that can effectively decrease the false alarm rate, the misdetection of small targets, and computational complexity. Secondly, the region of interest-scale-crop-resize (RSCR) method is utilized to merge CFACN and region-of-interest (ROI) CFACN (ROI-CFACN) further, in order to improve the accuracy at a lower computational cost. Finally, the Kalman filter is adopted to track the UAV. The effectiveness of our method is validated using a collected UAV dataset. The experimental results demonstrate that our methods can achieve higher accuracy with lower computational cost, being superior to BiFPN, CenterNet, YoLo, and their variants on the same dataset.
Published: 2021
Full Text: View/download PDF

40. Blind Image Quality Assessment Based on Classification Guidance and Feature Aggregation

Author: Cai Weipeng, Minyuan Wu, Cien Fan, Lian Zou, Yifeng Liu, and Ma Yang
Subjects: Computer Networks and Communications, Computer science, Image quality, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, lcsh:TK7800-8360, 02 engineering and technology, Convolutional neural network, Image (mathematics), Distortion, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Layer (object-oriented design), deep neutral networks, business.industry, lcsh:Electronics, 020206 networking & telecommunications, Pattern recognition, Mixture model, feature aggregation, Hardware and Architecture, Control and Systems Engineering, Feature (computer vision), Signal Processing, 020201 artificial intelligence & image processing, Artificial intelligence, business, blind image quality assessment
Abstract: In this work, we present a convolutional neural network (CNN) named CGFA-CNN for blind image quality assessment (BIQA). A unique two-stage strategy is utilized which firstly identifies the distortion type in an image using Sub-Network I and then quantifies this distortion using Sub-Network II. Different from most deep neural networks, we extract hierarchical features as descriptors to enhance the image representation and design a feature aggregation layer in an end-to-end training manner applying Fisher encoding to visual vocabularies modeled by Gaussian mixture models (GMMs). Considering the authentic distortions and synthetic distortions, the hierarchical feature contains the characteristics of a CNN trained on the self-built dataset and a CNN trained on ImageNet. We evaluated our algorithm on four publicly available databases, and the results demonstrate that our CGFA-CNN has superior performance over other methods both on synthetic and authentic databases.
Published: 2020
Full Text: View/download PDF

41. Adaptive deep feature aggregation using Fourier transform and low-pass filtering for robust object retrieval

Author: Zhongyu Li, Chen Li, Ziyao Zhou, Xinsheng Wang, and Ming Zeng
Subjects: Feature aggregation, business.industry, Computer science, Deep learning, 020207 software engineering, Pattern recognition, 02 engineering and technology, Convolutional neural network, Weighting, symbols.namesake, Fourier transform, Robustness (computer science), Frequency domain, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, symbols, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Electrical and Electronic Engineering, business, Image retrieval
Abstract: With the rapid development of deep learning techniques, convolutional neural networks (CNN) have been widely investigated for the feature representations in the image retrieval task. However, the key step in CNN-based retrieval, i.e., feature aggregation has not been solved in a robust and general manner when tackling different kinds of images. In this paper, we present a deep feature aggregation method for image retrieval using the Fourier transform and low-pass filtering, which can adaptively compute the weights for each feature map with discrimination. Specifically, the low-pass filtering can preserve the semantic information in each feature map by transforming images to the frequency domain. In addition, we develop three adaptive methods to further improve the robustness of feature aggregation, i.e., Region of Interests (ROI) selection, spatial weighting and channel weighting. Experimental results demonstrate the superiority of the proposed method in comparison with other state-of-the-art, in achieving robust and accurate object retrieval under five benchmark datasets.
Published: 2020
Full Text: View/download PDF

42. Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention

Author: Fazliddin Anvarov, Dae Ha Kim, and Byung Cheol Song
Subjects: Computer Networks and Communications, Computer science, lcsh:TK7800-8360, 02 engineering and technology, Convolutional neural network, Field (computer science), Image (mathematics), 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, action recognition, Feature aggregation, business.industry, lcsh:Electronics, 020206 networking & telecommunications, Pattern recognition, 3D CNN, Action (philosophy), Hardware and Architecture, Control and Systems Engineering, Filter (video), Signal Processing, Action recognition, 020201 artificial intelligence & image processing, Artificial intelligence, business, deep feature attention
Abstract: Action recognition is an active research field that aims to recognize human actions and intentions from a series of observations of human behavior and the environment. Unlike image-based action recognition mainly using a two-dimensional (2D) convolutional neural network (CNN), one of the difficulties in video-based action recognition is that video action behavior should be able to characterize both short-term small movements and long-term temporal appearance information. Previous methods aim at analyzing video action behavior only using a basic framework of 3D CNN. However, these approaches have a limitation on analyzing fast action movements or abruptly appearing objects because of the limited coverage of convolutional filter. In this paper, we propose the aggregation of squeeze-and-excitation (SE) and self-attention (SA) modules with 3D CNN to analyze both short and long-term temporal action behavior efficiently. We successfully implemented SE and SA modules to present a novel approach to video action recognition that builds upon the current state-of-the-art methods and demonstrates better performance with UCF-101 and HMDB51 datasets. For example, we get accuracies of 92.5% (16f-clip) and 95.6% (64f-clip) with the UCF-101 dataset, and 68.1% (16f-clip) and 74.1% (64f-clip) with HMDB51 for the ResNext-101 architecture in a 3D CNN.
Published: 2020
Full Text: View/download PDF

43. Main Aortic Segmentation from CTA with Deep Feature Aggregation Network

Author: Wenji Wang and Haogang Zhu
Subjects: Feature aggregation, Computer science, Generalization, business.industry, 0206 medical engineering, Pattern recognition, 02 engineering and technology, Image segmentation, 020601 biomedical engineering, Convolution, Visualization, Level set, 0202 electrical engineering, electronic engineering, information engineering, Medical imaging, 020201 artificial intelligence & image processing, Segmentation, Artificial intelligence, business
Abstract: In this study, we propose a Deep Feature Aggregation network (DFA-Net) for main aortic segmentation from CTA (Computed Tomography Angiography) by aggregating features from forwarding layers to Ieverage more visual information. To practically verify the effectiveness of our method, we collect 90 CTA volumes from Beijing AnZhen Hospital up to over 60 thousands 2-D slices. First, we use a level-set based algorithm to efficiently generate the dataset for training and validating the deep model. Then the dataset is divided into three parts, 70 instances are used for training and 5 instances are used for validating the best parameters, and the rest 15 instances are used for testing the generalization of the model. Finally, the testing result shows that mIoU (mean Intersection-over-Union) of the segmentation result is 0.943, which indicates that by properly aggregating more visual features in a deep network the segmentation model can achieve state-of-the-art performance.
Published: 2018
Full Text: View/download PDF

44. Long Length Document Classification by Local Convolutional Feature Aggregation

Author: Cong Zhenghai, Jun He, Liu Kaile, Jiali Zhao, Liu Liu, and Ji Yefei
Subjects: lcsh:T55.4-60.8, Computer science, convolutional feature aggregation, recurrent attention model, 02 engineering and technology, computer.software_genre, Convolutional neural network, lcsh:QA75.5-76.95, Theoretical Computer Science, 03 medical and health sciences, 0302 clinical medicine, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, lcsh:Industrial engineering. Management engineering, Numerical Analysis, Feature aggregation, business.industry, document classification, Deep learning, Document classification, Sentiment analysis, deep learning, Computational Mathematics, Recurrent neural network, Computational Theory and Mathematics, 020201 artificial intelligence & image processing, recurrent neural network, Data mining, Artificial intelligence, lcsh:Electronic computers. Computer science, business, computer, Classifier (UML), 030217 neurology & neurosurgery
Abstract: The exponential increase in online reviews and recommendations makes document classification and sentiment analysis a hot topic in academic and industrial research. Traditional deep learning based document classification methods require the use of full textual information to extract features. In this paper, in order to tackle long document, we proposed three methods that use local convolutional feature aggregation to implement document classification. The first proposed method randomly draws blocks of continuous words in the full document. Each block is then fed into the convolution neural network to extract features and then are concatenated together to output the classification probability through a classifier. The second model improves the first by capturing the contextual order information of the sampled blocks with a recurrent neural network. The third model is inspired by the recurrent attention model (RAM), in which a reinforcement learning module is introduced to act as a controller for selecting the next block position based on the recurrent state. Experiments on our collected four-class arXiv paper dataset show that the three proposed models all perform well, and the RAM model achieves the best test accuracy with the least information.
Published: 2018

45. Learning Local Feature Aggregation Functions with Backpropagation

Author: Despoina Paschalidou, Anastasios Delopoulos, Christos Diou, and Angelos Katharopoulos
Subjects: FOS: Computer and information sciences, Noise measurement, Feature aggregation, Computer science, business.industry, Feature vector, Feature extraction, Machine Learning (stat.ML), Fisher vector, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Backpropagation, Machine Learning (cs.LG), Computer Science - Learning, Bag-of-words model, Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Classifier (UML), 0105 earth and related environmental sciences
Abstract: This paper introduces a family of local feature aggregation functions and a novel method to estimate their parameters, such that they generate optimal representations for classification (or any task that can be expressed as a cost function minimization problem). To achieve that, we compose the local feature aggregation function with the classifier cost function and we backpropagate the gradient of this cost function in order to update the local feature aggregation function parameters. Experiments on synthetic datasets indicate that our method discovers parameters that model the class-relevant information in addition to the local feature space. Further experiments on a variety of motion and visual descriptors, both on image and video datasets, show that our method outperforms other state-of-the-art local feature aggregation functions, such as Bag of Words, Fisher Vectors and VLAD, by a large margin., In Proceedings of the 25th European Signal Processing Conference (EUSIPCO 2017)
Published: 2018
Full Text: View/download PDF

46. Use of OWA operators for feature aggregation in image classification

Author: Carlos Lopez-Molina, Edurne Barrenechea, Humberto Bustince, Miguel Pagola, and Juan I. Forcen
Subjects: 0209 industrial biotechnology, Contextual image classification, Feature aggregation, business.industry, Feature vector, Pooling, Pattern recognition, 02 engineering and technology, computer.software_genre, Convolutional neural network, Image (mathematics), 020901 industrial engineering & automation, Discriminative model, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, business, Relevant information, computer, Mathematics
Abstract: Feature aggregation is a crucial step in many methods of image classification, like the Bag-of-Words (BoW) model or the Convolutional Neural Networks (CNN). In this aggregation step, usually known as spatial pooling, the descriptors of neighbouring elements within a region of the image are combined into a local or a global feature vector. The combined vector must contain relevant information, while removing irrelevant and confusing details. Maximum and average are the most common aggregation functions used in the pooling step. To improve the aggregation of relevant information without degrading their discriminative power for classification in this work we propose the use of Ordered Weighted operators. We provide an extensive evaluation that shows that the final result of the classification using OWA aggregation is always better than average pooling and better than maximum pooling when dealing with small dictionary sizes.
Published: 2017
Full Text: View/download PDF

47. On the Automatic Identification of Music for Common Activities

Author: Yadati, K., Liem, C.C.S., Larson, M., Hanjalic, A., Ionescu, B., and Ionescu, B.
Subjects: Information retrieval, Multimedia, Feature aggregation, Computer science, Data Science, Pop music automation, 02 engineering and technology, computer.software_genre, Metadata, 030507 speech-language pathology & audiology, 03 medical and health sciences, Identification (information), Time windows, 0202 electrical engineering, electronic engineering, information engineering, Music information retrieval, 020201 artificial intelligence & image processing, Social media, 0305 other medical science, computer
Abstract: In this paper, we address the challenge of identifying music suitable to accompany typical daily activities. We first derive a list of common activities by analyzing social media data. Then, an automatic approach is proposed to find music for these activities. Our approach is inspired by our experimentally acquired findings (a) that genre and instrument information, i.e., as appearing in the textual metadata, are not sufficient to distinguish music appropriate for different types of activities, and (b) that existing content-based approaches in the music information retrieval community do not overcome this insufficiency. The main contributions of our work are (a) our analysis of the properties of activity-related music that inspire our use of novel high-level features, e.g., drop-like events, and (b) our approach's novel method of extracting and combining low-level features, and, in particular, the joint optimization of the time window for feature aggregation and the number of features to be used. The effectiveness of the approach method is demonstrated in a comprehensive experimental study including failure analysis.
Published: 2017
Full Text: View/download PDF

48. The effect of different video summarization models on the quality of video recommendation based on low-level visual features

Author: Yashar Deldjoo, Paolo Cremonesi, Massimo Quadrana, and Markus Schedl
Subjects: Content-based video recommendation, Computer science, RSS, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Evaluation, Feature aggregation, Shot segmentation granularity, Temporal feature summarization, 02 engineering and technology, Recommender system, Video quality, computer.software_genre, temporal feature summarization, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Segmentation, Quality (business), media_common, shot segmentation granularity, Information retrieval, evaluation, Multimedia, Search engine indexing, computer.file_format, Automatic summarization, feature aggregation, Video tracking, 020201 artificial intelligence & image processing, Content-based video recommendation, temporal feature summarization, feature aggregation, shot segmentation granularity, evaluation, computer
Abstract: Video summarization is a powerful tool for video understanding and browsing and is considered as an enabler for many video analysis tasks. While the effect of video summarization models has been largely studied in video retrieval and indexing applications over the last decade, its impact has not been well investigated in content-based video recommendation systems (RSs) based on low-level visual features, where the goal is to recommend items/videos to users based on visual content of videos. This work reveals specific problems related to video summarization and their impact on video recommendation. We present preliminary results of an analysis involving applying different video summarization models for the problem of video recommendation on a real-world RS dataset (MovieLens-10M) and show how temporal feature aggregation and video segmentation granularity can significantly influence/improve the quality of recommendation.
Published: 2017

49. Feature Deep Continuous Aggregation for 3D Vehicle Detection

Author: Qing Gu, Kun Zhao, Yu Meng, and Li Liu
Subjects: Computer science, Stability (learning theory), 02 engineering and technology, Field (computer science), 3D vehicle detection, autonomous driving, Bounding overwatch, 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Instrumentation, Fluid Flow and Transfer Processes, 050210 logistics & transportation, business.industry, Orientation (computer vision), Process Chemistry and Technology, 05 social sciences, General Engineering, Pattern recognition, Object detection, feature aggregation, Computer Science Applications, Feature (computer vision), bird’s-eye view, Benchmark (computing), loss weight mask, 020201 artificial intelligence & image processing, Artificial intelligence, business, Generator (mathematics)
Abstract: 3D object detection has recently become a research hotspot in the field of autonomous driving. Although great progress has been made, it still needs to be further improved. Therefore, this paper presents FDCA, a feature deep continuous aggregation network using multi-sensors for 3D vehicle detection. The proposed network adopts a two-stage structure with the bird&rsquo, s-eye view (BEV) map and the RGB image as an input. In the first stage, two feature extractors were used to generate feature maps with the high-resolution and representational ability for each input view. These feature maps were then fused and fed to a 3D proposal generator to obtain the reliable 3D vehicle proposals. In the second stage, the refinement network aggregated the features of the proposal regions further and performs classifications, a 3D bounding boxes regression, and orientation estimations to predict the location and heading of vehicles in 3D space. The FDCA network proposed was trained and evaluated on the KITTI 3D object detection benchmark. The experimental results of the validation set illustrated that compared with other fusion-based methods, the 3D average precision (AP) could achieve 76.82% on a moderate setting while having real-time capability, which was higher than that of the second-best performing method by 2.38%. Meanwhile, the results of ablation experiments show that the convergence rate of FDCA was much faster and the stability was also much better, making it a candidate for application in autonomous driving.
Published: 2019
Full Text: View/download PDF

50. Deep Temporal–Spatial Aggregation for Video-Based Facial Expression Recognition

Author: Wenshu Li, Jinzhao Wu, Wenping Guo, Xiaoying Guo, Junjie Xu, and Xianzhang Pan
Subjects: Physics and Astronomy (miscellaneous), Computer science, General Mathematics, Pooling, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Optical flow, 02 engineering and technology, Overfitting, Convolutional neural network, optical flow, convolutional neural networks, 0202 electrical engineering, electronic engineering, information engineering, Computer Science (miscellaneous), facial expression recognition, Layer (object-oriented design), temporal-spatial features, Facial expression, business.industry, lcsh:Mathematics, Aggregate (data warehouse), 020207 software engineering, Pattern recognition, lcsh:QA1-939, feature aggregation, Spatial multiplexing, ComputingMethodologies_PATTERNRECOGNITION, Chemistry (miscellaneous), 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: The proposed method has 30 streams, i.e., 15 spatial streams and 15 temporal streams. Each spatial stream corresponds to each temporal stream. Therefore, this work correlates with the symmetry concept. It is a difficult task to classify video-based facial expression owing to the gap between the visual descriptors and the emotions. In order to bridge the gap, a new video descriptor for facial expression recognition is presented to aggregate spatial and temporal convolutional features across the entire extent of a video. The designed framework integrates a state-of-the-art 30 stream and has a trainable spatial&ndash, temporal feature aggregation layer. This framework is end-to-end trainable for video-based facial expression recognition. Thus, this framework can effectively avoid overfitting to the limited emotional video datasets, and the trainable strategy can learn to better represent an entire video. The different schemas for pooling spatial&ndash, temporal features are investigated, and the spatial and temporal streams are best aggregated by utilizing the proposed method. The extensive experiments on two public databases, BAUM-1s and eNTERFACE05, show that this framework has promising performance and outperforms the state-of-the-art strategies.
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

55 results on '"feature aggregation"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources