Descriptor: "pyramid" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"pyramid"' showing total 760 results

Start Over Descriptor "pyramid" Publisher ieee

760 results on '"pyramid"'

1. EAPT: Efficient Attention Pyramid Transformer for Image Processing.

Author: Lin, Xiao, Sun, Shuzhou, Huang, Wei, Sheng, Bin, Li, Ping, and Feng, David Dagan
Abstract: Recent transformer-based models, especially patch-based methods, have shown huge potentiality in vision tasks. However, the split fixed-size patches divide the input features into the same size patches, which ignores the fact that vision elements are often various and thus may destroy the semantic information. Also, the vanilla patch-based transformer cannot guarantee the information communication between patches, which will prevent the extraction of attention information with a global view. To circumvent those problems, we propose an Efficient Attention Pyramid Transformer (EAPT). Specifically, we first propose the Deformable Attention, which learns an offset for each position in patches. Thus, even with split fixed-size patches, our method can still obtain non-fixed attention information that can cover various vision elements. Then, we design the Encode-Decode Communication module (En-DeC module), which can obtain communication information among all patches to get more complete global attention information. Finally, we propose a position encoding specifically for vision transformers, which can be used for patches of any dimension and any length. Extensive experiments on the vision tasks of image classification, object detection, and semantic segmentation demonstrate the effectiveness of our proposed model. Furthermore, we also conduct rigorous ablation studies to evaluate the key components of the proposed structure. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. RAt-CapsNet: A Deep Learning Network Utilizing Attention and Regional Information for Abnormality Detection in Wireless Capsule Endoscopy

Author: Md. Jahin Alam, Rifat Bin Rashid, Shaikh Anowarul Fattah, and Mohammad Saquib
Subjects: Wireless capsule endoscopy, deep CNN, GI tract, attention mechanism, pyramid, Computer applications to medicine. Medical informatics, R858-859.7, Medical technology, R855-855.5
Abstract: Background: The emergence of wireless capsule endoscopy (WCE) has presented a viable non-invasive mean of identifying gastrointestinal diseases in the field of clinical gastroenterology. However, to overcome its extended time of manual inspection, a computer aided automatic detection system is getting vast popularity. In this case, major challenges are low resolution and lack of regional context in images extracted from WCE videos. Methods: For tackling these challenges, in this paper a convolution neural network (CNN) based architecture, namely RAt-CapsNet, is proposed that reliably employs regional information and attention mechanism to classify abnormalities from WCE video data. The proposed RAt-CapsNet consists of two major pipelines: Compression Pipeline and Regional Correlative Pipeline. In the compression pipeline, an encoder module is designed using a Volumetric Attention Mechanism which provides 3D enhancement to feature maps using spatial domain condensation as well as channel-wise filtering for preserving relevant structural information of images. On the other hand, the regional correlative pipeline consists of Pyramid Feature Extractor which operates on image driven feature vectors to generalize and propagate local relationships of pixels from WCE abnormalities with respect to the normal healthy surrounding. The feature vectors generated by the pipelines are then accumulated to formulate a classification standpoint. Results: Promising computational accuracy of mean 98.51% in binary class and over 95.65% in multi-class are obtained through extensive experimentation on a highly unbalanced public dataset with over 47 thousand labelled. Conclusion: This outcome in turn supports the efficacy of the proposed methodology as a noteworthy WCE abnormality detection as well as diagnostic system.
Published: 2022
Full Text: View/download PDF

3. SDTP: Semantic-Aware Decoupled Transformer Pyramid for Dense Image Prediction.

Author: Li, Zekun, Liu, Yufan, Li, Bing, Feng, Bailan, Wu, Kebin, Peng, Chengwei, and Hu, Weiming
Subjects: *PYRAMIDS, *COMPUTER vision, *FORECASTING, *SEMANTICS, *IMAGE segmentation, *PROBLEM solving
Abstract: Although transformer has achieved great progress on computer vision tasks, the scale variation in dense image prediction is still the key challenge. Few effective multi-scale techniques are applied in transformer and there are two main limitations in the current methods. On the one hand, self-attention module in vanilla transformer fails to sufficiently exploit the diversity of semantic information because of its rigid mechanism. On the other hand, it is difficult to build attention and interaction among different levels due to the heavy computational burden. To alleviate this problem, we first revisit multi-scale problem in dense prediction, verifying the significance of diverse semantic representation and multi-scale interaction, and exploring the adaptation of transformer to pyramidal structure. Inspired by these findings, we propose a novel Semantic-aware Decoupled Transformer Pyramid (SDTP) for dense image prediction, consisting of Intra-level Semantic Promotion (ISP), Cross-level Decoupled Interaction (CDI) and Attention Refinement Function (ARF). ISP explores the semantic diversity in different receptive space through more flexible self-attention strategy. CDI builds the global attention and interaction among different levels in decoupled space which also solves the problem of heavy computation. Besides, ARF is further added to refine the attention in transformer. Experimental results demonstrate the validity and generality of the proposed method, which outperforms the state-of-the-art by a significant margin in dense image prediction tasks. Furthermore, the proposed components are all plug-and-play, which can be embedded in other methods. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

4. Pyramid Convolutional RNN for MRI Image Reconstruction.

Author: Chen, Eric Z., Wang, Puyang, Chen, Xiao, Chen, Terrence, and Sun, Shanhui
Subjects: *MAGNETIC resonance imaging, *PYRAMIDS, *KNEE, *DEEP learning
Abstract: Fast and accurate MRI image reconstruction from undersampled data is crucial in clinical practice. Deep learning based reconstruction methods have shown promising advances in recent years. However, recovering fine details from undersampled data is still challenging. In this paper, we introduce a novel deep learning based method, Pyramid Convolutional RNN (PC-RNN), to reconstruct images from multiple scales. Based on the formulation of MRI reconstruction as an inverse problem, we design the PC-RNN model with three convolutional RNN (ConvRNN) modules to iteratively learn the features in multiple scales. Each ConvRNN module reconstructs images at different scales and the reconstructed images are combined by a final CNN module in a pyramid fashion. The multi-scale ConvRNN modules learn a coarse-to-fine image reconstruction. Unlike other common reconstruction methods for parallel imaging, PC-RNN does not employ coil sensitive maps for multi-coil data and directly model the multiple coils as multi-channel inputs. The coil compression technique is applied to standardize data with various coil numbers, leading to more efficient training. We evaluate our model on the fastMRI knee and brain datasets and the results show that the proposed model outperforms other methods and can recover more details. The proposed method is one of the winner solutions in the 2019 fastMRI competition. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

5. Mixed-Scale Unet Based on Dense Atrous Pyramid for Monocular Depth Estimation

Author: Yifan Yang, Yuqing Wang, Chenhao Zhu, Ming Zhu, Haijiang Sun, and Tianze Yan
Subjects: Atrous convolution, dense connection, local and global, multi-scale, pyramid, Unet, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Monocular depth estimation is an undirected problem, so constructing a network to predict better image depth information is an important research topic. This paper proposes a mixed-scale Unet network (MAPUnet) with a dense atrous pyramid based on the coder-decoder structure widely used in computer vision. We innovatively introduce the Unet++ structure of the image segmentation network for depth estimation. We reset the number of convolutional layers of the network under the framework of the Unet++ network and innovatively connect the decoders densely. Moreover, by choosing the appropriate size of the atrous radius, we form a dense atrous pyramid based on different feature layers to better connect the features in the deep and shallow layers of the network. To verify the effectiveness of the proposed algorithm, we test the network on the KITTI dataset and the NYU Depth V2 dataset. We compare the network with the current state-of-the-art methods. The proposed method has higher accuracy and has steadily improved relative to the threshold of accuracy and root-mean-square error. We also conduct ablation studies, studies targeting the effectiveness of the network framework, and discussions on the convergence time and parameter complexity of the network. We will open-source the code at https://github.com/yang-yi-fan/MAPUnet.
Published: 2021
Full Text: View/download PDF

6. Pyramid-based Scatterplots Sampling for Progressive and Streaming Data Visualization.

Author: Chen, Xin, Zhang, Jian, Fu, Chi-Wing, Fekete, Jean-Daniel, and Wang, Yunhai
Subjects: DATA visualization, SCATTER diagrams, SPECIFIC gravity, VISUAL analytics, ELECTRONIC data processing, PROGRESSIVE collapse
Abstract: We present a pyramid-based scatterplot sampling technique to avoid overplotting and enable progressive and streaming visualization of large data. Our technique is based on a multiresolution pyramid-based decomposition of the underlying density map and makes use of the density values in the pyramid to guide the sampling at each scale for preserving the relative data densities and outliers. We show that our technique is competitive in quality with state-of-the-art methods and runs faster by about an order of magnitude. Also, we have adapted it to deliver progressive and streaming data visualization by processing the data in chunks and updating the scatterplot areas with visible changes in the density map. A quantitative evaluation shows that our approach generates stable and faithful progressive samples that are comparable to the state-of-the-art method in preserving relative densities and superior to it in keeping outliers and stability when switching frames. We present two case studies that demonstrate the effectiveness of our approach for exploring large data. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

7. Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation.

Author: Sun, Deqing, Yang, Xiaodong, Liu, Ming-Yu, and Kautz, Jan
Subjects: *OPTICAL flow, *CONVOLUTIONAL neural networks, *ADAPTIVE optics, *MATTER
Abstract: We investigate two crucial and closely-related aspects of CNNs for optical flow estimation: models and training. First, we design a compact but effective CNN model, called PWC-Net, according to simple and well-established principles: pyramidal processing, warping, and cost volume processing. PWC-Net is 17 times smaller in size, 2 times faster in inference, and 11 percent more accurate on Sintel final than the recent FlowNet2 model. It is the winning entry in the optical flow competition of the robust vision challenge. Next, we experimentally analyze the sources of our performance gains. In particular, we use the same training procedure for PWC-Net to retrain FlowNetC, a sub-network of FlowNet2. The retrained FlowNetC is 56 percent more accurate on Sintel final than the previously trained one and even 5 percent more accurate than the FlowNet2 model. We further improve the training procedure and increase the accuracy of PWC-Net on Sintel by 10 percent and on KITTI 2012 and 2015 by 20 percent. Our newly trained model parameters and training protocols are available on https://github.com/NVlabs/PWC-Net. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

8. Multiple Pyramids Based Image Inpainting Using Local Patch Statistics and Steering Kernel Feature.

Author: Ghorai, Mrinmoy, Samanta, Soumitra, Mandal, Sekhar, and Chanda, Bhabatosh
Subjects: *INPAINTING, *PYRAMIDS, *STATISTICS, *IMAGE, *IMAGE reconstruction
Abstract: In this paper, we propose a novel multiple pyramids based image inpainting method using local patch statistics and geometric feature-based sparse representation to maintain texture consistency and structure coherence. First, we approximate each patch in the target region (region to be inpainted) by statistically dominant local candidate patches to preserve local consistency. Then each approximated patch is refined by a sparse representation of candidate patches based on local steering kernel (LSK) feature to retain texture quality. We also propose a multiple pyramids based approach to generate several inpainted versions of the input image, one for each of the pyramids. Finally, we combine the inpainted images by gradient-based weighted average to produce the final inpainted image. This approach helps to maintain structure coherence and to remove artifacts which may appear in the inpainted images due to different initial scales of the individual pyramids. The proposed method is tested on a wide range of natural images for scratch and blob/object removal. We have presented both quantitative and qualitative comparison with the existing methods to demonstrate the superiority of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

9. Enhancing Selectivity in Big Data.

Author: Lecuyer, Mathias, Spahn, Riley, Geambasu, Roxana, Huang, Tzu-Kuo, and Sen, Siddhartha
Abstract: Today’s companies collect immense amounts of personal data and enable wide access to it within the company. This exposes the data to external hackers and privacy-transgressing employees. This study shows that, for a wide and important class of workloads, only a fraction of the data is needed to approach state-of-the-art accuracy. We propose selective data systems that are designed to pinpoint the data that is valuable for a company’s current and evolving workloads. These systems limit data exposure by setting aside the data that is not truly valuable. [ABSTRACT FROM PUBLISHER]
Published: 2018
Full Text: View/download PDF

10. Efficient Attention Pyramid Network for Semantic Segmentation

Author: Qirui Yang, Tao Ku, and Kunyuan Hu
Subjects: General Computer Science, Computer science, Feature extraction, Context (language use), 02 engineering and technology, 010501 environmental sciences, Semantics, Machine learning, computer.software_genre, 01 natural sciences, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Segmentation, Pyramid (image processing), 0105 earth and related environmental sciences, computer.programming_language, business.industry, General Engineering, Pascal (programming language), Image segmentation, Semantic segmentation, spatial pyramid, Feature (computer vision), PASCAL VOC 2012, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, attention mechanism, computer, lcsh:TK1-9971, Cityscapes
Abstract: Semantic segmentation is a task that covers most of the perception needs of intelligent vehicles in an unified way. Recent studies witnessed that attention mechanisms achieve impressive performance in computer vision task. Current attention mechanisms based segmentation methods differ with each other in position and form of the attention mechanism, and perform differently in practice. This paper firstly introduces the effectiveness of multi-scale context features and attention mechanisms in segmentation tasks. We find that multi-scale and channel attention can play a vital role in constructing effective context features. Based on this analysis, this paper proposes an efficient attention pyramid network (EAPNet) for semantic segmentation. Specifically, to efficient handle the problem of segmenting objects at multiple scales, we design efficient channel attention pyramid (ECAP) which employ atrous convolution with channel attention in cascade or in parallel to capture multi-scale context by using multiple atrous rates. Furthermore, we propose a residual attention fusion block (RAFB), whose purpose is to simultaneously focus on meaningful low-level feature maps and spatial location information. At the same time, we will explore different channel attention modules and spatial attention modules, and describe their impact on network performance. We empirically evaluate our EAPNet on two semantic segmentation datasets, including PASCAL VOC 2012 and Cityscapes datasets. Experimental results show that without MS COCO pre-training and any post-processing, EAPNet achieved 81.7% mIoU on the PASCAL VOC 2012 validation set. With deeplabv3+ as the benchmark, EAPNet improve the model performance of more than 1.50% mIoU.
Published: 2021

11. SPEDCCNN: Spatial Pyramid-Oriented Encoder-Decoder Cascade Convolution Neural Network for Crop Disease Leaf Segmentation

Author: Gang Lu, Yuxia Yuan, and Zengyong Xu
Subjects: 0106 biological sciences, Similarity (geometry), General Computer Science, Computer science, Pooling, 02 engineering and technology, 01 natural sciences, Convolutional neural network, Image (mathematics), pooling strategy, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Segmentation, Pyramid (image processing), Electrical and Electronic Engineering, Artificial neural network, business.industry, fungi, General Engineering, food and beverages, Pattern recognition, spatial pyramid, Crop disease leaf segmentation, Kernel (image processing), Cascade, encoder-decoder cascade convolution neural network, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971, 010606 plant biology & botany
Abstract: Disease is one of the main factors affecting crop growth. How to reflect the external morphological features of the disease and completely retain the color and texture information of the disease area is one of the key research issues for crop disease segmentation. Meanwhile, aiming at the problem of low segmentation accuracy with traditional convolution neural network-based methods in the crop disease leaf image, this paper proposes a spatial pyramid-oriented encoder-decoder cascade convolution neural network for crop disease leaf segmentation. The network consists of a region disease detection network and a region disease segmentation network. Region disease detection network is a kind of network combining cascade convolution neural network with spatial pyramid. This method connects the three-level convolution neural network model, where the structure of the three-level neural network model varies from simple to complex. Different crop disease leaf features are extracted from the different neural network levels. And images are screened to complete the detection of crop disease leaf. What’s more, a space pyramid pooling layer is added to each network level. This pooling strategy does not require fixed size input, which increases the size selection of input model. The region segmentation network is established based on the Encoder-Decoder structure. The multi-scale convolution kernel is used to improve the local receptive field of the original convolution kernel and accurately segment the crop disease leaf area. Finally, we conduct experiments on the crop disease leaf images under different conditions, the results show that the proposed method has higher segmentation accuracy. In terms of Precision, Correct segmentation, over-segmentation and under-segmentation indexes, etc., the average values of proposed method are more than 90%. The average dice similarity coefficient is over 95% under different background. Moreover, it can meticulously reflect the external morphological features of the crop disease leaf and relatively better retain the color and texture information.
Published: 2021

12. Nonfrontal and Asymmetrical Facial Expression Recognition Through Half-Face Frontalization and Pyramid Fourier Frequency Conversion

Author: Tianyang Cao, Jiamin Chen, Li Gao, and Chang Liu
Subjects: General Computer Science, Computer science, Frequency band, Feature extraction, 02 engineering and technology, 01 natural sciences, Facial recognition system, Discrete Fourier transform, Convolution, symbols.namesake, Fourier frequency conversion, Band-pass filter, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Expression recognition, General Materials Science, Pyramid (image processing), frequency band, nonfrontal face, discrete Fourier transform, business.industry, asymmetrical expression, 010401 analytical chemistry, General Engineering, food and beverages, Pattern recognition, 0104 chemical sciences, Fourier transform, Face (geometry), symbols, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971
Abstract: Expression recognition in the wild is easily distorted by nonfrontal and asymmetry faces. In nonfrontal faces, some areas are compressed and distorted. Even after frontalization, these compressed areas may still be blurred and distort expression recognition. Additionally, asymmetrical expressions are common on half or local face areas and produce incorrect expression features. Therefore, this paper presents a half-face frontalization and pyramid Fourier frequency conversion method. Despite the location, range and intensity of incorrect expressions in nonfrontal faces being unknown, according to discrete Fourier transform, it can be proven that the frequency band of the correct expression is much larger than that of incorrect expression on the same face. This can be taken advantage of by pyramid frequency conversion, which is designed based on Fourier frequency conversion. It can adjust incorrect expression frequency in multiscales to take them out off the band-pass of the convolution operations of deep learning and be eliminated completely, whereas correct expression information is reserved. Thus, expressions can be recognized effectively.
Published: 2021

13. Multi-Region Two-Stream Deep Architecture for Visual Power Monitoring Systems

Author: Wu Peng, Zhang Guoliang, Ziwen Zhang, Wei Jiang, Gan Jinrui, and Zhao Ting
Subjects: Scheme (programming language), General Computer Science, Computer science, media_common.quotation_subject, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Abnormal judgement, power systems, Discriminative model, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Quality (business), Pyramid (image processing), computer.programming_language, media_common, two-stream scheme, business.industry, Deep learning, 020208 electrical & electronic engineering, General Engineering, deep learning, Pattern recognition, Identification (information), 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, computer, Cropping, region fusion, lcsh:TK1-9971
Abstract: Judging imaging quality is an important part of the maintenance of visual intelligent monitoring systems for electrical power scenes. However, accurate and efficient identification of possible abnormalities in imaging quality remains challenging. This paper proposes a novel multi-region two-stream deep architecture to improve judging abnormalities. The proposed architecture incorporates two-stream scheme and multi-region strategy to identify relevant information and explore hidden details. More specifically, in addition to color and intensity in the original images, the two-stream scheme uses high-frequency structure information from gradient images to enhance its performance. The multi-region strategy employs spatial pyramid random cropping and region fusion to handle locally non-uniform changes among categories: spatial pyramid random cropping characterizes images at different spatial pyramid levels, while region fusion focuses attention on cropped regions relevant to quality perception by using adaptive learning weights in a fully connected layer. In this way, the proposed strategy guides the framework to adequately and adaptively explore the discriminative regions hidden in the input images, and provides an end-to-end learning procedure. Experimental results demonstrate its strong performance for judging abnormalities, and the proposed method can be easily extended to the entire surveillance system.
Published: 2021

14. Semi-Supervised Dim and Small Infrared Ship Detection Network Based on Haar Wavelet

Author: Shiqiang Wang, Dongfang Zhang, Zizhuang Song, Zheng Li, and Jiawei Yang
Subjects: General Computer Science, Computer science, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, dim and small infrared ship, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolution, self-training, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Pyramid (image processing), 0105 earth and related environmental sciences, business.industry, Deep learning, General Engineering, Haar wavelet, Pattern recognition, Object detection, Feature (computer vision), 020201 artificial intelligence & image processing, Artificial intelligence, feature map enhancement, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971
Abstract: Traditional deep learning detection network has poor effect on the detection of infrared dim and small targets on the sea in the case of interference or bad weather. In this paper, an improved dim and small infrared ship detection network based on Haar wavelet is proposed. The HaarConv module is designed based on the high-frequency features obtained by Haar wavelet decomposition, which further increases the feature extraction ability of the backbone network for small targets. Meanwhile, the HaarUp-HaarDown module is designed by using Haar forward and inverse transform, replacing the up-sampling layer of the feature pyramid network and the down-sampling layer of backbone network to retain smaller target features. Furthermore, the pseudo-label-based method enables the network to conduct semi-supervised learning, which reduces the labeling cost and improves detection accuracy while expanding the amount of training data. The above method is applied to the YOLOv5-s lightweight network and 11278 infrared images (3352 labeled) of dim and small ships are collected as a dataset. The results show that the introduction of semi-supervised training method effectively expands the training dataset, and the mAP@.5:.95 increases by 23.5%. The proposed Haar wavelet improvement method can effectively improve the detection accuracy of dim and small infrared ship targets by more than 2%, and the number of parameters increases by only about 0.02M. Compared with existing methods, the proposed method reaches the state-of-the-art result and has good generalization performance.
Published: 2021

15. Spreading Dynamics of SHIPR Pyramid Scheme Model on Scale-Free Networks

Author: Bingchuan Xue, Siwei Zhang, Xin-Ming Cheng, Tao Li, and Gaojun Shi
Subjects: Lyapunov function, Mathematical optimization, government management, General Computer Science, Mathematical model, SHIPR model, Scale-free network, General Engineering, Order (ring theory), Topology (electrical circuits), scale-free networks, TK1-9971, symbols.namesake, Pyramid schemes, Transmission (telecommunications), Exponential stability, Pyramid, symbols, General Materials Science, Electrical engineering. Electronics. Nuclear engineering, global attractivity
Abstract: Nowadays, pyramid schemes have caused extremely negative effects on people’s lives and seriously damaged the social economy. With the rapid development of network and communication technology, people’s direct or indirect social interaction is more frequent, which makes the phenomenon of pyramid schemes more serious. Therefore, it is necessary to study transmission mechanisms and transmission rules of pyramid schemes. In order to study the influence of government management and social interaction topology on the spreading of pyramid schemes, a novel SHIPR (susceptible-hesitator-involved-punished-resister) pyramid scheme spreading model is proposed on scale-free networks. The spreading dynamics of pyramid schemes are analyzed in detail by mean-field theory. Then, the basic reproduction number $R_{0}$ and equilibria are got. Theoretical analysis shows that the basic reproduction number $R_{0}$ has a great correlation with government crackdown intensity for involved individuals, the coverage rate of government anti-pyramid scheme publicity for susceptible individuals and hesitator, and social interaction topology. Furthermore, the local asymptotic stability of fraud-elimination equilibrium is analyzed based on the Routh-Hurwitz criterion, the global asymptotic stability of the fraud-elimination equilibrium is discussed by the Lyapunov function, the global attractivity of fraud-prevailing equilibrium is proved in detail by comparison principle. Finally, numerical simulations verify the theoretical analysis results.
Published: 2021

16. Anchor-Free Single Stage Detector in Remote Sensing Images Based on Multiscale Dense Path Aggregation Feature Pyramid Network

Author: Licheng Jiao, Yangyang Li, Qin Huang, Xuan Pei, Naresh Marturi, and Ronghua Shang
Subjects: General Computer Science, Computer science, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Redundancy (engineering), General Materials Science, Pyramid (image processing), 0105 earth and related environmental sciences, Remote sensing, business.industry, Deep learning, Detector, General Engineering, deep learning, object detection, Object (computer science), Object detection, Remote sensing (archaeology), Feature (computer vision), 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971, anchor-free
Abstract: Object detection has always been a challenging task in the field of computer vision due to complex background, large scale variation and many small objects, which are especially pronounced for remote sensing imagery. In recent years, object detection in remote sensing with the development of deep learning has also made great breakthroughs. At present, almost all state-of-the-art object detectors rely on pre-defined anchor boxes for remote sensing imagery. However, too many anchor boxes will introduce a large number of hyper-parameters, which not only increase the memory footprint, but also increase the computational redundancy of the detection model. In contrast, we propose an anchor-free single-stage detector for remote sensing imagery object detection, avoiding a large number of hyper-parameters related to the anchor box, which usually affect the performance of the detection model. Specially, considering the large-scale differences in the objects and the characteristics of small objects in remote sensing imagery, we design a dense path aggregation feature pyramid network (DPAFPN), which can make full use of the high-level semantic information and low-level location information in remote sensing imagery, and to a certain extent, avoid information loss during shallow feature transfer. In our experiments, extensive experiments on two public remote sensing datasets DOTA, NWPU VHR-10 were conducted. The experimental results demonstrate that our detector has good performance and is meaningful for object detection in remote sensing imagery.
Published: 2020

17. DEFace: Deep Efficient Face Network for Small Scale Variations

Author: Junghyun Cho, Ig-Jae Kim, Gi Pyo Nam, and Toan Minh Hoang
Subjects: General Computer Science, Pixel, context module, Computer science, business.industry, single-shot convolutional module, Feature extraction, General Engineering, Pattern recognition, 02 engineering and technology, Small face detection, 010501 environmental sciences, 01 natural sciences, Object detection, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, General Materials Science, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, Face detection, business, lcsh:TK1-9971, 0105 earth and related environmental sciences
Abstract: This study proposes a novel face detector called DEFace that focuses on the challenging tasks of face detection to cope with a small size that is under 12 pixels and occlusions due to a mask or human body parts. This study proposed the extended feature pyramid network (FPN) module to detect small faces by expanding the range of P layer, and the network by adding a receptive context module (RCM) after each predicted feature head from the top-down pathway in the FPN architecture to enhance the feature discriminability and the robustness. Based on the FPN principle, the combination between the low- and high-resolutions are beneficial for object detection with different object sizes. Furthermore, with assistance from the RCM, the proposed method can use a broad range of context information especially for small faces. To evaluate the performance of the proposed method, various public face datasets are used such as the WIDER Face dataset, the face detection dataset and benchmark (FDDB), and the masked faces (MAFA) dataset, which consist of challenging samples such as small face regions and occlusions by hair or other people. The results indicate that DEFace can detect the face region more accurately in comparison to the other state-of-the-art methods while maintaining the processing time.
Published: 2020

18. Video Object Detection With Two-Path Convolutional LSTM Pyramid

Author: Chen Zhang and Joohee Kim
Subjects: video object detection, General Computer Science, Scale (ratio), Computer science, business.industry, Detector, General Engineering, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Deep learning, Object (computer science), Object detection, Motion (physics), Feature (computer vision), Pyramid, Path (graph theory), ConvLSTM, convolutional neural networks, General Materials Science, Computer vision, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971
Abstract: One of the major challenges in video object detection is drastic scale changes of objects due to camera motion. In this paper, we propose a two-path Convolutional Long Short-Term Memory (convLSTM) pyramid network designed to extract and convey multi-scale temporal contextual information in order to handle object scale changes efficiently. The proposed two-path convLSTM pyramid consists of a stack of multi-input convLSTM modules. It is updated in top-down and bottom-up pathways so that the temporal contextual information for small-to-large and large-to-small scale changes is exploited. The proposed multi-input convLSTM module uses two input feature maps of different resolutions to store and exchange temporal contextual information of different scales between neighboring convLSTM modules. The outputs of the proposed convLSTM pyramid network constitute a feature pyramid where each feature map contains multi-scale temporal contextual information from earlier frames. The proposed convLSTM pyramid can be combined with various still-image object detectors to improve the performance of video object detection. Experimental results on ImageNet VID dataset show that the proposed method achieves state-of-the-art performance and can handle scale changes efficiently in video object detection.
Published: 2020

19. Multi-Instance Learning Algorithm Based on LSTM for Chinese Painting Image Classification

Author: Yue Zhang and Daxiang Li
Subjects: General Computer Science, Computer science, discriminative instance set, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Attention mechanism, 02 engineering and technology, Discriminative model, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Hidden Markov model, Chinese painting image classification, Painting, Artificial neural network, Contextual image classification, long and short term memory, Supervised learning, General Engineering, 020206 networking & telecommunications, Statistical classification, ComputingMethodologies_PATTERNRECOGNITION, Softmax function, 020201 artificial intelligence & image processing, multi-instance learning, lcsh:Electrical engineering. Electronics. Nuclear engineering, Algorithm, lcsh:TK1-9971
Abstract: Aiming at the problem of weakly supervised learning in traditional Chinese painting image classification, a novel multi-instance learning algorithm based on Long and Short-Term Memory neural network with attention mechanism (ALSTM-MIL) is proposed. Firstly, by using the Pyramid Overlapping Grid Division (POGP), a multi-instance modeling scheme is designed to convert Chinese painting images into multi-instance bag, thereby transforming the problem of Chinese painting image classification into a MIL problem. Secondly, an efficient sequence generator is designed. It selects discriminative instances from the positive bags, construct a discriminative instance set (DIS), and convert multi-instance bags into equal-length ordered sequences. Thirdly, an LSTM network model with an attention mechanism is designed to perform semantic analysis on multi-instance bags to obtain their memory coding features, and then combined with the Softmax classifier to achieve semantic classification of traditional Chinese painting images. Experimental results on the Chinese painting (CP) image set show that the LSTM network built on the visual feature set is feasible, and the performance of the proposed MIL algorithm is also superior to other classification algorithms.
Published: 2020

20. MAOD: An Efficient Anchor-Free Object Detector Based on MobileDet

Author: Hao Shen and Dong Chen
Subjects: 0209 industrial biotechnology, General Computer Science, Computer science, Detector, Feature extraction, General Engineering, lightweight feature pyramid, 02 engineering and technology, Object (computer science), MobileDet backbone, Lightweight real-time detector, 020901 industrial engineering & automation, anchor-free object detection, Feature (computer vision), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, General Materials Science, Pyramid (image processing), Free object, lcsh:Electrical engineering. Electronics. Nuclear engineering, Algorithm, lcsh:TK1-9971
Abstract: For real-time object detectors, accuracy and efficiency are two important considerations. In this paper, we propose a lightweight anchor-free detector, MAOD, to better balance efficiency and accuracy. Our object detector contains three components: an efficient backbone network (MobileDet), a lightweight feature pyramid structure (L-FPN) and an anchor-free per-pixel prediction method. MobileDet and L-FPN provide more accurate and faster multi-scale feature extraction. Our anchor-free per-pixel prediction method achieves efficient classification and location regression tasks. On the benchmark MS-COCO dataset, MAOD achieves 46.1% AP at the speed of 68 FPS with the input size $512\times512$ . When the input size is $800\times800$ , MAOD achieves 47.1% AP at the speed of 43 FPS. The fast version of MAOD ( $320\times320$ input size) can run at 91 FPS with 43.3% AP. Compared with other state-of-the-art object detectors, our detector has similar accuracy while maintaining extremely fast inference speed. MAOD achieves an optimal efficiency-accuracy tradeoff.
Published: 2020

21. Guided Depth Map Super-Resolution Using Recumbent Y Network

Author: Tao Li, Hongwei Lin, and Xiucheng Dong
Subjects: UNet network, General Computer Science, Computer science, business.industry, General Engineering, convolutional neural network, Depth map super-resolution, Superresolution, Depth map, Feature (computer vision), Pyramid, atrous spatial pyramid pooling, General Materials Science, Computer vision, Pyramid (image processing), Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, Scale (map), business, attention mechanism, Image resolution, lcsh:TK1-9971
Abstract: Low spatial resolution is a well-known problem for depth maps captured by low-cost consumer depth cameras. Depth map super-resolution (SR) can be used to enhance the resolution and improve the quality of depth maps. In this paper, we propose a recumbent Y network (RYNet) to integrate the depth information and intensity information for depth map SR. Specifically, we introduce two weight-shared encoders to respectively learn multi-scale depth and intensity features, and a single decoder to gradually fuse depth information and intensity information for reconstruction. We also design a residual channel attention based atrous spatial pyramid pooling structure to further enrich the feature's scale diversity and exploit the correlations between multi-scale feature channels. Furthermore, the violations of co-occurrence assumption between depth discontinuities and intensity edges will generate texture-transfer and depth-bleeding artifacts. Thus, we propose a spatial attention mechanism to mitigate the artifacts by adaptively learning the spatial relevance between intensity features and depth features and reweighting the intensity features before fusion. Experimental results demonstrate the superiority of the proposed RYNet over several state-of-the-art depth map SR methods.
Published: 2020

22. Pyramid With Super Resolution for In-the-Wild Facial Expression Recognition

Author: Hyung-Jeong Yang, Thanh-Hung Vo, Soo-Hyung Kim, and Guee-Sang Lee
Subjects: General Computer Science, Computer science, business.industry, General Engineering, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Function (mathematics), Expression (mathematics), Task (project management), Image (mathematics), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, human computer interaction, 020201 artificial intelligence & image processing, General Materials Science, Pyramid (image processing), Artificial intelligence, Emotion recognition, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971, Smoothing, image resolution
Abstract: Facial Expression Recognition (FER) is a challenging task that improves natural human-computer interaction. This paper focuses on automatic FER on a single in-the-wild (ITW) image. ITW images suffer real problems of pose, direction, and input resolution. In this study, we propose a pyramid with super-resolution (PSR) network architecture to solve the ITW FER task. We also introduce a prior distribution label smoothing (PDLS) loss function that applies the additional prior knowledge of the confusion about each expression in the FER task. Experiments on the three most popular ITW FER datasets showed that our approach outperforms all the state-of-the-art methods.
Published: 2020

23. PSPNet-SLAM: A Semantic SLAM Detect Dynamic Object by Pyramid Scene Parsing Network

Author: Weiwei Zhang, Bo Zhao, and Xudong Long
Subjects: 0209 industrial biotechnology, General Computer Science, Computer science, Feature extraction, 02 engineering and technology, Simultaneous localization and mapping, computer.software_genre, OCMulti-view geometry, 020901 industrial engineering & automation, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Computer vision, dynamic, Parsing, business.industry, General Engineering, Ant colony, semantic, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, computer, PSPNet-SLAM, lcsh:TK1-9971
Abstract: Simultaneous Localization and Mapping (SLAM) plays an important role in the computer vision and robotic field. The traditional SLAM framework adopts a strong static world assumption for convenience of analysis. It is very essential to know how to deal with the dynamic environment in the entire industry with widespread attention. Faced with these challenges, researchers consider introducing semantic information to collaboratively solve dynamic objects in the scene. So, in this paper, we proposed a PSPNet-SLAM: Pyramid Scene Parsing Network SLAM, which integrated the Semantic thread of pyramid structure and geometric threads of reverse ant colony search strategy into ORB-SLAM2. In the proposed system, a pyramid-structured PSPNet was used for semantic thread to segment dynamic objects in combination with context information. In the geometric thread, we proposed a OCMulti-View Geometry thread. On the one hand, the optimal error compensation homography matrix was designed to improve the accuracy of dynamic point detection. On the other hand, we came up with a reverse ant colony collection strategy to enhance the real-time performance of the system and reduce its time consumption during the detection of dynamic objects. We have evaluated our SLAM in public data sheets and real-time world and compared it with ORB-SLAM2, DynaSLAM. Many improvements have been achieved in this system including location accuracy in high-dynamic scenarios, which also outperformed the other four state-of-the-art SLAM systems coping with the dynamic environments. The real-time performance has been delivered, compared with the geometric thread of the excellent DynaSALM system.
Published: 2020

24. Small-Object Detection in UAV-Captured Images via Multi-Branch Parallel Feature Pyramid Networks

Author: Peng Hu, Fengbao Yang, and Yingjie Liu
Subjects: 010504 meteorology & atmospheric sciences, General Computer Science, Computer science, Feature extraction, 02 engineering and technology, 01 natural sciences, Background noise, multi-branch parallel feature pyramid networks (MPFPN), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, feature fusion, General Materials Science, Computer vision, Pyramid (image processing), 0105 earth and related environmental sciences, Pixel, business.industry, General Engineering, object detection, Unmanned aerial vehicle, cascade architecture, Object (computer science), Object detection, Feature (computer vision), 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, Focus (optics), lcsh:TK1-9971
Abstract: Small object is one of the primary challenges in the field of object detection, which is notably pronounced to the detection in the images from Unmanned Aerial Vehicles (UAV). Existing detectors based on deep-learning methods usually apply the feature extraction networks with a large down-sampling factor to obtain higher-level features. However, such big stride tends to make the feature information of small objects become the little point or even vanish in the low-resolution feature maps due to the limitation of pixels. Therefore, a novel structure called Multi-branch Parallel Feature Pyramid Networks (MPFPN) is proposed in this article, which aims to extract more abundant feature information of the objects with a small size. Specifically, the parallel branch is designed to recover the features that missed in the deeper layers. Meanwhile, a supervised spatial attention module (SSAM) is applied to weaken the impact of background noise inference and focus object information. Furthermore, we adopt cascade architecture in the Fast R-CNN stage for a more powerful localization capability. Experiments on the public drone-based datasets named VisDrone-DET demonstrate that our method achieves competitive performance compared with other state-of-the-art detection algorithms.
Published: 2020

25. Image Denoising and Ring Artifacts Removal for Spectral CT via Deep Neural Network

Author: Chengyu Fan, Xiaodong Guo, Mi Zhou, Biao Wei, Peng He, Zourong Long, Xuezhi Ren, Xiaojie Lv, and Peng Feng
Subjects: General Computer Science, Computer science, image denoising, Feature extraction, 02 engineering and technology, Iterative reconstruction, 030218 nuclear medicine & medical imaging, Convolution, 03 medical and health sciences, Spectral CT, 0302 clinical medicine, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Computer vision, Pyramid (image processing), ring artifacts removal, business.industry, Detector, General Engineering, deep learning, Photon counting, Data set, Noise, Computer Science::Computer Vision and Pattern Recognition, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971, Energy (signal processing)
Abstract: The spectral computed tomography (CT) based on photon counting detectors can collect the incident photons with different energy ranges. However, due to the low photon counts in narrow energy bin and the unhomogeneous response problem of detector cells, there are severe noise and ring artifacts in reconstructed spectral CT images. We proposed an image denoising and ring artifacts removal method via improved Fully Convolutional Pyramid Residual Network (FCPRN). In our study, we scanned a mouse specimen with spectral CT based on photon counting detector, and reconstructed mouse CT images as data set. Then we use the data set to train our network for image denoising and ring artifacts removal. Experimental results demonstrated that the proposed method could reduce noise and suppress ring artifacts of spectral CT images concurrently in different energy ranges. And the performance of the FCPRN is better than that of some networks for CT image denoising.
Published: 2020

26. Filling the Gaps in Atrous Convolution: Semantic Segmentation With a Better Context

Author: Salman Khan, Liyuan Liu, Ling Shao, Yanwei Pang, Syed Waqas Zamir, and Fahad Shahbaz Khan
Subjects: General Computer Science, Computer science, 02 engineering and technology, computer.software_genre, supervised learning, Convolution, Image processing, 020204 information systems, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Segmentation, Context model, Parsing, business.industry, General Engineering, Pattern recognition, Image segmentation, neural networks, semantic segmentation, Kernel (image processing), 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, computer, lcsh:TK1-9971
Abstract: The main challenge for scene parsing arises when complex scenes with highly diverse objects are encountered. The objects not only differ in scale and appearance but also in semantics. Previous works focus on encoding the multi-scale contextual information (via pooling or atrous convolutions) generally on top of compact high-level features (i.e., at a single stage). In this work, we argue that a rich set of cues exist at multiple stages of the network, encapsulating low, mid and high-level scene details. Therefore, an optimal scene parsing model must aggregate multi-scale context at all three levels of the feature hierarchy; a capability that lacks in state-of-the-art scene parsing models. To address this limitation, we introduce a novel architecture with three new blocks that systematically aggregate low, mid and high tier features. The heart of our approach is a high-level feature aggregation module that augments sparsely connected atrous convolution with dense local and layer-wise connections to avoid gridding artifacts. Besides, we employ a novel feature pyramid augmentation and semantic refinement unit to generate low- and mid-level features that are mixed with high-level features at the decoder. We extensively evaluate our proposed approach on the large-scale Cityscapes and ADE2K benchmarks. Our approach surpasses many latest models on both datasets, achieving mean intersection-over-union (mIoU) scores of 80.5% and 44.0% on Cityscapes and ADE20K, respectively.
Published: 2020

27. Wavelet Based Deep Recursive Pyramid Convolution Residual Network for Single Image Rain Removal

Author: Tian Tian Gong and Jun Sheng Wang
Subjects: Discrete wavelet transform, General Computer Science, Computer science, Low-pass filter, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Residual, Convolutional neural network, residual coefficients, Convolution, Wavelet, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, low model parameters, General Materials Science, Pyramid (image processing), wavelet transform, Artificial neural network, business.industry, 020208 electrical & electronic engineering, General Engineering, Pattern recognition, Image rain removal, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971
Abstract: Image rain removal aims to separate the background image from the rainy image. During the past three years, the image rain removal with deep convolutional neural networks has achieved impressive performance. However, how to reach tradeoff between high de-raining performance and low model parameters is still a challenge. To address the issue, the paper is devoted to exploring a novel method based on wavelet deep recursive pyramid convolution residual network (WDRPRN), in which discrete wavelet transform is embedded to decompose the rainy image in different frequency domains, and the deep recursive pyramid convolution residual network (DRPRN) can well predict the residual coefficients between rainy image and clean image. In addition, compared with other neural networks, the DRPRN adopts recursive model that can cost fewer parameters. Plentiful of experiments on synthetic and real-world datasets show that the proposed method is significantly superior to the recent state-of-the-art algorithms.
Published: 2020

28. Dynamic Downscaling Segmentation for Noisy, Low-Contrast in Situ Underwater Plankton Images

Author: Cheng Xuemin, Hongsheng Bi, and Cheng Kaichang
Subjects: In situ, gradient clustering, General Computer Science, business.industry, Computer science, segmentation, General Engineering, Pattern recognition, Background noise, in~situ underwater plankton image, Pyramid, Entropy (information theory), General Materials Science, Segmentation, Artificial intelligence, Dynamic downscaling, region of interest, lcsh:Electrical engineering. Electronics. Nuclear engineering, Underwater, scale pyramid space, Cluster analysis, business, Image resolution, lcsh:TK1-9971, Downscaling
Abstract: Finding and segmenting objects in noisy low-contrast in situ underwater plankton images is challenging because of the difficulty of separating potential plankton objects from the complex background and numerous and diverse other particles. In the present study, a dynamic downscaling model was developed to rapidly extract complete and clean regions of interest (ROIs) from images with highly variable content and quality. The original image was downscaled, and dynamic segmentation was performed in a scale pyramid space to ensure the integrity of weak targets based on local two-dimensional (2D) entropy parameters. Subsequently, a series of local thresholds and clustering gradients was examined iteratively for ROI selection. The performance of the local 2D entropy parameters relative to water turbidity (in a scattering medium and with high background noise) and image size was examined. To suppress the background and increase the sharpness of potential targets, a sharpness descriptor and gradient clustering were employed. The method was compared with the currently commonly used local threshold-based Sauvola segmentation using the same set of images. The results showed that the proposed method improves the ROI extraction accuracy and reduces oversegmentation for in situ underwater plankton images. It was concluded that the proposed method is a fast and robust segmentation technique and could facilitate the deployment of in situ plankton imaging systems for process-based research and routine plankton monitoring.
Published: 2020

29. Insulator Defect Recognition Based on Global Detection and Local Segmentation

Author: Li Xuefeng, Su Hansong, and Gaohua Liu
Subjects: 0209 industrial biotechnology, General Computer Science, Channel (digital image), Computer science, Feature extraction, 02 engineering and technology, Insulator defect detection, 020901 industrial engineering & automation, Robustness (computer science), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Pyramid (image processing), Pixel, business.industry, General Engineering, deep learning, Pattern recognition, object detection, semantic segmentation, Cross entropy, Feature (computer vision), 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, Precision and recall, business, Encoder, lcsh:TK1-9971
Abstract: Locating the tiny insulator defect object with complex backgrounds in high-resolution aerial images is a challenging task. In this paper, we propose a novel method which cascades detection and segmentation networks to identify the defect from the global and local two levels: (1) The improved Faster R-CNN is carried out to capture both defects and insulators in the entire image. ResNeXt-101 is adopted as the feature extraction network so as to fully extract features, and Feature Pyramid Network (FPN) is built to enhance the ability of detecting small targets. In addition, the Online Hard Example Mining (OHEM) training strategy is applied to solve the imbalance problem of positive and negative samples. (2) All the detected insulators are extracted and fed into the improved U-Net network to futher inspect at pixel level, we utilize the pre-trained ResNeXt-50 as the encoder of U-Net, incorporate an attention module, Spatial and Channel Squeeze & Excitation Block (SCSE), into the decoding path to highlight the meaningful information. A hybrid loss which merges binary cross entropy (BCE) loss and dice coefficient loss is designed to train our network for figuring out the class imbalance issue. The missed detection can be greatly reduced with the combination of two modified network, which makes comprehensive use of the original map information and local information. On the test set of actual images, the insulator defect recognition precision and recall of the cascade network is 91.9% and 95.7%, exhibiting strong robustness and accuracy.
Published: 2020

30. Detection and Classification of Multi-Magnetic Targets Using Mask-RCNN

Author: Zhijian Zhou, Xuqing Wu, Meng Zhang, and Jiefu Chen
Subjects: shapes, General Computer Science, Computer science, Feature extraction, 010502 geochemistry & geophysics, 01 natural sciences, Robustness (computer science), 0103 physical sciences, Pyramid, Mask-RCNN, General Materials Science, Tensor, 0105 earth and related environmental sciences, 010302 applied physics, Training set, business.industry, Deep learning, General Engineering, LabelMe, Pattern recognition, Magnetic targets, Magnetostatics, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, recognition, business, lcsh:TK1-9971
Abstract: To detect the shape of a small magnetic target in the shallow underground layer, this article proposes a recognition method based on Mask-RCNN. Firstly, using COMSOL software and MATLAB software to establish the database of magnetic targets model under different shapes and orientations, which greatly enriched the diversity of the training data set. Then, the ${G}_{\mathrm {zz}}$ component of the magnetic gradient tensor matrix is selected to highlight the shape features of the magnetic target, and the contour image is generated. The experimental data set is created by using the deep learning annotation tool Labelme. Finally, Resnet101 is used as the backbone network and feature pyramid network (FPN) structure is used to extract features. The regional recommendation network (RPN) is trained end-to-end to create regional recommendations for each feature map. The detection results of 200 test images show that the average detection accuracy of the method is 97%, and the recall rate is 94%. The simulation results show that the recognition accuracy and robustness of the method are improved.
Published: 2020

31. Data-Driven Based Tiny-YOLOv3 Method for Front Vehicle Detection Inducing SPP-Net

Author: Cao Jiaqi, Wang Xiaolan, Wang Yansong, and Wang Shuo
Subjects: 010504 meteorology & atmospheric sciences, General Computer Science, Computer science, k-means, Feature extraction, convolutional neural network, Context (language use), 010501 environmental sciences, 01 natural sciences, Data-driven, Pyramid, General Materials Science, Computer vision, Pyramid (image processing), Cluster analysis, 0105 earth and related environmental sciences, business.industry, spatial pyramid pooling, General Engineering, Feature (computer vision), vehicle detection, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, Scale (map), business, lcsh:TK1-9971
Abstract: In order to solve the problem of low recognition rate and low real-time performance of vehicle detection in complex road environment, a data-driven forward vehicle detection algorithm based on improved tiny-YOLOv3 is proposed. Based on tiny-YOLOv3, the context feature information is combined to increase the two scale detections of tiny-YOLOv3 to three. The spatial pyramid pooling (SPP) module is added to increase the number of feature channels to improve the network feature extraction ability. According to the dense arrangement of vehicles on the horizontal axis in the road image ahead, we change the grid size of tiny-YOLOv3 and increase the number of candidate boxes on the horizontal axis. In addition, combined with the characteristics of the vehicle size in the road image ahead, K-means clustering method is used to select the appropriate number and size of target candidate boxes. We obtain the optimal detection model by multi-scale training of the improved network. The experimental results show that the average accuracy of the improved algorithm on the KITTI datasets is 91.03%, which is 7.12% higher than that of tiny-YOLOv3. And the detection speed of improved network is 144 frames/s, which meets the real-time requirements.
Published: 2020

32. Research on Recognition of Fly Species Based on Improved RetinaNet and CBAM

Author: Yantong Chen, Junsheng Wang, Weinan Chen, Yuyang Li, and Xianzhong Zhang
Subjects: General Computer Science, Computer science, business.industry, Feature extraction, General Engineering, convolutional neural network, Pattern recognition, RetinaNet algorithm, Function (mathematics), attention convolution module, Minimum bounding box, Pyramid, Feature (machine learning), General Materials Science, Pyramid (image processing), Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, Fly species recognition, lcsh:TK1-9971, Block (data storage)
Abstract: Flies carry pathogens that endanger the health of humans and animals. The color and shape of the fly species are very similar, which is difficult to recognize. This paper proposes a fly species recognition method based on improved RetinaNet and convolutional block attention module (CBAM). Firstly, the proposed method used ResNeXt101 as a feature extraction network, and the improved CBAM called Stochastic-CBAM was added. Then, we built a multi-scale feature pyramid through an improved feature pyramid network (FPN) and integrated multi-level feature information. Finally, the small full convolutional network (FCN) was used as the classification subnet and the bounding box regression subnet. The Kullback-Leibler (KL) loss replaced smooth L1 loss as a bounding box regression loss function for learning bounding box regression and positioning uncertainty at the same time. We experimentally compared the proposed method with other the state-of-the-art methods on the established dataset. Experimental results showed that the mean Average Precision (mAP) of this method reached 90.38%, which was better than the state-of-the-art methods. The average time to recognize a single image was 0.131s. This method has a good detection effect on the fly species recognition.
Published: 2020

33. Vehicle-Damage-Detection Segmentation Algorithm Based on Improved Mask RCNN

Author: Qinghui Zhang, Xianing Chang, and Shanfeng Bian Bian
Subjects: General Computer Science, Computer science, Feature extraction, vehicle-damage-detection, Bilinear interpolation, 02 engineering and technology, 010501 environmental sciences, Residual, 01 natural sciences, Convolutional neural network, Residual neural network, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Segmentation, Mask RCNN, Spatial analysis, 0105 earth and related environmental sciences, detection accuracy, General Engineering, LabelMe, loss function, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Algorithm, lcsh:TK1-9971
Abstract: Traffic congestion due to vehicular accidents seriously affects normal travel, and accurate and effective mitigating measures and methods must be studied. To resolve traffic accident compensation problems quickly, a vehicle-damage-detection segmentation algorithm based on transfer learning and an improved mask regional convolutional neural network (Mask RCNN) is proposed in this paper. The experiment first collects car damage pictures for preprocessing and uses Labelme to make data set labels, which are divided into training sets and test sets. The residual network (ResNet) is optimized, and feature extraction is performed in combination with Feature Pyramid Network (FPN). Then, the proportion and threshold of the Anchor in the region proposal network (RPN) are adjusted. The spatial information of the feature map is preserved by bilinear interpolation in ROIAlign, and different weights are introduced in the loss function for different-scale targets. Finally, the results of self-made dedicated dataset training and testing show that the improved Mask RCNN has better Average Precision (AP) value, detection accuracy and masking accuracy, and improves the efficiency of solving traffic accident compensation problems.
Published: 2020

34. Ultrasound Image Segmentation Method for Thyroid Nodules Using ASPP Fusion Features

Author: Yating Wu, Xueliang Shen, Feng Bu, and Jin Tian
Subjects: Thyroid nodules, General Computer Science, Computer science, 02 engineering and technology, 030218 nuclear medicine & medical imaging, Convolution, 03 medical and health sciences, 0302 clinical medicine, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, medicine, Medical imaging, General Materials Science, Segmentation, Spatial analysis, Thyroid nodule, medical image segmentation, business.industry, Thyroid, General Engineering, Pattern recognition, Image segmentation, medicine.disease, dilated convolution, medicine.anatomical_structure, Kernel (image processing), Ultrasound imaging, atrous spatial pyramid pooling, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, ultrasound image, lcsh:TK1-9971
Abstract: Ultrasound imaging technology plays an important role to assist doctors in diagnosing thyroid nodules. The tissue structure around the thyroid is very complex, which makes it difficult to segment and extract the ultrasound image of thyroid nodules accurately. For address this problem, this paper proposes a model algorithm for thyroid nodule ultrasound image segmentation using ASPP fusion features. First, spatial pyramid pooling and depthwise separable convolution are combined in order to solve the problem that the size of the mapping feature will change in the process of better capturing the context information. Besides, Atrous Spatial Pyramid Pooling (ASPP) is proposed to achieve the purpose of processing input image channel and spatial information separately. In order to appropriately reduce the dimension and size of feature images, a $1\times 1$ convolution operation is performed before each convolution calculation, and the model size is optimized. In the decoding stage, decoder module appropriately adjusts the feature map with a relatively low resolution previously from decoder module, and sets the output channel number of two convolutions to the same value. All features have the same dimension by adjustment, and features can be fused by element-wise summation. Finally, Dice Similarity Coefficient (DSC), Prevent Match (PM) and Correspondence Patio (CR) are used as evaluation criteria to compare with other model algorithms. The experimental results show that the proposed model can significantly improve the segmentation effect of ultrasound images for thyroid nodules compared with traditional models.
Published: 2020

35. Efficient Feature Recombining Network Based on Refining Multi-Level Feature Maps for Semantic Segmentation

Author: Luan Zhao and Xiaofeng Zhang
Subjects: General Computer Science, Computer science, Feature extraction, Context (language use), 02 engineering and technology, Convolutional neural network, modified pyramid pooling, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Segmentation, Pyramid (image processing), Layer (object-oriented design), context information, Feature recombining, business.industry, General Engineering, Pattern recognition, Image segmentation, semantic segmentation, Feature (computer vision), 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971
Abstract: Modern approaches for semantic segmentation usually concatenate the feature map of the last convolutional layer and multi-scale features to form the final feature representation, which can achieve the more accurate classification of target pixels for the input image. However, the feature information of the last layer is not complete and refined so that there is a performance bottleneck in the concatenation between the final feature map and multi-scale feature representations. To solve this problem, we propose the Feature Recombining Network to get more refined and precise features for Semantic Segmentation. Our network is composed of Feature Recombining Module and Modified Pyramid Pooling Module. The two modules can extract more detailed and representative features through the feature recombination and acquire richer context information than the previous module respectively. Experiments show that our modules are effective to improve the segmentation precision and the Modified Pyramid Pooling Module is also superior to the previous module. Based on our proposed network, we achieve the performance of 51.9% mIoU on Pascal Context dataset and 44.75% mIoU on ADE20K dataset.
Published: 2020

36. Instance Segmentation and Classification Method for Plant Leaf Images Based on ISC-MRCNN and APS-DCCNN

Author: Rundong Jiang, Yuan Gao, Jianwu Wang, Wenjie Chen, Xiaobo Yang, Guoxiong Zhou, and Aibin Chen
Subjects: General Computer Science, business.industry, Computer science, Object detection, General Engineering, Pattern recognition, adaptive chaotic particle swarm algorithm, Filter (signal processing), Mask R-CNN, Convolutional neural network, Support vector machine, Feature (computer vision), Pyramid, Softmax function, instance segmentation, General Materials Science, Segmentation, dual channels convolutional neural network, Pyramid (image processing), Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971, plant leaf
Abstract: To solve the complex background problems (e.g. Noise interference, object overlap, and different illumination) that affect the classification performance on plant leaf images, this paper proposes an instance segmentation and classification method for plant leaf images based on IFPN SNMS CFFI-Mask R-CNN (ISC-MRCNN) and ACPSOSVM-Dual Channels Convolutional Neural Network (APS-DCCNN). To obtain the foreground of plant leaf images, the lateral connection structure of the feature map pyramid in ISC-MRCNN fuses the feature maps of different depths, so that the network learns more detailed features. Then, the Soft Non-Maximum Suppression Algorithm is employed to improve the detection performance of overlapping objects. Next, the pooling method of integrating the continuous function can reduce the precision loss during the alignment of the mapping between the feature map and the original image. Finally, by constructing a mask filter layer, complex backgrounds are masked. To distinguish the similarity between plant leaf images, APS-DCCNN is used to classify the foreground images. In this process, the Support Vector Machine is used to replace softmax and then an Adaptive Chaotic Particle Swarm Algorithm is employed to optimize it. The experimental results show that compared with Mask R-CNN, the average precision of ISC-MRCNN has increased by 1.89% under different thresholds. The proposed method is suitable for the object detection and instance segmentation problems with complex background. Besides, compared with traditional CNN, the average precision of the classification results obtained by APS-DCCNN has improved by 1.59%. This has shown that the proposed method is suitable for the classification of plant leaves.
Published: 2020

37. LDPNet: A Lightweight Densely Connected Pyramid Network for Real-Time Semantic Segmentation

Author: Xuegang Hu and Liyuan Jing
Subjects: General Computer Science, Computer science, Feature extraction, 02 engineering and technology, Convolutional neural network, real-time semantic segmentation, lightweight network, 0502 economics and business, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Segmentation, atrous pyramid module, Spatial analysis, 050210 logistics & transportation, business.industry, encoder-decoder network, 05 social sciences, General Engineering, Pattern recognition, Image segmentation, Frame rate, Kernel (image processing), 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, Deep convolutional neural network, business, lcsh:TK1-9971
Abstract: A deep convolutional neural network has been widely used in image semantic segmentation in recent years, its deployment on mobile terminals, however is limited by its high computational costs. Given the slow inference speed and large memory usage of deep convolutional neural networks, we propose a lightweight and densely connected pyramid network (LDPNet) for real-time semantic segmentation. Firstly, a densely connected atrous pyramid (DCAP) module is constructed in the encoding process to extract multi-scale context information for forwarding propagation, strengthen the reuse of features, and offset the spatial information lost in the down-sampling process of the feature map. Secondly, a cross-fusion (CF) module embedded in each other during the decoding process is proposed, which uses high-level semantic features to effectively guide the fusion of low-level spatial details while strengthening context information. Our network is tested on two complex urban road scene data sets. Among them, experiments on the Cityscapes data set show that our structure has 87 frames per second (FPS) on a single NVIDIA GTX1080Ti GPU. The Mean Intersection over Union (mIoU) reaches 71.1%, and the parameter is only 0.8M. Compared with the existing similar networks, the new system achieves a state-of-the-art trade-off between efficiency and accuracy.
Published: 2020

38. Combining Residual Neural Networks and Feature Pyramid Networks to Estimate Poverty Using Multisource Remote Sensing Data

Author: Guanhua Zhou, Yumin Tan, Peng Wu, Bingxin Bai, and Yunxin Li
Subjects: Atmospheric Science, 010504 meteorology & atmospheric sciences, Computer science, neural network, poverty, Geophysics. Cosmic physics, 0211 other engineering and technologies, Multi-task learning, 02 engineering and technology, Economic indicators, Residual, multitask learning model, 01 natural sciences, symbols.namesake, Economic inequality, Economic indicator, Pyramid, night-time light, Feature (machine learning), Econometrics, Computers in Earth Sciences, TC1501-1800, 021101 geological & geomatics engineering, 0105 earth and related environmental sciences, Artificial neural network, QC801-809, Pearson product-moment correlation coefficient, Ocean engineering, symbols
Abstract: Reliable poverty data are critical for regional economic analysis and policy making, especially considering that economic inequality and sustainable development are widespread social concerns. This article proposes a multitask learning model combining deep residual neural networks and feature pyramid networks to estimate poverty level from multiple sources including the night-time light data, Landsat 8 imagery, and spectral index data. We first train the multitask learning model using the multisource data in Chongqing, China and then estimate the representative economic indicators in the study area. The model is evaluated with the Pearson correlation coefficient of the actual and estimated economic indicators. The result shows that the proposed model outperforms other models with the Pearson correlation coefficient up to 0.87 in the annual estimates of economic indicators between 2013 and 2017. As all the data used in this article are publicly available, the proposed model can be used to estimate the economic indicators in other regions as well.
Published: 2020

39. Cross Complementary Fusion Network for Video Salient Object Detection

Author: Junxia Li, Ziyang Wang, and Zefeng Pan
Subjects: General Computer Science, Channel (digital image), Computer science, Optical flow, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, self-attention mechanism, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Computer vision, Pyramid (image processing), Representation (mathematics), business.industry, General Engineering, 020207 software engineering, TK1-9971, Video saliency detection, pyramid pooling, Feature (computer vision), multi-channel concatenation, structural information, 020201 artificial intelligence & image processing, Artificial intelligence, Electrical engineering. Electronics. Nuclear engineering, business, Feature learning
Abstract: Recently, optical flow guided video saliency detection methods have achieved high performance. However, the computation cost of optical flow is usually expensive, which limits the applications of these methods in time-critical scenarios. In this article, we propose an end-to-end cross complementary network (CCNet) based on fully convolutional network for video saliency detection. The CCNet consists of two effective components: single-image representation enhancement (SRE) module and spatiotemporal information learning (STIL) module. The SRE module provides robust saliency feature learning for a single image through a pyramid pooling module followed by a lightweight channel attention module. As an effective alternative operation of optical flow to extract spatiotemporal information, the STIL introduces a spatiotemporal information fusion module and a video correlation filter to learn the spatiotemporal information, the inner collaborative and interactive information between consecutive input groups. In addition to enhancing the feature representation of a single image, the combination of SRE and STIL can learn the spatiotemporal information and the correlation between consecutive images well. Extensive experimental results demonstrate the effectiveness of our method in comparison with 14 state-of-the-art approaches.
Published: 2020

40. Robust Visual Tracking via a Collaborative Model Based on Locality-Constrained Sparse Coding

Author: Xiaoping Fan and Jia Hu
Subjects: General Computer Science, Computer science, 02 engineering and technology, Discriminative model, Robustness (computer science), Histogram, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, business.industry, Locality, General Engineering, collaborative model, 020206 networking & telecommunications, Pattern recognition, visual tracking, Active appearance model, Locality-constrained, Bayesian framework, Eye tracking, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, Neural coding, business, lcsh:TK1-9971
Abstract: Target tracking is an important task in computer vision. Now many tracking algorithms have achieved great results. However, several challenges still hinder the development of tracking algorithms, such as abrupt motion, occlusion and so on. In order to use the feature information of the target more effectively and improve the accuracy and robustness of target tracking, a novel model is designed which is different from the previous discriminative component and generative component, and a novel discriminative-generative collaborative appearance model is presented to combine the two components in this paper. First, for the discriminative component, Locality-Constrained Sparse Coding Algorithm is proposed. In this algorithm, the objective function of the local feature of the target spatial information is determined by fusing the pyramid maximum pool and local feature histogram method. The objective function has three important parameters, which are solved by different optimization strategies. Second, for the generative component, the Histogram of Locality-Constrained Feature Algorithm is proposed. In this algorithm, the locality constraint is served to describe the spatial information of the target as a generative appearance model. Each image patch can be approximated by a linear combination of a local coordinate system formed by a dictionary whose elements are cluster centers that contain the most representative model of the target. Third, this paper designs a collaborative target tracking framework based on semi-supervised learning algorithm with locality constraint coding. The framework can quickly and robustly determine the feature information of the tracking region. The proposed algorithm is evaluated on the comprehensive test platform. The experimental results show that our method is more robust and efficient, and the precision and success rate of our algorithm are improved by 5.4% and 4.7%, respectively.
Published: 2020

41. StegoPNet: Image Steganography With Generalization Ability Based on Pyramid Pooling Module

Author: Wenxin Wang, Zimei Xie, Chuan Qin, Xintao Duan, Dongli Yue, and Nao Liu
Subjects: General Computer Science, Computer science, Pooling, Feature extraction, 0211 other engineering and technologies, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Convolutional neural network, Distortion, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Pyramid (image processing), Image steganography, pyramid pooling module, 021110 strategic, defence & security studies, Steganography, business.industry, Deep learning, Payload (computing), General Engineering, deep neural network, Pattern recognition, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971
Abstract: In terms of payload capacity and visual effects, the existing image steganography technology based on deep neural networks still needs improvement, to solve this problem, this article proposes a new deep convolutional steganography network based on the pyramid pooling module to achieve better image steganography. The deep convolutional neural network itself can extract features efficiently. Based on the combination of up-sampling structure, we added a pyramid pool module, under the premise of ensuring safety, fully integrated the previous important global features, achieved good hiding and extraction effects, fully integrated the previous important global features, and effective it reduces the loss of contextual information between different sub-regions in the feature extraction process and achieves better hiding and extraction effects under the premise of ensuring security. Experiments show that the average peak signal-to-noise ratio (PSNR)/structure similarity (SSIM) and other indicators between the images obtained by this method have achieved good results in the experiment. Also, we have verified through ablation experiments that the pyramid pooling module can enhance the steganography effect of the network model and can further cut down the loss function of the model.
Published: 2020

42. A Novel Zero-Watermarking Scheme Based on Variable Parameter Chaotic Mapping in NSPD-DCT Domain

Author: Meng Yue, Han Shaocheng, Wang Rui, Cheng Zheng, Zhang Peng, and Yujin Zhang
Subjects: General Computer Science, Computer science, Feature extraction, Data_MISCELLANEOUS, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Image processing, 02 engineering and technology, rotation attacks, Robustness (computer science), Pyramid, Computer Science::Multimedia, Discrete cosine transform, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Pyramid (image processing), nonsubsampled pyramid decomposition, Digital watermarking, Computer Science::Cryptography and Security, General Engineering, 020207 software engineering, Watermark, Filter (signal processing), variable parameter chaotic mapping, Feature (computer vision), Computer Science::Computer Vision and Pattern Recognition, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, zero-watermarking, Algorithm, lcsh:TK1-9971
Abstract: In this paper, a novel image zero-watermarking scheme against rotation attacks is proposed based on nonsubsampled pyramid decomposition (NSPD) and discrete cosine transform (DCT). It utilizes the intrinsic characteristics of NSPD and DCT to extract the robust feature of an image as the original zero-watermark. To increase the security of the proposed scheme, a variable parameter chaotic mapping (VPCM) is designed for the processes of watermark encryption and robust feature extraction. Firstly, the host gray-scale image is decomposed by NSPD, and the low-frequency sub-band image is divided into non-overlapping blocks. After the blocks are transformed by DCT, the signs of the first AC coefficients from all the blocks are used to construct a binary feature image. Then an exclusive-or operation is performed between the binary feature image and the encrypted watermark image to obtain the verification zero-watermark image. Furthermore, a method against arbitrary rotation attacks is employed to improve the robustness of the scheme against geometric attacks. The experimental results demonstrate that the proposed scheme is highly robust against various image processing attacks such as filtering, JPEG compression, scaling, translation, rotation and Checkmark attacks.
Published: 2020

43. MPANET: Multi-Scale Pyramid Aggregation Network For Stereo Matching

Author: Wei Chen, Ziyu Zhu, Qiuping Li, Wei Guo, and Yong Zhao
Subjects: Scale (ratio), business.industry, Computer science, Pyramid, Stereo matching, Computer vision, Artificial intelligence, business
Published: 2021
Full Text: View/download PDF

44. Ddan: A Deep Dual Attention Network For Video Super-Resolution

Author: Yao Zhao, Huihui Bai, Xiyue Sun, and Feng Li
Subjects: Motion compensation, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Optical flow, DUAL (cognitive architecture), Residual, Superresolution, Motion (physics), Attention network, Pyramid, Computer vision, Artificial intelligence, business
Abstract: We present a deep dual attention network (DDAN) for video super-resolution, which cascades a motion compensation network (MCNet) and an SR reconstruction network (ReconNet). The MCNet utilize pyramid framework and learn the optical flow representations progressively to synthesize the motion information across adjacent frames. And it extracts detail components of LR neighboring frames for more accurate motion compensation. In ReconNet, we combine dual attention mechanisms and residual learning strategy for recovering high-frequency details. The DDAN performs effectively and generally on video super-resolution tasks. Relevant project has been released on Github.
Published: 2021
Full Text: View/download PDF

45. Multi-Scale Dilated Convolution Network Based Depth Estimation in Intelligent Transportation Systems

Author: Jinglu Hu, Yanling Tian, Pengyi Hao, Qieshi Zhang, Ziliang Ren, and Fuxiang Wu
Subjects: multi-scale dilated module, General Computer Science, Scale (ratio), Computer science, Feature extraction, General Engineering, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, dilated network, Convolution, ResNet, Feature (computer vision), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, General Materials Science, Pyramid (image processing), Data mining, lcsh:Electrical engineering. Electronics. Nuclear engineering, Depth estimation, Intelligent transportation system, computer, intelligent transportation systems (ITS), lcsh:TK1-9971
Abstract: Vision based depth estimation plays a significant role in Intelligent Transportation Systems (ITS) because of its low cost and high efficiency, which can be used to analyze driving environment, improve driving safety, etc. Although recently proposed approaches abandon time consuming pre-processing or post-processing steps and achieve an end-to-end prediction manner, fine details may be lost through max-pooling based encode modules. To tackle this problem, we propose Multi-Scale Dilated Convolution Network (MSDC-Net), a dilated convolution based deep network. For the feature encoding and decoding part, dilated layers maintain the scale of original image and reduce lost details. After that, a pyramid dilated feature extraction module is added to integrate the knowledge learned through forward steps with different receptive fields. The proposed approach is evaluated on KITTI dataset, and achieves a state-of-the-art result on the dataset.
Published: 2019

46. High-Resolution SAR Change Detection Based on ROI and SPP Net

Author: Fang Liu, Licheng Jiao, Lingling Li, Yang Zhengyan, and Xu Liu
Subjects: Synthetic aperture radar, General Computer Science, Computer science, Feature extraction, 0211 other engineering and technologies, 02 engineering and technology, Speckle pattern, Non-subsampled contourlet transform, Region of interest, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Cluster analysis, change detection, Image resolution, 021101 geological & geomatics engineering, business.industry, spatial pyramid pooling, General Engineering, Speckle noise, Pattern recognition, Contourlet, high-resolution SAR, 020201 artificial intelligence & image processing, Artificial intelligence, region of interest, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971, Change detection
Abstract: With the resolution increasing, the structure information becomes more and more abundant in Synthetic Aperture Radar (SAR) images. The speckle noise generated by the coherent imaging mechanism, has a great influence on the detection accuracy and detection difficulty accordingly in high-resolution SAR change detection. In this paper, a multivariate change detection framework based on non-subsampled contourlet transform (NSCT), deep belief networks (DBN), fuzzy c-means (FCM) clustering, and global-local spatial pyramid pooling (SPP) net is proposed. NSCT decomposes the image into multiple scales and DBN is used for extracting feature of the decomposed coefficient matrix. FCM converges the similarity matrix of the initial features by DBN into two classes as a pseudo-label for global-local SPP net training data. The global-local SPP net consists of a large-scale region of interest (ROI) SPP net and a small-scale change detection SPP net. The combination of ROI and the SPP net, as well as the fusion between different scales, weakens the interference of the unchanged information and effectively eliminates a large number of redundant information. The experimental results show that our proposed method can effectively remove speckle noise and improve the robustness of high-resolution SAR change detection.
Published: 2019

47. Multi-Component Fusion Network for Small Object Detection in Remote Sensing Images

Author: Shuojin Yang, Liang Tian, Jing Liu, Haibin Ling, Bingyin Zhou, Jianqing Jia, and Wei Guo
Subjects: complex scene, 010504 meteorology & atmospheric sciences, General Computer Science, Computer science, multi-component, 02 engineering and technology, occlusion, 01 natural sciences, Convolutional neural network, remote sensing, Robustness (computer science), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Spatial analysis, 0105 earth and related environmental sciences, Remote sensing, Fusion, business.industry, Deep learning, Small object, General Engineering, 020206 networking & telecommunications, Object detection, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971, dual pyramid fusion
Abstract: Small object detection is a major challenge in the field of object detection. With the development of deep learning, many methods based on deep convolutional neural networks (DCNNs) have greatly improved the speed of detection while ensuring accuracy. However, due to the contradiction between the spatial details and semantic information of DCNNs, previous deep learning methods often meet problems when detecting small objects. The challenge can be more serious in complex scenes involving similar background objects and/or occlusion, such as in remote sensing imagery. In this paper, we propose an end-to-end DCNN called the multi-component fusion network (MCFN) to improve the accuracy of small object detection in such cases. First, we propose a dual pyramid fusion network, which densely concatenates spatial information and semantic information to extract small object features via encoding and decoding operations. Then we use a relative region proposal network to adequately extract the features of small objects samples and parts of objects. Finally, to achieve robustness against background disturbance, we add contextual information to the proposal regions before final detection. Experimental evaluations demonstrate that the proposed method significantly improves the accuracy of object detection in remote sensing images compared with other state-of-the-art methods, especially in complex scenes with the conditions of occlusion.
Published: 2019

48. Learning to Fuse Multiscale Features for Visual Place Recognition

Author: Mao Jun, Xiaofeng He, Liao Wu, Michael Milford, Lilian Zhang, and Xiaoping Hu
Subjects: 0209 industrial biotechnology, General Computer Science, Computer science, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Convolutional neural network, localization, 020901 industrial engineering & automation, Visual place recognition, mobile robots, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Pyramid (image processing), business.industry, Deep learning, General Engineering, deep learning, Mobile robot, Pattern recognition, Visualization, Feature (computer vision), Benchmark (computing), 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971
Abstract: Efficient and robust visual place recognition is of great importance to autonomous mobile robots. Recent work has shown that features learned from convolutional neural networks achieve impressed performance with efficient feature size, where most of them are pooled or aggregated from a convolutional feature map. However, convolutional filters only capture the appearance of their perceptive fields, which lack the considerations on how to combine the multiscale appearance for place recognition. In this paper, we propose a novel method to build a multiscale feature pyramid and present two approaches to use the pyramid to augment the place recognition capability. The first approach fuses the pyramid to obtain a new feature map, which has an awareness of both the local and semi-global appearance, and the second approach learns an attention model from the feature pyramid to weight the spatial grids on the original feature map. Both approaches combine the multiscale features in the pyramid to suppress the confusing local features while tackling the problem in two different ways. Extensive experiments have been conducted on benchmark datasets with varying degrees of appearance and viewpoint variations. The results show that the proposed approaches achieve superior performance over the networks without the multiscale feature fusion and the multiscale attention components. Analyses on the performance of using different feature pyramids are also provided.
Published: 2019

49. Multiscale Context-Cascaded Ensemble Framework (MsC2EF): Application to Breast Histopathological Image

Author: Chao Tu, Qianjin Feng, Xinsen Zhang, Zhenyuan Ning, and Yu Zhang
Subjects: 0301 basic medicine, General Computer Science, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Context (language use), MsC²EF, Image (mathematics), 03 medical and health sciences, 0302 clinical medicine, Breast cancer, breast cancer, Pyramid, medicine, General Materials Science, Pyramid (image processing), Layer (object-oriented design), business.industry, Deep learning, General Engineering, Pattern recognition, histopathological image, medicine.disease, 030104 developmental biology, ComputingMethodologies_PATTERNRECOGNITION, Feature (computer vision), ensemble learning, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971, 030217 neurology & neurosurgery
Abstract: Microscopic analysis of breast cell and tissue is a critical step in the definitive diagnosis of breast cancer. However, it's time-consuming and fatigable for histopathologists to find the diagnostic characteristic of cell and tissue in breast histopathological image through multiple magnification scannings. Many computer-aided studies, including traditional machine learning and deep learning approaches, have been conducted to efficiently assist histopathologists in making diagnostic decision. However, precision and complexity of such approaches remain challenging. In this work, we propose and evaluate a new framework, called multiscale context-cascaded ensemble framework (MsC2EF), to classify breast histopathological images. The model based on MsC2EF exhibits a higher precision than traditional machine learning. Meantime, it is more efficient and hardware-independent compared with deep learning approaches. The MsC2EF consists of the input, cascade, and decision layers. The input layer comprises a feature extractor and a spatial pyramid of image to execute feature input from coarse to fine scales. Four ensemble channels are stacked in a parallel manner as the cascade layer to select and transfer contextual feature iteratively and adaptively. For the decision layer, kernel fusion-based method is integrated to perform classification of breast histopathological image by fusing four different feature spaces. Our proposed method has been evaluated on an open dataset. The experimental result shows that MsC2EF obtains a good classification performance (Accuracy at patch level: 0.948±0.016; accuracy at patient level: 0.981±0.016), indicating its potential application to the classification of breast histopathologist images.
Published: 2019

50. Multi-Attention Object Detection Model in Remote Sensing Images Based on Multi-Scale

Author: Xuewei Li, Zhiqiang Liu, Ruiguo Yu, Han Jiang, Xiang Ying, Jie Gao, Mei Yu, and Qiang Wang
Subjects: General Computer Science, Channel (digital image), Computer science, Object detection, 0211 other engineering and technologies, 02 engineering and technology, satellite imagery, Pyramid, spatial attention, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Pyramid (image processing), pixel-level attention, 021101 geological & geomatics engineering, Remote sensing, Pixel, General Engineering, Object (computer science), Expression (mathematics), Kernel (image processing), Feature (computer vision), 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Scale (map), lcsh:TK1-9971
Abstract: Ground object detection, based on remote sensing satellite imagery, provides the groundwork of numerous applications, so the detection accuracy is of vital importance. The background of remote sensing images is complex, the object size is various, and there are many small objects. In view of the above problems, a multi-attention object detection method (MA-FPN) based on multi-scale is proposed in this paper, which can effectively make the network pay attention to the location of the object and reduce the loss of small object information. According to feature pyramid network (FPN), we firstly put forward a global spatial attention module, which extracts spatial location-related information from shallow features and fuses it with deep features to enhance the position expression ability of deep features. Besides, the paper provides a pixel feature attention module: the multi-scale convolution kernel is employed to generate the feature map of the same size as the input, as well as channel attention is used to assign weights to each layer of feature maps to obtain pixel-level attention maps with good details. Experiments on NWPU, RSOD, and DOTA datasets show that the proposed algorithm outperforms state of the art methods.
Published: 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

760 results on '"pyramid"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources