19 results on '"Shen, Chunhua"'
Search Results
2. SOLO: A Simple Framework for Instance Segmentation.
- Author
-
Wang, Xinlong, Zhang, Rufeng, Shen, Chunhua, Kong, Tao, and Li, Lei
- Subjects
PIXELS ,IMAGE segmentation ,TASK analysis ,MASK laws - Abstract
Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that has made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the “detect-then-segment” strategy (e.g., Mask R-CNN), or predict embedding vectors first then cluster pixels into individual instances. In this paper, we view the task of instance segmentation from a completely new perspective by introducing the notion of “instance categories”, which assigns categories to each pixel within an instance according to the instance's location. With this notion, we propose segmenting objects by locations (SOLO), a simple, direct, and fast framework for instance segmentation with strong performance. We derive a few SOLO variants (e.g., Vanilla SOLO, Decoupled SOLO, Dynamic SOLO) following the basic principle. Our method directly maps a raw input image to the desired object categories and instance masks, eliminating the need for the grouping post-processing or the bounding box detection. Our approach achieves state-of-the-art results for instance segmentation in terms of both speed and accuracy, while being considerably simpler than the existing methods. Besides instance segmentation, our method yields state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation. We further demonstrate the flexibility and high-quality segmentation of SOLO by extending it to perform one-stage instance-level image matting. Code is available at: https://git.io/AdelaiDet. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Index Networks.
- Author
-
Lu, Hao, Dai, Yutong, Shen, Chunhua, and Xu, Songcen
- Subjects
IMAGE denoising ,IMAGE segmentation - Abstract
We show that existing upsampling operators in convolutional networks can be unified using the notion of the index function. This notion is inspired by an observation in the decoding process of deep image matting where indices-guided unpooling can often recover boundary details considerably better than other upsampling operators such as bilinear interpolation. By viewing the indices as a function of the feature map, we introduce the concept of ‘learning to index’, and present a novel index-guided encoder-decoder framework where indices are learned adaptively from data and are used to guide downsampling and upsampling stages, without extra training supervision. At the core of this framework is a new learnable module, termed Index Network (IndexNet), which dynamically generates indices conditioned on the feature map. IndexNet can be used as a plug-in, applicable to almost all convolutional networks that have coupled downsampling and upsampling stages, enabling the networks to dynamically capture variations of local patterns. In particular, we instantiate and investigate five families of IndexNet. We highlight their superiority in delivering spatial information over other upsampling operators with experiments on synthetic data, and demonstrate their effectiveness on four dense prediction tasks, including image matting, image denoising, semantic segmentation, and monocular depth estimation. Code and models are available at https://git.io/IndexNet. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes.
- Author
-
Dong, Genshun, Yan, Yan, Shen, Chunhua, and Wang, Hanzi
- Abstract
Deep Convolutional Neural Networks (DCNNs) have recently shown outstanding performance in semantic image segmentation. However, state-of-the-art DCNN-based semantic segmentation methods usually suffer from high computational complexity due to the use of complex network architectures. This greatly limits their applications in the real-world scenarios that require real-time processing. In this paper, we propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes, which achieves a good trade-off between accuracy and speed. Specifically, a Lightweight Baseline Network with Atrous convolution and Attention (LBN-AA) is firstly used as our baseline network to efficiently obtain dense feature maps. Then, the Distinctive Atrous Spatial Pyramid Pooling (DASPP), which exploits the different sizes of pooling operations to encode the rich and distinctive semantic information, is developed to detect objects at multiple scales. Meanwhile, a Spatial detail-Preserving Network (SPN) with shallow convolutional layers is designed to generate high-resolution feature maps preserving the detailed spatial information. Finally, a simple but practical Feature Fusion Network (FFN) is used to effectively combine both deep and shallow features from the semantic branch (DASPP) and the spatial branch (SPN), respectively. Extensive experimental results show that the proposed method respectively achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) at the inference speeds of 51.0 fps and 39.3 fps on the challenging Cityscapes and CamVid test datasets (by only using a single NVIDIA TITAN X card). This demonstrates that the proposed method offers excellent performance at the real-time speed for semantic segmentation of urban street scenes. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
5. SESV: Accurate Medical Image Segmentation by Predicting and Correcting Errors.
- Author
-
Xie, Yutong, Zhang, Jianpeng, Lu, Hao, Shen, Chunhua, and Xia, Yong
- Subjects
COMPUTER-assisted image analysis (Medicine) ,DIAGNOSTIC imaging ,CONVOLUTIONAL neural networks ,IMAGE segmentation - Abstract
Medical image segmentation is an essential task in computer-aided diagnosis. Despite their prevalence and success, deep convolutional neural networks (DCNNs) still need to be improved to produce accurate and robust enough segmentation results for clinical use. In this paper, we propose a novel and generic framework called Segmentation-Emendation-reSegmentation-Verification (SESV) to improve the accuracy of existing DCNNs in medical image segmentation, instead of designing a more accurate segmentation model. Our idea is to predict the segmentation errors produced by an existing model and then correct them. Since predicting segmentation errors is challenging, we design two ways to tolerate the mistakes in the error prediction. First, rather than using a predicted segmentation error map to correct the segmentation mask directly, we only treat the error map as the prior that indicates the locations where segmentation errors are prone to occur, and then concatenate the error map with the image and segmentation mask as the input of a re-segmentation network. Second, we introduce a verification network to determine whether to accept or reject the refined mask produced by the re-segmentation network on a region-by-region basis. The experimental results on the CRAG, ISIC, and IDRiD datasets suggest that using our SESV framework can improve the accuracy of DeepLabv3+ substantially and achieve advanced performance in the segmentation of gland cells, skin lesions, and retinal microaneurysms. Consistent conclusions can also be drawn when using PSPNet, U-Net, and FPN as the segmentation network, respectively. Therefore, our SESV framework is capable of improving the accuracy of different DCNNs on different medical image segmentation tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Human Detection Aided by Deeply Learned Semantic Masks.
- Author
-
Wang, Xinyu, Shen, Chunhua, Li, Hanxi, and Xu, Shugong
- Subjects
- *
DEEP learning , *CONVOLUTIONAL neural networks , *COMPUTER vision , *VIDEO surveillance , *IMAGE converters , *MASKS - Abstract
Human detection is one of the long-standing computer vision tasks, and it has been a cornerstone for many real-world applications, such as photo album organization, video surveillance, and autonomous driving. Benefiting from deep learning technologies, such as convolutional neural networks and modern object detectors, have been achieving much improved accuracy in generic object detection tasks. In this paper, we aim to improve deep learning-based human detection. Our main idea is to exploit semantic context information for human detection by using deep-learnt semantic features provided by semantic segmentation masks. Segmentation masks play as an attention mechanism and enforce the detectors to focus on the image regions where potential object candidates are likely to appear. Meanwhile, the extra segmentation mask channel can also guide the convolutional kernels to automatically learn more discriminative features which make it easier to distinguish the background and foreground. We implement our methods with two popular detection frameworks, i.e., faster R-CNN and SSD and experimentally analyze the effectiveness of the proposed methods. Evaluation results on the widely used MS-COCO dataset and the very recent CrowdHuman dataset are provided. Our proposed methods outperform the baseline detectors and achieve better performance on highly occluded human detection. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
7. A Mutual Bootstrapping Model for Automated Skin Lesion Segmentation and Classification.
- Author
-
Xie, Yutong, Zhang, Jianpeng, Xia, Yong, and Shen, Chunhua
- Abstract
Automated skin lesion segmentation and classification are two most essential and related tasks in the computer-aided diagnosis of skin cancer. Despite their prevalence, deep learning models are usually designed for only one task, ignoring the potential benefits in jointly performing both tasks. In this paper, we propose the mutual bootstrapping deep convolutional neural networks (MB-DCNN) model for simultaneous skin lesion segmentation and classification. This model consists of a coarse segmentation network (coarse-SN), a mask-guided classification network (mask-CN), and an enhanced segmentation network (enhanced-SN). On one hand, the coarse-SN generates coarse lesion masks that provide a prior bootstrapping for mask-CN to help it locate and classify skin lesions accurately. On the other hand, the lesion localization maps produced by mask-CN are then fed into enhanced–SN, aiming to transfer the localization information learned by mask-CN to enhanced-SN for accurate lesion segmentation. In this way, both segmentation and classification networks mutually transfer knowledge between each other and facilitate each other in a bootstrapping way. Meanwhile, we also design a novel rank loss and jointly use it with the Dice loss in segmentation networks to address the issues caused by class imbalance and hard-easy pixel imbalance. We evaluate the proposed MB-DCNN model on the ISIC-2017 and PH2 datasets, and achieve a Jaccard index of 80.4% and 89.4% in skin lesion segmentation and an average AUC of 93.8% and 97.7% in skin lesion classification, which are superior to the performance of representative state-of-the-art skin lesion segmentation and classification methods. Our results suggest that it is possible to boost the performance of skin lesion segmentation and classification simultaneously via training a unified model to perform both tasks in a mutual bootstrapping way. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
8. RefineNet: Multi-Path Refinement Networks for Dense Prediction.
- Author
-
Lin, Guosheng, Liu, Fayao, Milan, Anton, Shen, Chunhua, and Reid, Ian
- Subjects
CONVOLUTIONAL neural networks ,FORECASTING ,OBJECT recognition (Computer vision) ,MATHEMATICAL convolutions - Abstract
Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense prediction problems such as semantic segmentation and depth estimation. However, repeated subsampling operations like pooling or convolution striding in deep CNNs lead to a significant decrease in the initial image resolution. Here, we present RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. We carry out comprehensive experiments on semantic segmentation which is a dense classification problem and achieve good performance on seven public datasets. We further apply our method for depth estimation and demonstrate the effectiveness of our method on dense regression problems. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
9. Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation.
- Author
-
Zhang, Tianyi, Lin, Guosheng, Cai, Jianfei, Shen, Tong, Shen, Chunhua, and Kot, Alex C.
- Abstract
Weakly supervised semantic segmentation receives much research attention since it alleviates the need to obtain a large amount of dense pixel-wise ground-truth annotations for the training images. Compared with other forms of weak supervision, image labels are quite efficient to obtain. In this paper, we focus on the weakly supervised semantic segmentation with image label annotations. Recent progress for this task has been largely dependent on the quality of generated pseudo-annotations. In this paper, inspired by spatial neural-attention for image captioning, we propose a decoupled spatial neural attention network for generating pseudo-annotations. Our decoupled attention structure could simultaneously identify the object regions and localize the discriminative parts, which generates high-quality pseudo-annotations in one forward path. The generated pseudo-annotations lead to the segmentation results that achieve the state of the art in weakly supervised semantic segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
10. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition.
- Author
-
Wu, Zifeng, Shen, Chunhua, and van den Hengel, Anton
- Subjects
- *
ARTIFICIAL neural networks , *IMAGE segmentation , *DEEP learning , *DIGITAL image processing , *MACHINE learning - Abstract
Highlights • We further develop the unravelled view of ResNets, which helps us better understand their behaviours. We demonstrate this in the context of a training process, which is the key difference from the original version 1. • We propose a group of relatively shallow convolutional networks based on our new understanding. Some of them perform comparably with the state-of-the-art approaches on the ImageNet classification dataset 2. • We evaluate the impact of using different networks on the performance of semantic image segmentation, and show these networks, as pre-trained features, can boost existing algorithms a lot. Abstract The community has been going deeper and deeper in designing one cutting edge network after another, yet some works are there suggesting that we may have gone too far in this dimension. Some researchers unravelled a residual network into an exponentially wider one, and assorted the success of residual networks to fusing a large amount of relatively shallow models. Since some of their early claims are still not settled, we in this paper dig more on this topic, i.e., the unravelled view of residual networks. Based on that, we try to find a good compromise between the depth and width. Afterwards, we walk through a typical pipeline of developing a deep-learning-based algorithm. We start from a group of relatively shallow networks, which perform as well or even better than the current (much deeper) state-of-the-art models on the ImageNet classification dataset. Then, we initialize fully convolutional networks (FCNs) using our pre-trained models, and tune them for semantic image segmentation. Results show that the proposed networks, as pre-trained features, can boost existing methods a lot. Even without exhausting the sophistical techniques to improve the classic FCN model, we achieve comparable results with the best performers on four widely-used datasets, i.e., Cityscapes, PASCAL VOC, ADE20k and PASCAL-Context. The code and pre-trained models are released for public access 1 1 https://github.com/itijyou/ademxapp. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
11. Crowd Counting via Weighted VLAD on a Dense Attribute Feature Map.
- Author
-
Sheng, Biyun, Shen, Chunhua, Lin, Guosheng, Li, Jun, Yang, Wankou, and Sun, Changyin
- Subjects
- *
FEATURE extraction , *DATA mapping , *DESCRIPTOR systems , *COMPUTER vision , *ARTIFICIAL neural networks - Abstract
Crowd counting is an important task in computer vision, which has many applications in video surveillance. Although the regression-based framework has achieved great improvements for crowd counting, how to improve the discriminative power of image representation is still an open problem. Conventional holistic features used in crowd counting often fail to capture semantic attributes and spatial cues of the image. In this paper, we propose integrating semantic information into learning locality-aware feature (LAF) sets for accurate crowd counting. First, with the help of a convolutional neural network, the original pixel space is mapped onto a dense attribute feature map, where each dimension of the pixelwise feature indicates the probabilistic strength of a certain semantic class. Then, LAF built on the idea of spatial pyramids on neighboring patches is proposed to explore more spatial context and local information. Finally, the traditional vector of locally aggregated descriptor (VLAD) encoding method is extended to a more generalized form weighted-VLAD (W-VLAD) in which diverse coefficient weights are taken into consideration. Experimental results validate the effectiveness of our presented method. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
12. Exploring Context with Deep Structured Models for Semantic Segmentation.
- Author
-
Lin, Guosheng, Shen, Chunhua, van den Hengel, Anton, and Reid, Ian
- Subjects
- *
SEMANTICS , *CONTEXTUAL analysis , *BACK propagation , *DEEP structure (Linguistics) , *IMAGE segmentation - Abstract
We propose an approach for exploiting contextual information in semantic image segmentation, and particularly investigate the use of patch-patch context and patch-background context in deep CNNs. We formulate deep structured models by combining CNNs and Conditional Random Fields (CRFs) for learning the patch-patch context between image regions. Specifically, we formulate CNN-based pairwise potential functions to capture semantic correlations between neighboring patches. Efficient piecewise training of the proposed deep structured model is then applied in order to avoid repeated expensive CRF inference during the course of back propagation. For capturing the patch-background context, we show that a network design with traditional multi-scale image inputs and sliding pyramid pooling is very effective for improving performance. We perform comprehensive evaluation of the proposed method. We achieve new state-of-the-art performance on a number of challenging semantic segmentation datasets. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
13. Compositional Model Based Fisher Vector Coding for Image Classification.
- Author
-
Liu, Lingqiao, Wang, Peng, Shen, Chunhua, Wang, Lei, Hengel, Anton van den, Wang, Chao, and Shen, Heng Tao
- Subjects
CLASSIFICATION ,INFORMATION organization ,COMPUTER graphics ,DIGITAL image processing ,IMAGE segmentation - Abstract
Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) as the generative model for local features. However, the representative power of a GMM can be limited because it essentially assumes that local features can be characterized by a fixed number of feature prototypes, and the number of prototypes is usually small in FVC. To alleviate this limitation, in this work, we break the convention which assumes that a local feature is drawn from one of a few Gaussian distributions. Instead, we adopt a compositional mechanism which assumes that a local feature is drawn from a Gaussian distribution whose mean vector is composed as a linear combination of multiple key components, and the combination weight is a latent random variable. In doing so we greatly enhance the representative power of the generative model underlying FVC. To implement our idea, we design two particular generative models following this compositional approach. In our first model, the mean vector is sampled from the subspace spanned by a set of bases and the combination weight is drawn from a Laplace distribution. In our second model, we further assume that a local feature is composed of a discriminative part and a residual part. As a result, a local feature is generated by the linear combination of discriminative part bases and residual part bases. The decomposition of the discriminative and residual parts is achieved via the guidance of a pre-trained supervised coding method. By calculating the gradient vector of the proposed models, we derive two new Fisher vector coding strategies. The first is termed Sparse Coding-based Fisher Vector Coding (SCFVC) and can be used as the substitute of traditional GMM based FVC. The second is termed Hybrid Sparse Coding-based Fisher vector coding (HSCFVC) since it combines the merits of both pre-trained supervised coding methods and FVC. Using pre-trained Convolutional Neural Network (CNN) activations as local features, we experimentally demonstrate that the proposed methods are superior to traditional GMM based FVC and achieve state-of-the-art performance in various image classification tasks. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
14. Large-Scale Binary Quadratic Optimization Using Semidefinite Relaxation and Applications.
- Author
-
Wang, Peng, Shen, Chunhua, Hengel, Anton Van Den, and Torr, Philip H. S.
- Subjects
- *
MARKOV random fields , *MATHEMATICAL optimization , *QUASI-Newton methods , *BINARY operations , *QUADRATIC programming - Abstract
In computer vision, many problems can be formulated as binary quadratic programs (BQPs), which are in general NP hard. Finding a solution when the problem is of large size to be of practical interest typically requires relaxation. Semidefinite relaxation usually yields tight bounds, but its computational complexity is high. In this work, we present a semidefinite programming (SDP) formulation for BQPs, with two desirable properties. First, it produces similar bounds to the standard SDP formulation. Second, compared with the conventional SDP formulation, the proposed SDP formulation leads to a considerably more efficient and scalable dual optimization approach. We then propose two solvers, namely, quasi-Newton and smoothing Newton methods, for the simplified dual problem. Both of them are significantly more efficient than standard interior-point methods. Empirically the smoothing Newton solver is faster than the quasi-Newton solver for dense or medium-sized problems, while the quasi-Newton solver is preferable for large sparse/structured problems. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
15. Structured Learning of Tree Potentials in CRF for Image Segmentation.
- Author
-
Liu, Fayao, Lin, Guosheng, Qiao, Ruizhi, and Shen, Chunhua
- Subjects
IMAGE segmentation ,ARTIFICIAL neural networks ,SUPPORT vector machines - Abstract
We propose a new approach to image segmentation, which exploits the advantages of both conditional random fields (CRFs) and decision trees. In the literature, the potential functions of CRFs are mostly defined as a linear combination of some predefined parametric models, and then, methods, such as structured support vector machines, are applied to learn those linear coefficients. We instead formulate the unary and pairwise potentials as nonparametric forests—ensembles of decision trees, and learn the ensemble parameters and the trees in a unified optimization problem within the large-margin framework. In this fashion, we easily achieve nonlinear learning of potential functions on both unary and pairwise terms in CRFs. Moreover, we learn classwise decision trees for each object that appears in the image. Experimental results on several public segmentation data sets demonstrate the power of the learned nonlinear nonparametric potentials. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
16. StructBoost: Boosting Methods for Predicting Structured Output Variables.
- Author
-
Shen, Chunhua, Lin, Guosheng, and Hengel, Anton van den
- Subjects
- *
BOOSTING algorithms , *COMPUTER vision , *SUPPORT vector machines , *IMAGE segmentation , *ROBUST control - Abstract
Boosting is a method for learning a single accurate predictor by linearly combining a set of less accurate weak learners. Recently, structured learning has found many applications in computer vision. Inspired by structured support vector machines (SSVM), here we propose a new boosting algorithm for structured output prediction, which we refer to as StructBoost. StructBoost supports nonlinear structured learning by combining a set of weak structured learners. As SSVM generalizes SVM, our StructBoost generalizes standard boosting approaches such as AdaBoost, or LPBoost to structured learning. The resulting optimization problem of StructBoost is more challenging than SSVM in the sense that it may involve exponentially many variables and constraints. In contrast, for SSVM one usually has an exponential number of constraints and a cutting-plane method is used. In order to efficiently solve StructBoost, we formulate an equivalent $1$
-slack formulation and solve it using a combination of cutting planes and column generation. We show the versatility and usefulness of StructBoost on a range of problems such as optimizing the tree loss for hierarchical multi-class classification, optimizing the Pascal overlap criterion for robust visual tracking and learning conditional random field parameters for image segmentation. [ABSTRACT FROM PUBLISHER]- Published
- 2014
- Full Text
- View/download PDF
17. CRF learning with CNN features for image segmentation.
- Author
-
Liu, Fayao, Lin, Guosheng, and Shen, Chunhua
- Subjects
- *
MACHINE learning , *FEATURE extraction , *IMAGE segmentation , *IMAGE analysis , *ARTIFICIAL neural networks , *PIXELS - Abstract
Conditional Random Rields (CRF) have been widely applied in image segmentations. While most studies rely on hand-crafted features, we here propose to exploit a pre-trained large convolutional neural network (CNN) to generate deep features for CRF learning. The deep CNN is trained on the ImageNet dataset and transferred to image segmentations here for constructing potentials of superpixels. Then the CRF parameters are learnt using a structured support vector machine (SSVM). To fully exploit context information in inference, we construct spatially related co-occurrence pairwise potentials and incorporate them into the energy function. This prefers labelling of object pairs that frequently co-occur in a certain spatial layout and at the same time avoids implausible labellings during the inference. Extensive experiments on binary and multi-class segmentation benchmarks demonstrate the promise of the proposed method. We thus provide new baselines for the segmentation performance on the Weizmann horse, Graz-02, MSRC-21, Stanford Background and PASCAL VOC 2011 datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
18. Arbitrarily shaped scene text detection with dynamic convolution.
- Author
-
Cai, Ying, Liu, Yuliang, Shen, Chunhua, Jin, Lianwen, Li, Yidong, and Ergu, Daji
- Subjects
- *
IMAGE segmentation - Abstract
• According to the detailed characteristics of the text instance, we dynamically generate the convolutional kernels from multi-feature for different instances. The specific attributes such as position, scale, and center, have been embedded into the convolutional kernel so that the mask prediction task using the text-instance-aware kernel will focus on the pixels that belong to themselves. Obviously, this design is helpful to improve the detection accuracy of adjacent text instances. • We generate the respective mask prediction head for each instance in parallel. These heads predict masks on the original feature map and retain resolution details of the text instance. It is no longer necessary to crop the RoIs and force them to be the same size. Our architecture overcomes the problem that a set of fixed convolution kernels cannot adapt to all resolutions, and at the same time preventing the loss of information caused by the multi-scales of the instances. • Because improving the text-instance-aware convolutional kernel increases the capacity of the model, we can also achieve competitive results with a very compact prediction head. Therefore, multiple mask prediction heads can be concurrently predicted without bringing significant computational overhead. • For the sake of improving the performance and accelerating the convergence of training, we design a text-shape sensitive position embedding to explicitly provide the location information to the mask prediction head. Arbitrarily shaped scene text detection has witnessed great development in recent years, and text detection using segmentation has been proven to an effective approach. However, problems caused by the diverse attributes of text instances, such as shapes, scales, and presentation styles (dense or sparse), persist. In this paper, we propose a novel text detector, termed DText, which can effectively formulate an arbitrarily shaped scene text detection task based on dynamic convolution. Our method can dynamically generate independent text-instance-aware convolutional parameters for each text instance from multi-features thus overcoming some intractable limitations of arbitrary text detection, such as the splitting of similar adjacent text, which poses challenges to fixed instance-shared convolutional parameters-based methods. Unlike standard segmentation methods relying on regions-of-interest bounding boxes, DText focuses on enhancing the flexibility of the network to retain details of instances from diverse resolutions while effectively improving prediction accuracy. Moreover, we propose encoding the shape and position information according to the characteristics of the text instance, termed text-shape sensitive position embedding. Thus, it can provide explicit shape and position information to the generator of the dynamic convolution parameters. Experiments on five benchmarks (Total-Text, SCUT-CTW1500, MSRA-TD500, ICDAR2015, and MLT) showed that our method achieves superior detection performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Reading car license plates using deep neural networks.
- Author
-
Li, Hui, Wang, Peng, You, Mingyu, and Shen, Chunhua
- Subjects
- *
AUTOMOBILE license plates , *ARTIFICIAL neural networks , *FEATURE extraction , *IMAGE segmentation , *BINARY number system - Abstract
In this work, we tackle the problem of car license plate detection and recognition in natural scene images based on the powerful deep neural networks (DNNs). Firstly, a 37-class convolutional neural network (CNN) is trained to detect characters in an image, which leads to a high recall compared with a binary text/non-text classifier. False positives are then eliminated effectively by a plate/non-plate CNN classifier. As to the license plate recognition, we regard the character string reading as a sequence labeling problem. Recurrent neural networks (RNNs) with long short-term memory (LSTM) are trained to recognize the sequential features extracted from the whole license plate via CNNs. The main advantage of this approach is that it is segmentation free. By exploring contextual information and avoiding errors caused by segmentation, this method performs better than conventional methods and achieves state-of-the-art recognition accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.