126 results on '"image parsing"'
Search Results
2. Context-Based Deep Learning Architecture with Optimal Integration Layer for Image Parsing
- Author
-
Mandal, Ranju, Azam, Basim, Verma, Brijesh, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Mantoro, Teddy, editor, Lee, Minho, editor, Ayu, Media Anugerah, editor, Wong, Kok Wai, editor, and Hidayanto, Achmad Nizar, editor
- Published
- 2021
- Full Text
- View/download PDF
3. Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer.
- Author
-
Lin, Liang, Gao, Yiming, Gong, Ke, Wang, Meng, and Liang, Xiaodan
- Subjects
- *
REPRESENTATIONS of graphs , *KNOWLEDGE transfer , *CHARTS, diagrams, etc. , *IMAGE segmentation , *GLOBAL method of teaching , *FEATURE extraction - Abstract
Prior highly-tuned image parsing models are usually studied in a certain domain with a specific set of semantic labels and can hardly be adapted into other scenarios (e.g.sharing discrepant label granularity) without extensive re-training. Learning a single universal parsing model by unifying label annotations from different domains or at various levels of granularity is a crucial but rarely addressed topic. This poses many fundamental learning challenges, e.g.discovering underlying semantic structures among different label granularity or mining label correlation across relevant tasks. To address these challenges, we propose a graph reasoning and transfer learning framework, named “Graphonomy,” which incorporates human knowledge and label taxonomy into the intermediate graph representation learning beyond local convolutions. In particular, Graphonomy learns the global and structured semantic coherency in multiple domains via semantic-aware graph reasoning and transfer, enforcing the mutual benefits of the parsing across domains (e.g.different datasets or co-related tasks). The Graphonomy includes two iterated modules: Intra-Graph Reasoning and Inter-Graph Transfer modules. The former extracts the semantic graph in each domain to improve the feature representation learning by propagating information with the graph; the latter exploits the dependencies among the graphs from different domains for bidirectional knowledge transfer. We apply Graphonomy to two relevant but different image understanding research topics: human parsing and panoptic segmentation, and show Graphonomy can handle both of them well via a standard pipeline against current state-of-the-art approaches. Moreover, some extra benefit of our framework is demonstrated, e.g., generating the human parsing at various levels of granularity by unifying annotations across different datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Relationship aware context adaptive deep learning for image parsing.
- Author
-
Azam, Basim, Mandal, Ranju, and Verma, Brijesh
- Subjects
- *
DEEP learning , *FEATURE selection , *AUTHORSHIP - Abstract
The formation of deep learning architectures is challenged in several aspects, among the major and fundamental steps to develop an effective image parsing network is feature selection. In addition, the exploration of context information in such frameworks is also of prime importance. In this research, a novel architecture that utilizes distinctive feature selection algorithm, and the context adaptive information is proposed. The feature selection algorithm defines the idea of exploring relationship aware information to minimize the similarity among features and select an affluent and optimum set of feature representations. The efficacy of proposed framework is analyzed using several benchmark datasets including Stanford Background, CamVid and MSRC v2. The proposed framework achieves 93.8%, 91.8% and 96.1% global pixel segmentation accuracy on the benchmark datasets respectively. Furthermore, we present a comprehensive comparative analysis with state-of-the-art techniques in the literature. The analysis reveals meaningful refinements in terms of segmentation accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. An Extensive Review on Verbal-Guided Image Parsing
- Author
-
Agrawal, Pankhuri, Choudhury, Tanupriya, Kumar, Praveen, Raj, Gaurav, Howlett, Robert James, Series editor, Jain, Lakhmi C., Series editor, Satapathy, Suresh Chandra, editor, Bhateja, Vikrant, editor, and Das, Swagatam, editor
- Published
- 2018
- Full Text
- View/download PDF
6. Task Driven Generative Modeling for Unsupervised Domain Adaptation: Application to X-ray Image Segmentation
- Author
-
Zhang, Yue, Miao, Shun, Mansi, Tommaso, Liao, Rui, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Frangi, Alejandro F., editor, Schnabel, Julia A., editor, Davatzikos, Christos, editor, Alberola-López, Carlos, editor, and Fichtinger, Gabor, editor
- Published
- 2018
- Full Text
- View/download PDF
7. Improving Semantic Segmentation with Generalized Models of Local Context
- Author
-
Ates, Hasan F., Sunetci, Sercan, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Felsberg, Michael, editor, Heyden, Anders, editor, and Krüger, Norbert, editor
- Published
- 2017
- Full Text
- View/download PDF
8. Kernel Likelihood Estimation for Superpixel Image Parsing
- Author
-
Ates, Hasan F., Sunetci, Sercan, Ak, Kenan E., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Campilho, Aurélio, editor, and Karray, Fakhri, editor
- Published
- 2016
- Full Text
- View/download PDF
9. Deep Learning Architecture with Context Adaptive Features for Image Parsing
- Author
-
Azam, Basim and Azam, Basim
- Abstract
Image parsing is fundamental to field of computer vision, and it aids the process of recognising objects and pixels in images. Image parsing refers to assigning pixels in an image to pre-defined object category labels. Traditional machine learning techniques have proven to be inadequate to produce accurate pixel labels and are prone to misclassifications. Therefore, the field of image parsing has gained popularity among researchers and experts in machine learning. As a result, numerous techniques have been proposed. These image parsing techniques, however, have complex architectures and almost lack the consideration of contextual information. There is an enormous body of literature that recognises the importance of image segmentation, presenting an evaluation of it, while the deep learning-based architectures are at the forefront of this developing body of models to perceive images as object labels. The deep learning architectures are mostly based on fully convolutional neural networks and hold the potential to transform the pixel-wise segmentation. Such deep architectures, however, should be developed considering additional context and relevant features. Therefore, the main aim of this thesis is to present novel deep learning-based image parsing architectures that consider contextual information for image parsing and to present component-wise investigations to select the best classifiers, context generations, visual features, and set of hyperparameters used to produce pixel labels. [...], Thesis (PhD Doctorate), Doctor of Philosophy (PhD), School of Info & Comm Tech, Science, Environment, Engineering and Technology, Full Text
- Published
- 2023
10. Image Parsing: Unifying Segmentation, Detection, and Recognition
- Author
-
Tu, Zhouwen, Chen, Xiangrong, Yuille, Alan L, and Zhu, Song-Chun
- Subjects
Image parsing ,image segmentation ,object detection ,object recognition ,data driven Markov Chain ,Monte Carlo ,AdaBoost - Abstract
In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a "parsing graph", in a spirt similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reverisble Markov chain jumps. this computational framework integrates two popular inference approaches- generative (top-down) methods and discriminative (bottom-up) methods. The former forumlates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilites are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilites activate top-down generative models. In this paper, we focus on two types of visual patterns - generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only the image parsing will correspond to image segmentation [47]). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.
- Published
- 2005
11. IC-Cut: A Compositional Search Strategy for Dynamic Test Generation
- Author
-
Christakis, Maria, Godefroid, Patrice, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Fischer, Bernd, editor, and Geldenhuys, Jaco, editor
- Published
- 2015
- Full Text
- View/download PDF
12. CollageParsing: Nonparametric Scene Parsing by Adaptive Overlapping Windows
- Author
-
Tung, Frederick, Little, James J., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Fleet, David, editor, Pajdla, Tomas, editor, Schiele, Bernt, editor, and Tuytelaars, Tinne, editor
- Published
- 2014
- Full Text
- View/download PDF
13. Organ Localization Using Joint AP/LAT View Landmark Consensus Detection and Hierarchical Active Appearance Models
- Author
-
Song, Qi, Montillo, Albert, Bhagalia, Roshni, Srikrishnan, V., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Menze, Bjoern, editor, Langs, Georg, editor, Montillo, Albert, editor, Kelm, Michael, editor, Müller, Henning, editor, and Tu, Zhuowen, editor
- Published
- 2014
- Full Text
- View/download PDF
14. Multi-hypothesis contextual modeling for semantic segmentation.
- Author
-
Ates, Hasan F. and Sunetci, Sercan
- Subjects
- *
SEMANTICS , *IMAGE segmentation , *MARKOV random fields , *ACCURACY , *HYPOTHESIS - Abstract
Highlights • We explore contextual models for fusion of alternative segmentations of the image. • Contextual constraints are defined on intersecting superpixels from multiple segmentations. • This multi-hypothesis MRF model improves the labeling accuracy of tested methods. • When used after FCN and PSP segmentations, the model achieves state-of-the-art results. Abstract Semantic segmentation (i.e. image parsing) aims to annotate each image pixel with its corresponding semantic class label. Spatially consistent labeling of the image requires an accurate description and modeling of the local contextual information. Segmentation result is typically improved by Markov Random Field (MRF) optimization on the initial labels. However this improvement is limited by the accuracy of initial result and how the contextual neighborhood is defined. In this paper, we develop generalized and flexible contextual models for segmentation neighborhoods in order to improve parsing accuracy. Instead of using a fixed segmentation and neighborhood definition, we explore various contextual models for fusion of complementary information available in alternative segmentations of the same image. In other words, we propose a novel MRF framework that describes and optimizes the contextual dependencies between multiple segmentations. Simulation results on two common datasets demonstrate significant improvement in parsing accuracy over the baseline approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. Improving Accuracy for Image Parsing Using Spatial Context and Mutual Information
- Author
-
Vu, Thi Ly, Choi, Sun-Wook, Lee, Chong Ho, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Lee, Minho, editor, Hirose, Akira, editor, Hou, Zeng-Guang, editor, and Kil, Rhee Man, editor
- Published
- 2013
- Full Text
- View/download PDF
16. Occluded and Low Resolution Face Detection with Hierarchical Deformable Model
- Author
-
Yang, Xiong, Peng, Gang, Cai, Zhaoquan, Zeng, Kehan, Yeo, Sang-Soo, editor, Pan, Yi, editor, Lee, Yang Sun, editor, and Chang, Hang Bae, editor
- Published
- 2012
- Full Text
- View/download PDF
17. Static Parsing Steganography
- Author
-
Farhat, Hikmat, Challita, Khalil, Zalaket, Joseph, Cherifi, Hocine, editor, Zain, Jasni Mohamad, editor, and El-Qawasmeh, Eyas, editor
- Published
- 2011
- Full Text
- View/download PDF
18. Learning deep representations for semantic image parsing: a comprehensive overview.
- Author
-
Huang, Lili, Peng, Jiefeng, Zhang, Ruimao, Li, Guanbin, and Lin, Liang
- Abstract
Semantic image parsing, which refers to the process of decomposing images into semantic regions and constructing the structure representation of the input, has recently aroused widespread interest in the field of computer vision. The recent application of deep representation learning has driven this field into a new stage of development. In this paper, we summarize three aspects of the progress of research on semantic image parsing, i.e., category-level semantic segmentation, instance-level semantic segmentation, and beyond segmentation. Specifically, we first review the general frameworks for each task and introduce the relevant variants. The advantages and limitations of each method are also discussed. Moreover, we present a comprehensive comparison of different benchmark datasets and evaluation metrics. Finally, we explore the future trends and challenges of semantic image parsing. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. Learning Affinity to Parse Images
- Author
-
Liu, Sifei
- Subjects
Computer science ,Engineering ,affinity learning ,computer vision ,image parsing - Abstract
Recent years have witnessed the success of deep learning models such as convolutional neural networks (ConvNets) for numerous vision tasks. However, ConvNets have a significant limitation: they do not have effective internal structures to explicitly learn image pairwise relations. This yields two fundamental bottlenecks for many vision problems of label and map regression, as well as image reconstruction: (a) pixels of an image have large amount of redundancies but cannot be efficiently utilized by ConvNets, which predict each of them independently, and (b) the convolutional operation cannot effectively solve problems that rely on similarities of pixel pairs, e.g., image pixel propagation and shape/mask refinement.This thesis focuses on how to learn pairwise relations of image pixels under jointly, end-to-end learnable neural networks. Specifically, this is achieved by two different approaches: (a) formulating the conditional random field (CRF) objective as a non-structured objective that can be implemented via ConvNets as an additional loss, and (b) developing spatial propagation based deep-learning-friendly structures that learn the pairwise relations in an explicit manner.In the first approach, we develop a novel multi-objective learning method that optimizes a single unified deep convolutional network with two distinct non-structured loss functions: one encoding the unary label likelihoods and the other encoding the pairwise label dependencies. We propose to apply this framework on face parsing, while experiments on both LFW and Helen datasets demonstrate the additional pairwise loss significantly improves the labeling performance compared to a single loss ConvNet with the same architecture.In the second approach, we explore how to learn pairwise relations using spatial propagation networks, instead of using additional loss functions. Unlike ConvNets, the propagation module is a spatially recurrent network with a linear transformation between adjacent rows and columns. We propose two typical structures: a one-way connection using one-dimensional propagation, and a three-way connection using two-dimensional propagation. For both models, the linear weights are spatially variant output maps that can be learned from any ConvNet. Since such modules are fully differentiable, they are flexible enough to be inserted into any type of neural network. We prove that while both structures can formulate global affinities, the one-way connection constructs a sparse matrix, and the three-way forms a much denser one. While both structures demonstrate their effectiveness over a wide range of vision problems, the three-way connection is more powerful with challenging tasks (e.g., general object segmentation). We show that a well-learned affinity can benefit numerous computer vision applications, including but not limited to image filtering and denoising, pixel/color interpolation, face parsing, as well as general semantic segmentation. Compared to graphical model base pairwise learning, the spatial propagation network can be a good alternative in deep-learning based frameworks.
- Published
- 2017
20. SCUT-AutoALP: A Diverse Benchmark Dataset for Automatic Architectural Layout Parsing
- Author
-
Jianyong Chen, Qiaoming Deng, Lingyu Liang, Yubo Liu, and Yangting Lai
- Subjects
Parsing ,Computer science ,business.industry ,Floor plan ,computer.software_genre ,Artificial Intelligence ,Hardware and Architecture ,Image parsing ,Benchmark (computing) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Software ,Natural language processing - Published
- 2020
21. A discriminative graph inferring framework towards weakly supervised image parsing.
- Author
-
Yu, Lei, Bao, Bing-Kun, and Xu, Changsheng
- Subjects
- *
GRAPH algorithms , *PARSING (Computer grammar) , *DIGITAL images , *BIG data , *STATISTICAL correlation , *MATHEMATICAL models - Abstract
In this paper, we focus on the task of assigning labels to the over-segmented image patches in a weakly supervised manner, in which the training images contain the labels but do not have the labels' locations in the images. We propose a unified discriminative graph inferring framework by simultaneously inferring patch labels and learning the patch appearance models. On one hand, graph inferring reasons the patch labels by a graph propagation procedure. The graph is constructed by connecting the nearest neighbors which share the same image label, and multiple correlations among patches and image labels are imposed as constraints to the inferring. On the other hand, for each label, the patches which do not contain the target label are adopted as negative samples to learn the appearance model. In this way, the predicted labels will be more accurate in the propagation. Graph inferring and the learned patch appearance models are finally embedded to complement each other in one unified formulation. Experiments on three public datasets demonstrate the effectiveness of our method in comparison with other baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
22. Progressive decomposition: a method of coarse-to-fine image parsing using stacked networks
- Author
-
Zhengxing Sun, Jinlong Shi, Yunhan Sun, and Jiagao Hu
- Subjects
Ground truth ,Parsing ,Artificial neural network ,Computer Networks and Communications ,Computer science ,business.industry ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,computer.software_genre ,Coarse to fine ,Hardware and Architecture ,Image parsing ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Segmentation ,Artificial intelligence ,business ,computer ,Software - Abstract
To parse images into fine-grained semantic parts, the complex elements will put it in trouble when using off-the-shelf semantic segmentation networks, because it is difficult for them to utilize the contextual information of fine-grained parts. In this paper we propose a progressive decomposition method to parse images in a coarse-to-fine manner with refined semantic classes. It consists of two aspects: stacked networks and progressive supervisions. The stacked network is achieved by stacking some segmentation layers in a segmentation network. The former segmentation module parses images at a coarser-grained level, and the result will be fed to the following one to provide effective contextual clues for the finer-grained parsing. The skip connections from shallow layers of the network to fine-grained parsing modules are also added to recover the details of small structures. For the training of the stacked networks which have coarse-to-fine outputs, a strategy of progressive supervision is proposed to merge classes in ground truth to get coarse-to-fine label maps, and then train the stacked network end-to-end with the hierarchical supervisions. The proposed framework can be injected into many advanced neural networks to improve the parsing results. Extensive evaluations on several public datasets including face parsing and human parsing well demonstrate the superiority of our method.
- Published
- 2020
23. A crowding multi-objective genetic algorithm for image parsing.
- Author
-
John Joseph, Ferdin and Auwatanamongkol, Surapong
- Subjects
- *
GENETIC algorithms , *IMAGE analysis , *PARSING (Computer grammar) , *PIXELS , *SUPPORT vector machines - Abstract
Image parsing is a process of understanding the contents of an image. The process normally involves labeling pixels or superpixels of a given image with classes of objects that may exist in the image. The accuracy of such labeling for the existing methodologies still needs to be improved. The parsing method needs to be able to identify multiple instances of objects of different classes and sizes. In our previous work, a novel feature representation for an instance of objects in an image was proposed for object recognition and image parsing. The feature representation consists of the histogram vector of 2 g of visual word ids of the two successive clockwise neighbors of any superpixels in the object instance and the shape vector of the instance. Using the feature representation, the instance can be classified with very high accuracy by the per class support vector machines (SVMs). A multi-objective genetic algorithm is also proposed to find a subset of image segments that would best constitute an instance for a class of objects, i.e., maximizing the SVM classification score and the size of the instance. However, the genetic algorithm can only identify a single instance for each class of objects, despite the fact that many instances of the same class may exist. In this paper, a crowding genetic algorithm is used instead to search for multiple optimal solutions and help alleviate this deficiency. The experimental results show that this crowding genetic algorithm performs better than the previously proposed method as well as the existing methodologies, in terms of class-wise and pixel-wise accuracy. The qualitative results also clearly show that this method can effectively identify multiple object instances existing in a given image. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
24. Exploiting Large Image Sets for Road Scene Parsing.
- Author
-
Alvarez, Jose M., Salzmann, Mathieu, and Barnes, Nick
- Abstract
There is an increasing interest in exploiting multiple images for scene understanding, with great progress in areas such as cosegmentation and video segmentation. Jointly analyzing the images in a large set offers the opportunity to exploit a greater source of information than when considering a single image on its own. However, this also yields challenges since, to effectively exploit all the available information, the resulting methods need to consider not just local connections, but efficiently analyze similarity between all pairs of pixels within and across all the images. In this paper, we propose to model an image set as a fully connected pairwise Conditional Random Field (CRF) defined over the image pixels, or superpixels, with Gaussian edge potentials. We show that this lets us co-label the images of a large set efficiently, thus yielding increased accuracy at no additional computational cost compared to sequential labeling of the images. Furthermore, we extend our framework to incorporate temporal dependence, thus effectively encompassing video segmentation as a special case of our approach, as well as to modeling label dependence over larger image regions. Our experimental evaluation demonstrates that our framework lets us handle over 10 000 images in a matter of seconds. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
25. Marginal Space Deep Learning: Efficient Architecture for Volumetric Image Parsing.
- Author
-
Ghesu, Florin C., Krubasik, Edward, Georgescu, Bogdan, Singh, Vivek, Zheng, Yefeng, Hornegger, Joachim, and Comaniciu, Dorin
- Subjects
- *
DEEP learning , *VOLUME (Cubic content) , *IMAGE segmentation , *PARSING (Computer grammar) , *FOLLOW-up studies (Medicine) , *IMAGE representation - Abstract
Robust and fast solutions for anatomical object detection and segmentation support the entire clinical workflow from diagnosis, patient stratification, therapy planning, intervention and follow-up. Current state-of-the-art techniques for parsing volumetric medical image data are typically based on machine learning methods that exploit large annotated image databases. Two main challenges need to be addressed, these are the efficiency in scanning high-dimensional parametric spaces and the need for representative image features which require significant efforts of manual engineering. We propose a pipeline for object detection and segmentation in the context of volumetric image parsing, solving a two-step learning problem: anatomical pose estimation and boundary delineation. For this task we introduce Marginal Space Deep Learning (MSDL), a novel framework exploiting both the strengths of efficient object parametrization in hierarchical marginal spaces and the automated feature design of Deep Learning (DL) network architectures. In the 3D context, the application of deep learning systems is limited by the very high complexity of the parametrization. More specifically 9 parameters are necessary to describe a restricted affine transformation in 3D, resulting in a prohibitive amount of billions of scanning hypotheses. The mechanism of marginal space learning provides excellent run-time performance by learning classifiers in clustered, high-probability regions in spaces of gradually increasing dimensionality. To further increase computational efficiency and robustness, in our system we learn sparse adaptive data sampling patterns that automatically capture the structure of the input. Given the object localization, we propose a DL-based active shape model to estimate the non-rigid object boundary. Experimental results are presented on the aortic valve in ultrasound using an extensive dataset of 2891 volumes from 869 patients, showing significant improvements of up to 45.2% over the state-of-the-art. To our knowledge, this is the first successful demonstration of the DL potential to detection and segmentation in full 3D data with parametrized representations. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
26. Understanding image concepts using ISTOP model.
- Author
-
Zarchi, M.S., Tan, R.T., van Gemeren, C., Monadjemi, A., and Veltkamp, R.C.
- Subjects
- *
IMAGE analysis , *REPRESENTATIONS of graphs , *GRAPH theory , *MATHEMATICAL decomposition , *PATTERN recognition systems , *SUPERVISED learning - Abstract
This paper focuses on recognizing image concepts by introducing the ISTOP model. The model parses the images from scene to object׳s parts by using a context sensitive grammar. Since there is a gap between the scene and object levels, this grammar proposes the “Visual Term” level to bridge the gap. Visual term is a higher concept level than the object level representing a few co-occurring objects. The grammar used in the model can be embodied in an And–Or graph representation. The hierarchical structure of the graph decomposes an image from the scene level into the visual term, object level and part level by terminal and non-terminal nodes, while the horizontal links in the graph impose the context and constraints between the nodes. In order to learn the grammar constraints and their weights, we propose an algorithm that can perform on weakly annotated datasets. This algorithm searches in the dataset to find visual terms without supervision and then learns the weights of the constraints using a latent SVM. The experimental results on the Pascal VOC dataset show that our model outperforms the state-of-the-art approaches in recognizing image concepts. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
27. Enhanced Reweighted MRFs for Efficient Fashion Image Parsing.
- Author
-
QIONG WU and BOULANGER, PIERRE
- Subjects
MARKOV random fields ,PARSING (Computer grammar) ,CONDITIONAL probability ,PHOTOGRAPHS ,CLOTHING & dress - Abstract
Previous image parsing methods usually model the problem in a conditional random field which describes a statistical model learned from a training dataset and then processes a query image using the conditional probability. However, for clothing images, fashion items have a large variety of layering and configuration, and it is hard to learn a certain statistical model of features that apply to general cases. In this article, we take fashion images as an example to show how Markov Random Fields (MRFs) can outperform Conditional Random Fields when the application does not follow a certain statistical model learned from the training data set. We propose a new method for automatically parsing fashion images in high processing efficiency with significantly less training time by applying a modification of MRFs, named reweighted MRF (RW-MRF), which resolves the problem of over smoothing infrequent labels. We further enhance RW-MRF with occlusion prior and background prior to resolve two other common problems in clothing parsing, occlusion, and background spill. Our experimental results indicate that our proposed clothing parsing method significantly improves processing time and training time over state-of-the-art methods, while ensuring comparable parsing accuracy and improving label recall rate. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
28. Scene parsing by nonparametric label transfer of content-adaptive windows.
- Author
-
Tung, Frederick and Little, James J.
- Subjects
PARSING (Computer grammar) ,NONPARAMETRIC estimation ,ADAPTIVE computing systems ,WINDOWS (Graphical user interfaces) ,IMAGE analysis ,PIXELS - Published
- 2016
- Full Text
- View/download PDF
29. Tracking Pedestrian with Multi-Component Online Deformable Part-Based Model.
- Author
-
ZHAO LIU, YI XIE, MINGTAO PEI, and YUNDE JIA
- Subjects
TRACKING control systems ,ONLINE algorithms ,POTENTIAL distribution ,ROBUST control ,SUPPORT vector machines ,REAL-time control - Abstract
In this paper, we present a novel online algorithm to track single pedestrian by integrating the bottom-up and top-down models. Motivated by the observation that the appearance of a pedestrian varies a lot in different perspectives or poses, the bottom-up model incorporates multiple components to represent distinct groups of the pedestrian appearances. Each component uses an online deformable part-based model (OLDPM) with one root and several shared parts to represent the flexible structure and salient local patterns of one particular appearance. The top-down model extends the bottom-up model by introducing newly created OLDPMs for uncovered new appearances. To achieve long term tracking, our paper incorporates the following methods; (i) Through an incremental support vector machine (INCSVM) associated with each component, the OLDPM can effectively adapt to the pedestrian appearance variations; (ii) OLDPM can efficiently generate match penalty maps through robust real-time pattern matching algorithm, and can search over all possible configurations in linear time by distance transforms algorithm; (iii) Parts can be shared among components to reduce the computational complexity for matching; (iv) To handle the hard negatives, the potential distracting targets are located explicitly to prevent drifting. We compare our method with four cutting edge tracking algorithms over eight visual sequences and provide quantitative and qualitative performance comparisons. [ABSTRACT FROM AUTHOR]
- Published
- 2016
30. Analysis by Synthesis: 3D Image Parsing Using Spatial Grammar and Markov Chain Monte Carlo
- Author
-
Qi, Siyuan
- Subjects
Computer science ,Statistics ,3D reconstruction ,computer vision ,image parsing ,indoor scene ,Markov Chain Monte Carlo ,spatial grammar - Abstract
Scene understanding is a fundamental problem in computer vision research. Weaddress this problem in an “analysis by synthesis” fashion - explain observed data(an 2D image) according to a set of spatial grammar (describes the underlyingfunctional arrangement and 3D geometric structure of a scene) that generate it.The inference process is carried out in a Bayesian framework. The posteriorprobability includes a prior probability reflecting the knowledge of indoor 3D scenestructure encoded by grammar, and a likelihood that evaluates the accuracy of there-projected image and the physical plausibility. The most reasonable explanationof the image is given by a parse tree that maximizes the posterior probability, andit is found by reversible-jump Markov Chain Monte Carlo sampling.
- Published
- 2015
31. Autonomous UAV for Suspicious Action Detection using Pictorial Human Pose Estimation and Classification
- Author
-
Surya Penmetsa, Fatima Minhuj, Amarjot Singh, and SN Omkar
- Subjects
Unmanned Aerial Vehicle (UAV) ,Pose Estimation and classification ,Pictorial structures ,Image parsing ,Human detection ,Hough Transform ,Computer engineering. Computer hardware ,TK7885-7895 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Visual autonomous systems capable of monitoring crowded areas and alerting the authorities in occurrence of a suspicious action can play a vital role in controlling crime rate. Previous atte mpts have been made to monitor crime using posture recognition but nothing exclusive to investigating actions of people in large populated area has been cited. In order resolve this shortcoming, we propose an autonomous unmanned aerial vehicle (UAV) visual surveillance system that locates humans in image frames followed by pose estimation using weak constraints on position, appearance of body parts and image parsing. The estimated pose, represented as a pictorial structure, is flagged using the proposed Hough Orientation Calculator (HOC) on close resemblance with any pose in the suspicious action dataset. The robustness of the system is demonstrated on videos recorded using a UAV with no prior knowledge of background, lighting or location and scale of the human in the image. The system produces an accuracy of 71% and can also be applied on various other video sources such as CCTV camera.
- Published
- 2014
- Full Text
- View/download PDF
32. Adaptive Nonparametric Image Parsing.
- Author
-
Nguyen, Tam V., Lu, Canyi, Sepulveda, Jose, and Yan, Shuicheng
- Subjects
- *
IMAGE retrieval , *PIXELS , *HISTOGRAMS , *DETECTORS , *SCALE invariance (Statistical physics) - Abstract
In this paper, we present an adaptive nonparametric solution to the image parsing task, namely, annotating each image pixel with its corresponding category label. For a given test image, first, a locality-aware retrieval set is extracted from the training data based on superpixel matching similarities, which are augmented with feature extraction for better differentiation of local superpixels. Then, the category of each superpixel is initialized by the majority vote of the $k$ -nearest-neighbor superpixels in the retrieval set. Instead of fixing $k$ as in traditional nonparametric approaches, here, we propose a novel adaptive nonparametric approach that determines the sample-specific $k$ for each test image. In particular, $k$ is adaptively set to be the number of the fewest nearest superpixels that the images in the retrieval set can use to get the best category prediction. Finally, the initial superpixel labels are further refined by contextual smoothing. Extensive experiments on challenging data sets demonstrate the superiority of the new solution over other state-of-the-art nonparametric solutions. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
33. Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors.
- Author
-
Tighe, Joseph, Niethammer, Marc, and Lazebnik, Svetlana
- Subjects
- *
IMAGE segmentation , *SEMANTICS , *PIXELS , *QUADRATIC programming , *IMAGE analysis - Abstract
This paper describes a system for interpreting a scene by assigning a semantic label at every pixel and inferring the spatial extent of individual object instances together with their occlusion relationships. First we present a method for labeling each pixel aimed at achieving broad coverage across hundreds of object categories, many of them sparsely sampled. This method combines region-level features with per-exemplar sliding window detectors. Unlike traditional bounding box detectors, per-exemplar detectors perform well on classes with little training data and high intra-class variation, and they allow object masks to be transferred into the test image for pixel-level segmentation. Next, we use per-exemplar detections to generate a set of candidate object masks for a given test image. We then select a subset of objects that explain the image well and have valid overlap relationships and occlusion ordering. This is done by minimizing an integer quadratic program either using a greedy method or a standard solver. We alternate between using the object predictions to refine the pixel labels and using the pixel labels to improve the object predictions. The proposed system obtains promising results on two challenging subsets of the LabelMe dataset, the largest of which contains 45,676 images and 232 classes. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
34. Labeling Complete Surfaces in Scene Understanding.
- Author
-
Guo, Ruiqi and Hoiem, Derek
- Subjects
- *
IMAGE analysis , *SEMANTICS , *PIXELS , *COMPUTER vision , *IMAGE segmentation , *DIGITAL image processing - Abstract
Scene understanding requires reasoning about both what we can see and what is occluded. We offer a simple and general approach to infer labels of occluded background regions. Our approach incorporates estimates of visible surrounding background, detected objects, and shape priors from transferred training regions. We demonstrate the ability to infer the labels of occluded background regions in three datasets: the outdoor StreetScenes dataset, IndoorScene dataset and SUN09 dataset, all using the same approach. Furthermore, the proposed approach is extended to 3D space to find layered support surfaces in RGB-Depth scenes. Our experiments and analysis show that our method outperforms competent baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
35. Image parsing by loopy dynamic programming.
- Author
-
Zhang, Shizhou, Wang, Jinjun, Gong, Yihong, Zhang, Shun, Zhang, Xinzi, and Lan, Xuguang
- Subjects
- *
IMAGE processing , *LOOPS (Group theory) , *DYNAMIC programming , *SEMANTICS , *COMPUTER vision , *ALGORITHMS - Abstract
The image parsing process gives labels to image regions, as well as information including shape, semantics and context. Although it is one of the most important features in the human visual system, automatic image parsing using computer vision techniques remains difficult due to computational issue. In this paper we introduce a novel method to address this limitation. Our system models an image as a set of regions and uses a novel hypotheses generation algorithm to get possible image parsing solutions for final re-scoring. Our proposed hypotheses generation algorithm, called Loopy Dynamic Programming (LDP), handles large search space efficiently and gives good parsing hypotheses for testing. With such capacity, we are able to apply more precise and complex image models to achieve better performance. In addition, our system can perform image segmentation, detection and classification simultaneously. Experimental results using Pascal VOC 2007 dataset show that the proposed technique achieves very promising performance in all the three tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
36. Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer
- Author
-
Yiming Gao, Ke Gong, Liang Lin, Meng Wang, and Xiaodan Liang
- Subjects
FOS: Computer and information sciences ,Parsing ,Computer science ,business.industry ,Applied Mathematics ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,computer.software_genre ,Graph ,Semantics ,Computational Theory and Mathematics ,Artificial Intelligence ,Iterated function ,Image parsing ,Humans ,Graph (abstract data type) ,Segmentation ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Transfer of learning ,computer ,Algorithms ,Software ,Natural language processing - Abstract
Prior highly-tuned image parsing models are usually studied in a certain domain with a specific set of semantic labels and can hardly be adapted into other scenarios (e.g., sharing discrepant label granularity) without extensive re-training. Learning a single universal parsing model by unifying label annotations from different domains or at various levels of granularity is a crucial but rarely addressed topic. This poses many fundamental learning challenges, e.g., discovering underlying semantic structures among different label granularity or mining label correlation across relevant tasks. To address these challenges, we propose a graph reasoning and transfer learning framework, named "Graphonomy", which incorporates human knowledge and label taxonomy into the intermediate graph representation learning beyond local convolutions. In particular, Graphonomy learns the global and structured semantic coherency in multiple domains via semantic-aware graph reasoning and transfer, enforcing the mutual benefits of the parsing across domains (e.g., different datasets or co-related tasks). The Graphonomy includes two iterated modules: Intra-Graph Reasoning and Inter-Graph Transfer modules. The former extracts the semantic graph in each domain to improve the feature representation learning by propagating information with the graph; the latter exploits the dependencies among the graphs from different domains for bidirectional knowledge transfer. We apply Graphonomy to two relevant but different image understanding research topics: human parsing and panoptic segmentation, and show Graphonomy can handle both of them well via a standard pipeline against current state-of-the-art approaches. Moreover, some extra benefit of our framework is demonstrated, e.g., generating the human parsing at various levels of granularity by unifying annotations across different datasets., Comment: To appear in IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI) 2021. We propose a graph reasoning and transfer learning framework, which incorporates human knowledge and label taxonomy into the intermediate graph representation learning beyond local convolutions. arXiv admin note: substantial text overlap with arXiv:1904.04536
- Published
- 2021
- Full Text
- View/download PDF
37. ImageSpirit.
- Author
-
Cheng, Ming-Ming, Zheng, Shuai, Lin, Wen-Yan, Vineet, Vibhav, Sturgess, Paul, Crook, Nigel, Mitra, Niloy J., and Torr, Philip
- Subjects
ALGORITHMS ,ALGORITHMIC randomness ,MACHINE translating ,MATHEMATICAL programming ,MACHINE theory - Abstract
Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixels. In this article we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interest enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g., smartphones, Google Glass, livingroom devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the trade-offs compared to traditional mouse-based interactions, results are reported for both a large-scale quantitative evaluation and a user study. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
38. VRT-Net: Real-Time Scene Parsing via Variable Resolution Transform
- Author
-
Gaurav Singh Rajput, R. Venkatesh Babu, and Jogendra Nath Kundu
- Subjects
Parsing ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Magnification ,02 engineering and technology ,computer.software_genre ,01 natural sciences ,Variable resolution ,medicine.anatomical_structure ,Foveal ,0103 physical sciences ,Image parsing ,Fixation (visual) ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,020201 artificial intelligence & image processing ,Segmentation ,Computer vision ,Human eye ,Artificial intelligence ,010306 general physics ,business ,computer - Abstract
Urban scene parsing is a basic requirement for various autonomous navigation systems especially in self-driving. Most of the available approaches employ generic image parsing architectures designed for segmentation of object focused scene captured in indoor setups. However, images captured in car-mounted cameras exhibit an extreme effect of perspective geometry, causing a significant scale disparity between near and farther objects. Recognizing this, we formalize a unique Variable Resolution Transform (VRT) technique motivated from the foveal magnification in human eye. Following this, we design a Fovea Estimation Network (FEN) which is trained to estimate a single most convenient fixation location along with the associated magnification factor, best suited for a given input image. The proposed framework is designed to enable its usage as a wrapper over the available real-time scene parsing models, thereby demonstrating a superior trade-off between speed and quality as compared to the prior state-of-the-arts.
- Published
- 2020
39. Depth embedded recurrent predictive parsing network for video scenes
- Author
-
Lingli Zhou, Ling Shao, Haofeng Zhang, Yang Long, and Jingyu Yang
- Subjects
050210 logistics & transportation ,Parsing ,Artificial neural network ,Computer science ,business.industry ,Mechanical Engineering ,Visual texture recognition ,05 social sciences ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,computer.software_genre ,Computer Science Applications ,Image (mathematics) ,Task (project management) ,0502 economics and business ,Automotive Engineering ,Image parsing ,Computer vision ,Segmentation ,Artificial intelligence ,business ,computer - Abstract
Semantic segmentation-based scene parsing plays an important role in automatic driving and autonomous navigation. However, most of the previous models only consider static images, and fail to parse sequential images because they do not take the spatial-temporal continuity between consecutive frames in a video into account. In this paper, we propose a depth embedded recurrent predictive parsing network (RPPNet), which analyzes preceding consecutive stereo pairs for parsing result. In this way, RPPNet effectively learns the dynamic information from historical stereo pairs, so as to correctly predict the representations of the next frame. The other contribution of this paper is to systematically study the video scene parsing (VSP) task, in which we use the RPPNet to facilitate conventional image paring features by adding spatial-temporal information. The experimental results show that our proposed method RPPNet can achieve fine predictive parsing results on cityscapes and the predictive features of RPPNet can significantly improve conventional image parsing networks in VSP task.
- Published
- 2019
40. Learning adaptive receptive fields for deep image parsing networks
- Author
-
Si Liu, Yao Sun, Zhen Wei, and Junyu Lin
- Subjects
Computer science ,receptive field ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,lcsh:QA75.5-76.95 ,Computer graphics ,Artificial Intelligence ,Image parsing ,0202 electrical engineering, electronic engineering, information engineering ,Layer (object-oriented design) ,0105 earth and related environmental sciences ,Parsing ,business.industry ,Pattern recognition ,Computer Graphics and Computer-Aided Design ,semantic segmentation ,Feature (computer vision) ,Receptive field ,Face (geometry) ,data-driven ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Affine transformation ,Artificial intelligence ,lcsh:Electronic computers. Computer science ,business ,face parsing ,computer - Abstract
In this paper, we introduce a novel approach to automatically regulate receptive fields in deep image parsing networks. Unlike previous work which placed much importance on obtaining better receptive fields using manually selected dilated convolutional kernels, our approach uses two affine transformation layers in the network’s backbone and operates on feature maps. Feature maps are inflated or shrunk by the new layer, thereby changing the receptive fields in the following layers. By use of end-to-end training, the whole framework is data-driven, without laborious manual intervention. The proposed method is generic across datasets and different tasks. We have conducted extensive experiments on both general image parsing tasks, and face parsing tasks as concrete examples, to demonstrate the method’s superior ability to regulate over manual designs.
- Published
- 2018
41. Superparsing.
- Author
-
Tighe, Joseph and Lazebnik, Svetlana
- Subjects
- *
NONPARAMETRIC statistics , *MARKOV random fields , *PIXELS , *IMAGE processing , *SEMANTIC computing - Abstract
This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach is based on lazy learning, and it can easily scale to datasets with tens of thousands of images and hundreds of labels. Given a test image, it first performs global scene-level matching against the training set, followed by superpixel-level matching and efficient Markov random field (MRF) optimization for incorporating neighborhood context. Our MRF setup can also compute a simultaneous labeling of image regions into semantic classes (e.g., tree, building, car) and geometric classes (sky, vertical, ground). Our system outperforms the state-of-the-art nonparametric method based on SIFT Flow on a dataset of 2,688 images and 33 labels. In addition, we report per-pixel rates on a larger dataset of 45,676 images and 232 labels. To our knowledge, this is the first complete evaluation of image parsing on a dataset of this size, and it establishes a new benchmark for the problem. Finally, we present an extension of our method to video sequences and report results on a video dataset with frames densely labeled at 1 Hz. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
42. Incremental grouping of image elements in vision.
- Author
-
Roelfsema, Pieter and Houtkamp, Roos
- Subjects
- *
PSYCHOLOGY , *NEUROPHYSIOLOGY , *PHYSIOLOGY , *NEUROLOGY , *VISUAL cortex - Abstract
One important task for the visual system is to group image elements that belong to an object and to segregate them from other objects and the background. We here present an incremental grouping theory (IGT) that addresses the role of object-based attention in perceptual grouping at a psychological level and, at the same time, outlines the mechanisms for grouping at the neurophysiological level. The IGT proposes that there are two processes for perceptual grouping. The first process is base grouping and relies on neurons that are tuned to feature conjunctions. Base grouping is fast and occurs in parallel across the visual scene, but not all possible feature conjunctions can be coded as base groupings. If there are no neurons tuned to the relevant feature conjunctions, a second process called incremental grouping comes into play. Incremental grouping is a time-consuming and capacity-limited process that requires the gradual spread of enhanced neuronal activity across the representation of an object in the visual cortex. The spread of enhanced neuronal activity corresponds to the labeling of image elements with object-based attention. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
43. Recursive Compositional Models for Vision: Description and Review of Recent Work.
- Author
-
Zhu, Long, Chen, Yuanhao, and Yuille, Alan
- Abstract
This paper describes and reviews a class of hierarchical probabilistic models of images and objects. Visual structures are represented in a hierarchical form where complex structures are composed of more elementary structures following a design principle of recursive composition. Probabilities are defined over these structures which exploit properties of the hierarchy-e.g. long range spatial relationships can be represented by local potentials at the upper levels of the hierarchy. The compositional nature of this representation enables efficient learning and inference algorithms. In particular, parts can be shared between different object models. Overall the architecture of Recursive Compositional Models (RCMs) provides a balance between statistical and computational complexity. The goal of this paper is to describe the basic ideas and common themes of RCMs, to illustrate their success on a range of vision tasks, and to gives pointers to the literature. In particular, we show that RCMs generally give state of the art results when applied to a range of different vision tasks and evaluated on the leading benchmarked datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
44. I2T: Image Parsing to Text Description.
- Author
-
YAO, BENJAMIN Z., YANG, XIONG, LIN, LIANG, LEE, MUN WAI, and ZHU, SONG-CHUN
- Subjects
PARSING (Computer grammar) ,DIGITAL image processing ,DIGITAL images ,INTERNET ,SEMANTIC computing - Abstract
In this paper, we present an image parsing to text description (I2T) framework that generates text descriptions of image and video content based on image understanding. The proposed I2T framework follows three steps: 1) input images (or video frames) are decomposed into their constituent visual patterns by an image parsing engine, in a spirit similar to parsing sentences in natural language; 2) the image parsing results are converted into semantic representation in the form of Web ontology language (OWL), which enables seamless integration with general knowledge bases; and 3) a text generation engine converts the results from previous steps into semantically meaningful, human readable, and query-able text reports. The centerpiece of the I2T framework is an and--or graph (AoG) visual knowledge representation, which provides a graphical representation serving as prior knowledge for representing diverse visual patterns and provides top--down hypotheses during the image parsing. The AoG embodies vocabularies of visual elements including primitives, parts, objects, scenes as well as a stochastic image grammar that specifies syntactic relations (i.e., compositional) and semantic relations (e.g., categorical, spatial, temporal, and functional) between these visual elements. Therefore, the AoG is a unified model of both categorical and symbolic representations of visual knowledge. The proposed I2T framework has two objectives. First, we use semiautomatic method to parse images from the Internet in order to build an AoG for visual knowledge representation. Our goal is to make the parsing process more and more automatic using the learned AoG model. Second, we use automatic methods to parse image/ video in specific domains and generate text reports that are useful for real-world applications. In the case studies at the end of this paper, we demonstrate two automatic I2T systems: a maritime and urban scene video surveillance system and a realtime automatic driving scene understanding system. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
45. From Image Parsing to Painterly Rendering.
- Author
-
KUN ZENG, MINGTIAN ZHAO, CAIMING XIONG, and SONG-CHUN ZHU
- Subjects
PARSING (Computer grammar) ,RENDERING (Computer graphics) ,COMPUTER graphics ,DIGITAL image processing ,COMPUTER drawing ,ARTIFICIAL intelligence - Abstract
We present a semantics-driven approach for stroke-based painterly rendering, based on recent image parsing techniques [Tu el al. 2005: Tu and Zhu 2006] in computer vision. Image parsing integrates segmentation for regions, sketching for curves, and recognition for object categories. In an interactive manner, we decompose an input image into a hierarchy of its constituent components in a parse tree representation with occlusion relations among the nodes in the tree. To paint the image, we build a brush dictionary containing a large set (760) of brush examples of four shape/appearance categories, which are collected from professional artists, then we select appropriate brushes from the dictionary and place them on the canvas guided by the image semantics included in the parse tree, with each image component and layer painted in various styles. During this process, the scene and object categories also determine the color blending and shading strategies for inhomogeneous synthesis of image details. Compared with previous methods, this approach benefits from richer meaningful image semantic information, which leads to better simulation of painting techniques of artists using the high-quality brush dictionary. We have tested our approach on a large number (hundreds) of images and it produced satisfactory painterly effects. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
46. Bottom-Up/Top-Down Image Parsing with Attribute Grammar.
- Author
-
Feng Han and Song-Chun Zhu
- Subjects
- *
PATTERN perception , *PATTERN recognition systems , *GRAMMAR , *PARSING (Grammar) , *GENERATIVE grammar , *COMPUTER programming - Abstract
This paper presents a simple attribute graph grammar as a generative representation for man-made scenes such as buildings, hallways, kitchens, and living rooms and studies an effective top-down/bottom-up inference algorithm for parsing images in the process of maximizing a Bayesian posterior probability or equivalently minimizing a description length (MDL). This simple grammar has one class of primitives as its terminal nodes, i.e., the projection of planar rectangles in 3-space into the image plane, and six production rules for the spatial layout of the rectangular surfaces. All of the terminal and nonterminal nodes in the grammar are described by attributes for their geometric properties and image appearance. Each production rule is associated with some equations that constrain the attributes of a parent node and those of its children. Given an input image, the inference algorithm computes (or constructs) a parse graph, which includes a parse tree for the hierarchical decomposition and a number of spatial constraints. In the inference algorithm, the bottom-up step detects an excessive number of rectangles as weighted candidates, which are sorted in a certain order and activate top-down predictions of occluded or missing components through the grammar rules. The whole procedure is, in spirit, similar to the data-driven Markov chain Monte Carlo paradigm [39], [33], except that a greedy algorithm is adopted for simplicity. In the experiment, we show that the grammar and top-down inference can largely improve the performance of bottom-up detection. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
47. Learning location constrained pixel classifiers for image parsing
- Author
-
Junsong Yuan and Kang Dang
- Subjects
Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,computer.software_genre ,Discriminative model ,Image parsing ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Leverage (statistics) ,Segmentation ,Computer vision ,Electrical and Electronic Engineering ,Parsing ,Pixel ,business.industry ,Estimator ,020207 software engineering ,Pattern recognition ,Computer Science::Computer Vision and Pattern Recognition ,Signal Processing ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,computer - Abstract
When parsing images with regular spatial layout, the location of a pixel ( x , y ) can provide important prior for its semantic label. This paper proposes a technique to leverage both location and appearance information for pixel labeling. The proposed method utilizes the spatial layout of the image by building local pixel classifiers that are location constrained, i.e., trained with pixels from a local neighborhood region only. Our proposed local learning works well in different challenging image parsing problems, such as pedestrian parsing, street-view scene parsing and object segmentation, and outperforms existing results that rely on one unified pixel classifier. To better understand the behavior of our local classifier, we perform bias-variance analysis, and demonstrate that the proposed local classifier essentially performs spatial smoothness over the target estimator that uses appearance information and location, which explains why the local classifier is more discriminative but can still handle mis-alignment. Meanwhile, our theoretical and experimental studies suggest the importance of selecting an appropriate neighborhood size to perform location constrained learning, which can significantly influence the parsing results.
- Published
- 2017
48. Image Parsing: Unifying Segmentation, Detection, and Recognition.
- Author
-
Tu, Zhuowen, Chen, Xiangrong, Yuille, Alan, and Zhu, Song-Chun
- Subjects
- *
IMAGE processing , *BAYESIAN analysis , *ALGORITHMS , *COMPUTER vision , *ARTIFICIAL intelligence - Abstract
In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a “parsing graph”, in a spirit similar to parsing sentences in speech and natural language. The algorithmconstructsthe parsing graph andre-configuresit dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches-generative(top-down) methods anddiscriminative(bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns-generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002.IEEE Trans. PAMI, 24(5):657-673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
49. Multi-hypothesis contextual modeling for semantic segmentation
- Author
-
Hasan Fehmi Ateş, Sercan Sünetci, Ates, Hasan F. Istanbul Medipol Univ, Dept Comp Engn, TR-34810 Istanbul, Turkey, Sunetci, Sercan Isik Univ, Dept Elect & Elect Engn, TR-34980 Istanbul, Turkey, Ates, Hasan -- 0000-0002-6842-1528, Işık Üniversitesi, Mühendislik Fakültesi, Elektrik-Elektronik Mühendisliği Bölümü, Işık University, Faculty of Engineering, Department of Electrical-Electronics Engineering, and Sünetci, Sercan
- Subjects
FOS: Computer and information sciences ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,computer.software_genre ,01 natural sciences ,Image (mathematics) ,Segmentation ,Artificial Intelligence ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Superpixel ,010306 general physics ,Parsing ,Markov random field ,Pixel ,business.industry ,Pattern recognition ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Scene ,Class (biology) ,Computer Science::Computer Vision and Pattern Recognition ,Image parsing ,Signal Processing ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,MRF ,business ,computer ,Software - Abstract
Semantic segmentation (i.e. image parsing) aims to annotate each image pixel with its corresponding semantic class label. Spatially consistent labeling of the image requires an accurate description and modeling of the local contextual information. Segmentation result is typically improved by Markov Random Field (MRF) optimization on the initial labels. However this improvement is limited by the accuracy of initial result and how the contextual neighborhood is defined. In this paper, we develop generalized and flexible contextual models for segmentation neighborhoods in order to improve parsing accuracy. Instead of using a fixed segmentation and neighborhood definition, we explore various contextual models for fusion of complementary information available in alternative segmentations of the same image. In other words, we propose a novel MRF framework that describes and optimizes the contextual dependencies between multiple segmentations. Simulation results on two common datasets demonstrate significant improvement in parsing accuracy over the baseline approaches., 8 pages and 3 figure, accepted to Pattern Recognition Letters, Elsevier
- Published
- 2018
50. Image Parsing with Superpixels and Gray Image Analysis
- Author
-
Feng Yang, Mei Xie, and Zheng Ma
- Subjects
Color image ,Computer science ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,Image segmentation ,010501 environmental sciences ,01 natural sciences ,Colored ,Computer Science::Computer Vision and Pattern Recognition ,Image parsing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Final Labeling ,business ,Sift flow ,Gray (horse) ,0105 earth and related environmental sciences - Abstract
This paper presents an effective system with the aim of solving the problem of image parsing. In order to make full use of information and characteristics in the input image(colored), the gray image is calculated from the color image, and the superpixels of both the gray image and the color one are obtained. After feature extraction, the features abstracted from color image and gray image are fused together to construct the final features. Then, the likelihood ratio scores of the input image and the similar images in the retrieval set are calculated according to these final features and the final labeling result are obtained through MRF analysis. The effectiveness of the proposed system is tested on the public dataset of SIFT Flow and experimental result shows that the proposed method in this paper is efficient.
- Published
- 2018
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.