Author: "Jimei Yang" / Topic: 0202 electrical engineering, electronic engineering, information engineering - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Jimei Yang"' showing total 32 results

Start Over Author "Jimei Yang" Topic 0202 electrical engineering, electronic engineering, information engineering

32 results on '"Jimei Yang"'

1. 3D Ken Burns effect from a single image

Author: Feng Liu, Simon Niklaus, Jimei Yang, and Long Mai
Subjects: FOS: Computer and information sciences, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Point cloud, 020207 software engineering, 02 engineering and technology, Viewpoints, Computer Graphics and Computer-Aided Design, Graphics (cs.GR), Rendering (computer graphics), View synthesis, Computer Science - Graphics, 0202 electrical engineering, electronic engineering, information engineering, Segmentation, Computer vision, Artificial intelligence, Zoom, business, Parallax
Abstract: The Ken Burns effect allows animating still images with a virtual camera scan and zoom. Adding parallax, which results in the 3D Ken Burns effect, enables significantly more compelling results. Creating such effects manually is time-consuming and demands sophisticated editing skills. Existing automatic methods, however, require multiple input images from varying viewpoints. In this paper, we introduce a framework that synthesizes the 3D Ken Burns effect from a single image, supporting both a fully automatic mode and an interactive mode with the user controlling the camera. Our framework first leverages a depth prediction pipeline, which estimates scene depth that is suitable for view synthesis tasks. To address the limitations of existing depth estimation methods such as geometric distortions, semantic distortions, and inaccurate depth boundaries, we develop a semantic-aware neural network for depth prediction, couple its estimate with a segmentation-based depth adjustment process, and employ a refinement neural network that facilitates accurate depth predictions at object boundaries. According to this depth estimate, our framework then maps the input image to a point cloud and synthesizes the resulting video frames by rendering the point cloud from the corresponding camera positions. To address disocclusions while maintaining geometrically and temporally coherent synthesis results, we utilize context-aware color- and depth-inpainting to fill in the missing information in the extreme views of the camera path, thus extending the scene geometry of the point cloud. Experiments with a wide variety of image content show that our method enables realistic synthesis results. Our study demonstrates that our system allows users to achieve better results while requiring little effort compared to existing solutions for the 3D Ken Burns effect creation., Comment: TOG 2019, http://sniklaus.com/kenburns
Published: 2019

2. Attribute-Conditioned Layout GAN for Automatic Graphic Design

Author: Tingfa Xu, Jianan Li, Chang Liu, Jianming Zhang, Christina Wang, and Jimei Yang
Subjects: FOS: Computer and information sciences, Engineering drawing, Forcing (recursion theory), business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, 02 engineering and technology, Graphic design, Computer Graphics and Computer-Aided Design, Aspect ratio (image), Graphics (cs.GR), Computer Science - Graphics, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Task analysis, Computer Vision and Pattern Recognition, Element (category theory), business, Software, Dropout (neural networks), Generator (mathematics)
Abstract: Modeling layout is an important first step for graphic design. Recently, methods for generating graphic layouts have progressed, particularly with Generative Adversarial Networks (GANs). However, the problem of specifying the locations and sizes of design elements usually involves constraints with respect to element attributes, such as area, aspect ratio and reading-order. Automating attribute conditional graphic layouts remains a complex and unsolved problem. In this article, we introduce Attribute-conditioned Layout GAN to incorporate the attributes of design elements for graphic layout generation by forcing both the generator and the discriminator to meet attribute conditions. Due to the complexity of graphic designs, we further propose an element dropout method to make the discriminator look at partial lists of elements and learn their local patterns. In addition, we introduce various loss designs following different design principles for layout optimization. We demonstrate that the proposed method can synthesize graphic layouts conditioned on different element attributes. It can also adjust well-designed layouts to new sizes while retaining elements’ original reading-orders. The effectiveness of our method is validated through a user study.
Published: 2020

3. Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

Author: Federico Perazzi, Jia-Bin Huang, Duygu Ceylan, Jianming Zhang, Yuliang Zou, and Jimei Yang
Subjects: 0209 industrial biotechnology, Monocular, Artificial neural network, business.industry, Computer science, Constraint (computer-aided design), Work (physics), Detector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Motion (physics), 020901 industrial engineering & automation, Human dynamics, 0202 electrical engineering, electronic engineering, information engineering, RGB color model, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: In this paper, we aim to reduce the footskate artifacts when reconstructing human dynamics from monocular RGB videos. Recent work has made substantial progress in improving the temporal smoothness of the reconstructed motion trajectories. Their results, however, still suffer from severe foot skating and slippage artifacts. To tackle this issue, we present a neural network based detector for localizing ground contact events of human feet and use it to impose a physical constraint for optimization of the whole human dynamics in a video. We present a detailed study on the proposed ground contact detector and demonstrate high-quality human motion reconstruction results in various videos.
Published: 2020

4. LayoutGAN: Synthesizing Graphic Layouts With Vector-Wireframe Adversarial Networks

Author: Jianan Li, Jimei Yang, Aaron Hertzmann, Jianming Zhang, and Tingfa Xu
Subjects: Computer science, business.industry, Applied Mathematics, 02 engineering and technology, Graphic design, Document layout, Rendering (computer graphics), Visualization, Computational Theory and Mathematics, Artificial Intelligence, Computer graphics (images), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Layout is important for graphic design and scene generation. We propose a novel Generative Adversarial Network, called LayoutGAN, that synthesizes layouts by modeling geometric relations of different types of 2D elements. The generator of LayoutGAN takes as input a set of randomly-placed 2D graphic elements, represented by vectors and uses self-attention modules to refine their labels and geometric parameters jointly to produce a realistic layout. Accurate alignment is critical for good layouts. We, thus, propose a novel differentiable wireframe rendering layer that maps the generated layout to a wireframe image, upon which a CNN-based discriminator is used to optimize the layouts in image space. We validate the effectiveness of LayoutGAN in various experiments including MNIST digit generation, document layout generation, clipart abstract scene generation, tangram graphic design, mobile app layout design, and webpage layout optimization from hand-drawn sketches.
Published: 2020

5. AIM 2020 Challenge on Image Extreme Inpainting

Author: Soikat Hasan Ahmed, Chao Li, Xinbo Gao, Haoning Wu, A. N. Rajagopalan, Mengmeng Bai, Murari Mandal, Chu Tak Li, Cai Yiyang, Andrés Romero, Jimei Yang, Taeoh Kim, Shilei Wen, Pranjal Singh Chauhan, Maitreya Suin, Eli Shechtman, Fu Li, Jianming Zhang, Hae Woong Jang, Evangelos Ntavelis, Pratik Narang, Daniel P. K. Lun, Zhi-Song Liu, Li-Wen Wang, Zheng Hui, Haopeng Ni, Chenghua Li, Dongliang He, Yu Zeng, Hanbin Son, Yong Ju Jung, Yu Han, Sangyoun Lee, Jungmin Yoon, Uddin S.M. Nadim, Zhe Lin, Siavash Arjomand Bigdeli, Kuldeep Purohit, Xiumei Wang, Wan-Chi Siu, Errui Ding, Shuchen Li, Dejia Xu, Chajin Shin, Huchuan Lu, Radu Timofte, and Weijian Zeng
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Inpainting, 020207 software engineering, 02 engineering and technology, GeneralLiterature_MISCELLANEOUS, Generative modeling, Image synthesis, Image (mathematics), Set (abstract data type), 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Computer vision, Segmentation, Artificial intelligence, business
Abstract: This paper reviews the AIM 2020 challenge on extreme image inpainting. This report focuses on proposed solutions and results for two different tracks on extreme image inpainting: classical image inpainting and semantically guided image inpainting. The goal of track 1 is to inpaint large part of the image with no supervision. Similarly, the goal of track 2 is to inpaint the image by having access to the entire semantic segmentation map of the input. The challenge had 88 and 74 participants, respectively. 11 and 6 teams competed in the final phase of the challenge, respectively. This report gauges current solutions and set a benchmark for future extreme image inpainting methods.
Published: 2020

6. High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling

Author: Jimei Yang, Eli Shechtman, Yu Zeng, Zhe Lin, Huchuan Lu, and Jianming Zhang
Subjects: Pixel, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Process (computing), Inpainting, 02 engineering and technology, 010501 environmental sciences, Object (computer science), 01 natural sciences, Image (mathematics), Upsampling, Generative model, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, 0105 earth and related environmental sciences
Abstract: Existing image inpainting methods often produce artifacts when dealing with large holes in real applications. To address this challenge, we propose an iterative inpainting method with a feedback mechanism. Specifically, we introduce a deep generative model which not only outputs an inpainting result but also a corresponding confidence map. Using this map as feedback, it progressively fills the hole by trusting only high-confidence pixels inside the hole at each iteration and focuses on the remaining pixels in the next iteration. As it reuses partial predictions from the previous iterations as known pixels, this process gradually improves the result. In addition, we propose a guided upsampling network to enable generation of high-resolution inpainting results. We achieve this by extending the Contextual Attention module [39] to borrow high-resolution feature patches in the input image. Furthermore, to mimic real object removal scenarios, we collect a large object mask dataset and synthesize more realistic training data that better simulates user inputs. Experiments show that our method significantly outperforms existing methods in both quantitative and qualitative evaluations. More results and Web APP are available at https://zengxianyu.github.io/iic.
Published: 2020

7. Contact and Human Dynamics from Monocular Video

Author: Aaron Hertzmann, Leonidas J. Guibas, Davis Rempe, Jimei Yang, Ruben Villegas, and Bryan Russell
Subjects: business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Process (computing), 020207 software engineering, 02 engineering and technology, Kinematics, Trajectory optimization, Motion capture, Human dynamics, 0202 electrical engineering, electronic engineering, information engineering, Character animation, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Pose, Computer animation, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors that violate physical constraints, such as feet penetrating the ground and bodies leaning at extreme angles. In this paper, we present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input. We first estimate ground contact timings with a novel prediction network which is trained without hand-labeled data. A physics-based trajectory optimization then solves for a physically-plausible motion, based on the inputs. We show this process produces motions that are significantly more realistic than those from purely kinematic methods, substantially improving quantitative measures of both kinematic and dynamic plausibility. We demonstrate our method on character animation and pose estimation tasks on dynamic motions of dancing and sports with complex contact patterns.
Published: 2020

8. Data Fusion in Forecasting Medical Demands based on Spectrum of Post-Earthquake Diseases

Author: Jiaqi Fang, Zhuming Bi, Shilan Dai, Jimei Yang, Lu Han, Hanping Hou, and Dongzhen Jin
Subjects: 0209 industrial biotechnology, Information Systems and Management, Operations research, Computer science, Reliability (computer networking), 02 engineering and technology, Filter (signal processing), Sensor fusion, Emergency rescue, Industrial and Manufacturing Engineering, Weighting, 020901 industrial engineering & automation, Resource (project management), 0202 electrical engineering, electronic engineering, information engineering, Damages, 020201 artificial intelligence & image processing, Natural disaster
Abstract: Industry 4.0 makes it possible to develop smart emergency rescue systems in natural disasters. One of the most critical challenges is forecasting the demands of resources for appropriate resource allocations based on data from multiple sources with different levels of reliability. This paper deals with the challenge of data fusion and processing in forecasting resource demands for emergency responses to patients with various disease types. After an earthquake, the data on injuries, damages, and medical demands are characterized as diversified, unorganized, distributed, dynamic, and chaotic. Therefore, how to collect, filter, fuse, and mine data is most critical to forecast and allocate resources, especially for some emergent sources such as drugs for injuries and illnesses in post-earthquakes. To determine general patterns of outbreak diseases and corresponding medical needs, multi-source data is fused and processed to determine a reliable and accurate spectrum of post-earthquake diseases. The entropy-based weighting technology is adopted to determine the reliability and accuracy of data; the fused data is further processed to estimate the numbers of injuries, classify disease types, and finally predict the demands of medical supplies over time. In emergency rescues, medical resources are allocated and dispatched based on estimated numbers, types, and locations of patients. The effectiveness of the proposed method is verified and validated in simulation.
Published: 2021

9. Top-Down Visual Saliency via Joint CRF and Dictionary Learning

Author: Jimei Yang and Ming-Hsuan Yang
Subjects: Conditional random field, Computer science, Speech recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Inference, Visual dictionary, Latent variable, 02 engineering and technology, Discriminative model, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, computer.programming_language, Context model, business.industry, Applied Mathematics, Cognitive neuroscience of visual object recognition, 020207 software engineering, Pattern recognition, Pascal (programming language), Top-down and bottom-up design, Object detection, Visualization, ComputingMethodologies_PATTERNRECOGNITION, Stochastic gradient descent, Computational Theory and Mathematics, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Neural coding, computer, Software
Abstract: Top-down visual saliency is an important module of visual attention. In this work, we propose a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a visual dictionary. The proposed model incorporates a layered structure from top to bottom: CRF, sparse coding and image patches. With sparse coding as an intermediate layer, CRF is learned in a feature-adaptive manner; meanwhile with CRF as the output layer, the dictionary is learned under structured supervision. For efficient and effective joint learning, we develop a max-margin approach via a stochastic gradient descent algorithm. Experimental results on the Graz-02 and PASCAL VOC datasets show that our model performs favorably against state-of-the-art top-down saliency methods for target object localization. In addition, the dictionary update significantly improves the performance of our model. We demonstrate the merits of the proposed top-down saliency model by applying it to prioritizing object proposals for detection and predicting human fixations.
Published: 2017

10. On the Continuity of Rotation Representations in Neural Networks

Author: Jingwan Lu, Hao Li, Jimei Yang, Yi Zhou, and Connelly Barnes
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Similarity (geometry), business.industry, Computer science, Deep learning, Machine Learning (stat.ML), 020207 software engineering, 02 engineering and technology, Homeomorphism, Machine Learning (cs.LG), Euler angles, Algebra, symbols.namesake, Statistics - Machine Learning, Euclidean geometry, 0202 electrical engineering, electronic engineering, information engineering, symbols, 020201 artificial intelligence & image processing, Orthogonal group, Artificial intelligence, business, Quaternion, Rotation (mathematics), Rotation group SO
Abstract: In neural networks, it is often desirable to work with various representations of the same space. For example, 3D rotations can be represented with quaternions or Euler angles. In this paper, we advance a definition of a continuous representation, which can be helpful for training deep neural networks. We relate this to topological concepts such as homeomorphism and embedding. We then investigate what are continuous and discontinuous representations for 2D, 3D, and n-dimensional rotations. We demonstrate that for 3D rotations, all representations are discontinuous in the real Euclidean spaces of four or fewer dimensions. Thus, widely used representations such as quaternions and Euler angles are discontinuous and difficult for neural networks to learn. We show that the 3D rotations have continuous representations in 5D and 6D, which are more suitable for learning. We also present continuous representations for the general case of the n-dimensional rotation group SO(n). While our main focus is on rotations, we also show that our constructions apply to other groups such as the orthogonal group and similarity transforms. We finally present empirical results, which show that our continuous rotation representations outperform discontinuous ones for several practical problems in graphics and vision, including a simple autoencoder sanity test, a rotation estimator for 3D point clouds, and an inverse kinematics solver for 3D human poses.
Published: 2019

11. Foreground-Aware Image Inpainting

Author: Zhe Lin, Wei Xiong, Jiahui Yu, Jimei Yang, Jiebo Luo, Connelly Barnes, and Xin Lu
Subjects: FOS: Computer and information sciences, Structure (mathematical logic), Pixel, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Inpainting, 020206 networking & telecommunications, 02 engineering and technology, Object (computer science), Image (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business
Abstract: Existing image inpainting methods typically fill holes by borrowing information from surrounding pixels. They often produce unsatisfactory results when the holes overlap with or touch foreground objects due to lack of information about the actual extent of foreground and background regions within the holes. These scenarios, however, are very important in practice, especially for applications such as the removal of distracting objects. To address the problem, we propose a foreground-aware image inpainting system that explicitly disentangles structure inference and content completion. Specifically, our model learns to predict the foreground contour first, and then inpaints the missing region using the predicted contour as guidance. We show that by such disentanglement, the contour completion model predicts reasonable contours of objects, and further substantially improves the performance of image inpainting. Experiments show that our method significantly outperforms existing methods and achieves superior inpainting results on challenging cases with complex compositions., Comment: Camera Ready version of CVPR 2019 with supplementary materials
Published: 2019

12. Multimodal Style Transfer via Graph Cuts

Author: Zhaowen Wang, Yun Fu, Zhe Lin, Yilin Wang, Yulun Zhang, Chen Fang, and Jimei Yang
Subjects: FOS: Computer and information sciences, Stylized fact, Pixel, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, Pattern recognition, 02 engineering and technology, Cut, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: An assumption widely used in recent neural style transfer methods is that image styles can be described by global statics of deep features like Gram or covariance matrices. Alternative approaches have represented styles by decomposing them into local pixel or neural patches. Despite the recent progress, most existing methods treat the semantic patterns of style image uniformly, resulting unpleasing results on complex styles. In this paper, we introduce a more flexible and general universal style transfer technique: multimodal style transfer (MST). MST explicitly considers the matching of semantic patterns in content and style images. Specifically, the style image features are clustered into sub-style components, which are matched with local content features under a graph cut formulation. A reconstruction network is trained to transfer each sub-style and render the final stylized result. We also generalize MST to improve some existing methods. Extensive experiments demonstrate the superior effectiveness, robustness, and flexibility of MST., Accepted to ICCV 2019. Typos in Eqs. (11) and (12) have been fixed in arXiv V2 and this version (V6). Code: https://github.com/yulunzhang/MST
Published: 2019

13. FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images

Author: Christian Zimmermann, Bryan Russell, Thomas Brox, Max Argus, Jimei Yang, and Duygu Ceylan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Generalization, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, Pattern recognition, 02 engineering and technology, Sample (graphics), Machine Learning (cs.LG), Set (abstract data type), Computer Science - Robotics, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), RGB color model, 020201 artificial intelligence & image processing, Artificial intelligence, business, Robotics (cs.RO)
Abstract: Estimating 3D hand pose from single RGB images is a highly ambiguous problem that relies on an unbiased training dataset. In this paper, we analyze cross-dataset generalization when training on existing datasets. We find that approaches perform well on the datasets they are trained on, but do not generalize to other datasets or in-the-wild scenarios. As a consequence, we introduce the first large-scale, multi-view hand dataset that is accompanied by both 3D hand pose and shape annotations. For annotating this real-world dataset, we propose an iterative, semi-automated `human-in-the-loop' approach, which includes hand fitting optimization to infer both the 3D pose and shape for each sample. We show that methods trained on our dataset consistently perform well when tested on other datasets. Moreover, the dataset allows us to train a network that predicts the full articulated hand shape from a single RGB image. The evaluation set can serve as a benchmark for articulated hand shape estimation., Comment: Accepted to ICCV 2019, Project page: https://lmb.informatik.uni-freiburg.de/projects/freihand/
Published: 2019
Full Text: View/download PDF

14. Brush stroke synthesis with a generative adversarial network driven by physically based simulation

Author: Zhaowen Wang, Rundong Wu, Zhili Chen, Jimei Yang, and Steve Marschner
Subjects: Fluid simulation, Painting, Artificial neural network, Computer science, Oil painting, media_common.quotation_subject, Real-time computing, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, 0202 electrical engineering, electronic engineering, information engineering, Trajectory, Quality (business), Generative adversarial network, Brush stroke, ComputingMethodologies_COMPUTERGRAPHICS, 0105 earth and related environmental sciences, media_common
Abstract: We introduce a novel approach that uses a generative adversarial network (GAN) to synthesize realistic oil painting brush strokes, where the network is trained with data generated by a high-fidelity simulator. Among approaches to digitally synthesizing natural media painting strokes, physically based simulation produces by far the most realistic visual results and allows the most intuitive control of stroke variations. However, accurate physics simulations are known to be computationally expensive and often cannot meet the performance requirements of painting applications.In our work, we propose to replace the expensive fluid simulation with a neural network. The network takes the existing canvas and a new stroke trajectory as input and produces the height and color of the new stroke as output. We train the network with a dataset generated with a high quality offline simulator. The network is able to produce visual quality comparable to the offline simulator with better performance than the existing real-time oil painting simulator. Finally, we implement a real-time painting system using the trained network.
Published: 2018

15. Neural Kinematic Networks for Unsupervised Motion Retargetting

Author: Honglak Lee, Duygu Ceylan, Jimei Yang, and Ruben Villegas
Subjects: FOS: Computer and information sciences, Forward kinematics, Monocular, Inverse kinematics, Artificial neural network, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, Kinematics, Animation, Motion (physics), Recurrent neural network, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: We propose a recurrent neural network architecture with a Forward Kinematics layer and cycle consistency based adversarial training objective for unsupervised motion retargetting. Our network captures the high-level properties of an input motion by the forward kinematics layer, and adapts them to a target character with different skeleton bone lengths (e.g., shorter, longer arms etc.). Collecting paired motion training sequences from different characters is expensive. Instead, our network utilizes cycle consistency to learn to solve the Inverse Kinematics problem in an unsupervised manner. Our method works online, i.e., it adapts the motion sequence on-the-fly as new frames are received. In our experiments, we use the Mixamo animation data to test our method for a variety of motions and characters and achieve state-of-the-art results. We also demonstrate motion retargetting from monocular human videos to 3D characters using an off-the-shelf 3D pose estimator., IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Published: 2018

16. MAttNet: Modular Attention Network for Referring Expression Comprehension

Author: Xin Lu, Jimei Yang, Zhe Lin, Tamara L. Berg, Licheng Yu, Mohit Bansal, and Xiaohui Shen
Subjects: FOS: Computer and information sciences, Focus (computing), Computer Science - Computation and Language, Referring expression, Phrase, Computer Science - Artificial Intelligence, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, 010501 environmental sciences, Modular design, 01 natural sciences, Comprehension, Artificial Intelligence (cs.AI), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Computation and Language (cs.CL), Word (computer architecture), Natural language, 0105 earth and related environmental sciences
Abstract: In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. While most recent work treats expressions as a single unit, we propose to decompose them into three modular components related to subject appearance, location, and relationship to other objects. This allows us to flexibly adapt to expressions containing different types of information in an end-to-end framework. In our model, which we call the Modular Attention Network (MAttNet), two types of attention are utilized: language-based attention that learns the module weights as well as the word/phrase attention that each module should focus on; and visual attention that allows the subject and relationship modules to focus on relevant image components. Module weights combine scores from all three modules dynamically to output an overall score. Experiments show that MAttNet outperforms previous state-of-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo and code are provided., Equation of word attention fixed; MAttNet+Grabcut results added
Published: 2018

17. PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image

Author: Yasutaka Furukawa, Ersin Yumer, Duygu Ceylan, Chen Liu, and Jimei Yang
Subjects: FOS: Computer and information sciences, Plane (geometry), business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, 02 engineering and technology, Image segmentation, Iterative reconstruction, Set (abstract data type), Planar, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Representation (mathematics), business
Abstract: This paper proposes a deep neural network (DNN) for piece-wise planar depthmap reconstruction from a single RGB image. While DNNs have brought remarkable progress to single-image depth prediction, piece-wise planar depthmap reconstruction requires a structured geometry representation, and has been a difficult task to master even for DNNs. The proposed end-to-end DNN learns to directly infer a set of plane parameters and corresponding plane segmentation masks from a single RGB image. We have generated more than 50,000 piece-wise planar depthmaps for training and testing from ScanNet, a large-scale RGBD video database. Our qualitative and quantitative evaluations demonstrate that the proposed approach outperforms baseline methods in terms of both plane segmentation and depth estimation accuracy. To the best of our knowledge, this paper presents the first end-to-end neural architecture for piece-wise planar reconstruction from a single RGB image. Code and data are available at https://github.com/art-programmer/PlaneNet., CVPR 2018
Published: 2018

18. Generative Image Inpainting with Contextual Attention

Author: Xin Lu, Xiaohui Shen, Jiahui Yu, Jimei Yang, Zhe Lin, and Thomas S. Huang
Subjects: FOS: Computer and information sciences, Contextual image classification, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Deep learning, Inpainting, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, Pattern recognition, 02 engineering and technology, Image segmentation, Convolutional neural network, Graphics (cs.GR), Generative model, Computer Science - Graphics, Image texture, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Image restoration
Abstract: Recent deep learning based approaches have shown promising results for the challenging task of inpainting large missing regions in an image. These methods can generate visually plausible image structures and textures, but often create distorted structures or blurry textures inconsistent with surrounding areas. This is mainly due to ineffectiveness of convolutional neural networks in explicitly borrowing or copying information from distant spatial locations. On the other hand, traditional texture and patch synthesis approaches are particularly suitable when it needs to borrow textures from the surrounding regions. Motivated by these observations, we propose a new deep generative model-based approach which can not only synthesize novel image structures but also explicitly utilize surrounding image features as references during network training to make better predictions. The model is a feed-forward, fully convolutional neural network which can process images with multiple holes at arbitrary locations and with variable sizes during the test time. Experiments on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and natural images (ImageNet, Places2) demonstrate that our proposed approach generates higher-quality inpainting results than existing ones. Code, demo and models are available at: https://github.com/JiahuiYu/generative_inpainting., Accepted in CVPR 2018; add CelebA-HQ results; open sourced; interactive demo available: http://jhyu.me/demo
Published: 2018

19. Flow-Grounded Spatial-Temporal Video Prediction from Still Images

Author: Xin Lu, Ming-Hsuan Yang, Yijun Li, Jimei Yang, Chen Fang, and Zhaowen Wang
Subjects: Pixel, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Manifold, Motion (physics), Image (mathematics), Task (computing), Flow (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Focus (optics), 0105 earth and related environmental sciences
Abstract: Existing video prediction methods mainly rely on observing multiple historical frames or focus on predicting the next one-frame. In this work, we study the problem of generating consecutive multiple future frames by observing one single still image only. We formulate the multi-frame prediction task as a multiple time step flow (multi-flow) prediction phase followed by a flow-to-frame synthesis phase. The multi-flow prediction is modeled in a variational probabilistic manner with spatial-temporal relationships learned through 3D convolutions. The flow-to-frame synthesis is modeled as a generative process in order to keep the predicted results lying closer to the manifold shape of real video sequence. Such a two-phase design prevents the model from directly looking at the high-dimensional pixel space of the frame sequence and is demonstrated to be more effective in predicting better and diverse results. Extensive experimental results on videos with different types of motion show that the proposed algorithm performs favorably against existing methods in terms of quality, diversity and human perceptual evaluation.
Published: 2018

20. Video Scene Parsing with Predictive Feature Learning

Author: Luoqi Liu, Xiaojie Jin, Jiashi Feng, Shuicheng Yan, Xiaohui Shen, Yunpeng Chen, Zhe Lin, Jimei Yang, Zequn Jie, Xin Li, Jian Dong, and Huaxin Xiao
Subjects: FOS: Computer and information sciences, Predictive learning, Context model, Training set, Parsing, Process (engineering), Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Frame (networking), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Feature learning, 0105 earth and related environmental sciences
Abstract: In this work, we address the challenging video scene parsing problem by developing effective representation learning methods given limited parsing annotations. In particular, we contribute two novel methods that constitute a unified parsing framework. (1) \textbf{Predictive feature learning}} from nearly unlimited unlabeled video data. Different from existing methods learning features from single frame parsing, we learn spatiotemporal discriminative features by enforcing a parsing network to predict future frames and their parsing maps (if available) given only historical frames. In this way, the network can effectively learn to capture video dynamics and temporal context, which are critical clues for video scene parsing, without requiring extra manual annotations. (2) \textbf{Prediction steering parsing}} architecture that effectively adapts the learned spatiotemporal features to scene parsing tasks and provides strong guidance for any off-the-shelf parsing model to achieve better video scene parsing performance. Extensive experiments over two challenging datasets, Cityscapes and Camvid, have demonstrated the effectiveness of our methods by showing significant improvement over well-established baselines., Comment: 15 pages, 7 figures, 5 tables, currently v2
Published: 2017

21. FoveaNet: Perspective-Aware Urban Scene Parsing

Author: Zequn Jie, Jiashi Feng, Jimei Yang, Wei Wang, Changsong Liu, Xiaohui Shen, Zhe Lin, Qiang Chen, Shuicheng Yan, and Xin Li
Subjects: FOS: Computer and information sciences, Parsing, Property (programming), business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Perspective (graphical), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 010501 environmental sciences, Space (commercial competition), Semantics, Object (computer science), computer.software_genre, 01 natural sciences, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, CRFS, business, computer, 0105 earth and related environmental sciences
Abstract: Parsing urban scene images benefits many applications, especially self-driving. Most of the current solutions employ generic image parsing models that treat all scales and locations in the images equally and do not consider the geometry property of car-captured urban scene images. Thus, they suffer from heterogeneous object scales caused by perspective projection of cameras on actual scenes and inevitably encounter parsing failures on distant objects as well as other boundary and recognition errors. In this work, we propose a new FoveaNet model to fully exploit the perspective geometry of scene images and address the common failures of generic parsing models. FoveaNet estimates the perspective geometry of a scene image through a convolutional network which integrates supportive evidence from contextual objects within the image. Based on the perspective geometry information, FoveaNet “undoes” the camera perspective projection — analyzing regions in the space of the actual scene, and thus provides much more reliable parsing results. Furthermore, to effectively address the recognition errors, FoveaNet introduces a new dense CRFs model that takes the perspective geometry as a prior potential. We evaluate FoveaNet on two urban scene parsing datasets, Cityspaces and CamVid, which demonstrates that FoveaNet can outperform all the well-established baselines and provide new state-of-the-art performance.
Published: 2017

22. 3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks

Author: Duygu Ceylan, Chuhang Zou, Ersin Yumer, Jimei Yang, and Derek Hoiem
Subjects: FOS: Computer and information sciences, Computer science, Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Gaussian, Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML), 02 engineering and technology, Machine Learning (cs.LG), Set (abstract data type), symbols.namesake, Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, Representation (mathematics), Gaussian process, business.industry, 020207 software engineering, Pattern recognition, Object (computer science), Visualization, Generative model, Computer Science - Learning, Artificial Intelligence (cs.AI), Recurrent neural network, symbols, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: The success of various applications including robotics, digital content creation, and visualization demand a structured and abstract representation of the 3D world from limited sensor data. Inspired by the nature of human perception of 3D shapes as a collection of simple parts, we explore such an abstract shape representation based on primitives. Given a single depth image of an object, we present 3D-PRNN, a generative recurrent neural network that synthesizes multiple plausible shapes composed of a set of primitives. Our generative model encodes symmetry characteristics of common man-made objects, preserves long-range structural coherence, and describes objects of varying complexity with a compact representation. We also propose a method based on Gaussian Fields to generate a large scale dataset of primitive-based shape representations to train our network. We evaluate our approach on a wide range of examples and show that it outperforms nearest-neighbor based shape retrieval methods and is on-par with voxel-based generative models while using a significantly reduced parameter space., ICCV 2017
Published: 2017

23. Material Editing Using a Physically Based Rendering Network

Author: Ersin Yumer, Guilin Liu, Jimei Yang, Duygu Ceylan, and Jyh-Ming Lien
Subjects: FOS: Computer and information sciences, Image formation, Material editing, Network architecture, business.industry, Computer Vision and Pattern Recognition (cs.CV), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, 02 engineering and technology, Rendering (computer graphics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Differentiable function, Specular reflection, Single image, Physically based rendering, business
Abstract: The ability to edit materials of objects in images is desirable by many content creators. However, this is an extremely challenging task as it requires to disentangle intrinsic physical properties of an image. We propose an end-to-end network architecture that replicates the forward image formation process to accomplish this task. Specifically, given a single image, the network first predicts intrinsic properties, i.e. shape, illumination, and material, which are then provided to a rendering layer. This layer performs in-network image synthesis, thereby enabling the network to understand the physics behind the image formation process. The proposed rendering layer is fully differentiable, supports both diffuse and specular materials, and thus can be applicable in a variety of problem settings. We demonstrate a rich set of visually plausible material editing examples and provide an extensive comparative study., 14 pages, ICCV 2017
Published: 2017

24. Diversified Texture Synthesis with Feed-Forward Networks

Author: Yijun Li, Xin Lu, Ming-Hsuan Yang, Zhaowen Wang, Chen Fang, and Jimei Yang
Subjects: FOS: Computer and information sciences, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, 02 engineering and technology, Machine learning, computer.software_genre, Texture (geology), Visualization, Discriminative model, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Interpolation, Texture synthesis
Abstract: Recent progresses on deep discriminative and generative modeling have shown promising results on texture synthesis. However, existing feed-forward based methods trade off generality for efficiency, which suffer from many issues, such as shortage of generality (i.e., build one network per texture), lack of diversity (i.e., always produce visually identical output) and suboptimality (i.e., generate less satisfying visual effects). In this work, we focus on solving these issues for improved texture synthesis. We propose a deep generative feed-forward network which enables efficient synthesis of multiple textures within one single network and meaningful interpolation between them. Meanwhile, a suite of important techniques are introduced to achieve better convergence and diversity. With extensive experiments, we demonstrate the effectiveness of the proposed model and techniques for synthesizing a large number of textures and show its applications with the stylization., Comment: accepted by CVPR2017
Published: 2017

25. Transformation-Grounded Image Generation Network for Novel 3D View Synthesis

Author: Eunbyung Park, Alexander C. Berg, Duygu Ceylan, Ersin Yumer, and Jimei Yang
Subjects: FOS: Computer and information sciences, Pixel, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, 02 engineering and technology, Iterative reconstruction, Solid modeling, View synthesis, Reduction (complexity), Range (mathematics), Transformation (function), Hallucinating, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Mathematics
Abstract: We present a transformation-grounded image generation network for novel 3D view synthesis from a single image. Instead of taking a 'blank slate' approach, we first explicitly infer the parts of the geometry visible both in the input and novel views and then re-cast the remaining synthesis problem as image completion. Specifically, we both predict a flow to move the pixels from the input to the novel view along with a novel visibility map that helps deal with occulsion/disocculsion. Next, conditioned on those intermediate results, we hallucinate (infer) parts of the object invisible in the input image. In addition to the new network structure, training with a combination of adversarial and perceptual loss results in a reduction in common artifacts of novel view synthesis such as distortions and holes, while successfully generating high frequency details and preserving visual aspects of the input image. We evaluate our approach on a wide range of synthetic and real examples. Both qualitative and quantitative results show our method achieves significantly better results compared to existing methods., To appear in CVPR 2017
Published: 2017

26. Forecasting Human Dynamics from Static Images

Author: Brian Price, Jia Deng, Jimei Yang, Scott Cohen, and Yu-Wei Chao
Subjects: FOS: Computer and information sciences, Sequence, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, 010501 environmental sciences, 3D pose estimation, Machine learning, computer.software_genre, 01 natural sciences, Motion capture, Image (mathematics), Human dynamics, 0202 electrical engineering, electronic engineering, information engineering, Leverage (statistics), 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, computer, Pose, 0105 earth and related environmental sciences
Abstract: This paper presents the first study on forecasting human dynamics from static images. The problem is to input a single RGB image and generate a sequence of upcoming human body poses in 3D. To address the problem, we propose the 3D Pose Forecasting Network (3D-PFNet). Our 3D-PFNet integrates recent advances on single-image human pose estimation and sequence prediction, and converts the 2D predictions into 3D space. We train our 3D-PFNet using a three-step training strategy to leverage a diverse source of training data, including image and video based human pose datasets and 3D motion capture (MoCap) data. We demonstrate competitive performance of our 3D-PFNet on 2D pose forecasting and 3D pose recovery through quantitative and qualitative results., Comment: Accepted in CVPR 2017
Published: 2017
Full Text: View/download PDF

27. Deep GrabCut for Object Selection

Author: Brian Price, Scott Cohen, Jimei Yang, Thomas S. Huang, and Ning Xu
Subjects: Conditional random field, FOS: Computer and information sciences, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, Pattern recognition, 02 engineering and technology, Object (computer science), Constraint (information theory), GrabCut, Minimum bounding box, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Segmentation, Rectangle, Artificial intelligence, business
Abstract: Most previous bounding-box-based segmentation methods assume the bounding box tightly covers the object of interest. However it is common that a rectangle input could be too large or too small. In this paper, we propose a novel segmentation approach that uses a rectangle as a soft constraint by transforming it into an Euclidean distance map. A convolutional encoder-decoder network is trained end-to-end by concatenating images with these distance maps as inputs and predicting the object masks as outputs. Our approach gets accurate segmentation results given sloppy rectangles while being general for both interactive segmentation and instance segmentation. We show our network extends to curve-based input without retraining. We further apply our network to instance-level semantic segmentation and resolve any overlap using a conditional random field. Experiments on benchmark datasets demonstrate the effectiveness of the proposed approaches., Comment: BMVC 2017
Published: 2017
Full Text: View/download PDF

28. Generative Face Completion

Author: Ming-Hsuan Yang, Sifei Liu, Yijun Li, and Jimei Yang
Subjects: FOS: Computer and information sciences, Parsing, Artificial neural network, Pixel, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, Pattern recognition, 02 engineering and technology, Iterative reconstruction, computer.software_genre, Task (project management), Generative model, Consistency (database systems), Face (geometry), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, computer, Mathematics
Abstract: In this paper, we propose an effective face completion algorithm using a deep generative model. Different from well-studied background completion, the face completion task is more challenging as it often requires to generate semantically new pixels for the missing key components (e.g., eyes and mouths) that contain large appearance variations. Unlike existing nonparametric algorithms that search for patches to synthesize, our algorithm directly generates contents for missing regions based on a neural network. The model is trained with a combination of a reconstruction loss, two adversarial losses and a semantic parsing loss, which ensures pixel faithfulness and local-global contents consistency. With extensive experimental results, we demonstrate qualitatively and quantitatively that our model is able to deal with a large area of missing pixels in arbitrary shapes and generate realistic face completion results., Comment: Accepted by CVPR 2017
Published: 2017
Full Text: View/download PDF

29. Object Contour Detection with a Fully Convolutional Encoder-Decoder Network

Author: Scott Cohen, Jimei Yang, Ming-Hsuan Yang, Brian Price, and Honglak Lee
Subjects: FOS: Computer and information sciences, Computer science, Computer Vision and Pattern Recognition (cs.CV), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, Edge detection, Machine Learning (cs.LG), Object-class detection, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, computer.programming_language, Ground truth, business.industry, Deep learning, 020207 software engineering, Pattern recognition, Pascal (programming language), Image segmentation, Computer Science - Learning, 020201 artificial intelligence & image processing, Viola–Jones object detection framework, Artificial intelligence, business, computer, Decoding methods
Abstract: We develop a deep learning algorithm for contour detection with a fully convolutional encoder-decoder network. Different from previous low-level edge detection, our algorithm focuses on detecting higher-level object contours. Our network is trained end-to-end on PASCAL VOC with refined ground truth from inaccurate polygon annotations, yielding much higher precision in object contour detection than previous methods. We find that the learned model generalizes well to unseen object classes from the same super-categories on MS COCO and can match state-of-the-art edge detection on BSDS500 with fine-tuning. By combining with the multiscale combinatorial grouping algorithm, our method can generate high-quality segmented object proposals, which significantly advance the state-of-the-art on PASCAL VOC (improving average recall from 0.62 to 0.67) with a relatively small amount of candidates ($\sim$1660 per image)., Accepted by CVPR2016 as spotlight
Published: 2016

30. Deep Interactive Object Selection

Author: Scott Cohen, Brian Price, Jimei Yang, Ning Xu, and Thomas S. Huang
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, Pattern recognition, 02 engineering and technology, Pascal (programming language), Image segmentation, Machine learning, computer.software_genre, Euclidean distance, Cut, 0202 electrical engineering, electronic engineering, information engineering, RGB color model, 020201 artificial intelligence & image processing, Segmentation, Artificial intelligence, business, computer, computer.programming_language
Abstract: Interactive object selection is a very important research problem and has many applications. Previous algorithms require substantial user interactions to estimate the foreground and background distributions. In this paper, we present a novel deep learning based algorithm which has a much better understanding of objectness and thus can reduce user interactions to just a few clicks. Our algorithm transforms user provided positive and negative clicks into two Euclidean distance maps which are then concatenated with the RGB channels of images to compose (image, user interactions) pairs. We generate many of such pairs by combining several random sampling strategies to model user click patterns and use them to fine tune deep Fully Convolutional Networks (FCNs). Finally the output probability maps of our FCN 8s model is integrated with graph cut optimization to refine the boundary segments. Our model is trained on the PASCAL segmentation dataset and evaluated on other datasets with different object classes. Experimental results on both seen and unseen objects clearly demonstrate that our algorithm has a good generalization ability and is superior to all existing interactive object selection approaches., Computer Vision and Pattern Recognition
Published: 2016

31. Attribute2Image: Conditional Image Generation from Visual Attributes

Author: Xinchen Yan, Kihyuk Sohn, Honglak Lee, and Jimei Yang
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Inference, 020207 software engineering, Pattern recognition, 02 engineering and technology, Latent variable, Convolutional neural network, Image (mathematics), Generative model, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Generative grammar
Abstract: This paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be learned end-to-end using a variational auto-encoder. We experiment with natural images of faces and birds and demonstrate that the proposed models are capable of generating realistic and diverse samples with disentangled latent representations. We use a general energy minimization algorithm for posterior inference of latent variables given novel images. Therefore, the learned generative models show excellent quantitative and visual results in the tasks of attribute-conditioned image reconstruction and completion.
Published: 2016

32. Object tracking via dual linear structured SVM and explicit feature map

Author: Ming-Hsuan Yang, Shaojie Jiang, Lei Zhang, Jifeng Ning, and Jimei Yang
Subjects: Structured support vector machine, Robustness (computer science), Computer science, business.industry, Video tracking, 0202 electrical engineering, electronic engineering, information engineering, 020207 software engineering, 020201 artificial intelligence & image processing, Pattern recognition, 02 engineering and technology, Artificial intelligence, business, Classifier (UML)
Abstract: Structured support vector machine (SSVM) based methods have demonstrated encouraging performance in recent object tracking benchmarks. However, the complex and expensive optimization limits their deployment in real-world applications. In this paper, we present a simple yet efficient dual linear SSVM (DLSSVM) algorithm to enable fast learning and execution during tracking. By analyzing the dual variables, we propose a primal classifier update formula where the learning step size is computed in closed form. This online learning method significantly improves the robustness of the proposed linear SSVM with lower computational cost. Second, we approximate the intersection kernel for feature representations with an explicit feature map to further improve tracking performance. Finally, we extend the proposed DLSSVM tracker with multi-scale estimation to address the "drift" problem. Experimental results on large benchmark datasets with 50 and 100 video sequences show that the proposed DLSSVM tracking algorithm achieves state-of-the-art performance.

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

32 results on '"Jimei Yang"'

1. 3D Ken Burns effect from a single image

2. Attribute-Conditioned Layout GAN for Automatic Graphic Design

3. Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

4. LayoutGAN: Synthesizing Graphic Layouts With Vector-Wireframe Adversarial Networks

5. AIM 2020 Challenge on Image Extreme Inpainting

6. High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling

7. Contact and Human Dynamics from Monocular Video

8. Data Fusion in Forecasting Medical Demands based on Spectrum of Post-Earthquake Diseases

9. Top-Down Visual Saliency via Joint CRF and Dictionary Learning

10. On the Continuity of Rotation Representations in Neural Networks

11. Foreground-Aware Image Inpainting

12. Multimodal Style Transfer via Graph Cuts

13. FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images

14. Brush stroke synthesis with a generative adversarial network driven by physically based simulation

15. Neural Kinematic Networks for Unsupervised Motion Retargetting

16. MAttNet: Modular Attention Network for Referring Expression Comprehension

17. PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image

18. Generative Image Inpainting with Contextual Attention

19. Flow-Grounded Spatial-Temporal Video Prediction from Still Images

20. Video Scene Parsing with Predictive Feature Learning

21. FoveaNet: Perspective-Aware Urban Scene Parsing

22. 3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks

23. Material Editing Using a Physically Based Rendering Network

24. Diversified Texture Synthesis with Feed-Forward Networks

25. Transformation-Grounded Image Generation Network for Novel 3D View Synthesis

26. Forecasting Human Dynamics from Static Images

27. Deep GrabCut for Object Selection

28. Generative Face Completion

29. Object Contour Detection with a Fully Convolutional Encoder-Decoder Network

30. Deep Interactive Object Selection

31. Attribute2Image: Conditional Image Generation from Visual Attributes

32. Object tracking via dual linear structured SVM and explicit feature map

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

32 results on '"Jimei Yang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources