348 results on '"Tat-Jen Cham"'
Search Results
102. Video Editing Using Figure Tracking and Image-Based Rendering.
- Author
-
James M. Rehg, Sing Bing Kang, and Tat-Jen Cham
- Published
- 2000
- Full Text
- View/download PDF
103. Near Duplicate Identification With Spatially Aligned Pyramid Matching.
- Author
-
Dong Xu 0001, Tat-Jen Cham, Shuicheng Yan, Lixin Duan, and Shih-Fu Chang
- Published
- 2010
- Full Text
- View/download PDF
104. Face and Human Gait Recognition Using Image-to-Class Distance.
- Author
-
Yi Huang 0014, Dong Xu 0001, and Tat-Jen Cham
- Published
- 2010
- Full Text
- View/download PDF
105. Dynamic Feature Ordering for Efficient Registration.
- Author
-
Tat-Jen Cham and James M. Rehg
- Published
- 1999
- Full Text
- View/download PDF
106. A Dynamic Bayesian Network Approach to Figure Tracking using Learned Dynamic Models.
- Author
-
Vladimir Pavlovic 0001, James M. Rehg, Tat-Jen Cham, and Kevin P. Murphy
- Published
- 1999
- Full Text
- View/download PDF
107. A Multiple Hypothesis Approach to Figure Tracking.
- Author
-
Tat-Jen Cham and James M. Rehg
- Published
- 1999
- Full Text
- View/download PDF
108. A Statistical Framework for Long-Range Feature Matching in Uncalibrated Image Mosaicing.
- Author
-
Tat-Jen Cham and Roberto Cipolla
- Published
- 1998
- Full Text
- View/download PDF
109. Stereo Coupled Active Contours.
- Author
-
Tat-Jen Cham and Roberto Cipolla
- Published
- 1997
- Full Text
- View/download PDF
110. Shadow Elimination and Blinding Light Suppression for Interactive Projected Displays.
- Author
-
Jay Summet, Matthew Flagg, Tat-Jen Cham, James M. Rehg, and Rahul Sukthankar
- Published
- 2007
- Full Text
- View/download PDF
111. Automated B-Spline Curve Representation with MDL-based Active Contours.
- Author
-
Tat-Jen Cham and Roberto Cipolla
- Published
- 1996
- Full Text
- View/download PDF
112. Geometric Saliency of Curve Correspondances and Grouping of Symmetric Comntours.
- Author
-
Tat-Jen Cham and Roberto Cipolla
- Published
- 1996
- Full Text
- View/download PDF
113. Skewed Symmetry Detection Through Local Skewed Symmetries.
- Author
-
Tat-Jen Cham and Roberto Cipolla
- Published
- 1994
- Full Text
- View/download PDF
114. A local approach to recovering global skewed symmetry.
- Author
-
Tat-Jen Cham and Roberto Cipolla
- Published
- 1994
- Full Text
- View/download PDF
115. Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes.
- Author
-
Hai Xuan Pham, Chongyu Chen, Luc N. Dao, Vladimir Pavlovic 0001, Jianfei Cai 0001, and Tat-Jen Cham
- Published
- 2015
116. Determination of nuclear position by the arrangement of actin filaments using deep generative networks
- Author
-
Javier Fernández, Chuanxia Zheng, Jyothsna Vasudevan, J. G. Wan, Lim Chwee Teck, and Tat-Jen Cham
- Subjects
Cytoskeleton organization ,business.industry ,Deep learning ,Protein filament ,Cell nucleus ,Mechanobiology ,medicine.anatomical_structure ,medicine ,Artificial intelligence ,Cytoskeleton ,Biological system ,business ,Nucleus ,Actin - Abstract
The cell nucleus is a dynamic structure that changes locales during cellular processes such as proliferation, differentiation, or migration1, and its mispositioning is a hallmark of several disorders2,3. As with most mechanobiological activities of adherent cells, the repositioning and anchoring of the nucleus are presumed to be associated with the organization of the cytoskeleton, the network of protein filaments providing structural integrity to the cells4. However, this correlation between cytoskeleton organization and nuclear position has not, to date, been demonstrated, as it would require the parameterization of the extraordinarily intricate cytoskeletal fiber arrangements. Here, we show that this parameterization and demonstration can be achieved outside the limits of human conceptualization, using generative network and raw microscope images, relying on machine-driven interpretation and selection of parameterizable features. The developed transformer-based architecture was able to generate high-quality, completed images of more than 8,000 cells, using only information on actin filaments, predicting the presence of a nucleus and its exact localization in more than 70 per cent of instances. Our results demonstrate one of the most basic principles of mechanobiology with a remarkable level of significance. They also highlight the role of deep learning as a powerful tool in biology beyond data augmentation and analysis, capable of interpreting—unconstrained by the principles of human reasoning—complex biological systems from qualitative data.
- Published
- 2021
- Full Text
- View/download PDF
117. Detection with multi-exit asymmetric boosting.
- Author
-
Minh-Tri Pham, V-D. D. Hoang, and Tat-Jen Cham
- Published
- 2008
- Full Text
- View/download PDF
118. Near duplicate image identification with patially Aligned Pyramid Matching.
- Author
-
Dong Xu 0001, Tat-Jen Cham, Shuicheng Yan, and Shih-Fu Chang
- Published
- 2008
- Full Text
- View/download PDF
119. Online Learning Asymmetric Boosted Classifiers for Object Detection.
- Author
-
Minh-Tri Pham and Tat-Jen Cham
- Published
- 2007
- Full Text
- View/download PDF
120. High Distortion and Non-Structural Image Matching via Feature Co-occurrence.
- Author
-
Xi Chen 0069 and Tat-Jen Cham
- Published
- 2007
- Full Text
- View/download PDF
121. Automated B-Spline Curve Representation Incorporating MDL and Error-Minimizing Control Point Insertion Strategies.
- Author
-
Tat-Jen Cham and Roberto Cipolla
- Published
- 1999
- Full Text
- View/download PDF
122. The Spatially-Correlative Loss for Various Image Translation Tasks
- Author
-
Tat-Jen Cham, Jianfei Cai, Chuanxia Zheng, School of Computer Science and Engineering, and IEEE Conference on Computer Vision and Pattern Recognition
- Subjects
FOS: Computer and information sciences ,Correlative ,Translation Tasks ,Matching (graph theory) ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Pattern recognition ,Spatially-correlative Loss ,Translation (geometry) ,Domain (software engineering) ,Image (mathematics) ,Visualization ,Consistency (database systems) ,Computer science and engineering::Computing methodologies::Pattern recognition [Engineering] ,Image translation ,Artificial intelligence ,business - Abstract
We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired image-to-image (I2I) translation. Previous methods attempt this by using pixel-level cycle-consistency or feature-level matching losses, but the domain-specific nature of these losses hinder translation across large domain gaps. To address this, we exploit the spatial patterns of self-similarity as a means of defining scene structure. Our spatially-correlative loss is geared towards only capturing spatial relationships within an image rather than domain appearance. We also introduce a new self-supervised learning method to explicitly learn spatially-correlative maps for each specific translation task. We show distinct improvement over baseline models in all three modes of unpaired I2I translation: single-modal, multi-modal, and even single-image translation. This new loss can easily be integrated into existing network architectures and thus allows wide applicability., 14 pages, 12 figures
- Published
- 2021
- Full Text
- View/download PDF
123. Symmetry detection through local skewed symmetries.
- Author
-
Tat-Jen Cham and Roberto Cipolla
- Published
- 1995
- Full Text
- View/download PDF
124. Near duplicate identification with spatially aligned pyramid matching
- Author
-
Dong Xu, Tat Jen Cham, Shuicheng Yan, Lixin Duan, and Shih-Fu Chang
- Subjects
Image processing -- Analysis ,Scalability -- Analysis ,Video equipment -- Design and construction ,Business ,Computers ,Electronics ,Electronics and electrical industries - Published
- 2010
125. Face and human gait recognition using image-to-class distance
- Author
-
Yi Huang, Dong Xu, and Tat-Jen Cham
- Subjects
Image processing -- Analysis ,Business ,Computers ,Electronics ,Electronics and electrical industries - Published
- 2010
126. Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition
- Author
-
Jianfei Cai, Duy-Son Dao, Tat-Jen Cham, Chuanxia Zheng, and Guoxian Song
- Subjects
FOS: Computer and information sciences ,Focus (computing) ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Object (computer science) ,Bridge (nautical) ,Domain (software engineering) ,Image (mathematics) ,Artificial Intelligence ,Pattern recognition (psychology) ,Segmentation ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Existing scene understanding systems mainly focus on recognizing the visible parts of a scene, ignoring the intact appearance of physical objects in the real-world. Concurrently, image completion has aimed to create plausible appearance for the invisible regions, but requires a manual mask as input. In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene. Particularly, we built a system to decompose a scene into individual objects, infer their underlying occlusion relationships, and even automatically learn which parts of the objects are occluded that need to be completed. In order to disentangle the occluded relationships of all objects in a complex scene, we use the fact that the front object without being occluded is easy to be identified, detected, and segmented. Our system interleaves the two tasks of instance segmentation and scene completion through multiple iterations, solving for objects layer-by-layer. We first provide a thorough experiment using a new realistically rendered dataset with ground-truths for all invisible regions. To bridge the domain gap to real imagery where ground-truths are unavailable, we then train another model with the pseudo-ground-truths generated from our trained synthesis model. We demonstrate results on a wide variety of datasets and show significant improvement over the state-of-the-art., Comment: 20 pages, 16 pages
- Published
- 2021
- Full Text
- View/download PDF
127. Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation
- Author
-
Minghui Hu, Yujie Wang, Tat-Jen Cham, Jianfei Yang, and P.N. Suganthan
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
The integration of Vector Quantised Variational AutoEncoder (VQ-VAE) with autoregressive models as generation part has yielded high-quality results on image generation. However, the autoregressive models will strictly follow the progressive scanning order during the sampling phase. This leads the existing VQ series models to hardly escape the trap of lacking global information. Denoising Diffusion Probabilistic Models (DDPM) in the continuous domain have shown a capability to capture the global context, while generating high-quality images. In the discrete state space, some works have demonstrated the potential to perform text generation and low resolution image generation. We show that with the help of a content-rich discrete visual codebook from VQ-VAE, the discrete diffusion model can also generate high fidelity images with global context, which compensates for the deficiency of the classical autoregressive model along pixel space. Meanwhile, the integration of the discrete VAE with the diffusion model resolves the drawback of conventional autoregressive models being oversized, and the diffusion model which demands excessive time in the sampling process when generating images. It is found that the quality of the generated images is heavily dependent on the discrete visual codebook. Extensive experiments demonstrate that the proposed Vector Quantised Discrete Diffusion Model (VQ-DDM) is able to achieve comparable performance to top-tier methods with low complexity. It also demonstrates outstanding advantages over other vectors quantised with autoregressive models in terms of image inpainting tasks without additional training.
- Published
- 2021
- Full Text
- View/download PDF
128. Shading-Based Surface Detail Recovery Under General Unknown Illumination
- Author
-
Jianfei Cai, Tat-Jen Cham, Juyong Zhang, Jianmin Zheng, Qi Duan, Di Xu, School of Computer Science and Engineering, and Institute for Media Innovation (IMI)
- Subjects
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Iterative reconstruction ,Shape from Shading ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics ,Augmented Lagrangian method ,business.industry ,Applied Mathematics ,3D reconstruction ,020207 software engineering ,Vertex (geometry) ,Visual hull ,Photometric stereo ,Computer science and engineering::Computing methodologies::Image processing and computer vision [Engineering] ,Computational Theory and Mathematics ,Piecewise ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,3D Reconstruction ,business ,Software ,Surface reconstruction - Abstract
Reconstructing the shape of a 3D object from multi-view images under unknown, general illumination is a fundamental problem in computer vision. High quality reconstruction is usually challenging especially when fine detail is needed and the albedo of the object is non-uniform. This paper introduces vertex overall illumination vectors to model the illumination effect and presents a total variation (TV) based approach for recovering surface details using shading and multi-view stereo (MVS). Behind the approach are the two important observations: (1) the illumination over the surface of an object often appears to be piecewise smooth and (2) the recovery of surface orientation is not sufficient for reconstructing the surface, which was often overlooked previously. Thus we propose to use TV to regularize the overall illumination vectors and use visual hull to constrain partial vertices. The reconstruction is formulated as a constrained TV-minimization problem that simultaneously treats the shape and illumination vectors as unknowns. An augmented Lagrangian method is proposed to quickly solve the TV-minimization problem. As a result, our approach is robust, stable and is able to efficiently recover high-quality surface details even when starting with a coarse model obtained using MVS. These advantages are demonstrated by extensive experiments on the state-of-the-art MVS database, which includes challenging objects with varying albedo. NRF (Natl Research Foundation, S’pore) MOE (Min. of Education, S’pore)
- Published
- 2018
- Full Text
- View/download PDF
129. Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks
- Author
-
Liuhao Ge, Jianfei Cai, Junsong Yuan, Nadia Magnenat Thalmann, Yujun Cai, Jun Liu, Tat-Jen Cham, School of Computer Science and Engineering, School of Electrical and Electronic Engineering, Interdisciplinary Graduate School (IGS), 2019 IEEE International Conference on Computer Vision (ICCV 19), and Institute for Media Innovation (IMI)
- Subjects
3D Pose Estimation ,business.industry ,Computer science ,Feature extraction ,020206 networking & telecommunications ,Graph theory ,02 engineering and technology ,3D pose estimation ,Machine learning ,computer.software_genre ,Object detection ,Computer science and engineering::Computing methodologies::Image processing and computer vision [Engineering] ,Graph Convolutional Neural Network (GCN) ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Pose - Abstract
Despite great progress in 3D pose estimation from single-view images or videos, it remains a challenging task due to the substantial depth ambiguity and severe selfocclusions. Motivated by the effectiveness of incorporating spatial dependencies and temporal consistencies to alleviate these issues, we propose a novel graph-based method to tackle the problem of 3D human body and 3D hand pose estimation from a short sequence of 2D joint detections. Particularly, domain knowledge about the human hand (body) configurations is explicitly incorporated into the graph convolutional operations to meet the specific demand of the 3D pose estimation. Furthermore, we introduce a local-to-global network architecture, which is capable of learning multi-scale features for the graph-based representations. We evaluate the proposed method on challenging benchmark datasets for both 3D hand pose estimation and 3D body pose estimation. Experimental results show that our method achieves state-of-the-art performance on both tasks. Accepted version
- Published
- 2019
- Full Text
- View/download PDF
130. Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking
- Author
-
Vladimir Pavlovic, Tat-Jen Cham, Lu Sheng, King Ngi Ngan, Jianfei Cai, School of Computer Science and Engineering, and Institute for Media Innovation (IMI)
- Subjects
FOS: Computer and information sciences ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Point cloud ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,Facial recognition system ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Generative Model ,Computer vision ,Pose ,ComputingMethodologies_COMPUTERGRAPHICS ,Facial expression ,business.industry ,Applied Mathematics ,Visibility (geometry) ,3D Facial Pose Tracking ,Generative model ,Computational Theory and Mathematics ,Computer science and engineering::Computing methodologies::Image processing and computer vision [Engineering] ,Video tracking ,Face (geometry) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations. Specifically, we introduce a statistical 3D morphable model that flexibly describes the distribution of points on the surface of the face model, with an efficient switchable online adaptation that gradually captures the identity of the tracked subject and rapidly constructs a suitable face model when the subject changes. Moreover, unlike prior art that employed ICP-based facial pose estimation, to improve robustness to occlusions, we propose a ray visibility constraint that regularizes the pose based on the face model's visibility with respect to the input point cloud. Ablation studies and experimental results on Biwi and ICT-3DHP datasets demonstrate that the proposed framework is effective and outperforms completing state-of-the-art depth-based methods. NRF (Natl Research Foundation, S’pore) MOE (Min. of Education, S’pore)
- Published
- 2019
131. Video-based Human Action Classi.cation with Ambiguous Correspondences.
- Author
-
Zhou Feng and Tat-Jen Cham
- Published
- 2005
- Full Text
- View/download PDF
132. A Theory for Photometric Self-Calibration of Multiple Overlapping Projectors and Cameras.
- Author
-
Peng Song 0010 and Tat-Jen Cham
- Published
- 2005
- Full Text
- View/download PDF
133. Self-Calibrating Camera Projector Systems for Interactive Displays and Presentations.
- Author
-
Rahul Sukthankar, Tat-Jen Cham, Gita Sukthankar, James M. Rehg, David Hsu, and Thomas Leung
- Published
- 2001
- Full Text
- View/download PDF
134. Shading‐based surface recovery using subdivision‐based representation
- Author
-
Jianfei Cai, Teng Deng, Jianmin Zheng, Tat-Jen Cham, School of Computer Science and Engineering, and Institute for Media Innovation (IMI)
- Subjects
Surface recovery ,Computer science ,business.industry ,Representation (systemics) ,020207 software engineering ,02 engineering and technology ,Scene Analysis ,Computer Graphics and Computer-Aided Design ,Image-based Modelling ,Computer science and engineering::Computing methodologies::Image processing and computer vision [Engineering] ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Computer science and engineering::Computing methodologies::Computer graphics [Engineering] ,Shading ,Artificial intelligence ,business ,Subdivision - Abstract
This paper presents subdivision‐based representations for both lighting and geometry in shape‐from‐shading. A very recent shading‐based method introduced a per‐vertex overall illumination model for surface reconstruction, which has advantage of conveniently handling complicated lighting condition and avoiding explicit estimation of visibility and varied albedo. However, due to its discrete nature, the per‐vertex overall illumination requires a large amount of memory and lacks intrinsic coherence. To overcome these problems, in this paper we propose to use classic subdivision to define the basic smooth lighting function and surface, and introduce additional independent variables into the subdivision to adaptively model sharp changes of illumination and geometry. Compared to previous works, the new model not only preserves the merits of the per‐vertex illumination model, but also greatly reduces the number of variables required in surface recovery and intrinsically regularizes the illumination vectors and the surface. These features make the new model very suitable for multi‐view stereo surface reconstruction under general, unknown illumination condition. Particularly, a variational surface reconstruction method built upon the subdivision representations for lighting and geometry is developed. The experiments on both synthetic and real‐world data sets have demonstrated that the proposed method can achieve memory efficiency and improve surface detail recovery. NRF (Natl Research Foundation, S’pore) MOE (Min. of Education, S’pore)
- Published
- 2019
135. Towards a switchable AR/VR near-eye display with accommodation-vergence and eyeglass prescription support
- Author
-
Andrei State, Xinxing Xia, Y. Q. Guan, Kishore Rathinavel, Tat-Jen Cham, Henry Fuchs, Praneeth Chakravarthula, School of Computer Science and Engineering, and Institute for Media Innovation (IMI)
- Subjects
business.product_category ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Eyeglass prescription ,Holography ,02 engineering and technology ,Vergence ,Virtual reality ,law.invention ,Computer graphics ,law ,0202 electrical engineering, electronic engineering, information engineering ,Computer Graphics ,Image Processing, Computer-Assisted ,Humans ,Computer vision ,Focus (computing) ,Augmented Reality ,business.industry ,Virtual Reality ,020207 software engineering ,Equipment Design ,Computer Graphics and Computer-Aided Design ,Lens (optics) ,Eyeglasses ,Signal Processing ,Computer science and engineering [Engineering] ,Augmented reality ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Accommodation ,Software - Abstract
In this paper, we present our novel design for switchable AR/VR near-eye displays which can help solve the vergence-accommodation-conflict issue. The principal idea is to time-multiplex virtual imagery and real-world imagery and use a tunable lens to adjust focus for the virtual display and the see-through scene separately. With this novel design, prescription eyeglasses for near- and far-sighted users become unnecessary. This is achieved by integrating the wearer's corrective optical prescription into the tunable lens for both virtual display and see-through environment. We built a prototype based on the design, comprised of micro-display, optical systems, a tunable lens, and active shutters. The experimental results confirm that the proposed near-eye display design can switch between AR and VR and can provide correct accommodation for both. NRF (Natl Research Foundation, S’pore) Accepted version
- Published
- 2019
136. SubdSH: Subdivision-based Spherical Harmonics Field for Real-time Shading-based Refinement under Challenging Unknown Illumination
- Author
-
Tat-Jen Cham, Teng Deng, Jianmin Zheng, Jianfei Cai, School of Computer Science and Engineering, 2018 IEEE Visual Communications and Image Processing (VCIP), and Institute for Media Innovation (IMI)
- Subjects
Surface (mathematics) ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Spherical harmonics ,020207 software engineering ,02 engineering and technology ,Function (mathematics) ,Field (computer science) ,Lighting Model ,Computer Science::Graphics ,Quality (physics) ,Photometric stereo ,Shape-from-shading ,0202 electrical engineering, electronic engineering, information engineering ,Computer science and engineering [Engineering] ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Shading ,business ,ComputingMethodologies_COMPUTERGRAPHICS ,Subdivision - Abstract
This paper presents a spatial-varying illumination model for shading-based depth refinement that based on a smooth Spherical Harmonics (SH) lighting field. The proposed lighting model is able to recover shading under challenging unknown lighting conditions, thus improving the quality of recovered surface detail. To avoid over-parameterization, local lighting coefficients are treated as a vector-valued function which is represented by subdivided surfaces using Catmull-Clark subdivision. We solve our lighting model utilizing a highly parallelized scheme that recovers lighting in a few milliseconds. A real-time shading-based depth recovery system is implemented with the integration of our proposed lighting model. We conduct quantitative and qualitative evaluations on both synthetic and real world datasets under challenging illumination. The experimental results show our method outperforms the state-of-the-art real-time shading-based depth refinement system. NRF (Natl Research Foundation, S’pore) MOE (Min. of Education, S’pore)
- Published
- 2018
- Full Text
- View/download PDF
137. Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition
- Author
-
Gang Wang, Jiwen Lu, Tat-Jen Cham, Jianfei Cai, and Anran Wang
- Subjects
Modality (human–computer interaction) ,Artificial neural network ,Computer science ,business.industry ,3D single-object recognition ,Deep learning ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Cognitive neuroscience of visual object recognition ,Pattern recognition ,Semi-supervised learning ,Computer Science Applications ,Discriminative model ,Feature (computer vision) ,Signal Processing ,Media Technology ,Feature (machine learning) ,RGB color model ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Feature learning - Abstract
Most existing feature learning-based methods for RGB-D object recognition either combine RGB and depth data in an undifferentiated manner from the outset, or learn features from color and depth separately, which do not adequately exploit different characteristics of the two modalities or utilize the shared relationship between the modalities. In this paper, we propose a general CNN-based multi-modal learning framework for RGB-D object recognition. We first construct deep CNN layers for color and depth separately, which are then connected with a carefully designed multi-modal layer. This layer is designed to not only discover the most discriminative features for each modality, but is also able to harness the complementary relationship between the two modalities. The results of the multi-modal layer are back-propagated to update parameters of the CNN layers, and the multi-modal feature learning and the back-propagation are iteratively performed until convergence. Experimental results on two widely used RGB-D object datasets show that our method for general multi-modal learning achieves comparable performance to state-of-the-art methods specifically designed for RGB-D data.
- Published
- 2015
- Full Text
- View/download PDF
138. Real-Time and Temporal-Coherent Foreground Extraction With Commodity RGBD Camera
- Author
-
Jianfei Cai, Chi-Wing Fu, Mengyao Zhao, and Tat-Jen Cham
- Subjects
Computer science ,business.industry ,Flicker ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Human motion ,Frame rate ,CUDA ,Kernel (image processing) ,Signal Processing ,Computer vision ,Extraction methods ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Spatial analysis ,Image object - Abstract
Foreground extraction from video stream is an important component in many multimedia applications. By exploiting commodity RGBD cameras, we could further extract dynamic foreground objects with 3D information in real-time, thereby enabling new forms of multimedia applications such as 3D telepresence. However, one critical problem with existing methods for real-time foreground extraction is temporal coherency. They could exhibit severe flickering results for foreground objects such as human motion, thus affecting the visual quality as well as the image object analysis in the multimedia applications. This paper presents a new GPU-based real-time foreground extraction method with several novel techniques. First, we detect shadow and fill missing depth data accordingly in RGBD video, and then adaptively combine color and depth masks to form a trimap. After that, we formulate a novel closed-form matting model to improve the temporal coherency in foreground extraction while achieving real-time performance. Particularly, we propagate RGBD data across temporal domain to improve the visual coherence in the foreground object extraction, and take advantage of various CUDA strategies and spatial data structures to improve the speed. Experiments with a number of users on different scenarios show that, compared with state-of-the-art methods, our method can extract stabler foreground objects with higher visual quality as well as better temporal coherency, while still achieving real-time performance (experimentally, 30.3 frames per second on average).
- Published
- 2015
- Full Text
- View/download PDF
139. Structure-aware multimodal feature fusion for RGB-D scene classification and beyond
- Author
-
Jiwen Lu, Jianfei Cai, Tat-Jen Cham, Anran Wang, School of Computer Science and Engineering, and Institute for Media Innovation (IMI)
- Subjects
Computer Networks and Communications ,Computer science ,Multimodal Analytics ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Regularization (mathematics) ,Convolutional neural network ,Discriminative model ,Component (UML) ,0202 electrical engineering, electronic engineering, information engineering ,Representation (mathematics) ,0105 earth and related environmental sciences ,Modalities ,business.industry ,Cognitive neuroscience of visual object recognition ,Pattern recognition ,Feature Fusion ,ComputingMethodologies_PATTERNRECOGNITION ,Computer science and engineering::Computing methodologies::Image processing and computer vision [Engineering] ,Hardware and Architecture ,RGB color model ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
While convolutional neural networks (CNNs) have been excellent for object recognition, the greater spatial variability in scene images typically means that the standard full-image CNN features are suboptimal for scene classification. In this article, we investigate a framework allowing greater spatial flexibility, in which the Fisher vector (FV)-encoded distribution of local CNN features, obtained from a multitude of region proposals per image, is considered instead. The CNN features are computed from an augmented pixel-wise representation consisting of multiple modalities of RGB, HHA, and surface normals, as extracted from RGB-D data. More significantly, we make two postulates: (1) component sparsity—that only a small variety of region proposals and their corresponding FV GMM components contribute to scene discriminability, and (2) modal nonsparsity—that features from all modalities are encouraged to coexist. In our proposed feature fusion framework, these are implemented through regularization terms that apply group lasso to GMM components and exclusive group lasso across modalities. By learning and combining regressors for both proposal-based FV features and global CNN features, we are able to achieve state-of-the-art scene classification performance on the SUNRGBD Dataset and NYU Depth Dataset V2. Moreover, we further apply our feature fusion framework on an action recognition task to demonstrate that our framework can be generalized for other multimodal well-structured features. In particular, for action recognition, we enforce interpart sparsity to choose more discriminative body parts, and intermodal nonsparsity to make informative features from both appearance and motion modalities coexist. Experimental results on the JHMDB and MPII Cooking Datasets show that our feature fusion is also very effective for action recognition, achieving very competitive performance compared with the state of the art.
- Published
- 2018
140. A Generative Model for Depth-Based Robust 3D Facial Pose Tracking
- Author
-
King Ngi Ngan, Jianfei Cai, Vladimir Pavlovic, Lu Sheng, Tat-Jen Cham, School of Computer Science and Engineering, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), and Institute for Media Innovation (IMI)
- Subjects
Facial expression ,Computer science ,business.industry ,Computer Vision ,Visibility (geometry) ,Point cloud ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Face Recognition ,Generative model ,Discriminative model ,Robustness (computer science) ,Face (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,Computer science and engineering [Engineering] ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Pose - Abstract
We consider the problem of depth-based robust 3D facial pose tracking under unconstrained scenarios with heavy occlusions and arbitrary facial expression variations. Unlike the previous depth-based discriminative or data-driven methods that require sophisticated training or manual intervention, we propose a generative framework that unifies pose tracking and face model adaptation on-the-fly. Particularly, we propose a statistical 3D face model that owns the flexibility to generate and predict the distribution and uncertainty underlying the face model. Moreover, unlike prior arts employing the ICP-based facial pose estimation, we propose a ray visibility constraint that regularizes the pose based on the face models visibility against the input point cloud, which augments the robustness against the occlusions. The experimental results on Biwi and ICT-3DHP datasets reveal that the proposed framework is effective and outperforms the state-of-the-art depth-based methods. NRF (Natl Research Foundation, S’pore) MOE (Min. of Education, S’pore) Accepted version
- Published
- 2017
- Full Text
- View/download PDF
141. FaceCollage : a rapidly deployable system for real-time head reconstruction for on-the-go 3D telepresence
- Author
-
Jianfei Cai, Chi-Wing Fu, Fuwen Tan, Teng Deng, Tat-Jen Cham, School of Computer Science and Engineering, 25th ACM international conference on Multimedia, and Institute for Media Innovation (IMI)
- Subjects
3D Telepresence ,Computer science ,business.industry ,Robustness (computer science) ,RGBD Sensors ,0202 electrical engineering, electronic engineering, information engineering ,Computer science and engineering [Engineering] ,020207 software engineering ,020201 artificial intelligence & image processing ,Computer vision ,02 engineering and technology ,Artificial intelligence ,business - Abstract
This paper presents FaceCollage, a robust and real-time system for head reconstruction that can be used to create easy-to-deploy telepresence systems, using a pair of consumer-grade RGBD cameras that provide a wide range of views of the reconstructed user. A key feature is that the system is very simple to rapidly deploy, with autonomous calibration and requiring minimal intervention from the user, other than casually placing the cameras. This system is realized through three technical contributions: (1) a fully automatic calibration method, which analyzes and correlates the left and right RGBD faces just by the face features; (2) an implementation that exploits the parallel computation capability of GPU throughout most of the system pipeline, in order to attain real-time performance; and (3) a complete integrated system on which we conducted various experiments to demonstrate its capability, robustness, and performance, including testing the system on twelve participants with visually-pleasing results. NRF (Natl Research Foundation, S’pore)
- Published
- 2017
142. Multiple consumer-grade depth camera registration using everyday objects
- Author
-
Jianfei Cai, Jianmin Zheng, Teng Deng, Tat-Jen Cham, School of Computer Science and Engineering, and Institute for Media Innovation (IMI)
- Subjects
Computer science ,Calibration (statistics) ,business.industry ,media_common.quotation_subject ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Depth Camera Calibration ,020207 software engineering ,02 engineering and technology ,Depth Camera ,3d shapes ,Extrinsic calibration ,Task (project management) ,Computer science and engineering::Computing methodologies::Image processing and computer vision [Engineering] ,Camera auto-calibration ,Computer graphics (images) ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Quality (business) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,media_common - Abstract
The registration of multiple consumer-grade depth sensors is a challenging task due to noisy and systematic distortions in depth measurements. Most of the existing works heavily rely on large number of checkerboard observations for calibration and registration of multiple depth cameras, which is tedious and not flexible. In this paper, we propose a more practical method for conducting and maintaining registration of multi-depth sensors, via replacing checkerboards with everyday objects found in the scene, such as regular furniture. Particularly, high quality pre-scanned 3D shapes of standard furniture are used as calibration targets. We propose a unified framework that jointly computes the optimal extrinsic calibration and depth correction parameters. Experimental results show that our proposed method significantly outperforms state-of-the-art depth camera registration methods. NRF (Natl Research Foundation, S’pore) MOE (Min. of Education, S’pore)
- Published
- 2017
143. Robust real-time performance-driven 3D face tracking
- Author
-
Jianfei Cai, Hai Xuan Pham, Vladimir Pavlovic, and Tat-Jen Cham
- Subjects
business.industry ,Computer science ,Facial motion capture ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Point cloud ,020207 software engineering ,Tracking system ,02 engineering and technology ,Tracking (particle physics) ,Face (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,RGB color model ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business - Abstract
We introduce a novel robust hybrid 3D face tracking framework from RGBD video streams, which is capable of tracking head pose and facial actions without pre-calibration or intervention from a user. In particular, we emphasize on improving the tracking performance in instances where the tracked subject is at a large distance from the cameras, and the quality of point cloud deteriorates severely. This is accomplished by the combination of a flexible 3D shape regressor and the joint 2D+3D optimization on shape parameters. Our approach fits facial blendshapes to the point cloud of the human head, while being driven by an efficient and rapid 3D shape regressor trained on generic RGB datasets. As an on-line tracking system, the identity of the unknown user is adapted on-the-fly resulting in improved 3D model reconstruction and consequently better tracking performance. The result is a robust RGBD face tracker capable of handling a wide range of target scene depths, whose performances are demonstrated in our extensive experiments better than those of the state-of-the-arts.
- Published
- 2016
- Full Text
- View/download PDF
144. 3D faces are recognized more accurately and faster than 2D faces, but with similar inversion effects
- Author
-
Hong Xu, Yu Guo, Z.H.D. Eng, Shen-Hsing Annabel Chen, M. Reiner, Tat-Jen Cham, Y.Y. Yick, School of Computer Science and Engineering, School of Social Sciences, Lee Kong Chian School of Medicine (LKCMedicine), Centre for Research and Development in Learning (CRADLE), and Institute for Media Innovation (IMI)
- Subjects
Male ,Stereoscopy ,Facial recognition system ,Face Recognition ,050105 experimental psychology ,law.invention ,03 medical and health sciences ,Young Adult ,0302 clinical medicine ,Imaging, Three-Dimensional ,law ,Three-dimensional face recognition ,Humans ,0501 psychology and cognitive sciences ,Computer vision ,Face detection ,Communication ,business.industry ,05 social sciences ,Inversion (meteorology) ,Sensory Systems ,Ophthalmology ,Pattern Recognition, Visual ,Face ,Holistic Processing ,Female ,Artificial intelligence ,business ,Psychology ,Facial Recognition ,030217 neurology & neurosurgery ,Psychology::Consciousness and cognition [Social sciences] - Abstract
Recognition of faces typically occurs via holistic processing where individual features are combined to provide an overall facial representation. However, when faces are inverted, there is greater reliance on featural processing where faces are recognized based on their individual features. These findings are based on a substantial number of studies using 2-dimensional (2D) faces and it is unknown whether these results can be extended to 3-dimensional (3D) faces, which have more depth information that is absent in the typical 2D stimuli used in face recognition literature. The current study used the face inversion paradigm as a means to investigate how holistic and featural processing are differentially influenced by 2D and 3D faces. Twenty-five participants completed a delayed face-matching task consisting of upright and inverted faces that were presented as both 2D and 3D stereoscopic images. Recognition accuracy was significantly higher for 3D upright faces compared to 2D upright faces, providing support that the enriched visual information in 3D stereoscopic images facilitates holistic processing that is essential for the recognition of upright faces. Typical face inversion effects were also obtained, regardless of whether the faces were presented in 2D or 3D. Moreover, recognition performances for 2D inverted and 3D inverted faces did not differ. Taken together, these results demonstrated that 3D stereoscopic effects influence face recognition during holistic processing but not during featural processing. Our findings therefore provide a novel perspective that furthers our understanding of face recognition mechanisms, shedding light on how the integration of stereoscopic information in 3D faces influences face recognition processes. NRF (Natl Research Foundation, S’pore)
- Published
- 2016
145. MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition
- Author
-
Jianfei Cai, Anran Wang, Tat-Jen Cham, and Jiwen Lu
- Subjects
Computer science ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Cognitive neuroscience of visual object recognition ,Pattern recognition ,Modal ,ComputerApplications_MISCELLANEOUS ,RGB color model ,Computer vision ,Artificial intelligence ,business ,Feature learning ,Sparse matrix - Abstract
Most of the feature-learning methods for RGB-D object recognition either learn features from color and depth modalities separately, or simply treat RGB-D as undifferentiated four-channel data, which cannot adequately exploit the relationship between different modalities. Motivated by the intuition that different modalities should contain not only some modal-specific patterns but also some shared common patterns, we propose a multi-modal feature learning framework for RGB-D object recognition. We first construct deep CNN layers for color and depth separately, and then connect them with our carefully designed multi-modal layers, which fuse color and depth information by enforcing a common part to be shared by features of different modalities. In this way, we obtain features reflecting shared properties as well as modal-specific properties in different modalities. The information of the multi-modal learning frameworks is back-propagated to the early CNN layers. Experimental results show that our proposed multi-modal feature learning method outperforms state-of-the-art approaches on two widely used RGB-D object benchmark datasets.
- Published
- 2015
- Full Text
- View/download PDF
146. A unified feature selection framework for graph embedding on high dimensional data
- Author
-
Marcus Y. Chen, Tat-Jen Cham, Mingkui Tan, and Ivor W. Tsang
- Subjects
Clustering high-dimensional data ,Quadratically constrained quadratic program ,Graph embedding ,business.industry ,Computer science ,Pattern recognition ,Feature selection ,Data structure ,Computer Science Applications ,Data modeling ,Graph bandwidth ,Cardinality ,Computational Theory and Mathematics ,Feature (computer vision) ,Principal component analysis ,Artificial intelligence ,business ,Algorithm ,Sparse matrix ,Information Systems - Abstract
© 2014 IEEE. Although graph embedding has been a powerful tool for modeling data intrinsic structures, simply employing all features for data structure discovery may result in noise amplification. This is particularly severe for high dimensional data with small samples. To meet this challenge, this paper proposes a novel efficient framework to perform feature selection for graph embedding, in which a category of graph embedding methods is cast as a least squares regression problem. In this framework, a binary feature selector is introduced to naturally handle the feature cardinality in the least squares formulation. The resultant integral programming problem is then relaxed into a convex Quadratically Constrained Quadratic Program (QCQP) learning problem, which can be efficiently solved via a sequence of accelerated proximal gradient (APG) methods. Since each APG optimization is w.r.t. only a subset of features, the proposed method is fast and memory efficient. The proposed framework is applied to several graph embedding learning problems, including supervised, unsupervised, and semi-supervised graph embedding. Experimental results on several high dimensional data demonstrated that the proposed method outperformed the considered state-of-the-art methods.
- Published
- 2015
147. Objects co-segmentation: Propagated from simpler images
- Author
-
Tat-Jen Cham, Marcus Y. Chen, Ivor W. Tsang, Santiago Velasco-Forero, Tohoku University, WPI AIMR, Sendai, Tohoku University [Sendai], Centre de Morphologie Mathématique (CMM), MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Alcatel-Lucent Bell - Belgium (A-LBELL), Alcatel-Lucent, School of Computer Engineering [Singapore] (NTU), and School of Computer Engineering, Nanyang Technological University
- Subjects
Computer science ,business.industry ,Segmentation-based object categorization ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-space segmentation ,Image processing ,Pattern recognition ,difficult images ,Image segmentation ,image ranking ,seg-mentation propagation ,Minimum spanning tree-based segmentation ,Image texture ,Region growing ,Segmentation ,Computer vision ,Artificial intelligence ,Range segmentation ,business ,Index Terms— Co-segmentation ,Connected-component labeling ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing - Abstract
International audience; Recent works on image co-segmentation aim to segment common objects among image sets. These methods can co-segment simple images well, but their performance may degrade significantly on more cluttered images. In order to co-segment both simple and complex images well, this paper proposes a novel paradigm to rank images and to propagate the segmentation results from the simple images to more and more complex ones. In the experiments, the proposed paradigm demonstrates its effectiveness in segmenting large image sets with a wide variety in object appearance, sizes, orientations, poses, and multiple objects in one image. It out-performs the current state-of-the-art algorithms significantly, especially in difficult images.
- Published
- 2015
- Full Text
- View/download PDF
148. Semantic and Spatial Content Fusion for Scene Recognition
- Author
-
Elahe Farahzadeh, Wanqing Li, and Tat-Jen Cham
- Subjects
Computer science ,business.industry ,Feature vector ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Latent Dirichlet allocation ,Field (geography) ,symbols.namesake ,Discriminative model ,Map ,Feature (computer vision) ,symbols ,Embedding ,Artificial intelligence ,business ,Spatial analysis - Abstract
In the field of scene recognition, it is usually insufficient to use only one visual feature regardless of how discriminative the feature is. Therefore, the spatial location and semantic relationships of local features need to be captured together with the scene contextual information. In this paper we proposed a novel framework to project image contextual feature space with semantic space of local features into a map function. This embedding is performed based on a subset of training images denoted as an exemplar-set. This exemplar-set is composed of images that describe better the scene category’s attributes than the other images. The proposed framework learns a weighted combination of local semantic topics as well as global and spatial information, where the weights represent the features’ contributions in each scene category. An empirical study was performed on two of the most challenging scene datasets 15-Scene categories and 67-Indoor Scenes and the promising results of 89.47 and 45.0 were achieved respectively.
- Published
- 2015
- Full Text
- View/download PDF
149. Estimating spatial layout of rooms from RGB-D videos
- Author
-
Jianfei Cai, Gang Wang, Jiwen Lu, Tat-Jen Cham, and Anran Wang
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Trajectory ,RGB color model ,Contextual information ,Clutter ,Robotics ,Computer vision ,Artificial intelligence ,Simultaneous localization and mapping ,business ,Single frame - Abstract
Spatial layout estimation of indoor rooms plays an important role in many visual analysis applications such as robotics and human-computer interaction. While many methods have been proposed for recovering spatial layout of rooms in recent years, their performance is still far from satisfactory due to high occlusion caused by the presence of objects that clutter the scene. In this paper, we propose a new approach to estimate the spatial layout of rooms from RGB-D videos. Unlike most existing methods which estimate the layout from still images, RGB-D videos provide more spatial-temporal and depth information, which are helpful to improve the estimation performance because more contextual information can be exploited in RGB-D videos. Given a RGB-D video, we first estimate the spatial layout of the scene in each single frame and compute the camera trajectory using the simultaneous localization and mapping (SLAM) algorithm. Then, the estimated spatial layouts of different frames are integrated to infer temporally consistent layouts of the room throughout the whole video. Our method is evaluated on the NYU RGB-D dataset, and the experimental results show the efficacy of the proposed approach.
- Published
- 2014
- Full Text
- View/download PDF
150. Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo
- Author
-
Tat-Jen Cham, Jianfei Cai, Qi Duan, Jianming Zheng, Di Xu, and Juyong Zhang
- Subjects
Surface (mathematics) ,business.industry ,Augmented Lagrangian method ,Orientation (computer vision) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Object (computer science) ,Visual hull ,Pattern recognition (psychology) ,Piecewise ,Computer vision ,Shading ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics - Abstract
Reconstructing the shape of a 3D object from multi-view images under unknown, general illumination is a fundamental problem in computer vision and high quality reconstruction is usually challenging especially when high detail is needed. This paper presents a total variation (TV) based approach for recovering surface details using shading and multi-view stereo (MVS). Behind the approach are our two important observations: (1) the illumination over the surface of an object tends to be piecewise smooth and (2) the recovery of surface orientation is not sufficient for reconstructing geometry, which were previously overlooked. Thus we introduce TV to regularize the lighting and use visual hull to constrain partial vertices. The reconstruction is formulated as a constrained TVminimization problem that treats the shape and lighting as unknowns simultaneously. An augmented Lagrangian method is proposed to quickly solve the TV-minimization problem. As a result, our approach is robust, stable and is able to efficiently recover high quality of surface details even starting with a coarse MVS. These advantages are demonstrated by the experiments with synthetic and real world examples.
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.