11 results on '"Huang, Thomas"'
Search Results
2. Locating Nose-Tips and Estimating Head Poses in Images by Tensorposes.
- Author
-
Jilin Tu, Yun Fu, and Huang, Thomas S.
- Subjects
- *
ESTIMATION theory , *DATABASES , *IMAGING systems , *IMAGE processing , *SIMULATION methods & models , *MODELS & modelmaking , *INVARIANT subspaces - Abstract
This paper introduces a head pose estimation system that automatically localizes the nose-tips of the faces and estimates head poses in images simultaneously. In the training stage, the nose-tips of the faces are first manually labeled. The appearance variations caused by head pose changes are then characterized by a Tensorposes model. Given an image with unknown head pose and nose-tip location, the nose-tip of the face is automatically localized in a coarse-to-fine fashion after the skin color segmentation. The head pose is also estimated simultaneously. The performance of our system is evaluated on the Pointing'04 head pose image data set. We first evaluate the classification performance of the Tensorposes models with image patches of the faces cropped according to the manually labeled nose-tip locations of the faces in the Pointing '04 data set. By leaving-one-person-out evaluation strategy, we obtain the optimal parameters of the Tensorposes model, and evaluate the discriminative power of the Tensorposes model built based on high order singular value decomposition (HOSVD) and multilinear independent component analysis (MICA), and naive principal component analysis (PCA) subspace models. It is shown Tensorposes model by HOSVD and MICA decomposition performs similarly good but much better than naive PCA subspace models. The Tensorposes model is then utilized to automatically localize nose-tip location in the testing image and to simultaneously estimate the head pose. The nose-tip localization and pose estimation accuracy of the proposed system are evaluated against the ground truth. Finally, cross-database evaluation of the performance of our system is carried out on Pointing'04 database, a selected subset of CMU PIE database, and some pictures from CLEAR'07 head pose evaluation database. The experiments show that our system generalizes reasonably well to the real-world scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
3. Real-Time Multimodal Human--Avatar Interaction.
- Author
-
Yun Fu, Renxiang Li, Huang, Thomas S., and Danielsen, Mike
- Subjects
- *
THREE-dimensional imaging , *AVATARS (Virtual reality) , *JAVA programming language , *VISUAL communication , *HUMAN-machine systems , *AUDIOVISUAL materials , *MOBILE communication systems - Abstract
This paper presents a novel real-time multimodal human-avatar interaction (RTM-HAI) framework with vision-based remote animation control (RAC). The framework is designed for both mobile and desktop avatar-based human-machine or human-human visual communications in real-world scenarios. Using 3-D components stored in the Java mobile 3-D (M3G) file format, the avatar models can be flexibly constructed and customized on the fly on any mobile devices or systems that support the M3G standard. For the RAC head tracker, we propose a 2-D real-time face detection/tracking strategy through an interactive loop, in which the detection and tracking complement each other for efficient and reliable face localization, tolerating extreme user movement. With the face location robustly tracked, the RAC head tracker selects a main user and estimates the user's head rolling, tilting, yawing, scaling, horizontal, and vertical motion in order to generate avatar animation parameters. The animation parameters can be used either locally or remotely and can be transmitted through socket over the network. In addition, it integrates audio-visual analysis and synthesis modules to realize multichannel and runtime animations, visual TTS and real-time viseme detection and rendering. The framework is recognized as an effective design for future realistic industrial products of humanoid kiosk and human-to-human mobile communication. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
4. A Factor Graph Framework for Semantic Video Indexing.
- Author
-
Naphade, Milind Ramesh, Kozintsev, Igor V., and Huang, Thomas S.
- Subjects
- *
GRAPHIC methods , *SEMANTICS - Abstract
Presents a factor graph framework for semantic video indexing. Challenges in the research issues in video data management; Gap between the low-level representation and high level semantics; Improvement in the detection performance using the multinet.
- Published
- 2002
- Full Text
- View/download PDF
5. Human Pose Regression Through Multiview Visual Fusion.
- Author
-
Xu Zhao, Yun Fu, Huazhong Ning, Yuncai Liu, and Huang, Thomas S.
- Subjects
- *
THREE-dimensional imaging , *SIGNAL processing , *MATHEMATICAL transformations , *ALGORITHMS , *GAUSSIAN processes - Abstract
We consider the problem of estimating 3-D human body pose from visual signals within a discriminative framework. It is challenging because there is a wide gap between complex 3-D human motion and planar visual observation, which makes this a severely ill-conditioned problem. In this paper, we focus on three critical factors to tackle human body pose estimation, namely, feature extraction, learning algorithm, and camera utilization. On the feature level, we describe images using the salient interest points represented by scale-invariant feature transform (SIFT)-like descriptors, in which the position, appearance, and local structural information are encoded simultaneously. On the learning algorithm level, we propose to use Gaussian processes and multiple linear (ML) regression to model the mapping between poses and features. Fusing image information from multiple cameras in different views is of great interest to us on the camera level. We make a comprehensive evaluation on the HumanEva database and get two meaningful insights into the three crucial aspects for human pose estimation: 1) although the choice of feature is very important to the problem, once the learning algorithm becomes efficient, the choice of feature is no longer critical, and 2) the impact of information combination from multiple cameras on pose estimation is closely related to not only the quantity of image information, but also its quality. In most cases, it is true that the more information is involved, the better results can be achieved. But when the information quantity is the same, the differences in quality will lead to totally different performance. Furthermore, dense evaluations demonstrate that our approach is an accurate and robust solution to the human body pose estimation problem. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
6. Locality Versus Globality: Query-Driven Localized Linear Models for Facial Image Computing.
- Author
-
Fu, Yun, Li, Zhu, Yuan, Junsong, Wu, Ying, and Huang, Thomas S.
- Subjects
- *
FACE perception , *ALGORITHMS , *DIGITAL images , *CLASSIFICATION , *STRUCTURAL frames , *LEARNING , *METHODOLOGY , *IDENTITY (Psychology) , *ROBUST control - Abstract
Conventional subspace learning or recent feature extraction methods consider globality as the key criterion to design discriminative algorithms for image classification. We demonstrate in this paper that applying the local manner in sample space, feature space, and learning space via linear sub-space learning can sufficiently boost the discriminating power, as measured by discriminating power coefficient (DPC). The proposed solution achieves good classification accuracy gains and shows computationally efficient. Particularly, we approximate the global nonlinearity through a multimodal localized piecewise subspace learning framework, in which three locality criteria can work individually or jointly for any new subspace learning algorithm design. It turns out that most existing subspace learning methods can be unified in such a common framework embodying either the global or local learning manner. On the other hand, we address the problem of numerical difficulty in the large-size pattern classification case, where many local variations cannot be adequately handled by a single global model. By localizing the modeling, the classification error rate estimation is also localized and thus it appears to be more robust and flexible for the model selection among different model candidates. As a new algorithm design based on the proposed framework, the query-driven locally adaptive (QDLA) mixture-of-experts model for robust face recognition and head pose estimation is presented. Experiments demonstrate the local approach to be effective, robust, and fast for large size, multiclass, and multivariance data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
7. Convergent 2-D Subspace Learning With Null Space Analysis.
- Author
-
Xu, Dong, Yan, Shuicheng, Lin, Stephen, and Huang, Thomas S.
- Subjects
- *
STOCHASTIC convergence , *ALGORITHMS , *MATRICES (Mathematics) , *HUMAN facial recognition software , *KRONECKER products , *DATABASES - Abstract
Recent research has demonstrated the success of supervised dimensionality reduction algorithms 2DLDA and 2DMFA, which are based on the image-as-matrix representation, in small sample size cases. To solve the convergence problem in 2DLDA and 2DMFA, we propose in this work two new schemes, called Null Space based 2DLDA (NS2DLDA) and Null Space based 2DMFA (NS2DMFA), and apply them to the challenging multi-view face recognition task. First, we convert each 2-D face image (matrix) into a vector and compute the first projection matrix P1 from the null space of the intra-class scatter matrix, such that the samples from the same class are projected to the same point. Then the data are projected and reconstructed with P1. Finally, we re-organize the reconstructed datum into a matrix and then compute the second projection direction P2, in the form of a Kronecker product of two matrices, by maximizing the inter-class scatter. A proof of algorithmic convergence is provided. The experiments on two benchmark multi-view face databases, the CMU PIE and FERET databases, demonstrate that NS2DLDA outperforms Fisherface, Null Space LDA (NSLDA) and 2DLDA. Additionally, NS2DMFA is also demonstrated to be more accurate than MFA and 2DMFA for face recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
8. Reconstruction and Recognition of Tensor-Based Objects With Concurrent Subspaces Analysis.
- Author
-
Dong Xu, Shuicheng Yan, Lei Zhang, Lin, Stephen, Hong-Jiang Zhang, and Huang, Thomas S.
- Subjects
- *
PRINCIPAL components analysis , *DIGITAL images , *VIDEO recording , *IMAGING systems , *STATISTICAL correlation , *FACTOR analysis , *ALGORITHMS , *COMPUTER algorithms , *DIGITAL image processing - Abstract
Principal Components Analysis (PCA) has traditionally been utilized with data expressed in the form of 1-D vectors, but there exists much data such as gray-level images, video sequences, Gabor-filtered images and so on, that are intrinsically in the form of second or higher order tensors. For representations of image objects in their intrinsic form and order rather than concatenating all the object data into a single vector, we propose in this paper a new optimal object reconstruction criterion with which the information of a high-dimensional tensor is represented as a much lower dimensional tensor computed from projections to multiple concurrent subspaces. In each of these subspaces, correlations with respect to one of the tensor dimensions are reduced, enabling better object reconstruction performance. Concurrent subspaces analysis (CSA) is presented to efficiently learn these subspaces in an iterative manner. In contrast to techniques such as PCA which vectorize tensor data, CSA's direct use of data in tensor form brings an enhanced ability to learn a representative subspace and an increased number of available projection directions. These properties enable CSA to outperform traditional algorithms in the common case of small sample sizes, where CSA can be effective even with only a single sample per class. Extensive experiments on images of faces and digital numbers encoded as second or third order tensors demonstrate that the proposed CSA outperforms PCA-based algorithms in object reconstruction and object recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
9. Integrating Discriminant and Descriptive Information for Dimension Reduction and Classification.
- Author
-
Jie Yu, Qi Tian, Ting Rui, and Huang, Thomas S.
- Subjects
- *
DATA compression (Telecommunication) , *ARTIFICIAL intelligence , *IMAGE processing , *IMAGE retrieval , *INFORMATION retrieval , *PATTERN perception , *DATA analysis , *INFORMATION storage & retrieval systems - Abstract
In this paper, a novel hybrid dimension reduction technique for classification is proposed based on the hybrid analysis of principal component analysis (PCA) and linear discriminant analysis (LDA). LDA is known for capturing the most discriminant features of the data in the projected space while PCA is known for preserving the most descriptive ones after projection. Our hybrid technique integrates discriminant and descriptive information and finds a richer set of alternatives beyond LDA and PCA in a 2-D parametric space, which fits a specific classification task and data distribution better. Theoretical study shows that our technique also alleviates the singularity problem of scatter matrix, which is caused by small training set, and increases the effective dimension of the projected subspace. In order to find the hybrid features adaptively and avoid exhaustive parameter searching, we further propose a boosted hybrid analysis method that incorporates a nonlinear boosting process to enhance a set of hybrid classifiers and combine them into a more accurate one. Compared with the other techniques that aim at combining PCA and LDA, our approaches are novel because our method finds alternatives to LDA and PCA in a 2-D parameter space and the boosting process provides enhancement and robust combination of the classifiers. Extensive experiments are conducted on benchmark and real image databases to compare our proposed methods with the state-of-the-art linear and nonlinear discriminant analysis techniques. The results show the superior performance of our hybrid analysis methods. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
10. Computer modeling, analysis, and synthesis of dressed humans.
- Author
-
Jojic, Nebojsa, Jin Gu, Shen, Helen C., and Huang, Thomas S.
- Subjects
- *
COMPUTER vision , *DIGITAL image processing , *THREE-dimensional imaging , *MATHEMATICAL models , *ALGORITHMS - Abstract
We present computer vision techniques for building dressed human models using images. We develop an algorithm for three-dimensional body reconstruction and texture mapping using contour, stereo, and texture information from several images and deformable superquadrics as the model parts. We demonstrate a novel vision technique for analysis of cloth draping behavior. This technique allows for estimation of cloth model parameters, such as bending properties, but can also be used to estimate the contact points between the body and clothing in the range data of dressed humans. Combined with our body reconstruction algorithm and additional constraints on the articulation model, the detection of the garment-body contact points allows construction of a dressed human model in which even the geometry that was covered by clothing in the available data is reasonably well estimated [ABSTRACT FROM PUBLISHER]
- Published
- 1999
- Full Text
- View/download PDF
11. Compression of MPEG-4 facial animation parameters for transmission of talking heads.
- Author
-
Hai Tao, Chen, Homer H., Wei Wu, and Huang, Thomas S.
- Subjects
- *
MPEG (Video coding standard) , *ANIMATION (Cinematography) , *FACIAL expression , *DATA compression (Telecommunication) , *VIDEO compression standards - Abstract
The emerging MPEG-4 standard supports the transmission and composition of facial animation with natural video. The new standard will include a facial animation parameter (FAP) set that is defined based on the study of minimal facial actions and is closely related to muscle actions. The FAP set enables model-based representation of natural or synthetic talking-head sequences and allows intelligible visual reproduction of facial expressions, emotions, and speech pronunciations at the receiver. This paper addresses the data-compression issue of talking heads and presents three methods for bit-rate reduction of FAPs. Compression efficiency is achieved by way of transform coding, principal component analysis, and FAP interpolation. These methods are independent of each other in nature and thus can be applied in combination to lower the bit-rate demand of FAPs, making possible the transmission of multiple talking heads over band-limited channels. The basic methods described here have been adopted into the MPEG-4 Visual Committee Draft and are readily applicable to other articulation data such as body animation parameters. The efficacy of the methods is demonstrated by both subjective and objective results [ABSTRACT FROM PUBLISHER]
- Published
- 1999
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.