Author: "Changhu Wang" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Changhu Wang"' showing total 40 results

Start Over Author "Changhu Wang" Publisher ieee

40 results on '"Changhu Wang"'

1. Time-Aware Neighbor Sampling on Temporal Graphs

Author: Yiwei Wang, Yujun Cai, Yuxuan Liang, Henghui Ding, Changhu Wang, and Bryan Hooi
Published: 2022

2. Research on PMSM Control System with LC Filter

Author: Zhiqiang Zhang, Hejin Xiong, Changhu Wang, and Yu Cao
Published: 2021

3. MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis

Author: Jiaxin Li, Zijian Feng, Qi She, Henghui Ding, Changhu Wang, and Gim Hee Lee
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Graphics, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Graphics (cs.GR), ComputingMethodologies_COMPUTERGRAPHICS, Machine Learning (cs.LG)
Abstract: In this paper, we propose MINE to perform novel view synthesis and depth estimation via dense 3D reconstruction from a single image. Our approach is a continuous depth generalization of the Multiplane Images (MPI) by introducing the NEural radiance fields (NeRF). Given a single image as input, MINE predicts a 4-channel image (RGB and volume density) at arbitrary depth values to jointly reconstruct the camera frustum and fill in occluded contents. The reconstructed and inpainted frustum can then be easily rendered into novel RGB or depth views using differentiable rendering. Extensive experiments on RealEstate10K, KITTI and Flowers Light Fields show that our MINE outperforms state-of-the-art by a large margin in novel view synthesis. We also achieve competitive results in depth estimation on iBims-1 and NYU-v2 without annotated depth supervision. Our source code is available at https://github.com/vincentfung13/MINE, Comment: ICCV 2021. Main paper and supplementary materials
Published: 2021

4. Domain-Invariant Disentangled Network for Generalizable Object Detection

Author: Chuang Lin, Zehuan Yuan, Sicheng Zhao, Peize Sun, Changhu Wang, and Jianfei Cai
Published: 2021

5. Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective

Author: Wei Wang, Haochen Zhang, Zehuan Yuan, and Changhu Wang
Published: 2021

6. Learning the Best Pooling Strategy for Visual Semantic Embedding

Author: Hexiang Hu, Hao Wu, Jiacheng Chen, Yuning Jiang, and Changhu Wang
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Pooling, Computer Science - Computer Vision and Pattern Recognition, Semantics, Machine learning, computer.software_genre, Data modeling, Visualization, Code (cryptography), Feature (machine learning), Embedding, Artificial intelligence, business, computer
Abstract: Visual Semantic Embedding (VSE) is a dominant approach for vision-language retrieval, which aims at learning a deep embedding space such that visual data are embedded close to their semantic text labels or descriptions. Recent VSE models use complex methods to better contextualize and aggregate multi-modal features into holistic embeddings. However, we discover that surprisingly simple (but carefully selected) global pooling functions (e.g., max pooling) outperform those complex models, across different feature extractors. Despite its simplicity and effectiveness, seeking the best pooling function for different data modality and feature extractor is costly and tedious, especially when the size of features varies (e.g., text, video). Therefore, we propose a Generalized Pooling Operator (GPO), which learns to automatically adapt itself to the best pooling strategy for different features, requiring no manual tuning while staying effective and efficient. We extend the VSE model using this proposed GPO and denote it as VSE$\infty$. Without bells and whistles, VSE$\infty$ outperforms previous VSE methods significantly on image-text retrieval benchmarks across popular feature extractors. With a simple adaptation, variants of VSE$\infty$ further demonstrate its strength by achieving the new state of the art on two video-text retrieval datasets. Comprehensive experiments and visualizations confirm that GPO always discovers the best pooling strategy and can be a plug-and-play feature aggregation module for standard VSE models. Code and pre-trained models are available at https://vse-infty.github.io., CVPR 2021 camera-ready (oral). The new version fixes a few typos and updates citations
Published: 2021

7. Cross Media Routing and Clustering Algorithm for Autonomous Marine Systems

Author: Changhu Wang, Lin Shuisheng, Haifen Yang, Ding Jiannan, and Meiqiu Jiang
Subjects: Routing protocol, Computer science, Node (networking), Real-time computing, Cross media, Routing (electronic design automation), Underwater, Cluster analysis, Data transmission, Network simulation
Abstract: In the air-sea environment, unmanned aerial vehicles (UAVs), unmanned surface vehicles (USVs) and autonomous underwater vehicles (AUVs) constitute the autonomous marine systems (AMS). However, the challenge is that the performance of underwater acoustic networks is much lower than that of overwater networks. Existing clustering algorithms cannot adapt to AMS well. Besides, existing routing protocols have not considered using the advantages of the overwater networks performance to improve the transmission performance of the underwater acoustic networks. In this paper, we propose a LEACH-based cross media clustering algorithm (LEACH-CM) and a vector-based cross media routing protocol (VBCM), both of them are applicable to the AMS environment. LEACH-CM can ensure the success rate of data transmission in a high-density node environment. VBCM can effectively select a link with a lower delay based on the estimated link delay. We also verify the effectiveness of LEACH-CM and VBCM through NS3 network simulation.
Published: 2021

8. Moflowgan: Video Generation With Flow Guidance

Author: Changhu Wang, Zehuan Yuan, Xiangzhong Fang, and Wei Li
Subjects: Facial expression, Contextual image classification, Computer science, business.industry, 05 social sciences, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Optical flow, 010501 environmental sciences, 01 natural sciences, Facial recognition system, Flow (mathematics), Feature (computer vision), 0502 economics and business, Computer vision, Artificial intelligence, 050207 economics, business, Pose, 0105 earth and related environmental sciences, Generator (mathematics)
Abstract: In recent years, video generation has attracted a lot of attention in the computer vision community. Unlike image generation which only focuses on appearance, video generation requires modeling both content information and motion dynamics. In this work, we propose MoFlowGAN, which explicitly models motion dynamics by a content-motion decomposition architecture with an additional flow generator. The decomposition architecture models content and motion separately and is instantiated by a compact variant of BigGAN [1]. Besides, the flow generator generates optical flow directly based on high-level feature maps of adjacent frames as a strong supervision, hence the searching space of motion patterns is highly reduced. Our proposed MoFlowGAN achieves the state-of-the-art results on both MUG facial expression and UCF-101 datasets.
Published: 2020

9. Improving Convolutional Networks With Self-Calibrated Convolutions

Author: Jiashi Feng, Jiang-Jiang Liu, Ming-Ming Cheng, Qibin Hou, and Changhu Wang
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Object detection, Convolution, Discriminative model, Kernel (image processing), Computer engineering, Convolutional code, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Segmentation, Artificial intelligence, business, Feature learning, 0105 earth and related environmental sciences
Abstract: Recent advances on CNNs are mostly devoted to designing more complex architectures to enhance their representation learning capacity. In this paper, we consider how to improve the basic convolutional feature transformation process of CNNs without tuning the model architectures. To this end, we present a novel self-calibrated convolutions that explicitly expand fields-of-view of each convolutional layers through internal communications and hence enrich the output features. In particular, unlike the standard convolutions that fuse spatial and channel-wise information using small kernels (e.g., 3x3), self-calibrated convolutions adaptively build long-range spatial and inter-channel dependencies around each spatial location through a novel self-calibration operation. Thus, it can help CNNs generate more discriminative representations by explicitly incorporating richer information. Our self-calibrated convolution design is simple and generic, and can be easily applied to augment standard convolutional layers without introducing extra parameters and complexity. Extensive experiments demonstrate that when applying self-calibrated convolutions into different backbones, our networks can significantly improve the baseline models in a variety of vision tasks, including image recognition, object detection, instance segmentation, and keypoint detection, with no need to change the network architectures. We hope this work could provide a promising way for future research in designing novel convolutional feature transformations for improving convolutional networks. Code is available on the project page.
Published: 2020

10. Unsupervised Teacher-Student Model for Large-Scale Video Retrieval

Author: Yei-Wei Chen, Dong Liang, Changhu Wang, Jie Shao, Rui Wang, and Lanfen Lin
Subjects: Scale (ratio), Computer science, Event (computing), business.industry, Frame (networking), Machine learning, computer.software_genre, Pipeline (software), Domain (software engineering), Feature (computer vision), Social media, Artificial intelligence, Representation (mathematics), business, computer
Abstract: With the growth of video-sharing platforms and social media applications, video retrieval plays an import role in many aspects, such as copyright infringement detection, event classification, personalized recommendation, and etc. The content-based video retrieval presents the following two main challenges: (i) Distribution inconsistency for feature representation from the source domain to the target domain. (ii) Difficulty of video aggregation by sufficiently incorporating frame-based information. In this paper, we propose an unsupervised teacher-student model (UTS Net) to improve the performance of the content-based video retrieval tasks: (i) A teacher-student model maintaining the global consistency for feature representation from different domains and retaining the local inconsistency within the intra-batch data; (ii) A simple but effective video retrieval pipeline integrating the frame-level binarized feature. Our proposed framework experimentally outperforms the state-of-the-art approach on the DSVR, CSVR, and ISVR tasks in the FIVR datasets, and achieves a mean average precision of 76%, 72%, and 61%, respectively.
Published: 2019

11. Temporal Feature Augmented Network for Video Instance Segmentation

Author: Minghui Dong, Changhu Wang, Yuanyuan Huang, Jie Shao, Shiping Wen, Kai Su, Dongdong Yu, Kaihui Zhou, and Jian Wang
Subjects: Feature (computer vision), Computer science, business.industry, Motion blur, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer vision, Segmentation, Image segmentation, Variation (game tree), Artificial intelligence, Object (computer science), business, Task (project management)
Abstract: In this paper, we propose a temporal feature augmented network for video instance segmentation. Video instance segmentation task can be split into two subtasks: instance segmentation and tracking. Similar to the previous work, a track head is added to an instance segmentation network to track object instances across frames. Then the network can performing detection, segmentation and tracking tasks simultaneously. We choose the Cascade-RCNN as the basic instance segmentation network. Besides, in order to make better use of the rich information contained in the video, a temporal feature augmented module is introduced to the network. When performing instance segmentation task on a single frame, information from other frames in the same video will be included and the performance of instance segmentation task can be effectively improved. Moreover, experiments show that the temporal feature augmented module can effectively alleviate the problem of motion blur and pose variation
Published: 2019

12. Generative Dual Adversarial Network for Generalized Zero-Shot Learning

Author: Changhu Wang, He Huang, Philip S. Yu, and Chang-Dong Wang
Subjects: FOS: Computer and information sciences, Class (computer programming), Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Deep learning, Computer Science - Computer Vision and Pattern Recognition, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Semantic mapping, Categorization, Feature (computer vision), Metric (mathematics), 0202 electrical engineering, electronic engineering, information engineering, Embedding, 020201 artificial intelligence & image processing, Artificial intelligence, business, 0105 earth and related environmental sciences, Generator (mathematics)
Abstract: This paper studies the problem of generalized zero-shot learning which requires the model to train on image-label pairs from some seen classes and test on the task of classifying new images from both seen and unseen classes. Most previous models try to learn a fixed one-directional mapping between visual and semantic space, while some recently proposed generative methods try to generate image features for unseen classes so that the zero-shot learning problem becomes a traditional fully-supervised classification problem. In this paper, we propose a novel model that provides a unified framework for three different approaches: visual-> semantic mapping, semantic->visual mapping, and metric learning. Specifically, our proposed model consists of a feature generator that can generate various visual features given class embeddings as input, a regressor that maps each visual feature back to its corresponding class embedding, and a discriminator that learns to evaluate the closeness of an image feature and a class embedding. All three components are trained under the combination of cyclic consistency loss and dual adversarial loss. Experimental results show that our model not only preserves higher accuracy in classifying images from seen classes, but also performs better than existing state-of-the-art models in in classifying images from unseen classes.
Published: 2019

13. Multi-Person Pose Estimation With Enhanced Channel-Wise and Spatial Information

Author: Kai Su, Dongdong Yu, Xin Geng, Zhenqi Xu, and Changhu Wang
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Deep learning, Feature extraction, Computer Science - Computer Vision and Pattern Recognition, Context (language use), 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Feature (computer vision), Pyramid, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Computer vision, Pyramid (image processing), Artificial intelligence, business, Spatial analysis, Pose, 0105 earth and related environmental sciences, Gesture, Communication channel
Abstract: Multi-person pose estimation is an important but challenging problem in computer vision. Although current approaches have achieved significant progress by fusing the multi-scale feature maps, they pay little attention to enhancing the channel-wise and spatial information of the feature maps. In this paper, we propose two novel modules to perform the enhancement of the information for the multi-person pose estimation. First, a Channel Shuffle Module (CSM) is proposed to adopt the channel shuffle operation on the feature maps with different levels, promoting cross-channel information communication among the pyramid feature maps. Second, a Spatial, Channel-wise Attention Residual Bottleneck (SCARB) is designed to boost the original residual unit with attention mechanism, adaptively highlighting the information of the feature maps both in the spatial and channel-wise context. The effectiveness of our proposed modules is evaluated on the COCO keypoint benchmark, and experimental results show that our approach achieves the state-of-the-art results., Accepted by CVPR 2019
Published: 2019

14. Jersey Number Recognition with Semi-Supervised Spatial Transformer Network

Author: Li Lei, Changhu Wang, Shikun Xu, Li Gen, and Xiang Liu
Subjects: Artificial neural network, Computer science, business.industry, Detector, Feature extraction, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Object detection, Data modeling, 0202 electrical engineering, electronic engineering, information engineering, Task analysis, 020201 artificial intelligence & image processing, Artificial intelligence, business, 0105 earth and related environmental sciences, Transformer (machine learning model)
Abstract: It is still a challenging task to recognize the jersey number of players on the court in soccer match videos, as the jersey numbers are very small in the object detection task and annotated data are not easy to collect. Based on the object detection results of all the players on the court, a CNN model is first introduced to classify these numbers on the deteced players' images. To localize the jersey number more precisely without involving another digit detector and extra consumption, we then improve the former network to an end-to-end framework by fusing with the spatial transformer network (STN). To further improve the accuracy, we bring extra supervision to STN and upgrade the model to a semi-supervised multi-task learning system, by labeling a small portion of the number areas in the dataset by quadrangle. Extensive experiments illustrate the effectiveness of the proposed framework.
Published: 2018

15. Image segmentation using contour, surface, and depth cues

Author: C.-C. Jay Kuo, Jian Li, Chen Chen, Xiang Fu, and Changhu Wang
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 0202 electrical engineering, electronic engineering, information engineering, 020206 networking & telecommunications, 020201 artificial intelligence & image processing, Segmentation, Computer vision, 02 engineering and technology, Image segmentation, Artificial intelligence, Depth perception, business
Abstract: We target at solving the problem of automatic image segmentation. Although 1D contour and 2D surface cues have been widely utilized in existing work, 3D depth information of an image, a necessary cue according to human visual perception, is however overlooked in automatic image segmentation. In this paper, we study how to fully utilize 1D contour, 2D surface, and 3D depth cues for image segmentation. First, three elementary segmentation modules are developed for these cues respectively. The proposed 3D depth cue is able to segment different textured regions even with similar color, and also merge similar textured areas, which cannot be achieved using state-of-the-art approaches. Then, a content-dependent spectral (CDS) graph is proposed for layered affinity models to produce the final segmentation. CDS is designed to build a more reliable relationship between neighboring surface nodes based on the three elementary cues in the spectral graph. Extensive experiments not only show the superior performance of the proposed algorithm over state-of-the-art approaches, but also verify the necessities of these three cues in image segmentation.
Published: 2017

16. Surveillance Video Parsing with Single Frame Supervision

Author: Han Yu, Si Liu, Renda Bao, Changhu Wang, Yao Sun, and Ruihe Qian
Subjects: Parsing, Computer science, business.industry, Frame (networking), Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Optical flow, 02 engineering and technology, Image segmentation, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Object detection, Optical flow estimation, Video tracking, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, computer, 0105 earth and related environmental sciences, Reference frame, Block-matching algorithm
Abstract: Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications [41, 8]. However, pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding the frame is jointly considered. SVP (i) roughly parses the frames within the video segment, (ii) estimates the optical flow between frames and (iii) fuses the rough parsing results warped by optical flow to produce the refined parsing result. The three components of SVP, namely frame parsing, optical flow estimation and temporal fusion are integrated in an end-to-end manner. Experimental results on two surveillance video datasets show the superiority of SVP over state-of-the-arts. The collected video parsing datasets can be downloaded via http://liusi-group.com/projects/SVP for the further studies.
Published: 2017

17. Barycentric coordinates based soft assignment for object classification

Author: Chang Wen Chen, Changhu Wang, and Tao Wei
Subjects: Computer science, Fisher kernel, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, Visual Word, 010501 environmental sciences, 01 natural sciences, Algorithm, 0105 earth and related environmental sciences, Barycentric coordinates
Abstract: For object classification, soft assignment (SA) is capable of improving the bag-of-visual-words (BoVW) model and has the advantages in conceptual simplicity. However, the performance of soft assignment is inferior to those recently developed encoding schemes. In this paper, we propose a novel scheme called barycentric coordinates based soft assignment (BCSA) for the classification of object images. While maintaining conceptual simplicity, this scheme will be shown to outperform most of the existing encoding schemes, including sparse and local coding schemes. Furthermore, with only single-scale features, it is able to achieve comparable or even better performance to current state-of-the-art Fisher kernel (FK) encoding scheme. In particular, the proposed BCSA scheme enjoys the following properties: 1) preservation of linear order precision for encoding which makes BCSA robust to linear transform distortions; 2) inheriting naturally the visual word uncertainty which leads to a more expressive model; 3) generating linear classifiable codes that can be learned with significant less computational cost and storage. Extensive experiments based on widely used Caltech-101 and Caltech-256 datasets have been carried out to show its effectiveness of the proposed BCSA scheme in both performance and simplicity.
Published: 2016

18. Robust Image Segmentation Using Contour-Guided Color Palettes

Author: Chien-Yi Wang, C.-C. Jay Kuo, Chen Chen, Xiang Fu, and Changhu Wang
Subjects: Color histogram, Color image, Segmentation-based object categorization, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Scale-space segmentation, Pattern recognition, Image segmentation, Image texture, Region growing, Computer vision, Artificial intelligence, business, Histogram equalization, ComputingMethodologies_COMPUTERGRAPHICS, Mathematics
Abstract: The contour-guided color palette (CCP) is proposed for robust image segmentation. It efficiently integrates contour and color cues of an image. To find representative colors of an image, color samples along long contours between regions, similar in spirit to machine learning methodology that focus on samples near decision boundaries, are collected followed by the mean-shift (MS) algorithm in the sampled color space to achieve an image-dependent color palette. This color palette provides a preliminary segmentation in the spatial domain, which is further fine-tuned by post-processing techniques such as leakage avoidance, fake boundary removal, and small region mergence. Segmentation performances of CCP and MS are compared and analyzed. While CCP offers an acceptable standalone segmentation result, it can be further integrated into the framework of layered spectral segmentation to produce a more robust segmentation. The superior performance of CCP-based segmentation algorithm is demonstrated by experiments on the Berkeley Segmentation Dataset.
Published: 2015

19. Trip Mining and Recommendation from Geo-tagged Photos

Author: Lei Zhang, Changhu Wang, Nenghai Yu, and Huagang Yin
Subjects: Metadata, World Wide Web, Search engine, Similarity (geometry), Index (publishing), Computer science, Recommender system, Scale (map), Tourism, Task (project management)
Abstract: Trip planning is generally a very time-consuming task due to the complex trip requirements and the lack of convenient tools/systems to assist the planning. In this paper, we propose a travel path search system based on geo-tagged photos to facilitate tourists' trip planning, not only for where to visit but also how to visit. The large scale geo-tagged photos that are public ally available on the web make this system possible, as geo-tagged photos encode rich travel-related metadata and can be used to mine travel paths from previous tourists. In this work, about 20 million geo-tagged photos were crawled from Panoramio.com. Then a substantial number of travel paths are minded from the crawled geo-tagged photos. After that, a search system is built to index and search the paths, and the Sparse Chamfer Distance is proposed to measure the similarity of two paths. The search system supports various types of queries, including (1) a destination name, (2) a user-specified region on the map, (3) some user-preferred locations. Based on the search system, users can interact with the system by specifying a region or several interest points on the map to find paths. Extensive experiments show the effectiveness of the proposed framework.
Published: 2012

20. The scale of edges

Author: Changhu Wang, Xianming Liu, Hongxun Yao, and Lei Zhang
Subjects: Parsing, Scale (ratio), business.industry, Open problem, Scale-invariant feature transform, Edge (geometry), computer.software_genre, Edge detection, Interpretation (model theory), Visualization, Computer vision, Artificial intelligence, business, computer, Algorithm, Mathematics
Abstract: Although the scale of isotropic visual elements such as blobs and interest points, e.g. SIFT[12], has been well studied and adopted in various applications, how to determine the scale of anisotropic elements such as edges is still an open problem. In this paper, we study the scale of edges, and try to answer two questions: 1) what is the scale of edges, and 2) how to calculate it. From the points of human cognition and physical interpretation, we illustrate the existence of the scale of edges and provide a quantitative definition. Then, an automatic edge scale selection approach is proposed. Finally, a cognitive experiment is conducted to validate the rationality of the detected scales. Moreover, the importance of identifying the scale of edges is also shown in applications such as boundary detection and hierarchical edge parsing.
Published: 2012

21. Shape-based web image clustering for unsupervised object detection?

Author: Xilin Chen, Changhu Wang, and Wei Zheng
Subjects: Boosting (machine learning), Computer science, business.industry, Detector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Object detection, Support vector machine, Object-class detection, Discriminative model, Viola–Jones object detection framework, Computer vision, Artificial intelligence, business, Cluster analysis
Abstract: Automatic object detection for an arbitrary class is an important but very challenging problem, due to the countless kinds of objects in the world and the large amount of labeling work for each object. In this work, we target at solving the problem of automatic object detection for an arbitrary class without the laborious human effort. Motivated by the explosive growth of Web images and the phenomenal success of search techniques, we develop an unsupervised object detection framework by automatically training the object detector on the top returns of certain image search engine queried by the name of the object class. In order to automatically isolate the objects from the Web images for training, only clipart images with simple background are used, which preserve most of the shape information of the objects. A two-stage shape-based clustering algorithm is proposed to mine typical shapes of the object, in which the inner-class variance of object shapes is considered and undesired images are filtered out. In order to reduce the gap between clipart images and real-world images, we introduce an efficient algorithm to synthesize the real-world images from clipart images, and only shape feature is used in the detector training part. Finally, the synthetic images could be used to train object detectors by an off-the-shelf discriminative algorithm, e.g., boosting or SVM. Extensive experiments show the effectiveness of the proposed framework on objects with simple and representative shapes, and the proposed framework could be considered as a good beginning of solving this challenging problem.
Published: 2011

22. Edgel index for large-scale sketch-based image search

Author: Changhu Wang, Yang Cao, Lei Zhang, and Liqing Zhang
Subjects: Information retrieval, Matching (graph theory), business.industry, Computer science, Search engine indexing, Sketch, Search engine, Index (publishing), Feature (computer vision), Computer vision, Artificial intelligence, business, Image retrieval, Blossom algorithm
Abstract: Retrieving images to match with a hand-drawn sketch query is a highly desired feature, especially with the popularity of devices with touch screens. Although query-by-sketch has been extensively studied since 1990s, it is still very challenging to build a real-time sketch-based image search engine on a large-scale database due to the lack of effective and efficient matching/indexing solutions. The explosive growth of web images and the phenomenal success of search techniques have encouraged us to revisit this problem and target at solving the problem of web-scale sketch-based image retrieval. In this work, a novel index structure and the corresponding raw contour-based matching algorithm are proposed to calculate the similarity between a sketch query and natural images, and make sketch-based image retrieval scalable to millions of images. The proposed solution simultaneously considers storage cost, retrieval accuracy, and efficiency, based on which we have developed a real-time sketch-based image search engine by indexing more than 2 million images. Extensive experiments on various retrieval tasks (basic shape search, specific image search, and similar image search) show better accuracy and efficiency than state-of-the-art methods.
Published: 2011

23. Robust semantic sketch based specific image retrieval

Author: Bo Zhang, Dong Wang, Xiaobing Liu, Cailiang Liu, Changhu Wang, and Lei Zhang
Subjects: Information retrieval, Robustness (computer science), Computer science, Search algorithm, Histogram, LabelMe, Image retrieval, Episodic memory, Blossom algorithm, Sketch, Semantic gap
Abstract: Specific images refer to images one has a certain episodic memory about, e.g. a picture one has ever seen before. Specific image retrieval is a frequent daily information need and the episodic memory is the key to find a specific image. In this paper, we propose a novel semantic sketch-based interface to incorporate the episodic memory for specific image retrieval. The interface allows a user to specify the semantic category and rough area/color of the objects in his memory. To bridge the semantic gap between the query sketch and database images, in the back end, a sampling method selects exemplars from a reference dataset which contains many object instances with user-provided tags and bounding boxes. After that, an exemplar matching algorithm ranks images to retrieve the target image to match the user's memory. In practice, we have observed that query sketches are usually error prone. That is, the position or the color of an object may not be accurate. Meanwhile, the annotations in the reference dataset are also noisy. Thus, the search algorithm has to handle two kinds of errors: 1) reference dataset label noise; 2) user sketch error such as position or scale. For the former, we propose a robust sampling method. For the latter, we derive an efficient spatial reranking algorithm to tolerate inaccurate user sketches. Detailed experimental results on the LabelMe dataset show that the proposed approach is robust to both kinds of errors.
Published: 2010

24. Probabilistic models for supervised dictionary learning

Author: Bao-Liang Lu, Zhiwei Li, Lei Zhang, Changhu Wang, and Xiaochen Lian
Subjects: K-SVD, Contextual image classification, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Probabilistic logic, Statistical model, Pattern recognition, Mixture model, Logistic regression, Machine learning, computer.software_genre, ComputingMethodologies_PATTERNRECOGNITION, Discriminative model, Categorization, Computer Science::Computer Vision and Pattern Recognition, Histogram, Pyramid, Pyramid (image processing), Artificial intelligence, business, computer
Abstract: Dictionary generation is a core technique of the bag-of-visual-words (BOV) models when applied to image categorization. Most of previous approaches generate dictionaries by unsupervised clustering techniques, e.g. k-means. However, the features obtained by such kind of dictionaries may not be optimal for image classification. In this paper, we propose a probabilistic model for supervised dictionary learning (SDLM) which seamlessly combines an unsuper-vised model (a Gaussian Mixture Model) and a supervised model (a logistic regression model) in a probabilistic framework. In the model, image category information directly affects the generation of a dictionary. A dictionary obtained by this approach is a trade-off between minimization of distortions of clusters and maximization of discriminative power of image-wise representations, i.e. histogram representations of images. We further extend the model to incorporate spatial information during the dictionary learning process in a spatial pyramid matching like manner. We extensively evaluated the two models on various benchmark dataset and obtained promising results.
Published: 2010

25. Spatial-bag-of-features

Author: Changhu Wang, Lei Zhang, Yang Cao, Zhiwei Li, and Liqing Zhang
Subjects: Computer science, business.industry, Search engine indexing, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Feature selection, Pattern recognition, Automatic image annotation, Image texture, Discriminative model, Histogram, Computer vision, Visual Word, Artificial intelligence, Representation (mathematics), business, Image retrieval, Image resolution, Feature detection (computer vision)
Abstract: In this paper, we study the problem of large scale image retrieval by developing a new class of bag-of-features to encode geometric information of objects within an image. Beyond existing orderless bag-of-features, local features of an image are first projected to different directions or points to generate a series of ordered bag-of-features, based on which different families of spatial bag-of-features are designed to capture the invariance of object translation, rotation, and scaling. Then the most representative features are selected based on a boosting-like method to generate a new bag-of-features-like vector representation of an image. The proposed retrieval framework works well in image retrieval task owing to the following three properties: 1) the encoding of geometric information of objects for capturing objects' spatial transformation, 2) the supervised feature selection and combination strategy for enhancing the discriminative power, and 3) the representation of bag-of-features for effective image matching and indexing for large scale image retrieval. Extensive experiments on 5000 Oxford building images and 1 million Panoramio images show the effectiveness and efficiency of the proposed features as well as the retrieval framework.
Published: 2010

26. Multi-label sparse coding for automatic image annotation

Author: Shuicheng Yan, Lei Zhang, Hong-Jiang Zhang, and Changhu Wang
Subjects: Contextual image classification, business.industry, Computer science, Dimensionality reduction, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Iterative reconstruction, Image segmentation, Sparse approximation, Mixture model, Automatic image annotation, Computer Science::Computer Vision and Pattern Recognition, Computer vision, Artificial intelligence, Neural coding, business, Image retrieval, Subspace topology
Abstract: In this paper, we present a multi-label sparse coding framework for feature extraction and classification within the context of automatic image annotation. First, each image is encoded into a so-called supervector, derived from the universal Gaussian Mixture Models on orderless image patches. Then, a label sparse coding based subspace learning algorithm is derived to effectively harness multi-label information for dimensionality reduction. Finally, the sparse coding method for multi-label data is proposed to propagate the multi-labels of the training images to the query image with the sparse l1 reconstruction coefficients. Extensive image annotation experiments on the Corel5k and Corel30k databases both show the superior performance of the proposed multi-label sparse coding framework over the state-of-the-art algorithms.
Published: 2009

27. A lexica family with small semantic gap

Author: Jiemin Liu, Xiaokang Yang, Changhu Wang, Qi Tian, Lei Zhang, Shipeng Li, and Yijuan Lu
Subjects: Consistency (database systems), Information retrieval, Text mining, business.industry, Computer science, Feature extraction, Feature selection, Space (commercial competition), Lexicon, business, Construct (philosophy), Image retrieval, Semantic gap
Abstract: Defining a lexicon of high-level concepts is the first step for data collection and model construction in concept-based image retrieval. Differences of semantic gaps among concepts are well worth considering. By measuring consistency in visual space and textual space, concepts with small semantic gap can be obtained. Considering so many diverse concepts in large-scale image dataset, we construct a lexica family of high-level concepts with small semantic gap based on different low-level features and different consistency measurements. In this lexica family, the lexica are independent to each other and mutually complementary. It provides helpful suggestions about data collection, feature selection and search model construction for large-scale image retrieval.
Published: 2009

28. Multiplicative nonnegative greph embedding

Author: Shuicheng Yan, Hong-Jiang Zhang, Changhu Wang, Zheng Song, and Lei Zhang
Subjects: Combinatorics, Discrete mathematics, Factorization, Graph embedding, Multiplicative function, MathematicsofComputing_NUMERICALANALYSIS, Embedding, Graph theory, Tensor, Nonnegative matrix, MathematicsofComputing_DISCRETEMATHEMATICS, Matrix decomposition, Mathematics
Abstract: In this paper, we study the problem of nonnegative graph embedding, originally investigated in [J. Yang et al., 2008] for reaping the benefits from both nonnegative data factorization and the specific purpose characterized by the intrinsic and penalty graphs. Our contributions are two-fold. On the one hand, we present a multiplicative iterative procedure for nonnegative graph embedding, which significantly reduces the computational cost compared with the iterative procedure in [14] involving the matrix inverse calculation of an M-matrix. On the other hand, the nonnegative graph embedding framework is expressed in a more general way by encoding each datum as a tensor of arbitrary order, which brings a group of byproducts, e.g., nonnegative discriminative tensor factorization algorithm, with admissible time and memory cost. Extensive experiments compared with the state-of-the-art algorithms on nonnegative data factorization, graph embedding, and tensor representation demonstrate the algorithmic properties in computation speed, sparsity, discriminating power, and robustness to realistic image occlusions.
Published: 2009

29. Large scale natural image classification by sparsity exploration

Author: Changhu Wang, Shuicheng Yan, and Hong-Jiang Zhang
Subjects: Scale (ratio), Contextual image classification, Computer science, business.industry, WordNet, Sample (statistics), Sparse approximation, computer.software_genre, Machine learning, k-nearest neighbors algorithm, Discriminative model, Robustness (computer science), Data mining, Artificial intelligence, business, computer
Abstract: We consider in this paper the problem of large scale natural image classification. As the explosion and popularity of images in the Internet, there are increasing attentions to utilize millions of or even billions of these images for helping image related research. Beyond the opportunities brought by unlimited data, a great challenge is how to design more effective classification methods under these large scale scenarios. Most of existing attempts are based on k-nearest-neighbor method. However, in spite of the optimistic performance in some tasks, this strategy still suffers from that, one single fixed global parameter k is not robust for different object classes from different semantic levels. In this paper, we propose an alternative method, called l1-nearest-neighbor, based on a sparse representation computed by l1-minimization. We first treat a testing sample as a sparse linear combination of all training samples, and then consider the related samples as the nearest neighbors of the testing sample. Finally, we classify the testing sample based on the majority of these neighbors' classes. We conduct extensive experiments on a 1.6 million natural image database on different semantic levels defined based on WordNet, which demonstrate that the proposed l1-nearest-neighbor algorithm outperforms k-nearest-neighbor in two aspects: 1) the robustness of parameter selection for different semantic levels, and 2) the discriminative capability for large scale image classification task.
Published: 2009

30. Content-Based Image Annotation Refinement

Author: Changhu Wang, Hong-Jiang Zhang, Feng Jing, and Lei Zhang
Subjects: Information retrieval, business.industry, Computer science, Markov process, Set (abstract data type), Support vector machine, symbols.namesake, Annotation, Automatic image annotation, Feature (computer vision), symbols, business, Hidden Markov model, Image retrieval, Content management
Abstract: Automatic image annotation has been an active research topic due to its great importance in image retrieval and management. However, results of the state-of-the-art image annotation methods are often unsatisfactory. Despite continuous efforts in inventing new annotation algorithms, it would be advantageous to develop a dedicated approach that could refine imprecise annotations. In this paper, a novel approach to automatically refining the original annotations of images is proposed. For a query image, an existing image annotation method is first employed to obtain a set of candidate annotations. Then, the candidate annotations are re-ranked and only the top ones are reserved as the final annotations. By formulating the annotation refinement process as a Markov process and defining the candidate annotations as the states of a Markov chain, a content-based image annotation refinement (CIAR) algorithm is proposed to re-rank the candidate annotations. It leverages both corpus information and the content feature of a query image. Experimental results on a typical Corel dataset show not only the validity of the refinement, but also the superiority of the proposed algorithm over existing ones.
Published: 2007

31. Realistic 3D Face Modeling by Fusing Multiple 2D Images

Author: Shuicheng Yan, Hong-Jiang Zhang, Wei-Ying Ma, and Changhu Wang
Subjects: 2d images, business.industry, Estimation theory, Computer science, Efficient algorithm, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Texture model, Pattern recognition, Texture (geology), Face (geometry), Computer vision, Artificial intelligence, Quaternion, business, Representation (mathematics), ComputingMethodologies_COMPUTERGRAPHICS
Abstract: In this paper, we propose a fully automatic and efficient algorithm for realistic 3D face reconstruction by fusing multiple 2D face images. Firstly, an efficient multi-view 2D face alignment algorithm is utilized to localize the facial points of the face images; and then the intrinsic shape and texture models are inferred by the proposed Syncretized Shape Model (SSM) and Syncretized Texture Model (STM), respectively. Compared with other related works, our proposed algorithm has the following characteristics: 1) the inferred shape and texture are more realistic owing to the constraints and co-enhancement among the multiple images; 2) it is fully automatic, without any user interaction; and 3) the shape and pose parameter estimation is efficient via EM approach and unit quaternion based pose representation, and is also robust as a result of the dynamic correspondence approach. The experimental results show the effectiveness of our proposed algorithm for 3D face reconstruction.
Published: 2005

32. Edgel index for large-scale sketch-based image search.

Author: Yang Cao, Changhu Wang, Liqing Zhang, and Lei Zhang
Published: 2011
Full Text: View/download PDF

33. Shape-based web image clustering for unsupervised object detection?

Author: Wei Zheng, Changhu Wang, and Xilin Chen
Published: 2011
Full Text: View/download PDF

34. Spatial-bag-of-features.

Author: Yang Cao, Changhu Wang, Zhiwei Li, Liqing Zhang, and Lei Zhang
Published: 2010
Full Text: View/download PDF

35. Probabilistic models for supervised dictionary learning.

Author: Xiao-Chen Lian, Zhiwei Li, Changhu Wang, Bao-Liang Lu, and Lei Zhang
Published: 2010
Full Text: View/download PDF

36. Robust semantic sketch based specific image retrieval.

Author: Cailiang Liu, Dong Wang, Xiaobing Liu, Changhu Wang, Lei Zhang, and Bo Zhang
Published: 2010
Full Text: View/download PDF

37. Multi-label sparse coding for automatic image annotation.

Author: Changhu Wang, Shuicheng Yan, Lei Zhang, and Hong-Jiang Zhang
Published: 2009
Full Text: View/download PDF

38. Multiplicative nonnegative greph embedding.

Author: Changhu Wang, Zheng Song, Shuicheng Yan, Lei Zhang, and Hong-Jiang Zhang
Published: 2009
Full Text: View/download PDF

39. Realistic 3D Face Modeling by Fusing Multiple 2D Images.

Author: Changhu Wang, Shuicheng Yan, Hongjiang Zhang, and Weiying Ma
Published: 2005
Full Text: View/download PDF

40. Viewpoint-Aware Representation for Sketch-Based 3D Model Retrieval.

Author: Changqing Zou, Changhu Wang, Yafei Wen, Lei Zhang, and Jianzhuang Liu
Subjects: IMAGE retrieval, THREE-dimensional modeling, DESCRIPTOR systems, EDGE detection (Image processing), QUERY (Information retrieval system)
Abstract: We study the problem of sketch-based 3D model retrieval, and propose a solution powered by a new query-to-model distance metric and a powerful feature descriptor based on the bag-of-features framework. The main idea of the proposed query-to-model distance metric is to represent a query sketch using a compact set of sample views (called basic views) of each model, and to rank the models in ascending order of the representation errors. To better differentiate between relevant and irrelevant models, the representation is constrained to be essentially a combination of basic views with similar viewpoints. In another aspect, we propose a mid-level descriptor (called BOF-JESC) which robustly characterizes the edge information within junction-centered patches, to extract the salient shape features from sketches or model views. The combination of the query-to-model distance metric and the BOF-JESC descriptor achieves effective results on two latest benchmark datasets. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

40 results on '"Changhu Wang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources