Author: "HanQing Lu" / Publisher: acm - Searchworks@Jio Institute Digital Library Search Results

1. Can Clicks Be Both Labels and Features?

Author: Tao Yang, Chen Luo, Hanqing Lu, Parth Gupta, Bing Yin, and Qingyao Ai
Published: 2022
Full Text: View/download PDF

2. ROSE: Robust Caches for Amazon Product Search

Author: Chen Luo, Vihan Lakshman, Anshumali Shrivastava, Tianyu Cao, Sreyashi Nag, Rahul Goutam, Hanqing Lu, Yiwei Song, and Bing Yin
Published: 2022
Full Text: View/download PDF

3. QUEACO

Author: Qiang Yang, Yiwei Song, Tianyu Cao, Danqing Zhang, Bing Yin, Hanqing Lu, Chen Luo, Zheng Li, Tony Wu, and Tuo Zhao
Subjects: FOS: Computer and information sciences, Normalization (statistics), Computer Science - Machine Learning, Focus (computing), Computer Science - Computation and Language, Meta learning (computer science), Computer Science - Artificial Intelligence, Computer science, Value (computer science), computer.software_genre, Bridge (interpersonal), Computer Science - Information Retrieval, Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Named-entity recognition, Leverage (statistics), Canonical form, Data mining, Computation and Language (cs.CL), computer, Information Retrieval (cs.IR)
Abstract: We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms. Such a problem consists of two phases: {named entity recognition (NER)} and {attribute value normalization (AVN)}. However, existing works only focus on the NER phase but neglect equally important AVN. To bridge this gap, this paper proposes a unified query attribute value extraction system in e-commerce search named QUEACO, which involves both two phases. Moreover, by leveraging large-scale weakly-labeled behavior data, we further improve the extraction performance with less supervision cost. Specifically, for the NER phase, QUEACO adopts a novel teacher-student network, where a teacher network that is trained on the strongly-labeled data generates pseudo-labels to refine the weakly-labeled data for training a student network. Meanwhile, the teacher network can be dynamically adapted by the feedback of the student's performance on strongly-labeled data to maximally denoise the noisy supervisions from the weak labels. For the AVN phase, we also leverage the weakly-labeled query-to-attribute behavior data to normalize surface form attribute values from queries into canonical forms from products. Extensive experiments on a real-world large-scale E-commerce dataset demonstrate the effectiveness of QUEACO., Comment: The 30th ACM International Conference on Information and Knowledge Management (CIKM 2021, Applied Research Track)
Published: 2021
Full Text: View/download PDF

4. Dual Hierarchical Temporal Convolutional Network with QA-Aware Dynamic Normalization for Video Story Question Answering

Author: Richang Hong, Hanqing Lu, Xinxin Zhu, Fei Liu, and Jing Liu
Subjects: Questions and answers, Exploit, business.industry, Computer science, Normalization (image processing), 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Multimodal interaction, 0202 electrical engineering, electronic engineering, information engineering, Question answering, Subtitle, 020201 artificial intelligence & image processing, Artificial intelligence, Temporal scales, business, computer, 0105 earth and related environmental sciences
Abstract: Video story question answering (video story QA) is a challenging problem, as it requires a joint understanding of diverse data sources (i.e., video, subtitle, question, and answer choices). Existing approaches for video story QA have several common defects: (1) single temporal scale; (2) static and rough multimodal interaction; and (3) insufficient (or shallow) exploitation of both question and answer choices. In this paper, we propose a novel framework named Dual Hierarchical Temporal Convolutional Network (DHTCN) to address the aforementioned defects together. The proposed DHTCN explores multiple temporal scales by building hierarchical temporal convolutional network. In each temporal convolutional layer, two key components, namely AttLSTM and QA-Aware Dynamic Normalization, are introduced to capture the temporal dependency and the multimodal interaction in a dynamic and fine-grained manner. To enable sufficient exploitation of both question and answer choices, we increase the depth of QA pairs with a stack of non-linear layers, and exploit QA pairs in each layer of the network. Extensive experiments are conducted on two widely used datasets: TVQA and MovieQA, demonstrating the effectiveness of DHTCN. Our model obtains state-of-the-art results on the both datasets.
Published: 2020
Full Text: View/download PDF

5. Gate-based Bidirectional Interactive Decoding Network for Scene Text Recognition

Author: Hanqing Lu, Yunze Gao, Yingying Chen, and Jinqiao Wang
Subjects: Computer science, Speech recognition, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Context (language use), Data_CODINGANDINFORMATIONTHEORY, 02 engineering and technology, Text recognition, Encoder, Decoding methods, Image (mathematics)
Abstract: Scene text recognition has attracted rapidly increasing attention from the research community. Recent dominant approaches typically follow an attention-based encoder-decoder framework that uses a unidirectional decoder to perform decoding in a left-to-right manner, but ignoring equally important right-to-left grammar information. In this paper, we propose a novel Gate-based Bidirectional Interactive Decoding Network (GBIDN) for scene text recognition. Firstly, the backward decoder performs decoding from right to left and generates the reverse language context. After that, the forward decoder simultaneously utilizes the visual context from image encoder and the reverse language context from backward decoder through two attention modules. In this way, the bidirectional decoders perform effective interaction to fully fuse the bidirectional grammar information and further improve the decoding quality. Besides, in order to relieve the adverse effect of noises, we devise a gated context mechanism to adaptively make use of the visual context and reverse language context. Extensive experiments on various challenging benchmarks demonstrate the effectiveness of our method.
Published: 2019
Full Text: View/download PDF

6. Erasing-based Attention Learning for Visual Question Answering

Author: Hanqing Lu, Fei Liu, Richang Hong, and Jing Liu
Subjects: Scheme (programming language), Computer science, business.industry, Inference, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Task (project management), Constraint (information theory), Discriminative model, Margin (machine learning), 0202 electrical engineering, electronic engineering, information engineering, Question answering, Feature (machine learning), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, 0105 earth and related environmental sciences, computer.programming_language
Abstract: Attention learning for visual question answering remains a challenging task, where most existing methods treat the attention and the non-attention parts in isolation. In this paper, we propose to enforce the correlation between the attention and the non-attention parts as a constraint for attention learning. We first adopt an attention-guided erasing scheme to obtain the attention and the non-attention parts respectively, and then learn to separate the attention and the non-attention parts by an appropriate distance margin in a feature embedding space. Furthermore, we associate a typical classification loss with the above distance constraint to learn a more discriminative attention map for answer prediction. The proposed approach does not introduce extra model parameters or inference complexity, and can be combined with any attention-based models. Extensive ablation experiments validate the effectiveness of our method, and new state-of-the-art or competitive results on four publicly available datasets are achieved.
Published: 2019
Full Text: View/download PDF

7. Enhancing Visual Question Answering Using Dropout

Author: Yong Li, Jing Liu, Zhiwei Fang, Hanqing Lu, Yanyuan Qiao, and Qu Tang
Subjects: Computer science, business.industry, Inference, 02 engineering and technology, Variance (accounting), 010501 environmental sciences, Overfitting, Machine learning, computer.software_genre, 01 natural sciences, 0202 electrical engineering, electronic engineering, information engineering, Question answering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Dropout (neural networks), 0105 earth and related environmental sciences
Abstract: Using dropout in Visual Question Answering (VQA) is a common practice to prevent overfitting. However, in multi-path networks, the current way to use dropout may cause two problems: the co-adaptations of neurons and the explosion of output variance. In this paper, we propose the coherent dropout and the siamese dropouy to solve the two problems, respectively. Specifically, in coherent dropout, all relevant dropout layers in multiple paths are forced to work coherently to maximize the ability of preventing neuron co-adaptations. We show that the coherent dropout is simple in implementation but very effective to overcome overfitting. As for the explosion of output variance, we develop a siamese dropout mechanism to explicitly minimize the difference between the two output vectors produced from the same input data during training phase. Such mechanism can reduce the gap between training and inference phases and make the VQA model more robust. Extensive experiments are conducted to verify the effectiveness of coherent dropout and siamese dropout. And the results also show that our methods can bring additional improvements on the state-of-the-art VQA models.
Published: 2018
Full Text: View/download PDF

8. Pseudo Label based Unsupervised Deep Discriminative Hashing for Image Retrieval

Author: Jiaxiang Wu, Jian Cheng, Hanqing Lu, Qinghao Hu, and Lifang Wu
Subjects: Computer science, business.industry, Quantization (signal processing), Hash function, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Locality-sensitive hashing, ComputingMethodologies_PATTERNRECOGNITION, Discriminative model, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Feature hashing, Artificial intelligence, business, Image retrieval, computer, Computer Science::Databases, Computer Science::Cryptography and Security, 0105 earth and related environmental sciences
Abstract: Hashing methods play an important role in large scale image retrieval. Traditional hashing methods use hand-crafted features to learn hash functions, which can not capture the high level semantic information. Deep hashing algorithms use deep neural networks to learn feature representation and hash functions simultaneously. Most of these algorithms exploit supervised information to train the deep network. However, supervised information is expensive to obtain. In this paper, we propose a pseudo label based unsupervised deep discriminative hashing algorithm. First, we cluster images via K-means and the cluster labels are treated as pseudo labels. Then we train a deep hashing network with pseudo labels by minimizing the classification loss and quantization loss. Experiments on two datasets demonstrate that our unsupervised deep discriminative hashing method outperforms the state-of-art unsupervised hashing methods.
Published: 2017
Full Text: View/download PDF

9. RSVP

Author: Han Yu, Ruihe Qian, Guanghui Ren, Hanqing Lu, Changhu Wang, Si Liu, and Yao Sun
Subjects: Parsing, Computer science, business.industry, Speech recognition, Deep learning, Frame (networking), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, computer.software_genre, Single frame, Video tracking, Face (geometry), Segmentation, Artificial intelligence, business, computer
Abstract: In this demo, we present a real-time surveillance video parsing (RSVP) system to parse surveillance videos. Surveillance video parsing, which aims to segment the video frames into several labels, e.g., face, pants, left-legs, has wide applications, especially in security filed. However, it is very tedious and time-consuming to annotate all the frames in a video. We design a RSVP system to parse the surveillance videos in real-time. The RSVP system requires only one labeled frame in training stage. The RSVP system jointly considers the segmentation of preceding frames when parsing one particular frame within the video. The RSVP system is proved to be effective and efficient in real applications.
Published: 2017
Full Text: View/download PDF

10. Sketch-based Image Retrieval using Generative Adversarial Networks

Author: Longteng Guo, Hanqing Lu, Wei Wen, Yuhang Wang, Jing Liu, and Zhonghua Luo
Subjects: business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, Real image, Sketch, Image (mathematics), Consistency (database systems), Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Encoder, Image retrieval, Generative grammar, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: For sketch-based image retrieval (SBIR), we propose a generative adversarial network trained on a large number of sketches and their corresponding real images. To imitate human search process, we attempt to match candidate images with theimaginary image in user single s mind instead of the sketch query, i.e., not only the shape information of sketches but their possible content information are considered in SBIR. Specifically, a conditional generative adversarial network (cGAN) is employed to enrich the content information of sketches and recover the imaginary images, and two VGG-based encoders, which work on real and imaginary images respectively, are used to constrain their perceptual consistency from the view of feature representations. During SBIR, we first generate an imaginary image from a given sketch via cGAN, and then take the output of the learned encoder for imaginary images as the feature of the query sketch. Finally, we build an interactive SBIR system that shows encouraging performance.
Published: 2017
Full Text: View/download PDF

11. Learning Max-Margin GeoSocial Multimedia Network Representations for Point-of-Interest Suggestion

Author: Fei Wu, Jun Xiao, Min Yang, Hanqing Lu, Zhou Zhao, Yueting Zhuang, and Qifan Yang
Subjects: Point of interest, Multimedia, Computer science, 02 engineering and technology, computer.software_genre, Social relation, Margin (machine learning), 020204 information systems, Metric (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Multimedia network, Web service, Mobile device, computer, Feature learning
Abstract: With the rapid development of mobile devices, point-of-interest (POI) suggestion has become a popular online web service, which provides attractive and interesting locations to users. In order to provide interesting POIs, many existing POI recommendation works learn the latent representations of users and POIs from users' past visiting POIs, which suffers from the sparsity problem of POI data. In this paper, we consider the problem of POI suggestion from the viewpoint of learning geosocial multimedia network representations. We propose a novel max-margin metric geosocial multimedia network representation learning framework by exploiting users' check-in behavior and their social relations. We then develop a random-walk based learning method with max-margin metric network embedding. We evaluate the performance of our method on a large-scale geosocial multimedia network dataset and show that our method achieves the best performance than other state-of-the-art solutions.
Published: 2017
Full Text: View/download PDF

12. Deep learning driven hypergraph representation for image-based emotion recognition

Author: Hanqing Lu and Yuchi Huang
Subjects: Ground truth, Hypergraph, Computer science, business.industry, Deep learning, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Machine learning, computer.software_genre, Convolutional neural network, Face (geometry), 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), 020201 artificial intelligence & image processing, Artificial intelligence, Layer (object-oriented design), Representation (mathematics), business, computer
Abstract: In this paper, we proposed a bi-stage framework for image-based emotion recognition by combining the advantages of deep convolutional neural networks (D-CNN) and hypergraphs. To exploit the representational power of D-CNN, we remodeled its last hidden feature layer as the `attribute' layer in which each hidden unit produces probabilities on a specific semantic attribute. To describe the high-order relationship among facial images, each face was assigned to various hyperedges according to the computed probabilities on different D-CNN attributes. In this way, we tackled the emotion prediction problem by a transductive learning approach, which tends to assign the same label to faces that share many incidental hyperedges (attributes), with the constraints that predicted labels of training samples should be similar to their ground truth labels. We compared the proposed approach to state-of-the-art methods and its effectiveness was demonstrated by extensive experimentation.
Published: 2016
Full Text: View/download PDF

13. Objectness-aware Semantic Segmentation

Author: Junjie Yan, Hanqing Lu, Yuhang Wang, Jing Liu, and Yong Li
Subjects: 0209 industrial biotechnology, Artificial neural network, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Pascal (programming language), Machine learning, computer.software_genre, Convolutional neural network, Upsampling, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Segmentation, Artificial intelligence, business, computer, Test data, computer.programming_language
Abstract: Recent advances in semantic segmentation are driven by the success of fully convolutional neural network (FCN). However, the coarse label map from the network and the object discrimination ability for semantic segmentation weaken the performance of those FCN-based models. To address these issues, we propose an objectness-aware semantic segmentation framework (OA-Seg) by jointly learning an object proposal network (OPN) and a lightweight deconvolutional neural network (Light-DCNN). First, OPN is learned based on a fully convolutional architecture to simultaneously predict object bounding boxes and their objectness scores. Second, we design a Light-DCNN to provide a finer upsampling way than FCN. The Light-DCNN is constructed with convolutional layers in VGG-net and their mirrored deconvolutional structure, where all fully-connected layers are removed. And hierarchical classification layers are added to multi-scale deconvolutional features to introduce more contextual information for pixel-wise label prediction. Compared with previous works, our approach performs an obvious decrease on model size and convergence time. Thorough evaluations are performed on the PASCAL VOC 2012 benchmark, and our model yields impressive results on its validation data (70.3% mean IoU) and test data (74.1% mean IoU).
Published: 2016
Full Text: View/download PDF

14. Partial Multi-Modal Sparse Coding via Adaptive Similarity Structure Regularization

Author: Hanqing Lu, Yueting Zhuang, Xiaofei He, Cai Deng, and Zhou Zhao
Subjects: Modality (human–computer interaction), Computer science, business.industry, 0206 medical engineering, Structure (category theory), Pattern recognition, 02 engineering and technology, Sparse approximation, 020601 biomedical engineering, Regularization (mathematics), Modal, Similarity (network science), ComputerApplications_MISCELLANEOUS, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Neural coding
Abstract: Multi-modal sparse coding has played an important role in many multimedia applications, where data are usually with multiple modalities. Recently, various multi-modal sparse coding approaches have been proposed to learn sparse codes of multi-modal data, which assume that data appear in all modalities, or at least there is one modality containing all data. However, in real applications, it is often the case that some modalities of the data may suffer from missing information and thus result in partial multi-modality data. In this paper, we propose to solve the partial multi-modal sparse coding problem via multi-modal similarity structure regularization. Specifically, we propose a partial multi-modal sparse coding framework termed Adaptive Partial Multi-Modal Similarity Structure Regularization for Sparse Coding (AdaPM2SC), which preserves the similarity structure within the same modality and between different modalities. Experimental results conducted on two real-world datasets demonstrate that AdaPM2SC significantly outperforms the state-of-the-art methods under partial multi-modality scenario.
Published: 2016
Full Text: View/download PDF

15. Deep People Counting with Faster R-CNN and Correlation Tracking

Author: Jinqiao Wang, Hanqing Lu, Baocai Yin, Zhiqiang Li, Yikai Fang, Huazhong Xu, and La Zhang
Subjects: Computer science, business.industry, Deep learning, 010401 analytical chemistry, Detector, Correlation filter, 02 engineering and technology, 01 natural sciences, Convolutional neural network, 0104 chemical sciences, Correlation, Minimum bounding box, 0202 electrical engineering, electronic engineering, information engineering, Leverage (statistics), 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Crowd counting
Abstract: Crowd counting is a key problem for many computer vision tasks while most existing methods try to count people based on regression with hand-crafted features. Recently, the fast development of deep learning has resulted in many promising detectors of generic object classes. In this paper, to effective leverage the discriminability of convolutional neural networks, we propose a method to people counting based on Faster R-CNN[9] head-shoulder detection and correlation tracking. Firstly, we train a Faster R-CNN head-shoulder detector with Zeiler model to detect people with multiple poses and views. Next, we employ kernelized correlation filter(KCF)[7] to track the people and obtain the trajectory. Considering the results of the detection and tracking, we fuse the two bounding box to obtain a continuous and stable trajectory. Extensive experiments and comparison show the promise of the proposed approach.
Published: 2016
Full Text: View/download PDF

16. Object-aware Deep Network for Commodity Image Retrieval

Author: Yong Li, Jing Liu, Hanqing Lu, Song Hang, Jinhui Tang, Zhiwei Fang, and Yuhang Wang
Subjects: education.field_of_study, Information retrieval, business.industry, Computer science, Population, Commodity, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, Object (computer science), 01 natural sciences, Object detection, Image (mathematics), Ranking (information retrieval), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, education, business, Feature learning, Image retrieval, computer, 0105 earth and related environmental sciences
Abstract: Recent years, with the development of e-commerce and population of mobile phones, image-based commodity retrieval has attracted much attention. This paper proposed a deep framework for commodity image retrieval(CMIR) from the view that they are same designed commodities. Our framework can catch as many design details as possible by exploring object detection and ranking sensitive feature learning, while the former is performed based on Faster R-CNN, and the later is learned with a multi-task Siamese Network. Besides, we refine the processing speed of the framework to make it a live system. Our framework is implemented on an android application based on Client/Server structure model whose server response time is about 150 ms per query.
Published: 2016
Full Text: View/download PDF

17. Personalized Recommendation Meets Your Next Favorite

Author: Ting Yuan, Hanqing Lu, Jian Cheng, and Qiang Song
Subjects: Topic model, Measure (data warehouse), Information retrieval, Computer science, business.industry, media_common.quotation_subject, Recommender system, Machine learning, computer.software_genre, Ranking (information retrieval), Personalization, Ranking, Collaborative filtering, Graph (abstract data type), Artificial intelligence, Function (engineering), business, computer, media_common
Abstract: A comprehensive understanding of user's item selection behavior is not only essential to many scientific disciplines, but also has a profound business impact on online recommendation. Recent researches have discovered that user's favorites can be divided into 2 categories: long-term and short-term. User's item selection behavior is a mixed decision of her long and short-term favorites. In this paper, we propose a unified model, namely States Transition pAir-wise Ranking Model (STAR), to address users' favorites mining for sequential-set recommendation. Our method utilizes a transition graph for collaborative filtering that accounts for mining user's short-term favorites, jointed with a generative topic model for expressing user's long-term favorites. Furthermore, a user's specific prior is introduced into our unified model for better modeling personalization. Technically, we develop a pair-wise ranking loss function for parameters learning. Empirically, we measure the effectiveness of our method using two real-world datasets and the results show that our method outperforms state-of-the-art methods.
Published: 2015
Full Text: View/download PDF

18. Exclusive Constrained Discriminative Learning for Weakly-Supervised Semantic Segmentation

Author: Hanqing Lu, Songde Ma, Jin Liu, and Peng Ying
Subjects: Computer science, business.industry, LabelMe, Pattern recognition, Machine learning, computer.software_genre, Term (time), Task (project management), Constraint (information theory), ComputingMethodologies_PATTERNRECOGNITION, Discriminative model, Segmentation, Artificial intelligence, business, Set (psychology), computer, Smoothing, Discriminative learning
Abstract: How to import image-level labels as weak supervision to direct the region-level labeling task is the core task of weakly-supervised semantic segmentation. In this paper, we focus on designing an effective but simple weakly-supervised constraint, and propose an exclusive constrained discriminative learning model for image semantic segmentation. To be specific, we employ a discriminative linear regression model to assign subsets of superpixels with different labels. During the assignment, we construct an exclusive weakly-supervised constraint term to suppress the labeling responses of each superpixel on the labels outside its parent image-level label set. Besides, a spectral smoothing term is integrated to encourage that both visually and semantically similar superpixels have similar labels. Combining these terms, we formulate the problem as a convex objective function, which can be easily optimized via alternative iterations. Extensive experiments on MSRC-21 and LabelMe datasets demonstrate the effectiveness of the proposed model.
Published: 2015
Full Text: View/download PDF

19. Semi- and Weakly- Supervised Semantic Segmentation with Deep Convolutional Neural Networks

Author: Hanqing Lu, Yong Li, Jing Liu, and Yuhang Wang
Subjects: business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, LabelMe, Pattern recognition, Pascal (programming language), Semi-supervised learning, Machine learning, computer.software_genre, Convolutional neural network, ComputingMethodologies_PATTERNRECOGNITION, Discriminative model, Labeled data, Segmentation, Artificial intelligence, business, computer, computer.programming_language
Abstract: Successful semantic segmentation methods typically rely on the training datasets containing a large number of pixel-wise labeled images. To alleviate the dependence on such a fully annotated training dataset, in this paper, we propose a semi- and weakly-supervised learning framework by exploring images most only with image-level labels and very few with pixel-level labels, in which two stages of Convolutional Neural Network (CNN) training are included. First, a pixel-level supervised CNN is trained on very few fully annotated images. Second, given a large number of images with only image-level labels available, a collaborative-supervised CNN is designed to jointly perform the pixel-level and image-level classification tasks, while the pixel-level labels are predicted by the fully-supervised network in the first stage. The collaborative-supervised network can remain the discriminative ability of the fully-supervised model learned with fully labeled images, and further enhance the performance by importing more weakly labeled data. Our experiments on two challenging datasets, i.e, PASCAL VOC 2007 and LabelMe LMO, demonstrate the satisfactory performance of our approach, nearly matching the results achieved when all training images have pixel-level labels.
Published: 2015
Full Text: View/download PDF

20. Spatio-Temporal Triangular-Chain CRF for Activity Recognition

Author: Yifan Zhang, Congqi Cao, and Hanqing Lu
Subjects: Activity recognition, Chain (algebraic topology), Action (philosophy), business.industry, Computer science, Artificial intelligence, Dimension (data warehouse), business, Machine learning, computer.software_genre, CRFS, computer, Hierarchical database model
Abstract: Understanding human activities in video is a fundamental problem in computer vision. In real life, human activities are composed of temporal and spatial arrangement of actions. Understanding such complex activities requires recognizing not only each individual action, but more importantly, capturing their spatio-temporal relationships. This paper addresses the problem of complex activity recognition with a unified hierarchical model. We expand triangular-chain CRFs (TriCRFs) to the spatial dimension. The proposed architecture can be perceived as a spatio-temporal version of the TriCRFs, in which the labels of actions and activity are modeled jointly and their complex dependencies are exploited. Experiments show that our model generates promising results, outperforming competing methods significantly. The framework also can be applied to model other structured sequential data.
Published: 2015
Full Text: View/download PDF

21. Learning Multi-view Deep Features for Small Object Retrieval in Surveillance Scenarios

Author: Zheng-Jun Zha, Hanqing Lu, Haiyun Guo, Min Xu, and Jinqiao Wang
Subjects: Computer science, Property (programming), business.industry, Hash function, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Representation (systemics), Pattern recognition, Object (computer science), Convolutional neural network, Discriminative model, Feature (computer vision), Visual Objects, Computer vision, Artificial intelligence, business, computer, computer.programming_language
Abstract: © 2015 ACM. With the explosive growth of surveillance videos, object re-trieval has become a significant task for security monitoring. However, visual objects in surveillance videos are usually of small size with complex light conditions, view changes and partial occlusions, which increases the dificulty level of eff-ciently retrieving objects of interest in a large-scale dataset. Although deep features have achieved promising results on object classification and retrieval and have been veriffed to contain rich semantic structure property, they lack of ade-quate color information, which is as crucial as structure in-formation for effective object representation. In this paper, we propose to leverage discriminative Convolutional Neural Network (CNN) to learn deep structure and color feature to form an Effcient multi-view object representation. Specifi-cally, we utilize CNN trained on ImageNet to abstract rich semantic structure information. Meanwhile, we propose a CNN model supervised by 11 color names to extract deep color features. Compared with traditional color descriptors, deep color features can capture the common color property across difierent illumination conditions. Then, the comple-mentary multi-view deep features are encoded into short bi-nary codes by Locality-Sensitive Hash (LSH) and fused to retrieve objects. Retrieval experiments are performed on a dataset of 100k objects extracted from multi-camera surveil-lance videos. Comparison results with several popular visual descriptors show the effectiveness of the proposed approach.
Published: 2015
Full Text: View/download PDF

22. Incremental Matrix Factorization via Feature Space Re-learning for Recommender System

Author: Qiang Song, Hanqing Lu, and Jian Cheng
Subjects: Matrix (mathematics), Theoretical computer science, Feature Dimension, Computer science, Feature (computer vision), Feature vector, Document-term matrix, Recommender system, Matrix decomposition, Non-negative matrix factorization
Abstract: Matrix factorization is widely used in Recommender Systems. Although existing popular incremental matrix factorization methods are effectively in reducing time complexity, they simply assume that the similarity between items or users is invariant. For instance, they keep the item feature matrix unchanged and just update the user matrix without re-training the entire model. However, with the new users growing continuously, the fitting error would be accumulated since the extra distribution information of items has not been utilized. In this paper, we present an alternative and reasonable approach, with a relaxed assumption that the similarity between items (users) is relatively stable after updating. Concretely, utilizing the prediction error of the new data as the auxiliary features, our method updates both feature matrices simultaneously, and thus users' preference can be better modeled than merely adjusting one corresponded feature matrix. Besides, our method maintains the feature dimension in a smaller size through taking advantage of matrix sketching. Experimental results show that our proposal outperforms the existing incremental matrix factorization methods.
Published: 2015
Full Text: View/download PDF

23. 60 Hz self-tuning background modeling

Author: Hanqing Lu, Jun Luo, La Zhang, Jinqiao Wang, Yingying Chen, and Huazhong Xu
Subjects: Background subtraction, Sequence, Pixel, business.industry, Computer science, Benchmark (computing), Self-tuning, Preprocessor, Computer vision, Artificial intelligence, business, Sample (graphics), Change detection
Abstract: Background modeling or change detection is often used as a preprocessing step in many computer vision tasks especially for intelligent surveillance. Despite various methods have been proposed to deal with this problem, they often involve complex parameter settings and have poor adaptability to scene changes. In this paper, we propose a fast and robust approach for background modeling with self-adaptive ability. Like ViBe [7], each pixel model is represented by a sequence of historical samples based on sample consensus. To adapt various changes in complex scenes, a flexible feedback scheme is presented to automatically adjust the model parameters. Moreover, a selective diffusion method is employed to overcome the problems like incomplete foregrounds or false detections brought by intermittent moving objects. Experiment results on ChangeDetection benchmark 2014 show that the proposed approach outperforms state-of-the-art approaches with a speed of 60 fps on CPU for a 640 × 480 image sequence.
Published: 2015
Full Text: View/download PDF

24. Concurrent group activity classification with context modeling

Author: Wei Fu, Jian Cheng, Hanqing Lu, Jinqiao Wang, Jing Liu, and Chaoyang Zhao
Subjects: Context model, Action (philosophy), Computer science, business.industry, Duration time, Feature (machine learning), Context (language use), Artificial intelligence, business, Group activity, Cognitive psychology, Task (project management)
Abstract: Group activity classification is the task to identify activities with multiple person participation, which often involves in the usage of the context information like person relationships and person interactions. In this paper, we propose a novel approach to jointly model three co-existing cues including the activity duration time, individual action feature and the context information shared between person interactions. Our approach infers group activity labels of all the persons together with their activity durations, especially for the situation with multiple group activities co-existing. Experimental results show that our approach outperform state-of-the-art by 10%.
Published: 2015
Full Text: View/download PDF

25. When Personalization Meets Conformity

Author: Shuang Qiu, Hanqing Lu, Zhenfeng Zhu, Jian Cheng, and Xi Zhang
Subjects: Information retrieval, Computer science, media_common.quotation_subject, Context (language use), Recommender system, computer.software_genre, Conformity, Personalization, Domain (software engineering), Similarity (psychology), Leverage (statistics), Data mining, computer, media_common
Abstract: Existing recommender systems place emphasis on personalization to achieve promising accuracy. However, in the context of multiple domain, users are likely to seek the same behaviors as domain authorities. This conformity effect provides a wealth of prior knowledge when it comes to multi-domain recommendation, but has not been fully exploited. In particular, users whose behaviors are significant similar with the public tastes can be viewed as domain authorities. To detect these users meanwhile embed conformity into recommendation, a domain-specific similarity matrix is intuitively employed. Therefore, a collective similarity is obtained to leverage the conformity with personalization. In this paper, we establish a Collective Structure Sparse Representation(CSSR) method for multi-domain recommendation. Based on adaptive $k$-Nearest-Neighbor framework, we impose the lasso and group lasso penalties as well as least square loss to jointly optimize the collective similarity. Experimental results on real-world data confirm the effectiveness of the proposed method.
Published: 2015
Full Text: View/download PDF

26. Mobile Media Thumbnailing

Author: Yingying Chen, Jinqiao Wang, Hanqing Lu, and Jing Liu
Subjects: Computer science, business.industry, Frame (networking), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Thumbnail, Display device, Mobile media, Computer vision, The Internet, Video browsing, Artificial intelligence, Image warping, business, Mobile device
Abstract: With the development of Multimedia and Internet techniques, massively increasing visual data, such as image and video, need to be shown and browsed as thumbnails in various digital display platforms, like PC, cell phone, etc. This demonstration presents a grid based adaptive media thumb-nailing approach to maximize user experience in mobile image and video browsing. After representative frame extraction by spectral clustering and salient region detection, we obtain thumbnails with three resizing operators: cropping, warping and scaling, and adaptively fuse them into a unified grid based convex programming problem which could be solved simultaneously and efficiently through numerical optimization. Extensive experiments and comparisons on HUAWEI Honor 6 and Samsung S5 demonstrate that the proposed method achieves an excellent information preservation for thumbnails in mobile devices.
Published: 2015
Full Text: View/download PDF

27. Exploring Heterogeneity for Multi-Domain Recommendation with Decisive Factors Selection

Author: Hanqing Lu, Shuang Qiu, Xi Zhang, and Jian Cheng
Subjects: Consistency (database systems), Multi domain, Computer science, Data mining, computer.software_genre, computer, Selection (genetic algorithm), Domain (software engineering)
Abstract: To address the recommendation problems in the scenarios of multiple domains, in this paper, we propose a novel method, HMRec, which models both consistency and heterogeneity of users' multiple behaviors in a unified framework. Moreover, the decisive factors of each domain can also be captured by our approach successfully. Experiments on the real multi-domain dataset demonstrate the effectiveness of our model.
Published: 2015
Full Text: View/download PDF

28. Mask Assisted Object Coding with Deep Learning for Object Retrieval in Surveillance Videos

Author: Hanqing Lu, Min Xu, Jinqiao Wang, and Kezhen Teng
Subjects: Artificial neural network, Computer science, business.industry, Deep learning, Autoencoder, Background noise, Robustness (computer science), Video tracking, Visual Objects, Computer vision, Artificial intelligence, business, computer, Coding (social sciences), computer.programming_language
Abstract: Retrieving visual object from a large-scale video dataset is one of multimedia research focuses but a challenging task due to imprecise object extraction and partial occlusion. This paper presents a novel approach to efficiently encode and retrieve visual objects, which addresses some practical complications in surveillance videos. Specifically, we take advantage of the mask information to assist object representation, and develop an encoding method by utilizing highly nonlinear mapping with a deep neural network. Furthermore, we add some occluded noise into the learning process to enhance the robustness of dealing with background noise and partial occlusions. A real-life surveillance video data containing over 10 million objects are built to evaluate the proposed approach. Experimental results show our approach significantly outperforms state-of-the-art solutions for object retrieval in large-scale video dataset.
Published: 2014
Full Text: View/download PDF

29. Supervised Hashing with Soft Constraints

Author: Cong Leng, Jian Cheng, Jiaxiang Wu, Xi Zhang, and Hanqing Lu
Subjects: Boosting (machine learning), Computer science, business.industry, Hash function, Hamming distance, Pattern recognition, Machine learning, computer.software_genre, Regularization (mathematics), Semantic similarity, Artificial intelligence, Hamming space, business, computer, Semantic gap
Abstract: Due to the ability to preserve semantic similarity in Hamming space, supervised hashing has been extensively studied recently. Most existing approaches encourage two dissimilar samples to have maximum Hamming distance. This may lead to an unexpected consequence that two unnecessarily similar samples would have the same code if they are both dissimilar with another sample. Besides, in existing methods, all labeled pairs are treated with equal importance without considering the semantic gap, which is not conducive to thoroughly leverage the supervised information. We present a general framework for supervised hashing to address the above two limitations. We do not toughly require a dissimilar pair to have maximum Hamming distance. Instead, a soft constraint which can be viewed as a regularization to avoid over-fitting is utilized. Moreover, we impose different weights to different training pairs, and these weights can be automatically adjusted in the learning process. Experiments on two benchmarks show that the proposed method can easily outperform other state-of-the-art methods.
Published: 2014
Full Text: View/download PDF

30. Less is More

Author: Hanqing Lu, Jian Cheng, and Xi Zhang
Subjects: Preference learning, business.industry, Computer science, Recommender system, Machine learning, computer.software_genre, Preference, MovieLens, Cold start, Benchmark (computing), Preference elicitation, Artificial intelligence, Set (psychology), business, computer
Abstract: Cold start recommendation is a challenging but crucial problem for recommender systems. Preference Elicitation, as a commonly used approach to address the problem, solicits preference of cold user by interviewing them with some elaborately selected items. How to select minimum items to reflect user preference as much as possible is the essential goal of preference elicitation. In this paper, we propose a novel Structured Sparse Representative Selection(SSRS) model to select a sparse set of items based on their ability of representation. Moreover, a e2,1-norm is utilized on both loss function and regularization to make the model insensitive to outliers and avoid selecting redundant queries respectively. Empirical results on benchmark movie rating datasets Movielens and Flixster verify the promising performance of our proposed preference elicitation method for cold start recommendation.
Published: 2014
Full Text: View/download PDF

31. Group latent factor model for recommendation with multiple user behaviors

Author: Jinqiao Wang, Jian Cheng, Ting Yuan, and Hanqing Lu
Subjects: Computer science, Group (mathematics), business.industry, Recommender system, computer.software_genre, Machine learning, Linear subspace, Matrix decomposition, Factor (programming language), Collaborative filtering, Data mining, Artificial intelligence, business, computer, computer.programming_language
Abstract: Recently, some recommendation methods try to relieve the data sparsity problem of Collaborative Filtering by exploiting data from users' multiple types of behaviors. However, most of the exist methods mainly consider to model the correlation between different behaviors and ignore the heterogeneity of them, which may make improper information transferred and harm the recommendation results. To address this problem, we propose a novel recommendation model, named Group Latent Factor Model (GLFM), which attempts to learn a factorization of latent factor space into subspaces that are shared across multiple behaviors and subspaces that are specific to each type of behaviors. Thus, the correlation and heterogeneity of multiple behaviors can be modeled by these shared and specific latent factors. Experiments on the real-world dataset demonstrate that our model can integrate users' multiple types of behaviors into recommendation better.
Published: 2014
Full Text: View/download PDF

32. Item group based pairwise preference learning for personalized ranking

Author: Ting Yuan, Cong Leng, Hanqing Lu, Shuang Qiu, and Jian Cheng
Subjects: Preference learning, Computer science, business.industry, Machine learning, computer.software_genre, Preference, k-nearest neighbors algorithm, Ranking, Collaborative filtering, Pairwise comparison, Artificial intelligence, Set (psychology), business, computer
Abstract: Collaborative filtering with implicit feedbacks has been steadily receiving more attention, since the abundant implicit feedbacks are more easily collected while explicit feedbacks are not necessarily always available. Several recent work address this problem well utilizing pairwise ranking method with a fundamental assumption that a user prefers items with positive feedbacks to the items without observed feedbacks, which also implies that the items without observed feedbacks are treated equally without distinction. However, users have their own preference on different items with different degrees which can be modeled into a ranking relationship. In this paper, we exploit this prior information of a user's preference from the nearest neighbor set by the neighbors' implicit feedbacks, which can split items into different item groups with specific ranking relations. We propose a novel PRIGP(Personalized Ranking with Item Group based Pairwise preference learning) algorithm to integrate item based pairwise preference and item group based pairwise preference into the same framework. Experimental results on three real-world datasets demonstrate the proposed method outperforms the competitive baselines on several ranking-oriented evaluation metrics.
Published: 2014
Full Text: View/download PDF

33. Random subspace for binary codes learning in large scale image retrieval

Author: Hanqing Lu, Jian Cheng, and Cong Leng
Subjects: Theoretical computer science, Computer science, Universal hashing, Nearest neighbor search, Feature vector, Concatenation, Hash function, Short Code, 2-choice hashing, Linear code, K-independent hashing, Binary code, computer, Image retrieval, computer.programming_language
Abstract: Due to the fast query speed and low storage cost, hashing based approximate nearest neighbor search methods have attracted much attention recently. Many state of the art methods are based on eigenvalue decomposition. In these approaches, the information caught in different dimensions is unbalanced and generally most of the information is contained in the top eigenvectors. We demonstrate that this leads to an unexpected phenomenon that longer hashing code does not necessarily yield better performance. In this work, we introduce a random subspace strategy to address this limitation. At first, a small fraction of the whole feature space is randomly sampled to train the hashing algorithms each time and only the top eigenvectors are kept to generate one piece of short code. This process will be repeated several times and then the obtained many pieces of short codes are concatenated into one piece of long code. Theoretical analysis and experiments on two benchmarks confirm the effectiveness of the proposed strategy for hashing.
Published: 2014
Full Text: View/download PDF

34. Fusing multi-modal features for gesture recognition

Author: Hanqing Lu, Jian Cheng, Jiaxiang Wu, and Chaoyang Zhao
Subjects: Dynamic time warping, business.industry, Computer science, Speech recognition, Pattern recognition, Sign language, Levenshtein distance, Gesture recognition, Artificial intelligence, business, Precision and recall, Hidden Markov model, Classifier (UML), Signature recognition
Abstract: This paper proposes a novel multi-modal gesture recognition framework and introduces its application to continuous sign language recognition. A Hidden Markov Model is used to construct the audio feature classifier. A skeleton feature classifier is trained to provided complementary information based on the Dynamic Time Warping model. The confidence scores generated by two classifiers are firstly normalized and then combined to produce a weighted sum for the final recognition. Experimental results have shown that the precision and recall scores for 20 classes of our multi-modal recognition framework can achieve 0.8829 and 0.8890 respectively, which proves that our method is able to correctly reject false detection caused by single classifier. Our approach scored 0.12756 in mean Levenshtein distance and was ranked 1st in the Multi-modal Gesture Recognition Challenge in 2013.
Published: 2013
Full Text: View/download PDF

35. Object co-segmentation via discriminative low rank matrix recovery

Author: Yong Li, Jing Liu, Hanqing Lu, Zechao Li, and Yang Liu
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Low-rank approximation, Pattern recognition, Object (computer science), Image (mathematics), Term (time), Set (abstract data type), Discriminative model, Salient, Segmentation, Computer vision, Artificial intelligence, business
Abstract: The goal of this paper is to simultaneously segment the object regions appearing in a set of images of the same object class, known as object co-segmentation. Different from typical methods, simply assuming that the regions common among images are the object regions, we additionally consider the disturbance from consistent backgrounds, and indicate not only common regions but salient ones among images to be the object regions. To this end, we propose a Discriminative Low Rank matrix Recovery (DLRR) algorithm to divide the over-completely segmented regions (i.e.,superpixels) of a given image set into object and non-object ones. In DLRR, a low-rank matrix recovery term is adopted to detect salient regions in an image, while a discriminative learning term is used to distinguish the object regions from all the super-pixels. An additional regularized term is imported to jointly measure the disagreement between the predicted saliency and the objectiveness probability corresponding to each super-pixel of the image set. For the unified learning problem by connecting the above three terms, we design an efficient optimization procedure based on block-coordinate descent. Extensive experiments are conducted on two public datasets, i.e., MSRC and iCoseg, and the comparisons with some state-of-the-arts demonstrate the effectiveness of our work.
Published: 2013
Full Text: View/download PDF

36. TCRec

Author: Jing Liu, Yu Jiang, Zechao Li, Hanqing Lu, and Xi Zhang
Subjects: Product category, Information retrieval, Computer science, Product (category theory), Social trust
Abstract: In this paper, we develop a novel product recommendation method called TCRec, which takes advantage of consumer rating history record, social-trust network and product category information simultaneously. Compared experiments are conducted on two real-world datasets and outstanding performance is achieved, which demonstrates the effectiveness of TCRec.
Published: 2013
Full Text: View/download PDF

37. TopRec

Author: Hanqing Lu, Biao Niu, Xi Zhang, Jian Cheng, and Ting Yuan
Subjects: Topic model, Social network, Computer science, business.industry, Probabilistic logic, Filter (signal processing), Recommender system, Machine learning, computer.software_genre, Preference, Domain (software engineering), Collaborative filtering, Artificial intelligence, business, computer
Abstract: Traditionally, Collaborative Filtering assumes that similar users have similar responses to similar items. However, human activities exhibit heterogenous features across multiple domains such that users own similar tastes in one domain may behave quite differently in other domains. Moreover, highly sparse data presents crucial challenge in preference prediction. Intuitively, if users' interested domains are captured first, the recommender system is more likely to provide the enjoyed items while filter out those uninterested ones. Therefore, it is necessary to learn preference profiles from the correlated domains instead of the entire user-item matrix. In this paper, we propose a unified framework, TopRec, which detects topical communities to construct interpretable domains for domain-specific collaborative filtering. In order to mine communities as well as the corresponding topics, a semi-supervised probabilistic topic model is utilized by integrating user guidance with social network. Experimental results on real-world data from Epinions and Ciao demonstrate the effectiveness of the proposed framework.
Published: 2013
Full Text: View/download PDF

38. Real-time multiple object instances detection

Author: Jinqiao Wang, Hanqing Lu, Yifan Zhang, and Chengli Xie
Subjects: Computer science, business.industry, Template matching, Pattern recognition, Data mining, Artificial intelligence, computer.software_genre, business, computer, Classifier (UML)
Abstract: In this paper, we present a novel, real-time multiple object instance detection system via template matching and pairwise classification. Instance detection aims to find and locate exactly the same object instances as specified. Our system is composed of two heterogeneous stages. The first stage adopts instance-specific detection to generate candidates. And the second stage makes use of a pairwise-based classifier across instance categories to test and verify these candidates with respect to templates. Experiments show the superiority of our approach.
Published: 2012
Full Text: View/download PDF

39. Hi, magic closet, tell me what to wear!

Author: Shuicheng Yan, Zheng Song, Hanqing Lu, Jiashi Feng, Si Liu, Tianzhu Zhang, and Changsheng Xu
Subjects: Banquet, Information retrieval, Computer science, business.industry, media_common.quotation_subject, Magic (programming), Closet, Clothing, business, Magic (paranormal), GeneralLiterature_MISCELLANEOUS, media_common
Abstract: In this paper, we aim at a practical system, magic closet, for automatic occasion-oriented clothing recommendation. Given a user-input occasion, e.g., wedding, shopping or dating, magic closet intelligently suggests the most suitable clothing from the user's own clothing photo album, or automatically pairs the user-specified reference clothing (upper-body or lower-body) with the most suitable one from online shops. Two key criteria are explicitly considered for the magic closet system. One criterion is to wear properly, e.g., compared to suit pants, it is more decent to wear a cocktail dress for a banquet occasion. The other criterion is to wear aesthetically, e.g., a red T-shirt matches better white pants than green pants. To narrow the semantic gap between the low-level features of clothing and the high-level occasion categories, we adopt middle-level clothing attributes (e.g., clothing category, color, pattern) as a bridge. More specifically, the clothing attributes are treated as latent variables in our proposed latent Support Vector Machine (SVM) based recommendation model. The wearing properly criterion is described in the model through a feature-occasion potential and an attribute-occasion potential, while the wearing aesthetically criterion is expressed by an attribute-attribute potential. To learn a generalize-well model and comprehensively evaluate it, we collect a large clothing What-to-Wear (WoW) dataset, and thoroughly annotate the whole dataset with 7 multi-value clothing attributes and 10 occasion categories via Amazon Mechanic Turk. Extensive experiments on the WoW dataset demonstrate the effectiveness of the magic closet system for both occasion-oriented clothing recommendation and pairing.
Published: 2012
Full Text: View/download PDF

40. Social tag alignment with image regions by sparse reconstructions

Author: Yang Liu, Zechao Li, Hanqing Lu, Jing Liu, and Biao Niu
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer vision, Artificial intelligence, Neural coding, business, Image retrieval, Image (mathematics), Feature detection (computer vision), Task (project management)
Abstract: How to align social tags with image regions without additional human intervention is a challenging but a valuable task since it can provide more detailed image semantic information and improve the accuracy of image retrieval. To this end, we propose a novel tag-to-region method with two phases of sparse reconstructions by exploring the large-scale user contributed resources. Given an image with social tags, we first explore the tagging information of large-scale social images to sparsely reconstruct the label vector of the given image, and then use the reconstructing weights as the semantic relevance to the image. With the top $T$ semantically relevant images, we further employ a group sparse coding algorithm to reconstruct each region of the given image, in which the regions from the social images with a common label are deemed as a label group. The group sparsity works on the assumption that one image region corresponds to tags as few as possible. Finally, the region-level tags can be predicted based on the reconstruction error in the corresponding label groups. Extensive experiments on MSRC and SAIAPR TC-12 datasets demonstrate the encouraging performance of our method in comparison with other baselines.
Published: 2012
Full Text: View/download PDF

41. Low rank metric learning for social image retrieval

Author: Zechao Li, Yu Jiang, Jinhui Tang, Jing Liu, and Hanqing Lu
Subjects: Computer science, business.industry, Pattern recognition, Overfitting, Machine learning, computer.software_genre, Contextual design, Semantic similarity, Convex optimization, Metric (mathematics), Leverage (statistics), Artificial intelligence, business, computer, Image retrieval
Abstract: With the popularity of social media applications, large amounts of social images associated with rich context are available, which is helpful for many applications. In this paper, we propose a Low Rank distance Metric Learning (LRML) algorithm by discovering knowledge from these rich contextual data, to boost the performance of CBIR. Different from traditional approaches that often use the must-links and cannot-links between images, the proposed method exploits information from the visual and textual domains. We assume that the visual similarity estimated by the learned metric is expected to be consistent with the semantic similarity in the textual domain. Since tags are usually noisy, misspelling or meaningless, we also leverage the preservation of visual structure to prevent overfitting those noisy tags. On the other hand, the metric is straightforward constrained to be low rank. We formulate it as a convex optimization problem with nuclear norm minimization and propose an effective optimization algorithm based on proximal gradient method. With the learned metric for image retrieval, some experimental evaluations on a real-world dataset demonstrate the outperformance of our approach over other related work.
Published: 2012
Full Text: View/download PDF

42. Point-context descriptor based region search for logo recognition

Author: Hanqing Lu, Jianlong Fu, Yifan Zhang, and Jinqiao Wang
Subjects: Logo recognition, Point (typography), Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Mode (statistics), Context (language use), Logo, Inverted index, Image (mathematics), Constraint (information theory), Computer vision, Artificial intelligence, business
Abstract: We propose a novel approach for logo recognition in this paper. Firstly, we adopt a point-context descriptor for region modeling, which is a highly correlated integration of three kinds of features: point, shape, and patch. Secondly, after the query image is segmented into region trees, an asymmetric region-to-image search approach is utilized to visual logo recognition. A weak geometric constraint based on regions is encoded into the inverted file structures to accelerate the search speed. Then we apply global features to refine the results in the re-ranking stage. Finally, we combine each region score both in max-response mode and accumulate-response mode to obtain the final results. To evaluate the performance, we test the proposed approach both on our challenging logo dataset and the Flickr_Logos dataset. Experiments and comparisons show that our approach is superior to the state-of-the-art approaches.
Published: 2012
Full Text: View/download PDF

43. Chat with illustration

Author: Hanqing Lu, Changsheng Xu, Zechao Li, Yu Jiang, and Jing Liu
Subjects: User studies, Scheme (programming language), World Wide Web, Service (systems architecture), Index (publishing), Computer science, First language, Language barrier, Context (language use), computer, Sentence, computer.programming_language
Abstract: Traditional instant messaging service mainly transfers textual message, while the visual message is ignored to a great extent. In this paper, we propose a novel instant messaging scheme with visual aids named Chat with Illustration (CWI), which presents users visual messages associated with chat content automatically. When users start their chat, the system first identifies meaningful keywords from dialogue content and analyzes context relations. Then CWI explores keyword-based image search in a image database with cluster-based index. Finally, according to context relations, CWI assembles these images properly and presents an optimal visual message for each dialogue sentence. With the combination of textual and visual message, users could enjoy a more interesting and vivid communication experience. Especially for different native language speakers, CWI can help them cross language barrier to some degree. The in-depth user studies demonstrate the effectiveness of our approach.
Published: 2012
Full Text: View/download PDF

44. Multiple features fusion for crowd density estimation

Author: Hanqing Lu, Zhenchong Wang, Jinqiao Wang, and Zi Ye
Subjects: Fusion, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Wavelet transform, Pattern recognition, Rule-based system, Support vector machine, Background noise, Robustness (computer science), Computer vision, Artificial intelligence, Crowd density, business, Statistic
Abstract: Crowd density estimation, is much valuable in intelligent crowd monitoring. The traditional approach based on static texture analysis of single frame, is not adept to complex background, and the rule based statistic approaches are short of robustness for background noise. In this paper, a crowd density estimation approach fusing statistic features and texture analysis was proposed. After extracting foreground objects with frame difference, we learn SVM classifiers with GLCM and statistical features. The experiment results show the superiority of the proposed method and it can be applied in a complex environment.
Published: 2012
Full Text: View/download PDF

45. Ordinal preserving projection

Author: Jing Liu, Changsheng Xu, Hanqing Lu, Yan Liu, Changsheng Li, and Qingshan Liu
Subjects: Computer science, business.industry, Dimensionality reduction, Pattern recognition, computer.software_genre, Image (mathematics), Data set, Ranking, Learning to rank, Artificial intelligence, Data mining, Projection (set theory), business, computer, Subspace topology, Curse of dimensionality
Abstract: Learning to rank has been demonstrated as a powerful tool for image ranking, but the issue of the "curse of dimensionality" is a key challenge of learning a ranking model from a large image database. This paper proposes a novel dimensionality reduction algorithm named ordinal preserving projection (OPP) for learning to rank. We first define two matrices, which work in the row direction and column direction respectively. The two matrices aim at leveraging the global structure of the data set and ordinal information of the observations. By maximizing the corresponding objective functions, we can obtain two optimal projection matrices mapping original data points into low-dimensional subspace, in which both global structure and ordinal information can be preserved. The experiments are conducted on the public available MSRA-MM image data set and "Web Queries" image data set, and the experimental results demonstrate the effectiveness of the proposed method.
Published: 2012
Full Text: View/download PDF

46. News contextualization with geographic and visual information

Author: Changsheng Xu, Jing Liu, Meng Wang, Zechao Li, and Hanqing Lu
Subjects: Set (abstract data type), Contextualization, Information retrieval, Computer science, Reading (process), media_common.quotation_subject, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Relevance (information retrieval), Matrix decomposition, Image (mathematics), media_common
Abstract: In this paper, we investigate the contextualization of news documents with geographic and visual information. We propose a matrix factorization approach to analyze the location relevance for each news document. We also propose a method to enrich the document with a set of web images. For location relevance analysis, we first perform toponym extraction and expansion to obtain a toponym list from news documents. We then propose a matrix factorization method to estimate the location-document relevance scores while simultaneously capturing the correlation of locations and documents. For image enrichment, we propose a method to generate multiple queries from each news document for image search and then employ an intelligent fusion approach to collect a set of images from the search results. Based on the location relevance analysis and image enrichment, we introduce a news browsing system named NewsMap which can support users in reading news via browsing a map and retrieving news with location queries. The news documents with the corresponding enriched images are presented to help users quickly get information. Extensive experiments demonstrate the effectiveness of our approaches.
Published: 2011
Full Text: View/download PDF

47. Snap & play

Author: Changsheng Xu, Si Liu, Hanqing Lu, Jian Dong, Shuicheng Yan, and Qiang Chen
Subjects: Entertainment, Game design, Video game development, Multimedia, Computer science, ComputingMilieux_PERSONALCOMPUTING, computer.software_genre, Game Developer, Adaptation (computer science), computer, Mobile device
Abstract: According to the year 2010 report of the Entertainment Software Association [5], 42% of USA heads of households reported playing games on mobile devices, rising quickly from the 20% in 2002 and bringing huge market for mobile games. In this paper, by taking the popular game, Find-the-Difference (FiDi), as a concrete example, we explore new mobile game design principles and techniques for enhancing player's gaming experience in personalized, automatic, and dynamic aspects. Unlike traditional FiDi game, where image pairs (source image vs. target image) with M different patches are manually produced by game developer and players may feel boring or cheat after practicing all image pairs, our proposed Personalized FiDi (P-FiDi) mobile game may be played under a new Snap & Play mode. The player may first take photos with one s mobile device (or select from one's own albums). Then, these photos serve as source images, and the P-FiDi system automatically generates the counterpart target images by sequential operations of aesthetic image quality enhancement, image patch and differentiating style joint selection, music adaptation, dynamic difficulty level determination, and ultimate automatic image editing with a rich set of popular differentiating styles used in traditional FiDi game. Finally, the player enjoys the unique gaming with one's own (instant) photos and music, and the freedom to have new gaming image pairs any time. The user studies show that the P-FiDi mobile game is satisfying in terms of player experience.
Published: 2011
Full Text: View/download PDF

48. Specific vehicle detection and tracking in road environment

Author: Yang Zhang, Hanqing Lu, Jinqiao Wang, Wei Fu, and Huazhong Xu
Subjects: Background subtraction, Svm classifier, Kernel (image processing), Computer science, business.industry, Vehicle detection, Histogram, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer vision, Artificial intelligence, business
Abstract: In this paper, we propose a real-time method to detect and track specific vehicles, toward monitoring the abnormal activities in the traffic environment. Firstly, a novel background subtraction approach is used to get the accurate foreground segmentations and shadow suppression. Then a HIK (Histogram Intersection Kernel) based SVM classifier is trained to recognize whether a vehicle is suspicious. Finally, the Camshift based tracking is used to fast track the specific vehicles. Experiments in a real traffic scenario show the promise of the proposed approach.
Published: 2011
Full Text: View/download PDF

49. Feature selection under learning to rank model for multimedia retrieve

Author: Ling Shao, Changsheng Xu, Hanqing Lu, and Changsheng Li
Subjects: Multimedia, business.industry, Computer science, Dimensionality reduction, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Feature extraction, Pattern recognition, Feature selection, computer.software_genre, Feature model, Feature (computer vision), Ranking SVM, Learning to rank, Artificial intelligence, business, Feature learning, computer
Abstract: Most multimedia retrieval problem can be described by ranking model, i.e. the images in the database could be ranked according to the similarity compared with the query image. Existing ranking models generally use the features that are pre-defined by experts. This paper utilized machine learning techniques to automatically select useful features for ranking. We first generate a set of feature subsets by putting each feature into an individual feature subset. Then we sort these feature subsets according to the ranking performances. Third, two neighbor feature subsets in the ranked order are pairwised to generate a new feature subset. The new feature subsets are sorted based on the new ranking performance. Iterate until reach the pre-defined stop point. Experimental results on .gov dataset and Caltech101 development set show the effectiveness and efficiency of the proposed algorithm.
Published: 2010
Full Text: View/download PDF

50. Multi-modal multi-correlation person-centric news retrieval

Author: Jing Liu, Zechao Li, Xiaobin Zhu, and Hanqing Lu
Subjects: Correlation, Information retrieval, Modal, Ranking, Event (computing), Computer science, Face (geometry), InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Natural language, Image (mathematics), Ranking (information retrieval), Visualization
Abstract: In this paper, we propose a framework of multi-modal multi-correlation person-centric news retrieval, which integrates news event correlations, news entity correlations, and event-entity correlations simultaneously by exploring both text and image information. The proposed framework is confined to a person-name query and enables a more vivid and informative person-centric news retrieval by providing two views of result presentation, namely a query-oriented multi-correlation map and a ranking list of news items with necessary descriptions including news image, news title and summary, central entities and relevant news events. First, we pre-process news articles using natural language techniques, and initialize the three correlations by statistical analysis about events and entities in news articles and face images. Second, a Multi-correlation Probabilistic Matrix Factorization (MPMF) algorithm is proposed to complete and refine the three correlations. Different from traditional Probabilistic Matrix Factorization (PMF), the proposed MPFM additionally considers the event correlations and the entity correlations as well as the event-entity correlations during the factor analysis. Third, the result ranking and visualization are conducted to present search results relevant to a target news topic. Experimental results on a news dataset collected from multiple news websites demonstrate the attractive performance of the proposed solution for news retrieval.
Published: 2010
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

77 results on '"HanQing Lu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources