Author: "Ling Shao" / Publisher: acm - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Ling Shao"' showing total 34 results

Start Over Author "Ling Shao" Publisher acm

34 results on '"Ling Shao"'

1. From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to-Point Decoder

Author: Hang Dai, Ling Shao, Yong Ding, and Jiale Li
Subjects: FOS: Computer and information sciences, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Point cloud, computer.software_genre, Object (computer science), Object detection, Voxel, Region of interest, Code (cryptography), Point (geometry), Computer vision, Segmentation, Artificial intelligence, business, computer
Abstract: In this paper, we present an Intersection-over-Union (IoU) guided two-stage 3D object detector with a voxel-to-point decoder. To preserve the necessary information from all raw points and maintain the high box recall in voxel based Region Proposal Network (RPN), we propose a residual voxel-to-point decoder to extract the point features in addition to the map-view features from the voxel based RPN. We use a 3D Region of Interest (RoI) alignment to crop and align the features with the proposal boxes for accurately perceiving the object position. The RoI-Aligned features are finally aggregated with the corner geometry embeddings that can provide the potentially missing corner information in the box refinement stage. We propose a simple and efficient method to align the estimated IoUs to the refined proposal boxes as a more relevant localization confidence. The comprehensive experiments on KITTI and Waymo Open Dataset demonstrate that our method achieves significant improvements with novel architectures against the existing methods. The code is available on Github URL\footnote{\url{https://github.com/jialeli1/From-Voxel-to-Point}}., Comment: This is a pre-print of our paper published in proceedings of the 29th ACM International Conference on Multimedia (MM'21)
Published: 2021

2. Anchor-free 3D Single Stage Detector with Mask-Guided Attention for Point Cloud

Author: Jiale Li, Ling Shao, Hang Dai, and Yong Ding
Subjects: FOS: Computer and information sciences, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Detector, Computer Science - Computer Vision and Pattern Recognition, Point cloud, Construct (python library), Object (computer science), Object detection, Convolution, Minimum bounding box, Feature (computer vision), Computer vision, Artificial intelligence, business
Abstract: Most of the existing single-stage and two-stage 3D object detectors are anchor-based methods, while the efficient but challenging anchor-free single-stage 3D object detection is not well investigated. Recent studies on 2D object detection show that the anchor-free methods also are of great potential. However, the unordered and sparse properties of point clouds prevent us from directly leveraging the advanced 2D methods on 3D point clouds. We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps. We propose an attentive module to fit the sparse feature maps to dense mostly on the object regions through the deformable convolution tower and the supervised mask-guided attention. By directly regressing the 3D bounding box from the enhanced and dense feature maps, we construct a novel single-stage 3D detector for point clouds in an anchor-free manner. We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression. Our code is publicly available at \url{https://github.com/jialeli1/MGAF-3DSSD}., This is a pre-print of our paper published in proceedings of the 29th ACM International Conference on Multimedia (MM'21)
Published: 2021

3. Surpassing Real-World Source Training Data: Random 3D Characters for Generalizable Person Re-Identification

Author: Ling Shao, Shengcai Liao, and Yanan Wang
Subjects: FOS: Computer and information sciences, Training set, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Training (meteorology), 020207 software engineering, 02 engineering and technology, Machine learning, computer.software_genre, Clothing, Re identification, Camera network, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), 020201 artificial intelligence & image processing, Artificial intelligence, business, Texture mapping, computer
Abstract: Person re-identification has seen significant advancement in recent years. However, the ability of learned models to generalize to unknown target domains still remains limited. One possible reason for this is the lack of large-scale and diverse source training data, since manually labeling such a dataset is very expensive and privacy sensitive. To address this, we propose to automatically synthesize a large-scale person re-identification dataset following a set-up similar to real surveillance but with virtual environments, and then use the synthesized person images to train a generalizable person re-identification model. Specifically, we design a method to generate a large number of random UV texture maps and use them to create different 3D clothing models. Then, an automatic code is developed to randomly generate various different 3D characters with diverse clothes, races and attributes. Next, we simulate a number of different virtual environments using Unity3D, with customized camera networks similar to real surveillance systems, and import multiple 3D characters at the same time, with various movements and interactions along different paths through the camera networks. As a result, we obtain a virtual dataset, called RandPerson, with 1,801,816 person images of 8,000 identities. By training person re-identification models on these synthesized person images, we demonstrate, for the first time, that models trained on virtual data can generalize well to unseen target images, surpassing the models trained on various real-world datasets, including CUHK03, Market-1501, DukeMTMC-reID, and almost MSMT17. The RandPerson dataset is available at https://github.com/VideoObjectSearch/RandPerson., Comment: This is the ACMMM 2020 version, including the appendix
Published: 2020

4. Deep Local Binary Coding for Person Re-Identification by Delving into the Details

Author: Ling Shao, Fan Zhu, Jiaxin Chen, Yichao Yan, Li Liu, Jie Qin, and Lei Huang
Subjects: Matching (statistics), Theoretical computer science, Computer science, Binary number, 020206 networking & telecommunications, 02 engineering and technology, Mutual information, Set (abstract data type), Discriminative model, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Binary code, Block (data storage)
Abstract: Person re-identification (ReID) has recently received extensive research interests due to its diverse applications in multimedia analysis and computer vision. However, the majority of existing works focus on improving matching accuracy, while ignoring matching efficiency. In this work, we present a novel binary representation learning framework for efficient person ReID, namely Deep Local Binary Coding (DLBC). Different from existing deep binary ReID approaches, DLBC attempts to learn discriminative binary codes by explicitly interacting with local visual details. Specifically, DLBC first extracts a set of local features from spatially salient regions of pedestrian images. Subsequently, DLBC formulates a new binary-local semantic mutual information (BSMI) maximization term, based on which a self-lifting (SL) block is built to further exploit the semantic importance of local features. The BSMI term together with the SL block simultaneously enhances the dependency of binary codes on selected local features as well as their robustness to cross-view visual inconsistency. In addition, an efficient optimizing method is developed to train the proposed deep models with orthogonal and binary constraints. Extensive experiments reveal that DLBC significantly minimizes the accuracy gap between binary ReID methods and the state-of-the-art real-valued ones, whilst remarkably reducing query time and memory cost.
Published: 2020

5. Box Guided Convolution for Pedestrian Detection

Author: Ling Shao, Jinpeng Li, Shengcai Liao, and Hangzhi Jiang
Subjects: 0209 industrial biotechnology, Forcing (recursion theory), Computer science, business.industry, Pedestrian detection, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Inference, Pattern recognition, 02 engineering and technology, Convolution, 020901 industrial engineering & automation, Kernel (image processing), Bounding overwatch, 0202 electrical engineering, electronic engineering, information engineering, False positive paradox, 020201 artificial intelligence & image processing, Artificial intelligence, Performance improvement, business
Abstract: Occlusions, scale variation and numerous false positives still represent fundamental challenges in pedestrian detection. Intuitively, different sizes of receptive fields and more attention to the visible parts are required for detecting pedestrians with various scales and occlusion levels, respectively. However, these challenges have not been addressed well by existing pedestrian detectors. This paper presents a novel convolutional network, denoted as box guided convolution network (BGCNet), to tackle these challenges simultaneously in a unified framework. In particular, we proposed a box guided convolution (BGC) that can dynamically adjust the sizes of convolution kernels guided by the predicted bounding boxes. In this way, BGCNet provides position-aware receptive fields to address the challenge of large variations of scales. In addition, for the issue of heavy occlusion, the kernel parameters of BGC are spatially localized around the salient and mostly visible key points of a pedestrian, such as the head and foot, to effectively capture high-level semantic features to help detection. Furthermore, a local maximum (LM) loss is introduced to depress false positives and highlight true positives by forcing positives, rather than negatives, as local maximums, without any additional inference burden. We evaluate BGCNet on popular pedestrian detection benchmarks, and achieve the state-of-the-art results, with the significant performance improvement on heavily occluded and small-scale pedestrians.
Published: 2020

6. K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering

Author: Xiaopeng Hong, Gen Luo, Ling Shao, Yiyi Zhou, Jinsong Su, Xiaoshuai Sun, Xinghao Ding, and Rongrong Ji
Subjects: Network architecture, Theoretical computer science, Computer science, Computation, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Modal, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), Question answering, Graph (abstract data type), Layer (object-oriented design), Architecture, 0105 earth and related environmental sciences
Abstract: In this paper, we propose a cross-modal network architecture search (NAS) algorithm for VQA, termed as k-Armed Bandit based NAS (KAB-NAS). KAB-NAS regards the design of each layer as a k-armed bandit problem and updates the preference of each candidate via numerous samplings in a single-shot search framework. To establish an effective search space, we further propose a new architecture termed Automatic Graph Attention Network (AGAN), and extend the popular self-attention layer with three graph structures, denoted as dense-graph, co-graph and separate-graph.These graph layers are used to form the direction of information propagation in the graph network, and their optimal combinations are searched by KAB-NAS. To evaluate KAB-NAS and AGAN, we conduct extensive experiments on two VQA benchmark datasets, i.e., VQA2.0 and GQA, and also test AGAN with the popular BERT-style pre-training. The experimental results show that with the help of KAB-NAS, AGAN can achieve the state-of-the-art performance on both benchmark datasets with much fewer parameters and computations.
Published: 2020

7. MetaNER: Named Entity Recognition with Meta-Learning

Author: Shuo Shang, Jing Li, and Ling Shao
Subjects: 0301 basic medicine, Training set, Meta learning (computer science), Computer science, business.industry, 02 engineering and technology, Overfitting, Machine learning, computer.software_genre, Sequence labeling, Domain (software engineering), 03 medical and health sciences, 030104 developmental biology, Named-entity recognition, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), Artificial intelligence, business, computer
Abstract: Recent neural architectures in named entity recognition (NER) have yielded state-of-the-art performance on single domain data such as newswires. However, they still suffer from (i) requiring massive amounts of training data to avoid overfitting; (ii) huge performance degradation when there is a domain shift in the data distribution between training and testing. In this paper, we investigate the problem of domain adaptation for NER under homogeneous and heterogeneous settings. We propose MetaNER, a novel meta-learning approach for domain adaptation in NER. Specifically, MetaNER incorporates meta-learning and adversarial training strategies to encourage robust, general and transferable representations for sequence labeling. The key advantage of MetaNER is that it is capable of adapting to new unseen domains with a small amount of annotated data from those domains. We extensively evaluate MetaNER on multiple datasets under homogeneous and heterogeneous settings. The experimental results show that MetaNER achieves state-of-the-art performance against eight baselines. Impressively, MetaNER surpasses the in-domain performance using only 16.17% and 34.76% of target domain data on average for homogeneous and heterogeneous settings, respectively.
Published: 2020

8. Generative Reconstructive Hashing for Incomplete Video Analysis

Author: Fumin Shen, Ling Shao, Zhen Wei, Heng Tao Shen, Jingyi Zhang, Ionut Cosmin Duta, Fan Zhu, Li Liu, and Xing Xu
Subjects: Computer science, business.industry, Feature vector, Motion blur, Hash function, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Action (philosophy), Discriminative model, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Generative grammar, 0105 earth and related environmental sciences
Abstract: In the literature of video analysis, most researches, such as retrieval and recognition, hypothesize that each input video contains at least one complete semantic entity, e.g. an activity, action and event.However, this hypothesis does not hold in many realistic scenarios due to two main reasons. First, complete videos whose qualities are good enough for automatic analysis are not always accessible because of heavy motion blur, occlusions, interruptions, etc. % Second, extracting features from complete videos always fails to meet up with speed and storage requirements in large-scale use cases.To tackle these challenges, incomplete videos are more useful, but researches on them are seldom mentioned. In this paper, we propose a novel and effective hashing framework specialized in large-scale incomplete video analysis called Generative Reconstructive Hashing (GRH). To begin with, an adversarial generative network that is specially designed to map incomplete video features to the feature distributions of complete videos, so that features of incomplete videos become indistinguishable from those of complete videos. Then, the discriminative hashing module further fills the gap between full video features and estimated features from partial videos by projecting both features into a common binary feature space, which allows improvement in efficiency compared with real-value based methods. GRH is the first end-to-end framework for incomplete video analysis. Extensive experiments on various datasets demonstrate GRH's superior effectiveness and efficiency on retrieval and recognition tasks. GRH outperforms the recent state-of-the-art methods by 5.44/3.22/4.82 in terms of MAPs on HMDB51/UCF101/CCV datasets, respectively.
Published: 2019

9. Learning to Synthesize 3D Indoor Scenes from Monocular Images

Author: Li Liu, Yi Fang, Fumin Shen, Jin Xie, Fan Zhu, and Ling Shao
Subjects: Spatial contextual awareness, Monocular, Computer science, business.industry, Perspective (graphical), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, Pattern recognition, 02 engineering and technology, Object (computer science), Convolutional neural network, Object detection, 0202 electrical engineering, electronic engineering, information engineering, RGB color model, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: Depth images have always been playing critical roles for indoor scene understanding problems, and are particularly important for tasks in which 3D inferences are involved. However, since depth images are not universally available, abandoning them from the testing stage can significantly improve the generality of a method. In this work, we consider the scenarios where depth images are not available in the testing data, and propose to learn a convolutional long short-term memory (Conv LSTM) network and a regression convolutional neural network (regression ConvNet) using only monocular RGB images. The proposed networks benefit from 2D segmentations, object-level spatial context, object-scene dependencies and objects' geometric information, where optimization is governed by the semantic label loss, which measures the label consistencies of both objects and scenes, and the 3D geometrical loss, which measures the correctness of objects' 6Dof estimation. Conv LSTM and regression ConvNet are applied to scene/object classification, object detection and 6Dof estimation tasks respectively, where we utilize the joint inference from both networks and further provide the perspective of synthesizing fully rigged 3D scenes according to objects' arrangements in monocular images. Both quantitative and qualitative experimental results are provided on the NYU-v2 dataset, and we demonstrate that the proposed Conv LSTM can achieve state-of-the-art performance without requiring the depth information.
Published: 2018

10. Learning to Recognise Unseen Classes by A Few Similes

Author: Ling Shao and Yang Long
Subjects: Caltech 101, Contextual image classification, business.industry, Computer science, Cognitive neuroscience of visual object recognition, Inference, 02 engineering and technology, 010501 environmental sciences, Space (commercial competition), Machine learning, computer.software_genre, 01 natural sciences, Class (biology), Annotation, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, Embedding, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, 0105 earth and related environmental sciences
Abstract: Existing image classification systems often suffer from re-training models for novel unseen classes. Zero-shot learning (ZSL) aims to recognise these unseen classes directly using trained models with a further inference procedure. However, existing approaches highly rely on human-defined class-attribute associations to achieve the inference, which significantly increases the annotation cost. This paper aims to address ZSL on non-attribute tasks, i.e. only training images with labels are used as most of the supervised settings. Our main contributions are: 1) to circumvent expensive attributes, we propose to use semantic similes that directly indicate the unseen-to-seen associations; 2) a novel similarity-based representation is proposed to represent both visual images and semantic similes in a unified embedding space; 3) in order to reduce the annotation cost, we use only a few similes to infer a class-level prototype for each unseen class. On two popular benchmarks, AwA and aPY, extensive experiments manifest that our method can significantly improve the state-of-the-art results using only two similes for each unseen class. Furthermore, we revisit the Caltech 101 dataset without attributes. Our ZSL results can exceed that of previous supervised methods.
Published: 2017

11. Deep Self-taught Hashing for Image Retrieval

Author: Ling Shao, Ke Zhou, Yu Liu, Jingkuan Song, Fuhao Zou, Lingyu Yan, and Li Liu
Subjects: business.industry, Computer science, Universal hashing, Dynamic perfect hashing, Hash function, Pattern recognition, Machine learning, computer.software_genre, Hash table, Computer Science Applications, Hopscotch hashing, Locality-sensitive hashing, Human-Computer Interaction, Control and Systems Engineering, Artificial intelligence, Feature hashing, Electrical and Electronic Engineering, business, computer, Extendible hashing, Software, Double hashing, Information Systems
Abstract: Hashing algorithm has been widely used to speed up image retrieval due to its compact binary code and fast distance calculation. The combination with deep learning boosts the performance of hashing by learning accurate representations and complicated hashing functions. So far, the most striking success in deep hashing have mostly involved discriminative models, which require labels. To apply deep hashing on datasets without labels, we propose a deep self-taught hashing algorithm (DSTH), which generates a set of pseudo labels by analyzing the data itself, and then learns the hash functions for novel data using discriminative deep models. Furthermore, we generalize DSTH to support both supervised and unsupervised cases by adaptively incorporating label information. We use two different deep learning framework to train the hash functions to deal with out-of-sample problem and reduce the time complexity without loss of accuracy. We have conducted extensive experiments to investigate different settings of DSTH, and compared it with state-of-the-art counterparts in six publicly available datasets. The experimental results show that DSTH outperforms the others in all datasets.
Published: 2015

12. Cross-Modality Submodular Dictionary Learning for Information Retrieval

Author: Ling Shao, Fan Zhu, and Mengyang Yu
Subjects: Set (abstract data type), Information retrieval, K-SVD, Optimization problem, Computer science, Computer Science::Information Retrieval, Feature vector, Greedy algorithm, Image (mathematics), Submodular set function, Term (time)
Abstract: This paper addresses the problem of joint modeling of multimedia components in different media forms. We consider the information retrieval task across both text and image documents, which includes retrieving relevant images that closely match the description in a text query and retrieving text documents that best explain the content of an image query. A greedy dictionary construction approach is introduced for learning an isomorphic feature space, to which cross-modality data can be adapted while data smoothness is guaranteed. The proposed objective function consists of two reconstruction error terms for both modalities and a Maximum Mean Discrepancy (MMD) term that measures the cross-modality discrepancy. Optimization of the reconstruction terms and the MMD term yields a compact and modality-adaptive dictionary pair. We formulate the joint combinatorial optimization problem by maximizing variance reduction over a candidate signal set while constraining the dictionary size and coefficients' sparsity. By exploiting the submodularity and the monotonicity property of the proposed objective function, the optimization problem can be solved by a highly efficient greedy algorithm, and is guaranteed to be at least a (e - 1)=/e≈0.632- approximation to the optimum. The proposed method achieves state-of-the-art performance on the Wikipedia dataset.
Published: 2014

13. Multimodal Dynamic Networks for Gesture Recognition

Author: Ling Shao and Di Wu
Subjects: Deep belief network, Computer science, Gesture recognition, Speech recognition, Layer (object-oriented design), Sign language, Perceptron, Feature learning
Abstract: Multimodal input is a real-world situation in gesture recognition applications such as sign language recognition. In this paper, we propose a novel bi-modal (audio and skeleton joints) dynamic network for gesture recognition. First, state-of-the-art dynamic Deep Belief Networks are deployed to extract high level audio and skeletal joints representations. Then, instead of traditional late fusion, we adopt another layer of perceptron for cross modality learning taking the input from each individual net's penultimate layer. Finally, to account for temporal dynamics, the learned shared representations are used for estimating the emission probability to infer action sequences. In particular, we demonstrate that multimodal feature learning will extract semantically meaningful shared representations, outperforming individual modalities, and the early fusion scheme's efficacy against the traditional method of late fusion.
Published: 2014

14. Session details: Oral session 1

Author: Ling Shao
Subjects: medicine.medical_specialty, medicine, Medical physics, Session (computer science), Psychology
Published: 2013

15. Session details: Oral session 2

Author: Ling Shao
Subjects: medicine.medical_specialty, medicine, Medical physics, Session (computer science), Psychology
Published: 2013

16. Building holistic descriptors for scene recognition

Author: Xuelong Li, Li Liu, and Ling Shao
Subjects: Optimization problem, business.industry, Computer science, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Cognitive neuroscience of visual object recognition, Genetic programming, Pattern recognition, Machine learning, computer.software_genre, Multi-objective optimization, Set (abstract data type), Range (mathematics), Tree (data structure), Feature descriptor, Artificial intelligence, business, computer
Abstract: Real-world scene recognition has been one of the most challenging research topics in computer vision, due to the tremendous intraclass variability and the wide range of scene categories. In this paper, we successfully apply an evolutionary methodology to automatically synthesize domain-adaptive holistic descriptors for the task of scene recognition, instead of using hand-tuned descriptors. We address this as an optimization problem by using multi-objective genetic programming (MOGP). Specifically, a set of primitive operators and filters are first randomly assembled in theMOGP framework as tree-based combinations, which are then evaluated by two objective fitness criteria i.e., the classification error and the tree complexity. Finally, the best-so-far solution selected by MOGP is regarded as the (near-)optimal feature descriptor for scene recognition. We have evaluated our approach on three realistic scene datasets: MIT urban and nature, SUN and UIUC Sport. Experimental results consistently show that our MOGP-generated descriptors achieve significantly higher recognition accuracies compared with state-of-the-art hand-crafted and machine-learned features.
Published: 2013

17. The third ACM international workshop on interactive multimedia on mobile and portable devices (IMMPD'13)

Author: Jiebo Luo, Ling Shao, Caifeng Shan, and Minoru Etoh
Subjects: World Wide Web, Multimedia, Computer science, business.industry, Pattern recognition (psychology), User interface, computer.software_genre, Speech processing, business, computer, Interactive media
Abstract: With the mobile and portable devices become ubiquitous for people's daily life, how to design user interfaces of these products that enable natural, intuitive and fun interaction is one of the main challenges the multimedia community is facing. Following previous successful events, the third ACM International workshop on Interactive Multimedia on Mobile and Portable Devices (IMMPD'13) aims to bring together researchers from both academia and industry in domains including computer vision, audio and speech processing, machine learning, pattern recognition, communications, human-computer interaction, and media technology to share and discuss recent advances in interactive multimedia.
Published: 2013

18. One shot learning gesture recognition with Kinect sensor

Author: Hui Zhang, Ling Shao, Fan Zhu, and Di Wu
Subjects: Gesture recognition, business.industry, Computer science, Sketch recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Feature (machine learning), Computer vision, Artificial intelligence, One-shot learning, Representation (mathematics), business, Motion (physics), Gesture
Abstract: Gestures are both natural and intuitive for Human-Computer-Interaction (HCI) and the one-shot learning scenario is one of the real world situations in terms of gesture recognition problems. In this demo, we present a hand gesture recognition system using the Kinect sensor, which addresses the problem of one-shot learning gesture recognition with a user-defined training and testing system. Such a system can behave like a remote control where the user can allocate a specific function using a prefered gesture by performing it only once. To adopt the gesture recognition framework, the system first automatically segments an action sequence into atomic tokens, and then adopts the Extended-Motion-History-Image (Extended-MHI) for motion feature representation. We evaluate the performance of our system quantitatively in Chalearn Gesture Challenge, and apply it to a virtual one shot learning gesture recognition system.
Published: 2012

19. The second ACM international workshop on interactive multimedia on mobile and Portable devices

Author: Ling Shao, Caifeng Shan, and Minoru Etoh
Subjects: World Wide Web, Multimedia, business.industry, Computer science, Pattern recognition (psychology), User interface, computer.software_genre, Speech processing, business, computer, Interactive media
Abstract: When mobile and portable devices become ubiquitous for people's daily life, how to design multimedia user interfaces of these products that enable natural, intuitive and fun interaction is one of the main challenges the multimedia community is facing. Following several successful events, the 2nd ACM International workshop on Interactive Multimedia on Mobile and Portable Devices (IMMPD'12) aims to bring together researchers from both academia and industry in domains including computer vision, audio and speech processing, machine learning, pattern recognition, communications, human-computer interaction, and media technology to share and discuss recent advances in interactive multimedia.
Published: 2012

20. ACM international workshop on interactive multimedia on mobile and portable devices (IMMPD'11)

Author: Caifeng Shan, Ling Shao, Minoru Etoh, and Jiebo Luo
Subjects: Multimedia, Computer science, Human–computer interaction, business.industry, Pattern recognition (psychology), Natural (music), User interface, Speech processing, computer.software_genre, business, computer, Interactive media
Abstract: With the mobile and portable devices become ubiquitous for people's daily life, how to design user interfaces of these products that enable natural, intuitive and fun interaction is one of the main challenges the multimedia community is facing. Following several successful events, the ACM International workshop on Interactive Multimedia on Mobile and Portable Devices (IMMPD'11) aims to bring together researchers from both academia and industry in domains including computer vision, audio and speech processing, machine learning, pattern recognition, communications, human-computer interaction, and media technology to share and discuss recent advances in interactive multimedia.
Published: 2011

21. Action retrieval with relevance feedback on YouTube videos

Author: Simon Jones and Ling Shao
Subjects: Information retrieval, Action (philosophy), business.industry, Computer science, Feature (machine learning), Relevance feedback, The Internet, AdaBoost, Pruning (decision trees), business, Cluster analysis, Motion (physics)
Abstract: Content-based retrieval systems are becoming increasingly relevant for managing large multimedia databases, such as those found on the Internet. In this paper, we investigate applying content-based retrieval with relevance feedback to the popular YouTube human action dataset[8], using a variety of methods to extract and compare features, in order to determine the most accurate techniques in this setting. Among other techniques, we explore soft-assignment code-book clustering, feature pruning, motion and static features, Adaboost and ABRS-SVM for relevance feedback. We evaluate the performance of several different systems to find the best combination of techniques for human action retrieval. We demonstrate that existing relevance feedback methods do not work well for YouTube media, and that a naive algorithm consistently outperforms these.
Published: 2011

22. Water reflection recognition via minimizing reflection cost based on motion blur invariant moments

Author: Fu-Lai Chung, Ling Shao, Sheng-hua Zhong, and Yan Liu
Subjects: Contextual image classification, Computer science, business.industry, Feature vector, Motion blur, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Image content, Real image, Reflection symmetry, Computer Science::Computer Vision and Pattern Recognition, Computer vision, Artificial intelligence, Invariant (mathematics), business, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Water reflection, a kind of typical imperfect reflection symmetry problem, plays an important role in image content analysis. However, existing techniques of symmetry recognition cannot recognize water reflection images correctly because of the complex and various distortions caused by water wave. To address this difficulty, we construct a novel feature space which is composed of motion blur invariant moments. Moreover, we propose an efficient detection algorithm to determine the reflection axis in images with water reflection. By experimenting on real image dataset with different tasks, the proposed techniques demonstrate impressive results in the water reflection image classification, the reflection axis detection, and the retrieval of the images with water reflection.
Published: 2011

23. Feature selection under learning to rank model for multimedia retrieve

Author: Ling Shao, Changsheng Xu, Hanqing Lu, and Changsheng Li
Subjects: Multimedia, business.industry, Computer science, Dimensionality reduction, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Feature extraction, Pattern recognition, Feature selection, computer.software_genre, Feature model, Feature (computer vision), Ranking SVM, Learning to rank, Artificial intelligence, business, Feature learning, computer
Abstract: Most multimedia retrieval problem can be described by ranking model, i.e. the images in the database could be ranked according to the similarity compared with the query image. Existing ranking models generally use the features that are pre-defined by experts. This paper utilized machine learning techniques to automatically select useful features for ranking. We first generate a set of feature subsets by putting each feature into an individual feature subset. Then we sort these feature subsets according to the ranking performances. Third, two neighbor feature subsets in the ranked order are pairwised to generate a new feature subset. The new feature subsets are sorted based on the new ranking performance. Iterate until reach the pre-defined stop point. Experimental results on .gov dataset and Caltech101 development set show the effectiveness and efficiency of the proposed algorithm.
Published: 2010

24. DMATiler

Author: Ling Shao, John Kevin Patrick O'Brien, Tong Chen, Huoding Li, Tao Liu, Haibo Lin, and Lakshminarayanan Renganarayana
Subjects: Loop unrolling, Loop fission, Loop inversion, Computer science, Loop fusion, Loop nest optimization, Loop interchange, Parallel computing, Cache-oblivious algorithm, Loop tiling
Abstract: In this paper we present the design and implementation of a DMATiler which combines compiler analysis and runtime management to optimize local memory performance. In traditional cache model based loop tiling optimizations, the compiler approximates runtime cache misses as the number of distinct cache lines touched by a loop nest. In contrast, the DMATiler has the full control of the addresses, sizes, and sequences of data transfers. DMATiler uses a simplified DMA performance model to formulate the cost model for DMA-tiled loop nests, then solves it using a custom gradient descent algorithm with heuristics guided by DMA characteristics. Given a loop nest, DMATiler uses loop interchange to make the loop order more friendlier for data movements. Moreover, DMATiler applies compressed data buffer and advanced DMA command to further optimize data transfers. We have implemented the DMATiler in the IBM XL C/C++ for Multi-core Acceleration for Linux, and have conducted experiments with a set of loop nest benchmarks. The results show DMATiler is much more efficient than software controlled cache (average speedup of 9.8x) and single level loop blocking (average speedup of 6.2x) on the Cell BE processor.
Published: 2010

25. A set of co-occurrence matrices on the intrinsic manifold of human silhouettes for action recognition

Author: Feng Zheng, Ling Shao, and Zhan Song
Subjects: Computer science, business.industry, Graph embedding, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Co-occurrence, Pattern recognition, Manifold, Set (abstract data type), ComputingMethodologies_PATTERNRECOGNITION, Computer Science::Computer Vision and Pattern Recognition, Histogram, Action recognition, Computer vision, Artificial intelligence, business, Representation (mathematics), Spatial analysis
Abstract: Recognizing actions from a monocular video is a very hot topic in computer vision recently. In this paper, we propose a new representation of actions on the intrinsic shape manifold learned by various graph embedding algorithms. The co-occurrence matrices descriptor captures more temporal information than the histogram descriptor which only considers the spatial information. In addition, we compare the performance of the co-occurrence matrices descriptor on different manifolds learned by various graph embedding methods. The results show that nonlinear algorithms are more robust than linear ones. Furthermore, we conclude that label information plays a critical role in learning more discriminating manifolds.
Published: 2010

26. Feature detector and descriptor evaluation in human action recognition

Author: Riccardo Mattivi and Ling Shao
Subjects: business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Scale-invariant feature transform, Pattern recognition, Feature description, Field (computer science), Feature (computer vision), Bag-of-words model, Action recognition, Computer vision, Artificial intelligence, business, Feature detection
Abstract: In this paper, we evaluate and compare different feature detection and feature description methods for part-based approaches in human action recognition. Different methods have been proposed in the literature for both feature detection of space-time interest points and description of local video patches. It is however unclear which method performs better in the field of human action recognition. We compare, in the feature detection section, Dollar's method [18], Laptev's method [22], a bank of 3D-Gabor filters [6] and a method based on Space-Time Differences of Gaussians. We also compare and evaluate different descriptors such as Gradient [18], HOG-HOF [22], 3D SIFT [24] and an enhanced version of LBP-TOP [15]. We show the combination of Dollar's detection method and the improved LBP-TOP descriptor to be computationally efficient and to reach the best recognition accuracy on the KTH database.
Published: 2010

27. Eigen-space learning using semi-supervised diffusion maps for human action recognition

Author: Feng Zheng, Ling Shao, and Zhan Song
Subjects: Computer science, Property (programming), business.industry, Dimensionality reduction, Nonlinear dimensionality reduction, Diffusion map, Pattern recognition, Silhouette, Computer Science::Computer Vision and Pattern Recognition, Histogram, Trajectory, Computer vision, Artificial intelligence, business, Eigenvalues and eigenvectors
Abstract: Human actions can be seen as a trajectory in the eigen-space of silhouette of the human body. In this paper, the silhouette is firstly denoted as a vector using R-transform. Then, we exploit semi-supervised diffusion maps (SSDM) for dimensionality reduction and learning the eigen-space of the silhouette. Semi-supervised diffusion maps characterizes the spatiotemporal property of the action, as well as to preserve much of the local geometric structure and label information. We use the K-nearest neighbor classifier for recognizing actions represented as histograms of occurrence of the silhouette in the eigen-space. Experimental results show that the proposed approach performs significantly better than other manifold learning based action recognition techniques.
Published: 2010

28. A descriptor combining MHI and PCOG for human motion classification

Author: Ling Ji and Ling Shao
Subjects: business.industry, Computer science, Local binary patterns, GLOH, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Human motion, ComputingMethodologies_PATTERNRECOGNITION, Histogram of oriented gradients, Robustness (computer science), Computer Science::Computer Vision and Pattern Recognition, Feature descriptor, Computer vision, Artificial intelligence, business, Correlogram
Abstract: The performance of human motion classification and recognition systems is highly dependent on the distinctiveness and robustness of the feature descriptor. In this paper, a new descriptor containing motion, shape and spatial layout information is proposed, therefore it is more effective for action modeling and is suitable for detecting and recognizing a variety of actions. Experiments show that the proposed descriptor outperforms other existing methods, such as Moment Invariants and Histogram of Oriented Gradients, on recognizing human motions in an indoor environment with a stationary camera.
Published: 2010

29. Spatio-temporal shape contexts for human action retrieval

Author: Ling Shao and Yuanjia Du
Subjects: Brightness, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer vision, Pattern recognition, Artificial intelligence, Invariant (mathematics), business, Mathematics
Abstract: A novel method using the combination of spatio-temporal interest points and spatio-temporal shape contexts for human action description and retrieval is proposed. Both the feature points detection and description are invariant to geometric and photometric changes and intra-class variations. The experimental results show that the proposed descriptor is more effective than brightness gradients based motion descriptors for classifying and retrieving challenging human actions.
Published: 2009

30. 1st ACM international workshop on interactive multimedia for consumer electronics (IMCE'09)

Author: Ling Shao, Caifeng Shan, Minoru Etoh, and Jiebo Luo
Subjects: Multimedia, Human–computer interaction, business.industry, Computer science, Context awareness, Electronics, User interface, Speech processing, business, computer.software_genre, computer, Interactive media, Haptic technology
Abstract: The ACM International workshop on Interactive Multimedia for Consumer Electronics (IMCE) aims to bring together researchers from both academia and industry in domains including computer vision, machine learning, audio and speech processing, communications, artificial intelligence and media technology to share and discuss recent advances in interactive user interfaces and multimedia applications. Multimedia interaction is becoming a technology applied in many consumer electronics devices and can make user interfaces more intuitive and controllable. Multiple modalities including audio, video and haptics can be utilized and fused for media interaction.
Published: 2009

31. DBDB

Author: Haibo Lin, Tao Liu, Ling Shao, Tong Chen, and John Kevin Patrick O'Brien
Subjects: Hardware_MEMORYSTRUCTURES, Computer science, Working set, Cache-only memory architecture, Uniform memory access, Parallel computing, Loop tiling, computer.software_genre, Memory map, Non-uniform memory access, Operating system, Cache, computer, Compile time
Abstract: In heterogeneous multi-core systems, such as the Cell BE or certain embedded systems, the accelerator core has its own fast local memory without hardware supported coherence. It is software's responsibility to dynamically transfer the working set when the total data set is too large to fit in the local memory. The data can be transferred through a software controlled cache which maintains correctness and exploits reuse among references, especially when complicated aliasing or data dependence exists. However, the software controlled cache introduces the extra overhead of cache lookup. In this paper we present the design and implementation of a Direct Blocking Data Buffer (DBDB) which combines compiler analysis and runtime management to optimize local memory utilization. We use compile time analysis to identify regular references in a loop body, block the innermost loop according to the access patterns and available local memory space, insert DMA operations for the blocked loop, and substitute references to local buffers. The runtime is responsible for allocating local memory for DBDB, especially for disambiguating aliased memory accesses which could not be resolved at compile time. We further optimize noncontiguous references by taking advantage of the DMA-list feature provided by the Cell BE. A practical performance model is presented to guide the DMA transfer scheme selection among single-DMA, multi-DMA and DMA-list. We have implemented DBDB in the IBM XL C/C++ for Multicore Acceleration for Linux, and have conducted experiments with selected test cases from the NAS OpenMP and SPEC benchmarks. The results show that our method performs well compared with traditional software cache approach. We have observed a speedup of up to 5.3x and 4x in average.
Published: 2009

32. Proceedings of the 3rd ACM international workshop on Interactive multimedia on mobile & portable devices, IMMPD@ACM Multimedia 2013, Barcelona, Spain, October 22, 2013

Author: Jiebo Luo, Caifeng Shan, Ling Shao 0001, and Minoru Etoh
Published: 2013
Full Text: View/download PDF

33. Proceedings of the 2nd ACM international workshop on Interactive multimedia on mobile and portable devices, IMMPD@ACM Multimedia 2012, Nara, Japan, November 2, 2012

Author: Ling Shao 0001, Caifeng Shan, and Minoru Etoh
Published: 2012
Full Text: View/download PDF

34. Proceedings of the 2011 international ACM workshop on Interactive multimedia on mobile and portable devices, IMMPD@ACM Multimedia 2011, Scottsdale, AZ, USA, November 29, 2011

Author: Jiebo Luo, Caifeng Shan, Ling Shao 0001, and Minoru Etoh
Published: 2011
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

34 results on '"Ling Shao"'

1. From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to-Point Decoder

2. Anchor-free 3D Single Stage Detector with Mask-Guided Attention for Point Cloud

3. Surpassing Real-World Source Training Data: Random 3D Characters for Generalizable Person Re-Identification

4. Deep Local Binary Coding for Person Re-Identification by Delving into the Details

5. Box Guided Convolution for Pedestrian Detection

6. K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering

7. MetaNER: Named Entity Recognition with Meta-Learning

8. Generative Reconstructive Hashing for Incomplete Video Analysis

9. Learning to Synthesize 3D Indoor Scenes from Monocular Images

10. Learning to Recognise Unseen Classes by A Few Similes

11. Deep Self-taught Hashing for Image Retrieval

12. Cross-Modality Submodular Dictionary Learning for Information Retrieval

13. Multimodal Dynamic Networks for Gesture Recognition

14. Session details: Oral session 1

15. Session details: Oral session 2

16. Building holistic descriptors for scene recognition

17. The third ACM international workshop on interactive multimedia on mobile and portable devices (IMMPD'13)

18. One shot learning gesture recognition with Kinect sensor

19. The second ACM international workshop on interactive multimedia on mobile and Portable devices

20. ACM international workshop on interactive multimedia on mobile and portable devices (IMMPD'11)

21. Action retrieval with relevance feedback on YouTube videos

22. Water reflection recognition via minimizing reflection cost based on motion blur invariant moments

23. Feature selection under learning to rank model for multimedia retrieve

24. DMATiler

25. A set of co-occurrence matrices on the intrinsic manifold of human silhouettes for action recognition

26. Feature detector and descriptor evaluation in human action recognition

27. Eigen-space learning using semi-supervised diffusion maps for human action recognition

28. A descriptor combining MHI and PCOG for human motion classification

29. Spatio-temporal shape contexts for human action retrieval

30. 1st ACM international workshop on interactive multimedia for consumer electronics (IMCE'09)

31. DBDB

32. Proceedings of the 3rd ACM international workshop on Interactive multimedia on mobile & portable devices, IMMPD@ACM Multimedia 2013, Barcelona, Spain, October 22, 2013

33. Proceedings of the 2nd ACM international workshop on Interactive multimedia on mobile and portable devices, IMMPD@ACM Multimedia 2012, Nara, Japan, November 2, 2012

34. Proceedings of the 2011 international ACM workshop on Interactive multimedia on mobile and portable devices, IMMPD@ACM Multimedia 2011, Scottsdale, AZ, USA, November 29, 2011

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

34 results on '"Ling Shao"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources