Author: "Guo-Jun Qi" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Guo-Jun Qi"' showing total 432 results

Start Over Author "Guo-Jun Qi"

432 results on '"Guo-Jun Qi"'

201. Interleaved Structured Sparse Convolutional Neural Networks

Author: Jingdong Wang, Jianhuang Lai, Richang Hong, Guo-Jun Qi, Ting Zhang, and Guotian Xie
Subjects: Computer science, business.industry, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, Convolution, Kernel (linear algebra), Kernel (image processing), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Algorithm, 0105 earth and related environmental sciences, Sparse matrix
Abstract: In this paper, we study the problem of designing efficient convolutional neural network architectures with the interest in eliminating the redundancy in convolution kernels. In addition to structured sparse kernels, low-rank kernels and the product of low-rank kernels, the product of structured sparse kernels, which is a framework for interpreting the recently-developed interleaved group convolutions (IGC) and its variants (e.g., Xception), has been attracting increasing interests. Motivated by the observation that the convolutions contained in a group convolution in IGC can be further decomposed in the same manner, we present a modularized building block, IGC-V2: interleaved structured sparse convolutions. It generalizes interleaved group convolutions, which is composed of two structured sparse kernels, to the product of more structured sparse kernels, further eliminating the redundancy. We present the complementary condition and the balance condition to guide the design of structured sparse kernels, obtaining a balance among three aspects: model size, computation complexity and classification accuracy. Experimental results demonstrate the advantage on the balance among these three aspects compared to interleaved group convolutions and Xception, and competitive performance compared to other state-of-the-art architecture design methods.
Published: 2018

202. CapProNet: Deep Feature Learning via Orthogonal Projections onto Capsule Subspaces

Author: Zhang, L., Edraki, M., and Guo-Jun Qi
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we formalize the idea behind capsule nets of using a capsule vector rather than a neuron activation to predict the label of samples. To this end, we propose to learn a group of capsule subspaces onto which an input feature vector is projected. Then the lengths of resultant capsules are used to score the probability of belonging to different classes. We train such a Capsule Projection Network (CapProNet) by learning an orthogonal projection matrix for each capsule subspace, and show that each capsule subspace is updated until it contains input feature vectors corresponding to the associated class. We will also show that the capsule projection can be viewed as normalizing the multiple columns of the weight matrix simultaneously to form an orthogonal basis, which makes it more effective in incorporating novel components of input features to update capsule representations. In other words, the capsule projection can be viewed as a multi-dimensional weight normalization in capsule subspaces, where the conventional weight normalization is simply a special case of the capsule projection onto 1D lines. Only a small negligible computing overhead is incurred to train the network in low-dimensional capsule subspaces or through an alternative hyper-power iteration to estimate the normalization matrix. Experiment results on image datasets show the presented model can greatly improve the performance of the state-of-the-art ResNet backbones by $10-20\%$ and that of the Densenet by $5-7\%$ respectively at the same level of computing and memory expenses. The CapProNet establishes the competitive state-of-the-art performance for the family of capsule nets by significantly reducing test errors on the benchmark datasets., Liheng Zhang, Marzieh Edraki, Guo-Jun Qi. CapProNet: Deep Feature Learning via Orthogonal Projections onto Capsule Subspaces, in Proccedings of Thirty-second Conference on Neural Information Processing Systems (NIPS 2018), Palais des Congr\`es de Montr\'eal, Montr\'eal, Canda, December 3-8, 2018
Published: 2018

203. Weakly Supervised Facial Attribute Manipulation via Deep Adversarial Network

Author: Baoxin Li, Guo-Jun Qi, Jiliang Tang, Yilin Wang, and Suhang Wang
Subjects: Parsing, Discriminator, Pixel, Artificial neural network, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, computer.software_genre, Semantics, 01 natural sciences, Expression (mathematics), Image (mathematics), Visualization, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, 0105 earth and related environmental sciences
Abstract: Automatically manipulating facial attributes is challenging because it needs to modify the facial appearances, while keeping not only the person's identity but also the realism of the resultant images. Unlike the prior works on the facial attribute parsing, we aim at an inverse and more challenging problem called attribute manipulation by modifying a facial image in line with a reference facial attribute. Given a source input image and reference images with a target attribute, our goal is to generate a new image (i.e., target image) that not only possesses the new attribute but also keeps the same or similar content with the source image. In order to generate new facial attributes, we train a deep neural network with a combination of a perceptual content loss and two adversarial losses, which ensure the global consistency of the visual content while implementing the desired attributes often impacting on local pixels. The model automatically adjusts the visual attributes on facial appearances and keeps the edited images as realistic as possible. The evaluation shows that the proposed model can provide a unified solution to both local and global facial attribute manipulation such as expression change and hair style transfer. Moreover, we further demonstrate that the learned attribute discriminator can be used for attribute localization.
Published: 2018

204. Automated Pulmonary Nodule Detection: High Sensitivity with Few Candidates

Author: Yongdong Zhang, Lixi Deng, Liheng Zhang, Bin Wang, Guo-Jun Qi, and Sheng Tang
Subjects: Computer science, business.industry, 020207 software engineering, Pattern recognition, 02 engineering and technology, medicine.disease, 030218 nuclear medicine & medical imaging, 03 medical and health sciences, 0302 clinical medicine, Feature (computer vision), Pulmonary nodule, 0202 electrical engineering, electronic engineering, information engineering, medicine, False positive paradox, Pyramid (image processing), Sensitivity (control systems), Artificial intelligence, Lung cancer, business, True positive rate
Abstract: Automated pulmonary nodule detection plays an important role in lung cancer diagnosis. In this paper, we propose a pulmonary detection framework that can achieve high sensitivity with few candidates. First, the Feature Pyramid Network (FPN), which leverages multi-level features, is applied to detect nodule candidates that cover almost all true positives. Then redundant candidates are removed by a simple but effective Conditional 3-Dimensional Non-Maximum Suppression (Conditional 3D-NMS). Moreover, a novel Attention 3D CNN (Attention 3D-CNN) which efficiently utilizes contextual information is proposed to further remove the overwhelming majority of false positives. The proposed method yields a sensitivity of $95.8\%$ at 2 false positives per scan on the LUng Nodule Analysis 2016 (LUNA16) dataset, which is competitive compared to the current published state-of-the-art methods.
Published: 2018

205. Generalized Loss-Sensitive Adversarial Learning with Manifold Margins

Author: Guo-Jun Qi and Marzieh Edraki
Subjects: Generalization, Computer science, 02 engineering and technology, 010501 environmental sciences, Net (mathematics), 01 natural sciences, Measure (mathematics), Manifold, law.invention, Ambient space, law, Margin (machine learning), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Algorithm, Manifold (fluid mechanics), Distribution (differential geometry), 0105 earth and related environmental sciences
Abstract: The classic Generative Adversarial Net and its variants can be roughly categorized into two large families: the unregularized versus regularized GANs. By relaxing the non-parametric assumption on the discriminator in the classic GAN, the regularized GANs have better generalization ability to produce new samples drawn from the real distribution. It is well known that the real data like natural images are not uniformly distributed over the whole data space. Instead, they are often restricted to a low-dimensional manifold of the ambient space. Such a manifold assumption suggests the distance over the manifold should be a better measure to characterize the distinct between real and fake samples. Thus, we define a pullback operator to map samples back to their data manifold, and a manifold margin is defined as the distance between the pullback representations to distinguish between real and fake samples and learn the optimal generators. We justify the effectiveness of the proposed model both theoretically and empirically.
Published: 2018

206. A Study of Question Effectiveness Using Reddit 'Ask Me Anything' Threads

Author: Arumae, K., Guo-Jun Qi, and Liu, F.
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Asking effective questions is a powerful social skill. In this paper we seek to build computational models that learn to discriminate effective questions from ineffective ones. Armed with such a capability, future advanced systems can evaluate the quality of questions and provide suggestions for effective question wording. We create a large-scale, real-world dataset that contains over 400,000 questions collected from Reddit "Ask Me Anything" threads. Each thread resembles an online press conference where questions compete with each other for attention from the host. This dataset enables the development of a class of computational models for predicting whether a question will be answered. We develop a new convolutional neural network architecture with variable-length context and demonstrate the efficacy of the model by comparing it with state-of-the-art baselines and human judges., Comment: 6 pages
Published: 2018
Full Text: View/download PDF

207. Large-scale Bisample Learning on ID Versus Spot Face Recognition

Author: Xiangyu Zhu, Hao Liu, Hailin Shi, Zhen Lei, Stan Z. Li, Fan Yang, Guo-Jun Qi, and Dong Yi
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Deep learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, Machine learning, computer.software_genre, Facial recognition system, Class (biology), Artificial Intelligence, Face (geometry), Softmax function, Pattern recognition (psychology), Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Scale (map), computer, Software
Abstract: In real-world face recognition applications, there is a tremendous amount of data with two images for each person. One is an ID photo for face enrollment, and the other is a probe photo captured on spot. Most existing methods are designed for training data with limited breadth (a relatively small number of classes) and sufficient depth (many samples for each class). They would meet great challenges on ID versus Spot (IvS) data, including the under-represented intra-class variations and an excessive demand on computing devices. In this paper, we propose a deep learning based large-scale bisample learning (LBL) method for IvS face recognition. To tackle the bisample problem with only two samples for each class, a classification-verification-classification (CVC) training strategy is proposed to progressively enhance the IvS performance. Besides, a dominant prototype softmax (DP-softmax) is incorporated to make the deep learning scalable on large-scale classes. We conduct LBL on a IvS face dataset with more than two million identities. Experimental results show the proposed method achieves superior performance to previous ones, validating the effectiveness of LBL on IvS face recognition., Comment: Accepted by special issue on Deep Learning for Face Analysis. International Journal of Computer Vision (IJCV), 2019
Published: 2018
Full Text: View/download PDF

208. Prior-Knowledge and Attention-based Meta-Learning for Few-Shot Learning

Author: Weiguo Zhang, Xiangyu Zhu, Zhen Lei, Jingping Shi, Yunxiao Qin, Zezheng Wang, Guo-Jun Qi, and Chenxu Zhao
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Information Systems and Management, Meta learning (computer science), Process (engineering), Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, Space (commercial competition), Machine learning, computer.software_genre, Management Information Systems, Machine Learning (cs.LG), Artificial Intelligence, 020204 information systems, Generalization (learning), 0202 electrical engineering, electronic engineering, information engineering, Representation (mathematics), business.industry, Cognition, Metric (mathematics), Key (cryptography), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Software
Abstract: Recently, meta-learning has been shown as a promising way to solve few-shot learning. In this paper, inspired by the human cognition process which utilizes both prior-knowledge and vision attention in learning new knowledge, we present a novel paradigm of meta-learning approach with three developments to introduce attention mechanism and prior-knowledge for meta-learning. In our approach, prior-knowledge is responsible for helping meta-learner expressing the input data into high-level representation space, and attention mechanism enables meta-learner focusing on key features of the data in the representation space. Compared with existing meta-learning approaches that pay little attention to prior-knowledge and vision attention, our approach alleviates the meta-learner's few-shot cognition burden. Furthermore, a Task-Over-Fitting (TOF) problem, which indicates that the meta-learner has poor generalization on different K-shot learning tasks, is discovered and we propose a Cross-Entropy across Tasks (CET) metric to model and solve the TOF problem. Extensive experiments demonstrate that we improve the meta-learner with state-of-the-art performance on several few-shot learning benchmarks, and at the same time the TOF problem can also be released greatly., Comment: 15 pages
Published: 2018
Full Text: View/download PDF

209. Sharp Attention Network via Adaptive Sampling for Person Re-identification

Author: Rongxin Jiang, Xian-Sheng Hua, Guo-Jun Qi, Zhongming Jin, Hongwei Yong, Chen Shen, and Yaowu Chen
Subjects: FOS: Computer and information sciences, Adaptive sampling, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Computer Science - Computer Vision and Pattern Recognition, Sampling (statistics), Pattern recognition, Bernoulli sampling, 02 engineering and technology, Convolutional neural network, Discriminative model, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, Media Technology, 020201 artificial intelligence & image processing, Differentiable function, Artificial intelligence, Electrical and Electronic Engineering, business
Abstract: In this paper, we present novel sharp attention networks by adaptively sampling feature maps from convolutional neural networks (CNNs) for person re-identification (re-ID) problem. Due to the introduction of sampling-based attention models, the proposed approach can adaptively generate sharper attention-aware feature masks. This greatly differs from the gating-based attention mechanism that relies soft gating functions to select the relevant features for person re-ID. In contrast, the proposed sampling-based attention mechanism allows us to effectively trim irrelevant features by enforcing the resultant feature masks to focus on the most discriminative features. It can produce sharper attentions that are more assertive in localizing subtle features relevant to re-identifying people across cameras. For this purpose, a differentiable Gumbel-Softmax sampler is employed to approximate the Bernoulli sampling to train the sharp attention networks. Extensive experimental evaluations demonstrate the superiority of this new sharp attention model for person re-ID over the other state-of-the-art methods on three challenging benchmarks including CUHK03, Market-1501, and DukeMTMC-reID., Comment: accepted by IEEE Transactions on Circuits and Systems for Video Technology(T-CSVT)
Published: 2018
Full Text: View/download PDF

210. Large-scale supervised similarity learning in networks

Author: Charu C. Aggarwal, Jiayu Zhou, Yingzhen Yang, Thomas S. Huang, Shiyu Chang, Guo-Jun Qi, and Meng Wang
Subjects: Computer science, business.industry, Node (networking), Context (language use), 02 engineering and technology, Semi-supervised learning, Recommender system, Machine learning, computer.software_genre, Human-Computer Interaction, Similarity (network science), SimRank, Artificial Intelligence, Hardware and Architecture, 020204 information systems, Hinge loss, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Software, Similarity learning, Information Systems
Abstract: The problem of similarity learning is relevant to many data mining applications, such as recommender systems, classification, and retrieval. This problem is particularly challenging in the context of networks, which contain different aspects such as the topological structure, content, and user supervision. These different aspects need to be combined effectively, in order to create a holistic similarity function. In particular, while most similarity learning methods in networks such as SimRank utilize the topological structure, the user supervision and content are rarely considered. In this paper, a factorized similarity learning (FSL) is proposed to integrate the link, node content, and user supervision into a uniform framework. This is learned by using matrix factorization, and the final similarities are approximated by the span of low-rank matrices. The proposed framework is further extended to a noise-tolerant version by adopting a hinge loss alternatively. To facilitate efficient computation on large-scale data, a parallel extension is developed. Experiments are conducted on the DBLP and CoRA data sets. The results show that FSL is robust and efficient and outperforms the state of the art. The code for the learning algorithm used in our experiments is available at http://www.ifp.illinois.edu/~chang87/.
Published: 2015

211. Ontological Random Forests for Image Classification

Author: Thomas S. Huang, Weiyao Lin, Ning Xu, Jiangping Wang, and Guo-Jun Qi
Subjects: Contextual image classification, business.industry, media_common.quotation_subject, Decision tree, Machine learning, computer.software_genre, Random forest, Semantic similarity, Perception, Ontology, Leverage (statistics), Artificial intelligence, business, computer, media_common, Mathematics
Abstract: Previous image classification approaches mostly neglect semantics, which has two major limitations. First, categories are simply treated independently while in fact they have semantic overlaps. For example, “sedan” is a specific kind of “car”. Therefore, it's unreasonable to train a classifier to distinguish between “sedan” and “car”. Second, image feature representations used for classifying different categories are the same. However, the human perception system is believed to use different features for different objects. In this paper, we leverage semantic ontologies to solve the aforementioned problems. The authors propose an ontological random forest algorithm where the splitting of decision trees are determined by semantic relations among categories. Then hierarchical features are automatically learned by multiple-instance learning to capture visual dissimilarities at different concept levels. Their approach is tested on two image classification datasets. Experimental results demonstrate that their approach not only outperforms state-of-the-art results but also identifies semantic visual features.
Published: 2015

212. Breaking the Barrier to Transferring Link Information across Networks

Author: Thomas S. Huang, Charu C. Aggarwal, and Guo-Jun Qi
Subjects: Dynamic network analysis, Social network, business.industry, Computer science, Inference, computer.software_genre, Graph, Computer Science Applications, Network formation, Network simulation, Computational Theory and Mathematics, Graph (abstract data type), Data mining, business, computer, Information Systems
Abstract: Link prediction is one of the most fundamental problems in graph modeling and mining. It has been studied in a wide range of scenarios, from uncovering missing links between different entities in databases, to recommending relations between people in social networks. In this problem, we wish to predict unseen links in a growing target network by exploiting existing structures in source networks. Most of the existing methods often assume that abundant links are available in the target network to build a model for link prediction. However, in many scenarios, the target network may be too sparse to enable robust inference process, which makes link prediction challenging with the paucity of link data. On the other hand, in many cases, other (more densely linked) auxiliary networks can be available that contains similar link structure relevant to that in the target network. The linkage information in the existing networks can be used in conjunction with the node attribute information in both networks in order to make more accurate link recommendations. Thus, this paper proposes the use of learning methods to perform link inference by transferring the link information from the source network to the target network. We also note that the source network may contain the link information irrelevant to the target network. This leads to cross-network bias between the networks, which makes the link model built upon the source network misaligned with the link structure of the target network. Therefore, we re-sample the source network to rectify such cross-network bias by maximizing the cross-network relevance measured by the node attributes, as well as preserving as rich link information as possible to avoid the loss of source link structure caused by the re-sampling algorithm. The link model based on the re-sampled source network can make more accurate link predictions on the target network with aligned link structures across the networks. We present experimental results illustrating the effectiveness of the approach.
Published: 2015

213. A low-cost, ligand exchange-free strategy to synthesize large-grained Cu2ZnSnS4 thin-films without a fine-grain underlayer from nanocrystals

Author: Hao Gong, Tang Jiao Huang, Guo-Jun Qi, Xuesong Yin, and Chunhua Tang
Subjects: Formamide, Materials science, Renewable Energy, Sustainability and the Environment, Annealing (metallurgy), Nanotechnology, General Chemistry, engineering.material, symbols.namesake, chemistry.chemical_compound, Nanocrystal, chemistry, Oleylamine, symbols, engineering, General Materials Science, CZTS, Kesterite, Thin film, Raman spectroscopy
Abstract: The first direct synthesis of CZTS nanocrystals in a formamide solvent system without using long hydrocarbon chain organic ligands is reported. The kesterite CZTS nanocrystals possess a mean size of 5.2 ± 1.2 nm. No secondary phases have been detected within the known limitations of XRD and Raman measurements. Experimental evidence suggests that excess S2− is present on the surface of the nanocrystals, accounting for their dispersibility in polar solvents. The nanocrystals also exhibit a smaller weight loss of 8.7% at 500 °C compared to 24.4% for those capped by oleylamine. A description of the formation of CZTS FA nanocrystals and the role of formamide during synthesis is proposed. Annealing of spin-coated nanocrystal thin-films highlighted the difficulty of forming dense films from loose nanocrystal films. This work shows that this can be overcome using compaction with a combination of a reasonably soft metal and silicone. A means to compact the film uniformly on a centimeter scale with reduced delamination is thus demonstrated. Annealed compacted films possess crystal grains with a favorable size on the order of microns. More significantly, a large-grain layer is formed without an unwanted residual fine-grain underlayer. The absence of a fine-grain underlayer shows that this ligand exchange-free strategy is effective in resolving a key challenge associated with the nanocrystal approach of making CZTS thin-films while simultaneously being low-cost and having a smaller environmental footprint. The strategy presented here is equally applicable to other nanocrystal approaches requiring the synthesis of dense thin-films from nanocrystal films.
Published: 2015

214. Accretion disks around naked singularities.

Author: Guo, Jun-Qi, Joshi, Pankaj S, Narayan, Ramesh, and Zhang, Lin
Subjects: *SCHWARZSCHILD black holes, *BLACK holes, *ENERGY conversion, *THERMAL properties, *STELLAR luminosity function, *ACCRETION disks
Abstract: We investigate here the thermal properties of accretion disks in a spacetime for some galactic density profiles in spherical symmetry. The matter distributions have a finite outer radius with a naked central singularity. The luminosities of the accretion disks for some density profile models are found to be higher than those for a Schwarzschild black hole of the same mass. The slopes for the luminosity distributions with respect to frequencies are significantly different, especially at higher frequencies, from that in the Schwarzschild black hole case. Such features may be used to distinguish black holes from naked singularities. The efficiencies for the conversion of mass energy of the accreting gas into radiation and the strength of naked singularities are analyzed. The novel feature that we find is, the strength of the singularity is different depending on the profiles considered, and the stronger the singularity is, the higher is the efficiency for the accretion disk. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

215. Interleaved Group Convolutions

Author: Bin Xiao, Ting Zhang, Guo-Jun Qi, and Jingdong Wang
Subjects: Contextual image classification, Artificial neural network, Computer science, business.industry, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, Convolution, Kernel (image processing), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Algorithm, Group theory, 0105 earth and related environmental sciences
Abstract: In this paper, we present a simple and modularized neural network architecture, named interleaved group convolutional neural networks (IGCNets). The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution. The two group convolutions are complementary: (i) the convolution on each partition in primary group convolution is a spatial convolution, while on each partition in secondary group convolution, the convolution is a point-wise convolution; (ii) the channels in the same secondary partition come from different primary partitions. We discuss one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved. We also show that regular convolutions, group convolution with summation fusion, and the Xception block are special cases of interleaved group convolutions. Empirical results over standard benchmarks, CIFAR-10, CIFAR-100, SVHN and ImageNet demonstrate that our networks are more efficient in using parameters and computation complexity with similar or higher accuracy.
Published: 2017

216. Mixture Factorized Ornstein-Uhlenbeck Processes for Time-Series Forecasting

Author: Jingdong Wang, Jiliang Tang, Guo-Jun Qi, and Jiebo Luo
Subjects: Series (mathematics), Process (engineering), Ornstein–Uhlenbeck process, 02 engineering and technology, White noise, Autoregressive model, 020204 information systems, Market data, 0202 electrical engineering, electronic engineering, information engineering, Econometrics, 020201 artificial intelligence & image processing, Time series, Brownian motion, Mathematics
Abstract: Forecasting the future observations of time-series data can be performed by modeling the trend and fluctuations from the observed data. Many classical time-series analysis models like Autoregressive model (AR) and its variants have been developed to achieve such forecasting ability. While they are often based on the white noise assumption to model the data fluctuations, a more general Brownian motion has been adopted that results in Ornstein-Uhlenbeck (OU) process. The OU process has gained huge successes in predicting the future observations over many genres of time series, however, it is still limited in modeling simple diffusion dynamics driven by a single persistent factor that never evolves over time. However, in many real problems, a mixture of hidden factors are usually present, and when and how frequently they appear or disappear are unknown ahead of time. This imposes a challenge that inspires us to develop a Mixture Factorized OU process (MFOUP) to model evolving factors. The new model is able to capture the changing states of multiple mixed hidden factors, from which we can infer their roles in driving the movements of time series. We conduct experiments on three forecasting problems, covering sensor and market data streams. The results show its competitive performance on predicting future observations and capturing evolution patterns of hidden factors as compared with the other algorithms.
Published: 2017

217. Context-patch based face hallucination via thresholding locality-constrained representation and reproducing learning

Author: Jiayi Ma, Junjun Jiang, Yi Yu, Guo-Jun Qi, Akiko Aizawa, and Suhua Tang
Subjects: Face hallucination, business.industry, Computer science, Deep learning, Stability (learning theory), 020207 software engineering, Pattern recognition, Context (language use), 02 engineering and technology, Iterative reconstruction, Thresholding, Robustness (computer science), Face (geometry), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Image resolution
Abstract: Face hallucination, which refers to predicting a HighResolution (HR) face image from an observed Low-Resolution (LR) one, is a challenging problem. Most state-of-the-arts employ local face structure prior to estimate the optimal representations for each patch by the training patches of the same position, and achieve good reconstruction performance. However, they do not take into account the contextual information of image patch, which is very useful for the expression of human face. Different from position-patch based methods, in this paper we leverage the contextual information and develop a robust and efficient context-patch face hallucination algorithm, called Thresholding Locality-constrained Representation with Reproducing learning (TLcR-RL). In TLcR-RL, we use a thresholding strategy to enhance the stability of patch representation and the reconstruction accuracy. Additionally, we develop a reproducing learning to iteratively enhance the estimated result by adding the estimated HR face to the training set. Experiments demonstrate that the performance of our proposed framework has a substantial increase when compared to state-of-the-arts, including recently proposed deep learning based method.
Published: 2017

218. Concurrence-Aware Long Short-Term Sub-Memories for Person-Person Action Recognition

Author: Yan Song, Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Liyan Zhang, and Zechao Li
Subjects: Focus (computing), business.industry, Computer science, Frame (networking), Concurrence, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Motion (physics), Term (time), Human–computer interaction, Dynamics (music), Pattern recognition (psychology), 0202 electrical engineering, electronic engineering, information engineering, Action recognition, 020201 artificial intelligence & image processing, Artificial intelligence, business, 0105 earth and related environmental sciences
Abstract: Recently, Long Short-Term Memory (LSTM) has become a popular choice to model individual dynamics for single-person action recognition. However, existing RNN models only focus on capturing the temporal dynamics of the person-person interactions by naively combining the activity dynamics of individuals or modeling them as a whole. This neglects the inter-related dynamics of how person-person interactions change over time. To this end, we propose a novel Concurrent Long Short-Term Memories (Co-LSTM) to model the long-term inter-related dynamics between two interacting people on the bonding boxes covering people. Specifically, for each frame, two sub-memory units store individual motion information, while a concurrent LSTM unit selectively integrates and stores inter-related motion information between interacting people from these two sub-memory units via a new co-memory cell. In experiments, we show the superior performance of Co-LSTM compared with the state-of-the-arts methods.
Published: 2017

219. Temporal Domain Neural Encoder for Video Representation Learning

Author: Zhaowen Wang, Zhe Lin, Hao Hu, Guo-Jun Qi, and Joon-Young Lee
Subjects: Network architecture, business.industry, Computer science, Feature extraction, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Visualization, Recurrent neural network, Discriminative model, Pattern recognition (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Encoder, Feature learning, 0105 earth and related environmental sciences
Abstract: We address the challenge of learning good video representations by explicitly modeling the relationship between visual concepts in time space. We propose a novel Temporal Preserving Recurrent Neural Network (TPRNN) that extracts and encodes visual dynamics with frame-level features as input. The proposed network architecture captures temporal dynamics by keeping track of the ordinal relationship of co-occurring visual concepts, and constructs video representations with their temporal order patterns. The resultant video representations effectively encode temporal information of dynamic patterns, which makes them more discriminative to human actions performed with different sequences of action patterns. We evaluate the proposed model on several real video datasets, and the results show that it successfully outperforms the baseline models. In particular, we observe significant improvement on action classes that can only be distinguished by capturing the temporal orders of action patterns.
Published: 2017

220. CLARE: A Joint Approach to Label Classification and Tag Recommendation

Author: Wang, Y., Wang, S., Tang, J., Guo-Jun Qi, Liu, H., and Li, B.
Subjects: General Medicine
Abstract: Data classification and tag recommendation are both important and challenging tasks in social media. These two tasks are often considered independently and most efforts have been made to tackle them separately. However, labels in data classification and tags in tag recommendation are inherently related. For example, a Youtube video annotated with NCAA, stadium, pac12 is likely to be labeled as football, while a video/image with the class label of coast is likely to be tagged with beach, sea, water and sand. The existence of relations between labels and tags motivates us to jointly perform classification and tag recommendation for social media data in this paper. In particular, we provide a principled way to capture the relations between labels and tags, and propose a novel framework CLARE, which fuses data CLAssification and tag REcommendation into a coherent model. With experiments on three social media datasets, we demonstrate that the proposed framework CLARE achieves superior performance on both tasks compared to the state-of-the-art methods.
Published: 2017

221. LEGO-MM: LEarning Structured Model by Probabilistic loGic Ontology Tree for MultiMedia

Author: Jinhui Tang, Shiyu Chang, Guo-Jun Qi, Qi Tian, Yong Rui, and Thomas S, Huang
Abstract: Recent advances in multimedia ontology have resulted in a number of concept models, e.g., large-scale concept for multimedia and Mediamill 101, which are accessible and public to other researchers. However, most current research effort still focuses on building new concepts from scratch, very few work explores the appropriate method to construct new concepts upon the existing models already in the warehouse. To address this issue, we propose a new framework in this paper, termed LEarning Structured Model by Probabilistic loGic Ontology Tree for MultiM edia (LEGO
Published: 2017

222. Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement

Author: Meng Wang, Ramesh Jain, Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Shuicheng Yan, and Zechao Li
Subjects: User information, Focus (computing), Computer science, Applied Mathematics, computer.software_genre, Visualization, Image (mathematics), Computational Theory and Mathematics, Artificial Intelligence, Tensor (intrinsic definition), Computer Vision and Pattern Recognition, Data mining, Tensor, Cluster analysis, computer, Software
Abstract: Social image tag refinement, which aims to improve tag quality by automatically completing the missing tags and rectifying the noise-corrupted ones, is an essential component for social image search. Conventional approaches mainly focus on exploring the visual and tag information, without considering the user information, which often reveals important hints on the (in)correct tags of social images. Towards this end, we propose a novel tri-clustered tensor completion framework to collaboratively explore these three kinds of information to improve the performance of social image tag refinement. Specifically, the inter-relations among users, images and tags are modeled by a tensor, and the intra-relations between users, images and tags are explored by three regularizations respectively. To address the challenges of the super-sparse and large-scale tensor factorization that demands expensive computing and memory cost, we propose a novel tri-clustering method to divide the tensor into a certain number of sub-tensors by simultaneously clustering users, images and tags into a bunch of tri-clusters. And then we investigate two strategies to complete these sub-tensors by considering (in)dependence between the sub-tensors. Experimental results on a real-world social image database demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
Published: 2017

223. Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities

Author: Guo-Jun Qi
Subjects: FOS: Computer and information sciences, Contextual image classification, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, Function (mathematics), Lipschitz continuity, Adversarial system, Artificial Intelligence, Test set, Pattern recognition (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Algorithm, Software, Generative grammar, Generator (mathematics)
Abstract: In this paper, we present the Lipschitz regularization theory and algorithms for a novel Loss-Sensitive Generative Adversarial Network (LS-GAN). Specifically, it trains a loss function to distinguish between real and fake samples by designated margins, while learning a generator alternately to produce realistic samples by minimizing their losses. The LS-GAN further regularizes its loss function with a Lipschitz regularity condition on the density of real data, yielding a regularized model that can better generalize to produce new data from a reasonable number of training examples than the classic GAN. We will further present a Generalized LS-GAN (GLS-GAN) and show it contains a large family of regularized GAN models, including both LS-GAN and Wasserstein GAN, as its special cases. Compared with the other GAN models, we will conduct experiments to show both LS-GAN and GLS-GAN exhibit competitive ability in generating new images in terms of the Minimum Reconstruction Error (MRE) assessed on a separate test set. We further extend the LS-GAN to a conditional form for supervised and semi-supervised learning problems, and demonstrate its outstanding performance on image classification tasks., The source codes for both LS-GAN and GLS-GAN are available at \url{https://github.com/maple-research-lab}. LS-GAN is also supported by Microsoft CNTK at \url{https://www.cntk.ai/pythondocs/CNTK_206C_WGAN_LSGAN.html}. The original codes of LS-GAN and GLS-GAN are also available at https://github.com/guojunq/lsgan/ and https://github.com/guojunq/glsgan/
Published: 2017

224. Global versus Localized Generative Adversarial Nets

Author: Liheng Zhang, Hao Hu, Marzieh Edraki, Guo-Jun Qi, Xian-Sheng Hua, and Jingdong Wang
Subjects: FOS: Computer and information sciences, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Tangent, 02 engineering and technology, 010501 environmental sciences, Topology, 01 natural sciences, Manifold, Data point, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Orthonormality, Classifier (UML), Subspace topology, 0105 earth and related environmental sciences
Abstract: In this paper, we present a novel localized Generative Adversarial Net (GAN) to learn on the manifold of real data. Compared with the classic GAN that {\em globally} parameterizes a manifold, the Localized GAN (LGAN) uses local coordinate charts to parameterize distinct local geometry of how data points can transform at different locations on the manifold. Specifically, around each point there exists a {\em local} generator that can produce data following diverse patterns of transformations on the manifold. The locality nature of LGAN enables local generators to adapt to and directly access the local geometry without need to invert the generator in a global GAN. Furthermore, it can prevent the manifold from being locally collapsed to a dimensionally deficient tangent subspace by imposing an orthonormality prior between tangents. This provides a geometric approach to alleviating mode collapse at least locally on the manifold by imposing independence between data transformations in different tangent directions. We will also demonstrate the LGAN can be applied to train a robust classifier that prefers locally consistent classification decisions on the manifold, and the resultant regularizer is closely related with the Laplace-Beltrami operator. Our experiments show that the proposed LGANs can not only produce diverse image transformations, but also deliver superior classification performances.
Published: 2017
Full Text: View/download PDF

225. CZTS-based materials and interfaces and their effects on the performance of thin film solar cells

Author: Tang Jiao Huang, Hao Gong, Guo-Jun Qi, and Xuesong Yin
Subjects: Materials science, Nanotechnology, Quantum dot solar cell, Condensed Matter Physics, Copper indium gallium selenide solar cells, Engineering physics, chemistry.chemical_compound, chemistry, Layer interface, General Materials Science, Grain boundary, Thin film solar cell, Plasmonic solar cell, CZTS, Thin film
Abstract: Cu2ZnSnS4 (CZTS) and its related materials such as Cu2ZnSnSe4 (CZTSe) and Cu2ZnSn(S,Se)4 (CZTSSe) have attracted considerable attention as an absorber material for thin film solar cells due to the non-toxicity, elemental abundance, and large production capacity of their constituents. Despite the similarities between CZTS-based materials and Cu(In,Ga)Se2(CIGS), the record efficiency of CZTS-based solar cells remains significantly lower than that of CIGS solar cells. Considering that the difference between the two lies in the choice of the absorber material, the cause of the lower efficiency of CZTS-based solar cells can be isolated to the issues associated with CZTS-based materials and their related interfaces. Herein, these issues and the work done to understand and resolve them is reviewed. Unlike existing review papers, every unique region of CZTS-based solar cells that contributes to its lower efficiency, namely: (1) the bulk of the absorber, (2) the grain boundaries of the absorber, (3) the absorber/buffer layer interface, and (4) the absorber/back contact interface are surveyed. This review also intends to identify the major unresolved issues and the potential improvement approaches of realizing sizable improvements in the solar cells' efficiency, thus providing a guide as to where research efforts should be focused. (© 2014 WILEY-VCH Verlag GmbH &Co. KGaA, Weinheim)
Published: 2014

226. Multi-Label Image Categorization With Sparse Factor Representation

Author: Fuming Sun, Thomas S. Huang, Haojie Li, Guo-Jun Qi, and Jinhui Tang
Subjects: Dependency (UML), Documentation, Machine learning, computer.software_genre, Sensitivity and Specificity, Occam's razor, Pattern Recognition, Automated, Image (mathematics), Correlation, symbols.namesake, Artificial Intelligence, Terminology as Topic, Image Interpretation, Computer-Assisted, Representation (mathematics), Natural Language Processing, Mathematics, Contextual image classification, business.industry, Reproducibility of Results, Pattern recognition, Image Enhancement, Computer Graphics and Computer-Aided Design, Categorization, symbols, Artificial intelligence, business, computer, Algorithms, Software, Test data
Abstract: The goal of multilabel classification is to reveal the underlying label correlations to boost the accuracy of classification tasks. Most of the existing multilabel classifiers attempt to exhaustively explore dependency between correlated labels. It increases the risk of involving unnecessary label dependencies, which are detrimental to classification performance. Actually, not all the label correlations are indispensable to multilabel model. Negligible or fragile label correlations cannot be generalized well to the testing data, especially if there exists label correlation discrepancy between training and testing sets. To minimize such negative effect in the multilabel model, we propose to learn a sparse structure of label dependency. The underlying philosophy is that as long as the multilabel dependency cannot be well explained, the principle of parsimony should be applied to the modeling process of the label correlations. The obtained sparse label dependency structure discards the outlying correlations between labels, which makes the learned model more generalizable to future samples. Experiments on real world data sets show the competitive results compared with existing algorithms.
Published: 2014

227. Supervised Ranking Hash for Semantic Similarity Search

Author: Kien A. Hua, Guo-Jun Qi, Jun Ye, Kai Li, and Tuoerhongjiang Yusuph
Subjects: Computer science, Universal hashing, Dynamic perfect hashing, Hash function, 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Hash table, Locality-sensitive hashing, Ranking SVM, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, Feature hashing, computer, Double hashing, 0105 earth and related environmental sciences
Abstract: The era of big data has spawned unprecedented interests in developing hashing algorithms for their storage efficiency and effectiveness in fast nearest neighbor search in large-scale databases. Most of the existing hash learning algorithms focus on learning hash functions which generate binary codes by numeric quantization of some projected feature space. In this work, we propose a novel hash learning framework that encodes features' ranking orders instead of quantizing their numeric values in a number of optimal low-dimensional ranking subspaces. We formulate the ranking-based hash learning problem as the optimization of a continuous probabilistic error function using softmax approximation and present an efficient learning algorithm to solve the problem. We extensively evaluate the proposed algorithm in several datasets and demonstrate superior performance against several state-of-the-arts.
Published: 2016

228. Induction of microRNA‑let‑7a inhibits lung adenocarcinoma cell growth by regulating cyclin�D1

Author: Zhao, Wei, primary, Hu, Jin‑Xia, additional, Hao, Rui‑Min, additional, Zhang, Qian, additional, Guo, Jun‑Qi, additional, Li, You‑Jie, additional, Xie, Ning, additional, Liu, Lu‑Ying, additional, Wang, Ping‑Yu, additional, Zhang, Can, additional, and Xie, Shu‑Yang, additional
Published: 2018
Full Text: View/download PDF

229. MultiMedia Modeling : 22nd International Conference, MMM 2016, Miami, FL, USA, January 4-6, 2016, Proceedings, Part I

Author: Qi Tian, Nicu Sebe, Guo-Jun Qi, Benoit Huet, Richang Hong, Xueliang Liu, Qi Tian, Nicu Sebe, Guo-Jun Qi, Benoit Huet, Richang Hong, and Xueliang Liu
Subjects: Multimedia systems, Information storage and retrieval systems, Pattern recognition systems, Data mining, Application software
Abstract: The two-volume set LNCS 9516 and LNCS 9517 constitutes the refereed proceedings of the 22nd International Conference on Multimedia Modeling, MMM 2016, held in Miami, FL, USA, in January 2016. The 32 revised full papers and 52 poster papers presented were carefully reviewed and selected from 117 submissions. In addition 20 papers were accepted for five special sessions out of 38 submissions as well as 7 demonstrations (from 11 submissions) and 9 video showcase papers. The papers are organized in topical sections on video content analysis, social media analysis, object recognition and system, multimedia retrieval and ranking, multimedia representation, machine learning in multimedia, and interaction and mobile. The special sessions are: good practices in multimedia modeling; semantics discovery from multimedia big data; perception, aesthetics, and emotion in multimedia quality modeling; multimodal learning and computing for human activity understanding; and perspectives on multimedia analytics.
Published: 2016

230. Cross-Space Affinity Learning with Its Application to Movie Recommendation

Author: Jinhui Tang, Guo-Jun Qi, Changsheng Xu, and Liyan Zhang
Subjects: Theoretical computer science, business.industry, Computer science, Recommender system, Machine learning, computer.software_genre, Computer Science Applications, Set (abstract data type), Kernel (linear algebra), Computational Theory and Mathematics, Product (mathematics), Collaborative filtering, Benchmark (computing), Artificial intelligence, Quadratic programming, Tensor, business, computer, Information Systems
Abstract: In this paper, we propose a novel cross-space affinity learning algorithm over different spaces with heterogeneous structures. Unlike most of affinity learning algorithms on the homogeneous space, we construct a cross-space tensor model to learn the affinity measures on heterogeneous spaces subject to a set of order constraints from the training pool. We further enhance the model with a factorization form which greatly reduces the number of parameters of the model with a controlled complexity. Moreover, from the practical perspective, we show the proposed factorized cross-space tensor model can be efficiently optimized by a series of simple quadratic optimization problems in an iterative manner. The proposed cross-space affinity learning algorithm can be applied to many real-world problems, which involve multiple heterogeneous data objects defined over different spaces. In this paper, we apply it into the recommendation system to measure the affinity between users and the product items, where a higher affinity means a higher rating of the user on the product. For an empirical evaluation, a widely used benchmark movie recommendation data set-MovieLens-is used to compare the proposed algorithm with other state-of-the-art recommendation algorithms and we show that very competitive results can be obtained.
Published: 2013

231. MultiMedia Modeling : 22nd International Conference, MMM 2016, Miami, FL, USA, January 4-6, 2016, Proceedings, Part II

Author: Qi Tian, Nicu Sebe, Guo-Jun Qi, Benoit Huet, Richang Hong, Xueliang Liu, Qi Tian, Nicu Sebe, Guo-Jun Qi, Benoit Huet, Richang Hong, and Xueliang Liu
Subjects: Multimedia systems, Information storage and retrieval systems, Pattern recognition systems, Data mining, Application software
Abstract: The two-volume set LNCS 9516 and 9517 constitutes the thoroughly refereed proceedings of the 22nd International Conference on Multimedia Modeling, MMM 2016, held in Miami, FL, USA, in January 2016.The 32 revised full papers and 52 poster papers were carefully reviewed and selected from 117 submissions. In addition 20 papers were accepted for five special sessions out of 38 submissions as well as 7 demonstrations (from 11 submissions) and 9 video showcase papers. The papers are organized in topical sections on video content analysis, social media analysis, object recognition and system, multimedia retrieval and ranking, multimedia representation, machine learning in multimedia, and interaction and mobile. The special sessions are: good practices in multimedia modeling; semantics discovery from multimedia big data; perception, aesthetics, and emotion in multimedia quality modeling; multimodal learning and computing for human activity understanding; and perspectives on multimedia analytics.
Published: 2015

232. Temporal Order-based First-Take-All Hashing for Fast Attention-Deficit-Hyperactive-Disorder Detection

Author: Guo-Jun Qi, Hao Hu, and Joey Velez-Ginorio
Subjects: business.industry, Computer science, Speech recognition, Nearest neighbor search, Hash function, 02 engineering and technology, Machine learning, computer.software_genre, 03 medical and health sciences, 0302 clinical medicine, Neuroimaging, Order (exchange), mental disorders, 0202 electrical engineering, electronic engineering, information engineering, Attention deficit, Key (cryptography), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, 030217 neurology & neurosurgery
Abstract: Attention Deficit Hyperactive Disorder (ADHD) is one of the most common childhood disorders and can continue through adolescence and adulthood. Although the root cause of the problem still remains unknown, recent advancements in brain imaging technology reveal there exists differences between neural activities of Typically Developing Children (TDC) and ADHD subjects. Inspired by this, we propose a novel First-Take-All (FTA) hashing framework to investigate the problem of fast ADHD subjects detection through the fMRI time-series of neuron activities. By hashing time courses from regions of interests (ROIs) in the brain into fixed-size hash codes, FTA can compactly encode the temporal order differences between the neural activity patterns that are key to distinguish TDC and ADHD subjects. Such patterns can be directly learned via minimizing the training loss incurred by the generated FTA codes. By conducting similarity search on the resultant FTA codes, data-driven ADHD detection can be achieved in an efficient fashion. The experiments' results on real-world ADHD detection benchmarks demonstrate the FTA can outperform the state-of-the-art baselines using only neural activity time series without any phenotypic information.
Published: 2016

233. Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes

Author: Guo-Jun Qi, Wei Liu, Charu C. Aggarwal, and Thomas S. Huang
Subjects: Computer science, Process (engineering), 02 engineering and technology, computer.software_genre, Machine learning, Semantics, Image (mathematics), Text mining, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Intramodal dispersion, Interpretability, Contextual image classification, business.industry, Applied Mathematics, 020206 networking & telecommunications, Class (biology), Visualization, ComputingMethodologies_PATTERNRECOGNITION, Computational Theory and Mathematics, Feature (computer vision), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, computer, Software, Natural language processing
Abstract: In this paper, we present a label transfer model from texts to images for image classification tasks. The problem of image classification is often much more challenging than text classification. On one hand, labeled text data is more widely available than the labeled images for classification tasks. On the other hand, text data tends to have natural semantic interpretability, and they are often more directly related to class labels. On the contrary, the image features are not directly related to concepts inherent in class labels. One of our goals in this paper is to develop a model for revealing the functional relationships between text and image features as to directly transfer intermodal and intramodal labels to annotate the images. This is implemented by learning a transfer function as a bridge to propagate the labels between two multimodal spaces. However, the intermodal label transfers could be undermined by blindly transferring the labels of noisy texts to annotate images. To mitigate this problem, we present an intramodal label transfer process, which complements the intermodal label transfer by transferring the image labels instead when relevant text is absent from the source corpus. In addition, we generalize the inter-modal label transfer to zero-shot learning scenario where there are only text examples available to label unseen classes of images without any positive image examples. We evaluate our algorithm on an image classification task and show the effectiveness with respect to the other compared algorithms.
Published: 2016

234. Cross-modal hashing through ranking subspace learning

Author: Guo-Jun Qi, Jun Ye, Kien A. Hua, and Kai Li
Subjects: Theoretical computer science, Computer science, Nearest neighbor search, Hash function, 02 engineering and technology, 010501 environmental sciences, Rolling hash, 01 natural sciences, Locality-sensitive hashing, K-independent hashing, 0202 electrical engineering, electronic engineering, information engineering, Hamming space, 0105 earth and related environmental sciences, Universal hashing, business.industry, Dynamic perfect hashing, Pattern recognition, 2-choice hashing, Hash table, Hopscotch hashing, Hash tree, Cuckoo hashing, Locality preserving hashing, 020201 artificial intelligence & image processing, Feature hashing, Artificial intelligence, business, Perfect hash function, Double hashing
Abstract: Hashing has been widely used for approximate nearest neighbor search of high-dimensional multimedia data. In this paper, we propose a novel hash learning framework that maps high-dimensional multimodal data into a common Hamming space where the cross-modal similarity can be measured using Hamming distance. Unlike existing cross-modal hashing methods that learn hash functions in the form of numeric quantization of linear projections, the proposed hash learning algorithm encodes features' ranking properties and takes advantage of rank correlations which are known to be scale-invariant, numerically stable and highly nonlinear. Specifically, we learn two groups of subspaces jointly, one for each modality, so that the ranking orders in those subspaces maximally preserve the cross-modal similarity. Extensive experiments on realworld datasets demonstrate superiority of the proposed methods compared to state-of-the-arts.
Published: 2016

235. Web-Scale Multimedia Information Networks

Author: Shen-Fu Tsai, Thomas S. Huang, Min-Hsuan Tsai, Liangliang Cao, and Guo-Jun Qi
Subjects: Web server, education.field_of_study, Information retrieval, business.industry, Computer science, Population, Information quality, Ontology (information science), computer.software_genre, Semantics, Software portability, Information extraction, Electrical and Electronic Engineering, education, business, computer, Content management
Abstract: The abundance of multimedia data on the Web presents both challenges (how to annotate, search, and mine) and opportunities (crawling the Web to create large structured multimedia data bases which can be used to do inference effectively). Because of the huge data volume, considering all semantic concepts as on the same (flat) level is not viable. In this paper, we introduce a unified STRUCTURED representation called multimedia information networks (MINets), which incorporates ontology and cross-media links, covering both content and context knowledge. Ontology and cross-media structures are constructed and expanded by automatically constructing MINets from web-scale data by state-of-the-art information extraction and knowledge-based population techniques. The resultant MINet will contain a wide range of linkages, including logical, statistical, and semantic relations among informative concept nodes, which connects proliferative ontology as well as cross-media web-scale resources together. The raw data collected in construction phase often contain much noisy, incomplete, or even conflicting information which could be detrimental to information extraction and utilization. Then, the redundant link structure can be utilized to distill MINets and improve quality of information (QoI). Moreover, advanced inference theory and system can be built upon the linked MINets, and then high-level ontological knowledge can be inferred and integrated in a logically harmonious network structure in MINets which is consistent with human cognition. Even more, as information channels, the ontology and cross-media links in MINets connect informative knowledge resources together, which makes it possible to increase the portability of information between different resources to increase information utilization levels.
Published: 2012

236. Recommending Flickr groups with social topic model

Author: Jiazhen Zhou, Jingdong Wang, Zhe Zhao, Hao Wang, Bin Cui, and Guo-Jun Qi
Subjects: Topic model, Exploit, Computer science, media_common.quotation_subject, Social media network, Probabilistic logic, Library and Information Sciences, World Wide Web, Pattern recognition (psychology), Social media, Function (engineering), Media content, Information Systems, media_common
Abstract: The explosion of multimedia content in social media networks raises a great demand of developing tools to facilitate producing, sharing and viewing media content. Flickr groups, self-organized communities with declared common interests, are able to help users to conveniently participate in social media network. In this paper, we address the problem of automatically recommending groups to users. We propose to simultaneously exploit media contents and link structures between users and groups. To this end, we present a probabilistic latent topic model to model them in an integrated framework, expecting to jointly discover the latent interests for users and groups and simultaneously learn the recommendation function. We demonstrate the proposed approach on the dataset crawled from Flickr.com.
Published: 2012

237. Sustainable recovery of nickel from spent hydrogenation catalyst: economics, emissions and wastes assessment

Author: Q.Z. Yang, H.C. Low, Bin Song, and Guo-Jun Qi
Subjects: Waste management, Renewable Energy, Sustainability and the Environment, Strategy and Management, Resource efficiency, Environmental engineering, chemistry.chemical_element, Industrial and Manufacturing Engineering, Toxic waste, Nickel, chemistry, Sustainability, Economics, Carbon footprint, Production (economics), Carbon, health care economics and organizations, General Environmental Science, Efficient energy use
Abstract: Economic viability, carbon emission profile and waste management associated with nickel recovery from spent hydrogenation catalysts are studied from sustainability perspectives. The purpose is to determine and compare the economic, environmental and social implications of different nickel reclamation techniques towards clean, safe and sustainable recovery of nickel from spent catalysts. Sustainability evaluation models are formulated to understand and improve the cost, carbon footprint and resource efficiency of a closed-loop nickel recovery process. The economic viability of the process highly depends on market values of recovered nickel and the production batch size. At a selling price higher than $12.60/kg, an operation with a batch size as small as 50 kg/batch would be profitable. The current rising nickel market, at ∼$18–24/kg, favors recovery operations although it also casts a dual effect on production costs. About 73–82% of carbon emission of the process is from the use of energy in the recovery operation. Energy efficiency is therefore identified as the most critical factor to improve the carbon footprint. The closed-loop process also improves resource use efficiency and minimizes toxic waste generation.
Published: 2011

238. Image annotation by k NN-sparse graph-based label propagation over noisily tagged web images

Author: Jinhui Tang, Ramesh Jain, Richang Hong, Shuicheng Yan, Guo-Jun Qi, and Tat-Seng Chua
Subjects: Dense graph, Exploit, business.industry, Computer science, Pattern recognition, Semi-supervised learning, Machine learning, computer.software_genre, Regularization (mathematics), Theoretical Computer Science, k-nearest neighbors algorithm, Automatic image annotation, Discriminative model, Artificial Intelligence, Artificial intelligence, business, computer, Label propagation
Abstract: In this article, we exploit the problem of annotating a large-scale image corpus by label propagation over noisily tagged web images. To annotate the images more accurately, we propose a novel k NN-sparse graph-based semi-supervised learning approach for harnessing the labeled and unlabeled data simultaneously. The sparse graph constructed by datum-wise one-vs- k NN sparse reconstructions of all samples can remove most of the semantically unrelated links among the data, and thus it is more robust and discriminative than the conventional graphs. Meanwhile, we apply the approximate k nearest neighbors to accelerate the sparse graph construction without loosing its effectiveness. More importantly, we propose an effective training label refinement strategy within this graph-based learning framework to handle the noise in the training labels, by bringing in a dual regularization for both the quantity and sparsity of the noise. We conduct extensive experiments on a real-world image database consisting of 55,615 Flickr images and noisily tagged training labels. The results demonstrate both the effectiveness and efficiency of the proposed approach and its capability to deal with the noise in the training labels.
Published: 2011

239. Economic Viability of Nickel Recovery from Waste Catalyst

Author: Y.P. Zhang, Guo-Jun Qi, H.C. Low, Q.Z. Yang, and Ruisheng Ng
Subjects: Waste management, Mechanical Engineering, Economic feasibility, chemistry.chemical_element, Catalysis, Nickel, chemistry, Economic viability, Mechanics of Materials, Production (economics), Revenue, Environmental science, General Materials Science, Profitability index, Market value
Abstract: This paper investigates the economic viability of a closed-loop process for nickel recovery from roasted catalytic wastes. The effects of process parameters and market factors that drive the bottom-line profitability of nickel recovery are identified and analyzed using a cost and revenue evaluation model developed in the study. The main factors include the production batch size, material cost, and nickel selling price. With a nickel market value higher than S$18.85 per kg, the process is economically viable even with a batch size as small as 50 kg/batch. Given that the current nickel selling price at the metal exchange market has reached around S$24-28/kg, the economic feasibility of the process is confirmed.
Published: 2010

240. Chemical Surface Treatment for Enhanced Bonding Strength between Polymer Coating and Aluminum Alloy

Author: Guo-Jun Qi, Yaoyu Feng, and Linda Y.L. Wu
Subjects: chemistry.chemical_classification, Materials science, Mechanical Engineering, Polymer, engineering.material, Isotropic etching, Contact angle, Coating, chemistry, Mechanics of Materials, visual_art, engineering, Aluminium alloy, visual_art.visual_art_medium, Surface roughness, General Materials Science, Profilometer, Composite material, Porosity
Abstract: Polymer materials are sometimes molded to aluminum alloy to form integrated industrial parts for special applications, such as high strength, heavy duty applications. To ensure the bonding strength for long term applications, chemical treatment of aluminum surface is a quick and efficient method prior to polymer injection molding onto aluminum part. In this study, chemical etching and anodizing processes were studied to obtain suitable surface roughness and porous structures to enhance the penetration of polymer into the pores leading to interfacial anchoring and high bonding strength. Surface roughness, surface contact angles, and surface porous structures were investigated by surface Profilometer, contact angle tester, and scanning electron microscopy. Bending tests and cracking analyses were carried out to determine the bonding strength. Optimized chemical treatment recipe and process parameters were established. Preferred surface characteristics are defined. The failure mechanism of bending test was analysed and correlated to the bending fracture phenomina.
Published: 2010

241. Image Classification With Kernelized Spatial-Context

Author: Yong Rui, Guo-Jun Qi, Jinhui Tang, Xian-Sheng Hua, and Hong-Jiang Zhang
Subjects: Context model, Spatial contextual awareness, Contextual image classification, Computer science, business.industry, Feature extraction, Pattern recognition, Machine learning, computer.software_genre, Computer Science Applications, Kernel method, Kernel (image processing), Feature (computer vision), Signal Processing, Media Technology, Artificial intelligence, Electrical and Electronic Engineering, Hidden Markov model, business, computer, Image resolution, Image retrieval
Abstract: The goal of image classification is to classify a collection of unlabeled images into a set of semantic classes. Many methods have been proposed to approach this goal by leveraging visual appearances of local patches in images. However, the spatial context between these local patches also provides significant information to improve the classification accuracy. Traditional spatial contextual models, such as two-dimensional hidden Markov model, attempt to construct one common model for each image category to depict the spatial structures of the images in this class. However due to large intra-class variances in an image category, one single model has difficulties in representing various spatial contexts in different images. In contrast, we propose to construct a prototype set of spatial contextual models by leveraging the kernel methods rather than only one model. Such an algorithm combines the advantages of rich representation ability of spatial contextual models as well as the powerful classification ability of kernel method. In particular, we propose a new distance measure between different spatial contextual models by integrating joint appearance-spatial image features. Such a distance measure can be efficiently computed in a recursive formulation that scales well to image size. Extensive experiments demonstrate that the proposed approach significantly outperforms the state-of-the-art approaches.
Published: 2010

242. Image Annotation by Graph-Based Inference With Integrated Multiple/Single Instance Representations

Author: Tat-Seng Chua, Guo-Jun Qi, Jinhui Tang, and Haojie Li
Subjects: business.industry, Computer science, Supervised learning, Graph theory, Image processing, Machine learning, computer.software_genre, Computer Science Applications, Digital image, Annotation, Automatic image annotation, Signal Processing, Media Technology, Graph (abstract data type), Artificial intelligence, Electrical and Electronic Engineering, business, Image retrieval, computer
Abstract: In most of the learning-based image annotation approaches, images are represented using multiple-instance (local) or single-instance (global) features. Their performances, however, are mixed as for certain concepts, the single-instance representations of images are more suitable, while for others, the multiple-instance representations are better. Thus this paper explores a unified learning framework that combines the multiple-instance and single-instance representations for image annotation. More specifically, we propose an integrated graph-based semi-supervised learning framework to utilize these two types of representations simultaneously. We further explore three strategies to convert from multiple-instance representation into a single-instance one. Experiments conducted on the COREL image dataset demonstrate the effectiveness and efficiency of the proposed integrated framework and the conversion strategies.
Published: 2010

243. Video semantic analysis based on structure-sensitive anisotropic manifold ranking

Author: Xian-Sheng Hua, Meng Wang, Guo-Jun Qi, and Jinhui Tang
Subjects: Computer Science::Machine Learning, Partial differential equation, Theoretical computer science, Anisotropic diffusion, business.industry, Supervised learning, Video content analysis, Semi-supervised learning, Machine learning, computer.software_genre, TRECVID, Euclidean distance, Control and Systems Engineering, Signal Processing, Graph (abstract data type), Computer Vision and Pattern Recognition, Artificial intelligence, Electrical and Electronic Engineering, business, computer, Software, Computer Science::Cryptography and Security, Mathematics
Abstract: As a major family of semi-supervised learning (SSL), graph-based SSL has recently attracted considerable interest in the machine learning community along with application areas such as video semantic analysis. In this paper, we analyze the connections between graph-based SSL and partial differential equation- (PDE) based diffusion. From the viewpoint of PDE-based diffusion, the label propagation in normal graph-based SSL is isotropic accompanied with distance. However, according to the structural assumption, which is one of the two basic assumptions in graph-based SSL, we need to enhance the label propagation between the samples in the same structure while weakening the counterpart between the samples in different structures. Accordingly, we deduce a novel graph-based SSL framework, named structure-sensitive anisotropic manifold ranking (SSAniMR), from PDE-based anisotropic diffusion. Instead of using Euclidean distance only, SSAniMR takes local structural difference into account to make the label propagation anisotropic, which is intrinsically different from the isotropic label propagation process in general graph-based SSL methods. Experiments conducted on the TREC Video Retrieval Evaluation (TRECVID) dataset show that this approach significantly outperforms existing graph-based SSL methods and is effective for video semantic annotation.
Published: 2009

244. Design and fabrication of a novel integrated shadow mask for passive matrix OLDE devices

Author: W.M. Su, Z.H. Huang, D. Lukito, X.T. Zeng, and Guo-Jun Qi
Subjects: Shadow mask, Fabrication, Materials science, business.industry, Metals and Alloys, Surfaces and Interfaces, Surfaces, Coatings and Films, Electronic, Optical and Magnetic Materials, law.invention, Matrix (mathematics), Optics, Mask set, law, Materials Chemistry, OLED, Deposition (phase transition), Thin film, Photolithography, business
Abstract: This paper reports a novel structural design of integrated shadow mask and its fabrication technique by thin film processes. The proposed Γ-shaped mask structure provides passive matrix OLED devices with a higher aperture ratio than the conventional T-shaped profiled masks. The Γ-shaped mask structure can be realized with a low cost patternable sol–gel and photolithography processes. Full angle deposition can be realized with the new mask structure, which is important for mass production of devices with automatic systems.
Published: 2009

245. Unified Video Annotation via Multigraph Learning

Author: Xian-Sheng Hua, Yan Song, Meng Wang, Jinhui Tang, Guo-Jun Qi, and Richang Hong
Subjects: Training set, Computer science, business.industry, Multigraph, Feature extraction, Supervised learning, Semi-supervised learning, Machine learning, computer.software_genre, TRECVID, Regularization (mathematics), Graph, Media Technology, Artificial intelligence, Electrical and Electronic Engineering, business, computer, Data compression, Curse of dimensionality
Abstract: Learning-based video annotation is a promising approach to facilitating video retrieval and it can avoid the intensive labor costs of pure manual annotation. But it frequently encounters several difficulties, such as insufficiency of training data and the curse of dimensionality. In this paper, we propose a method named optimized multigraph-based semi-supervised learning (OMG-SSL), which aims to simultaneously tackle these difficulties in a unified scheme. We show that various crucial factors in video annotation, including multiple modalities, multiple distance functions, and temporal consistency, all correspond to different relationships among video units, and hence they can be represented by different graphs. Therefore, these factors can be simultaneously dealt with by learning with multiple graphs, namely, the proposed OMG-SSL approach. Different from the existing graph-based semi-supervised learning methods that only utilize one graph, OMG-SSL integrates multiple graphs into a regularization framework in order to sufficiently explore their complementation. We show that this scheme is equivalent to first fusing multiple graphs and then conducting semi-supervised learning on the fused graph. Through an optimization approach, it is able to assign suitable weights to the graphs. Furthermore, we show that the proposed method can be implemented through a computationally efficient iterative process. Extensive experiments on the TREC video retrieval evaluation (TRECVID) benchmark have demonstrated the effectiveness and efficiency of our proposed approach.
Published: 2009

246. Semi-supervised kernel density estimation for video annotation

Author: Tao Mei, Li-Rong Dai, Richang Hong, Yan Song, Guo-Jun Qi, Xian-Sheng Hua, and Meng Wang
Subjects: Computer science, business.industry, Supervised learning, Kernel density estimation, Density estimation, Semi-supervised learning, Machine learning, computer.software_genre, Kernel method, Variable kernel density estimation, Kernel (statistics), Signal Processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, computer, Software, Parametric statistics
Abstract: Insufficiency of labeled training data is a major obstacle for automatic video annotation. Semi-supervised learning is an effective approach to this problem by leveraging a large amount of unlabeled data. However, existing semi-supervised learning algorithms have not demonstrated promising results in large-scale video annotation due to several difficulties, such as large variation of video content and intractable computational cost. In this paper, we propose a novel semi-supervised learning algorithm named semi-supervised kernel density estimation (SSKDE) which is developed based on kernel density estimation (KDE) approach. While only labeled data are utilized in classical KDE, in SSKDE both labeled and unlabeled data are leveraged to estimate class conditional probability densities based on an extended form of KDE. It is a non-parametric method, and it thus naturally avoids the model assumption problem that exists in many parametric semi-supervised methods. Meanwhile, it can be implemented with an efficient iterative solution process. So, this method is appropriate for video annotation. Furthermore, motivated by existing adaptive KDE approach, we propose an improved algorithm named semi-supervised adaptive kernel density estimation (SSAKDE). It employs local adaptive kernels rather than a fixed kernel, such that broader kernels can be applied in the regions with low density. In this way, more accurate density estimates can be obtained. Extensive experiments have demonstrated the effectiveness of the proposed methods.
Published: 2009

247. Correlative multilabel video annotation with temporal kernels

Author: Meng Wang, Guo-Jun Qi, Xian-Sheng Hua, Yong Rui, Jinhui Tang, Hong-Jiang Zhang, and Tao Mei
Subjects: Correlative, Computer Networks and Communications, Computer science, business.industry, Frame (networking), Machine learning, computer.software_genre, TRECVID, Data set, Discriminative model, Binary classification, Hardware and Architecture, Video browsing, Artificial intelligence, Hidden Markov model, business, computer
Abstract: Automatic video annotation is an important ingredient for semantic-level video browsing, search and navigation. Much attention has been paid to this topic in recent years. These researches have evolved through two paradigms. In the first paradigm, each concept is individually annotated by a pre-trained binary classifier. However, this method ignores the rich information between the video concepts and only achieves limited success. Evolved from the first paradigm, the methods in the second paradigm add an extra step on the top of the first individual classifiers to fuse the multiple detections of the concepts. However, the performance of these methods can be degraded by the error propagation incurred in the first step to the second fusion one. In this article, another paradigm of the video annotation method is proposed to address these problems. It simultaneously annotates the concepts as well as model correlations between them in one step by the proposed Correlative Multilabel (CML) method, which benefits from the compensation of complementary information between different labels. Furthermore, since the video clips are composed by temporally ordered frame sequences, we extend the proposed method to exploit the rich temporal information in the videos. Specifically, a temporal-kernel is incorporated into the CML method based on the discriminative information between Hidden Markov Models (HMMs) that are learned from the videos. We compare the performance between the proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. As to be shown, superior performance of the proposed method is gained.
Published: 2008

248. Video Annotation Based on Kernel Linear Neighborhood Propagation

Author: Yan Song, Guo-Jun Qi, Xiuqing Wu, Xian-Sheng Hua, and Jinhui Tang
Subjects: Computer Science::Machine Learning, Training set, Computer science, business.industry, Supervised learning, Feature extraction, Graph theory, Pattern recognition, Semi-supervised learning, TRECVID, Computer Science Applications, Kernel (linear algebra), Kernel method, Signal Processing, Media Technology, Graph (abstract data type), Embedding, Artificial intelligence, Electrical and Electronic Engineering, business
Abstract: The insufficiency of labeled training data for representing the distribution of the entire dataset is a major obstacle in automatic semantic annotation of large-scale video database. Semi-supervised learning algorithms, which attempt to learn from both labeled and unlabeled data, are promising to solve this problem. In this paper, a novel graph-based semi-supervised learning method named kernel linear neighborhood propagation (KLNP) is proposed and applied to video annotation. This approach combines the consistency assumption, which is the basic assumption in semi-supervised learning, and the local linear embedding (LLE) method in a nonlinear kernel-mapped space. KLNP improves a recently proposed method linear neighborhood propagation (LNP) by tackling the limitation of its local linear assumption on the distribution of semantics. Experiments conducted on the TRECVID data set demonstrate that this approach outperforms other popular graph-based semi-supervised learning methods for video semantic annotation.
Published: 2008

249. A novel high birefringence equal diameter circular-hole photonic crystal fiber

Author: Guo, Jun-qi, primary, Zhong, Yi, additional, Liu, Yu, additional, Yang, Xiao-hui, additional, Xiao, Ming-lang, additional, and Zhou, Min, additional
Published: 2017
Full Text: View/download PDF

250. INTERACTIVE VIDEO ANNOTATION BY MULTI-CONCEPT MULTI-MODALITY ACTIVE LEARNING

Author: Yan Song, Jinhui Tang, Tao Mei, Guo-Jun Qi, Li-Rong Dai, Meng Wang, and Xian-Sheng Hua
Subjects: Linguistics and Language, Video annotation, Modalities, Computer Networks and Communications, Computer science, Interactive video, business.industry, Machine learning, computer.software_genre, TRECVID, Multi modality, Computer Science Applications, Annotation, Artificial Intelligence, Graph (abstract data type), Multimedia annotation, Artificial intelligence, business, computer, Software, Information Systems
Abstract: Active learning has been demonstrated to be an effective approach to reducing human labeling effort in multimedia annotation tasks. However, most of the existing active learning methods for video annotation are studied in a relatively simple context where concepts are sequentially annotated with fixed effort and only a single modality is applied. However, we usually have to deal with multiple modalities, and sequentially annotating concepts without preference cannot suitably assign annotation effort. To address these two issues, in this paper we propose a multi-concept multi-modality active learning method for video annotation in which multiple concepts and multiple modalities can be simultaneously taken into consideration. In each round of active learning, this method selects the concept that is expected to get the highest performance gain and a batch of suitable samples to be annotated for this concept. Then, a graph-based semi-supervised learning is conducted on each modality for the selected concept. The proposed method is able to sufficiently explore the human effort by considering both the learnabilities of different concepts and the potentials of different modalities. Experimental results on TRECVID 2005 benchmark have demonstrated its effectiveness and efficiency.
Published: 2007

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Database

Publisher

432 results on '"Guo-Jun Qi"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources