33 results on '"Tianbao Yang"'
Search Results
2. Advancing non-convex and constrained learning
- Author
-
Tianbao Yang
- Subjects
021103 operations research ,Computer science ,business.industry ,Deep learning ,Zhàng ,Big data ,0211 other engineering and technologies ,0102 computer and information sciences ,02 engineering and technology ,General Medicine ,01 natural sciences ,Constraint (information theory) ,Wright ,010201 computation theory & mathematics ,Robustness (computer science) ,Paradigm shift ,Artificial intelligence ,business ,Interpretability - Abstract
As data gets more complex and applications of machine learning (ML) algorithms for decision-making broaden and diversify, traditional ML methods by minimizing an unconstrained or simply constrained convex objective are becoming increasingly unsatisfactory. To address this new challenge, recent ML research has sparked a paradigm shift in learning predictive models into non-convex learning and heavily constrained learning. Non-Convex Learning (NCL) refers to a family of learning methods that involve optimizing non-convex objectives. Heavily Constrained Learning (HCL) refers to a family of learning methods that involve constraints that are much more complicated than a simple norm constraint (e.g., data-dependent functional constraints, non-convex constraints), as in conventional learning. This paradigm shift has already created many promising outcomes: (i) non-convex deep learning has brought breakthroughs for learning representations from large-scale structured data (e.g., images, speech) (LeCun, Bengio, & Hinton, 2015; Krizhevsky, Sutskever, & Hinton, 2012; Amodei et al., 2016; Deng & Liu, 2018); (ii) non-convex regularizers (e.g., for enforcing sparsity or low-rank) could be more effective than their convex counterparts for learning high-dimensional structured models (C.-H. Zhang & Zhang, 2012; J. Fan & Li, 2001; C.-H. Zhang, 2010; T. Zhang, 2010); (iii) constrained learning is being used to learn predictive models that satisfy various constraints to respect social norms (e.g., fairness) (B. E. Woodworth, Gunasekar, Ohannessian, & Srebro, 2017; Hardt, Price, Srebro, et al., 2016; Zafar, Valera, Gomez Rodriguez, & Gummadi, 2017; A. Agarwal, Beygelzimer, Dudík, Langford, & Wallach, 2018), to improve the interpretability (Gupta et al., 2016; Canini, Cotter, Gupta, Fard, & Pfeifer, 2016; You, Ding, Canini, Pfeifer, & Gupta, 2017), to enhance the robustness (Globerson & Roweis, 2006a; Sra, Nowozin, & Wright, 2011; T. Yang, Mahdavi, Jin, Zhang, & Zhou, 2012), etc. In spite of great promises brought by these new learning paradigms, they also bring emerging challenges to the design of computationally efficient algorithms for big data and the analysis of their statistical properties.
- Published
- 2019
- Full Text
- View/download PDF
3. Effects of power ultrasound on the activity and structure of β‐D‐glucosidase with potentially aroma‐enhancing capability
- Author
-
Yuanzhong Xue, Yujing Sun, Peilong Sun, Tianbao Yang, Li Zeng, and Zongming Cheng
- Subjects
β d glucosidase ,lcsh:TX341-641 ,enhancing aroma ,01 natural sciences ,β‐d‐Glucosidase ,activity ,ultrasound ,structure ,Hydrolysis ,0404 agricultural biotechnology ,Flavor ,Aroma ,Original Research ,chemistry.chemical_classification ,Chromatography ,biology ,Chemistry ,business.industry ,010401 analytical chemistry ,Ultrasound ,food and beverages ,04 agricultural and veterinary sciences ,biology.organism_classification ,040401 food science ,Enzyme assay ,0104 chemical sciences ,Enzyme ,biology.protein ,Ultrasonic sensor ,business ,lcsh:Nutrition. Foods and food supply ,Food Science - Abstract
β‐d‐glucosidase can release aroma precursors to improve the flavor of plant food, but the hydrolysis efficiency of the enzyme is low; the purpose of this study was to improve the enzyme activity using ultrasound. The effects of ultrasound parameters on β‐d‐glucosidase activity were investigated, and the respective structures of enzyme activated and enzyme inhibited were further analyzed. Low temperature (20–45°C), low ultrasonic intensity (181.53 W/cm2), and long treatment time (>15 min) led to its inhibition. Application of ultrasound lowered the optimum temperature for β‐d‐glucosidase activity from 50 to 40°C. Ultrasound did not change the primary structures of the enzyme, but changed the secondary structures. When ultrasound activated β‐d‐glucosidase, the α‐helix contents were increased, the β‐fold and irregular coil content were reduced. When ultrasound inhibited β‐d‐glucosidase, the contents of β‐folds were increased, the α‐helix and irregular coil contents were reduced.. In summary, activation or inhibition of β‐d‐glucosidase under ultrasound was determined by the ultrasound conditions. This study suggests that ultrasound combined with β‐D‐glucosidase can be used in aroma‐enhancing.
- Published
- 2019
4. Thoracoscopic radical esophagectomy combined with left inferior pulmonary ligament lymphadenectomy for esophageal carcinoma via the right thoracic approach: A single-center retrospective study of 30 cases
- Author
-
Shijie Huang, Guozhong Huang, Wu Wang, Boyang Chen, Jinbiao Xie, Pengfei Chen, Douli Ke, Tianbao Yang, and Wenhua Huang
- Subjects
Male ,medicine.medical_specialty ,China ,Esophageal Neoplasms ,medicine.medical_treatment ,Operative Time ,Observational Study ,left inferior pulmonary ligament lymph nodes ,03 medical and health sciences ,0302 clinical medicine ,Postoperative Complications ,medicine ,Carcinoma ,Humans ,right thoracic approach ,030212 general & internal medicine ,Esophagus ,Aged ,Neoplasm Staging ,Retrospective Studies ,business.industry ,Thoracic Surgery, Video-Assisted ,Mediastinum ,Chylothorax ,General Medicine ,Length of Stay ,medicine.disease ,Prognosis ,Surgery ,esophageal squamous cell carcinoma ,Esophagectomy ,Dissection ,medicine.anatomical_structure ,Cardiothoracic surgery ,030220 oncology & carcinogenesis ,Lymph Node Excision ,Lymphadenectomy ,Female ,Lymph Nodes ,Neoplasm Recurrence, Local ,business ,video-assisted thoracic surgery ,Research Article - Abstract
To evaluate the necessity, safety, and feasibility of left inferior pulmonary ligament lymphadenectomy during video-assisted thoracic surgery (VATS) radical esophagectomy via the right thoracic approach. Thirty patients (20 men, 10 women) with thoracic esophageal squamous cell carcinoma (ESCC) were recruited for this study. The patients’ age ranged from 50 to 80 years, with an average age of 66.17 ± 7.47 years. After the patients underwent VATS radical esophagectomy and left inferior pulmonary ligament lymph node dissection (LIPLND) via the right thoracic approach, the operative outcomes included operative time, length of hospital stay, postoperative complications, number of lymph nodes removed, and postoperative pathologic results were evaluated. There were no massive hemorrhages of the left inferior pulmonary vein during the operation. The operative time of LIPLND was 8.67 ± 2.04 minutes, and the length of postoperative hospital stay was 12.23 ± 2.36 days. The postoperative complications included 2 cases of left pneumothorax, 4 pulmonary infection cases, and no chylothorax. Moreover, 68 LIPLNs were dissected, 5 of which were positive, and the degree of metastasis was 7.4%. The postoperative pathologic results showed that 3 cases of LIPLNs were positive, with a metastasis rate of 10.0%. Among them, 2 cases were SCC of the lower thoracic esophagus, and 1 case was SCC of the middle thoracic esophagus, which involved the lower segment. Thoracoscopic esophagectomy combined with left inferior pulmonary ligament lymphadenectomy for esophageal carcinoma via the right thoracic approach will not increase the difficulty of operation, increase the incidence of postoperative complications or prolong the postoperative hospital stay, and can theoretically reduce tumor recurrence. Therefore, we believe that LIPLND is necessary, safe, and feasible and is worthy of clinical popularization and application.
- Published
- 2020
5. A Simple and Effective Framework for Pairwise Deep Metric Learning
- Author
-
Yan Yan, Tianbao Yang, Zixuan Wu, Xiaoyu Wang, and Qi Qi
- Subjects
Heuristic ,business.industry ,Computer science ,Deep learning ,Robust optimization ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Set (abstract data type) ,Variable (computer science) ,Binary classification ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Pairwise comparison ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences - Abstract
Deep metric learning (DML) has received much attention in deep learning due to its wide applications in computer vision. Previous studies have focused on designing complicated losses and hard example mining methods, which are mostly heuristic and lack of theoretical understanding. In this paper, we cast DML as a simple pairwise binary classification problem that classifies a pair of examples as similar or dissimilar. It identifies the most critical issue in this problem—imbalanced data pairs. To tackle this issue, we propose a simple and effective framework to sample pairs in a batch of data for updating the model. The key to this framework is to define a robust loss for all pairs over a mini-batch of data, which is formulated by distributionally robust optimization. The flexibility in constructing the uncertainty decision set of the dual variable allows us to recover state-of-the-art complicated losses and also to induce novel variants. Empirical studies on several benchmark data sets demonstrate that our simple and effective method outperforms the state-of-the-art results.
- Published
- 2020
- Full Text
- View/download PDF
6. Accelerating Deep Learning with Millions of Classes
- Author
-
Xiaotian Yu, Zhishuai Guo, Tianbao Yang, Zhuoning Yuan, and Xiaoyu Wang
- Subjects
020203 distributed computing ,Computer science ,business.industry ,Random projection ,Deep learning ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Softmax function ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,Classifier (UML) ,Feature learning ,computer ,0105 earth and related environmental sciences - Abstract
Deep learning has achieved remarkable success in many classification tasks because of its great power of representation learning for complex data. However, it remains challenging when extending to classification tasks with millions of classes. Previous studies are focused on solving this problem in a distributed fashion or using a sampling-based approach to reduce the computational cost caused by the softmax layer. However, these approaches still need high GPU memory in order to work with large models and it is non-trivial to extend them to parallel settings. To address these issues, we propose an efficient training framework to handle extreme classification tasks based on Random Projection. The key idea is that we first train a slimmed model with a random projected softmax classifier and then we recover it to the original classifier. We also show a theoretical guarantee that this recovered classifier can approximate the original classifier with a small error. Later, we extend our framework to parallel settings by adopting a communication reduction technique. In our experiments, we demonstrate that the proposed framework is able to train deep learning models with millions of classes and achieve above \(10{\times }\) speedup compared to existing approaches.
- Published
- 2020
- Full Text
- View/download PDF
7. Hybrid safe–strong rules for efficient optimization in lasso-type problems
- Author
-
Yaohui Zeng, Tianbao Yang, and Patrick Breheny
- Subjects
Statistics and Probability ,Elastic net regularization ,Computer science ,business.industry ,Applied Mathematics ,Model selection ,Big data ,Type (model theory) ,Machine learning ,computer.software_genre ,Group lasso ,Computational Mathematics ,Computational Theory and Mathematics ,Lasso (statistics) ,Generalizability theory ,Statistical analysis ,Artificial intelligence ,business ,computer - Abstract
The lasso model has been widely used for model selection in data mining, machine learning, and high-dimensional statistical analysis. However, with the ultrahigh-dimensional, large-scale data sets now collected in many real-world applications, it is important to develop algorithms to solve the lasso that efficiently scale up to problems of this size. Discarding features from certain steps of the algorithm is a powerful technique for increasing efficiency and addressing the Big Data challenge. This paper proposes a family of hybrid safe–strong rules (HSSR) which incorporate safe screening rules into the sequential strong rule (SSR) to remove unnecessary computational burden. Two instances of HSSR are presented, SSR-Dome and SSR-BEDPP, for the standard lasso problem. SSR-BEDPP is further extended to the elastic net and group lasso problems to demonstrate the generalizability of the hybrid screening idea. Extensive numerical experiments with synthetic and real data sets are conducted for both the standard lasso and the group lasso problems. Results show that the proposed hybrid rules can substantially outperform existing state-of-the-art rules.
- Published
- 2021
- Full Text
- View/download PDF
8. Hetero-ConvLSTM
- Author
-
Zhuoning Yuan, Xun Zhou, and Tianbao Yang
- Subjects
Artificial neural network ,Computer science ,business.industry ,Deep learning ,02 engineering and technology ,computer.software_genre ,Spatial heterogeneity ,Temporal database ,Domain (software engineering) ,Traffic volume ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,Artificial intelligence ,Routing (electronic design automation) ,Baseline (configuration management) ,business ,computer - Abstract
Predicting traffic accidents is a crucial problem to improving transportation and public safety as well as safe routing. The problem is also challenging due to the rareness of accidents in space and time and spatial heterogeneity of the environment (e.g., urban vs. rural). Most previous research on traffic accident prediction conducted by domain researchers simply applied classical prediction models on limited data without addressing the above challenges properly, thus leading to unsatisfactory performance. A small number of recent works have attempted to use deep learning for traffic accident prediction. However, they either ignore time information or use only data from a small and homogeneous study area (a city), without handling spatial heterogeneity and temporal auto-correlation properly at the same time. In this paper we perform a comprehensive study on the traffic accident prediction problem using the Convolutional Long Short-Term Memory (ConvLSTM) neural network model. A number of detailed features such as weather, environment, road condition, and traffic volume are extracted from big datasets over the state of Iowa across 8 years. To address the spatial heterogeneity challenge in the data, we propose a Hetero-ConvLSTM framework, where a few novel ideas are implemented on top of the basic ConvLSTM model, such as incorporating spatial graph features and spatial model ensemble. Extensive experiments on the 8-year data over the entire state of Iowa show that the proposed framework makes reasonably accurate predictions and significantly improves the prediction accuracy over baseline approaches.
- Published
- 2018
- Full Text
- View/download PDF
9. A Unified Analysis of Stochastic Momentum Methods for Deep Learning
- Author
-
Qihang Lin, Yi Yang, Tianbao Yang, Yan Yan, and Zhe Li
- Subjects
Momentum (technical analysis) ,Optimization problem ,Computer science ,business.industry ,Generalization ,Deep learning ,Stability (learning theory) ,02 engineering and technology ,Term (time) ,Convergence (routing) ,0202 electrical engineering, electronic engineering, information engineering ,Deep neural networks ,Applied mathematics ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i.e., the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov?s accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG. However, the stability analysis shows that the momentum term can improve the stability of the learned model and hence improve the generalization performance. These theoretical insights verify the common wisdom and are also corroborated by our empirical analysis on deep learning.
- Published
- 2018
- Full Text
- View/download PDF
10. EIGEN: Ecologically-Inspired GENetic Approach for Neural Network Structure Searching from Scratch
- Author
-
Tianbao Yang, Zhe Li, Ning Xu, David J. Foran, Jian Ren, and Jianchao Yang
- Subjects
FOS: Computer and information sciences ,Structure (mathematical logic) ,education.field_of_study ,Secondary succession ,Artificial neural network ,Computer science ,Process (engineering) ,business.industry ,Deep learning ,Population ,Computer Science - Neural and Evolutionary Computing ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Task (project management) ,Domain (software engineering) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Neural and Evolutionary Computing (cs.NE) ,Artificial intelligence ,education ,business ,0105 earth and related environmental sciences - Abstract
Designing the structure of neural networks is considered one of the most challenging tasks in deep learning, especially when there is few prior knowledge about the task domain. In this paper, we propose an Ecologically-Inspired GENetic (EIGEN) approach that uses the concept of succession, extinction, mimicry, and gene duplication to search neural network structure from scratch with poorly initialized simple network and few constraints forced during the evolution, as we assume no prior knowledge about the task domain. Specifically, we first use primary succession to rapidly evolve a population of poorly initialized neural network structures into a more diverse population, followed by a secondary succession stage for fine-grained searching based on the networks from the primary succession. Extinction is applied in both stages to reduce computational cost. Mimicry is employed during the entire evolution process to help the inferior networks imitate the behavior of a superior network and gene duplication is utilized to duplicate the learned blocks of novel structures, both of which help to find better network structures. Experimental results show that our proposed approach can achieve similar or better performance compared to the existing genetic approaches with dramatically reduced computation cost. For example, the network discovered by our approach on CIFAR-100 dataset achieves 78.1% test accuracy under 120 GPU hours, compared to 77.0% test accuracy in more than 65, 536 GPU hours in [35]., CVPR 2019
- Published
- 2018
11. How Local Is the Local Diversity? Reinforcing Sequential Determinantal Point Processes with Dynamic Ground Sets for Supervised Video Summarization
- Author
-
Boqing Gong, Liqiang Wang, Tianbao Yang, and Yandong Li
- Subjects
Computer science ,Property (programming) ,business.industry ,Control (management) ,Volume (computing) ,020207 software engineering ,Statistical model ,02 engineering and technology ,Machine learning ,computer.software_genre ,Automatic summarization ,Point process ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Diversity (business) - Abstract
The large volume of video content and high viewing frequency demand automatic video summarization algorithms, of which a key property is the capability of modeling diversity. If videos are lengthy like hours-long egocentric videos, it is necessary to track the temporal structures of the videos and enforce local diversity. The local diversity refers to that the shots selected from a short time duration are diverse but visually similar shots are allowed to co-exist in the summary if they appear far apart in the video. In this paper, we propose a novel probabilistic model, built upon SeqDPP, to dynamically control the time span of a video segment upon which the local diversity is imposed. In particular, we enable SeqDPP to learn to automatically infer how local the local diversity is supposed to be from the input video. The resulting model is extremely involved to train by the hallmark maximum likelihood estimation (MLE), which further suffers from the exposure bias and non-differentiable evaluation metrics. To tackle these problems, we instead devise a reinforcement learning algorithm for training the proposed model. Extensive experiments verify the advantages of our model and the new learning algorithm over MLE-based methods.
- Published
- 2018
- Full Text
- View/download PDF
12. Development of a Portable Aerosol Collector and Spectrometer (PACS)
- Author
-
Jae Hong Park, Tianbao Yang, Sivaram P. Gogineni, Thomas M. Peters, Geb Thomas, and Changjie Cai
- Subjects
Materials science ,010504 meteorology & atmospheric sciences ,Particle number ,Spectrometer ,business.industry ,Measure (physics) ,010501 environmental sciences ,01 natural sciences ,Pollution ,Aerosol ,Optics ,Environmental Chemistry ,General Materials Science ,Development (differential geometry) ,Diffusion (business) ,business ,0105 earth and related environmental sciences - Abstract
This article presents the development of a Portable Aerosol Collector and Spectrometer (PACS), an instrument designed to measure particle number, surface area, and mass concentrations continuously and time-weighted mass concentration by composition from 10 nm to 10 µm. The PACS consists of a six-stage particle size selector, a valve system, a water condensation particle counter to detect number concentrations, and a photometer to detect mass concentrations. The stages of the selector include three impactor and two diffusion stages, which resolve particles by size and collect particles for later chemical analysis. Particle penetration by size was measured through each stage to determine actual collection performance and account for particle losses. The data inversion algorithm uses an adaptive grid-search process with a constrained linear least-square solver to fit a tri-modal (ultrafine, fine, and coarse), log-normal distribution to the input data (number and mass concentration exiting each stage). The measured 50% cutoff diameter of each stage was similar to the design. The pressure drop of each stage was sufficiently low to permit its operation with portable air pumps. Sensitivity studies were conducted to explore the influence of unknown particle density (range from 500 to 3,000 kg/m3) and shape factor (range from 1.0 to 3.0) on algorithm output. Assuming standard density spheres, the aerosol size distributions fit well with a normalized mean bias of −4.9% to 3.5%, normalized mean error of 3.3% to 27.6%, and R2 values of 0.90 to 1.00. The fitted number and mass concentration biases were within ±10% regardless of uncertainties in density and shape. However, fitted surface area concentrations were more likely to be underestimated/overestimated due to the variation in particle density and shape. The PACS represents a novel way to simultaneously assess airborne aerosol composition and concentration by number, surface area, and mass over a wide size range. Copyright © 2018 American Association for Aerosol Research
- Published
- 2018
- Full Text
- View/download PDF
13. Improving Sequential Determinantal Point Processes for Supervised Video Summarization
- Author
-
Tianbao Yang, Aidean Sharghi, Boqing Gong, Ali Borji, and Chengtao Li
- Subjects
Scheme (programming language) ,business.industry ,Computer science ,Search engine indexing ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Automatic summarization ,Point process ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences ,computer.programming_language - Abstract
It is now much easier than ever before to produce videos. While the ubiquitous video data is a great source for information discovery and extraction, the computational challenges are unparalleled. Automatically summarizing the videos has become a substantial need for browsing, searching, and indexing visual content. This paper is in the vein of supervised video summarization using sequential determinantal point processes (SeqDPPs), which models diversity by a probabilistic distribution. We improve this model in two folds. In terms of learning, we propose a large-margin algorithm to address the exposure bias problem in SeqDPP. In terms of modeling, we design a new probabilistic distribution such that, when it is integrated into SeqDPP, the resulting model accepts user input about the expected length of the summary. Moreover, we also significantly extend a popular video summarization dataset by (1) more egocentric videos, (2) dense user annotations, and (3) a refined evaluation scheme. We conduct extensive experiments on this dataset (about 60 h of videos in total) and compare our approach to several competitive baselines.
- Published
- 2018
- Full Text
- View/download PDF
14. A Machine Learning Approach for Air Quality Prediction: Model Regularization and Optimization
- Author
-
Dixian Zhu, Changjie Cai, Xun Zhou, and Tianbao Yang
- Subjects
air pollutant prediction ,multi-task learning ,regularization ,analytical solution ,Computer science ,Big data ,Matrix norm ,Multi-task learning ,Machine learning ,computer.software_genre ,lcsh:Technology ,Regularization (mathematics) ,Management Information Systems ,Artificial Intelligence ,information_technology_data_management ,Air quality index ,Basis (linear algebra) ,lcsh:T ,business.industry ,Regression analysis ,Computer Science Applications ,Nonlinear system ,Norm (mathematics) ,Artificial intelligence ,business ,computer ,Predictive modelling ,Information Systems - Abstract
In this paper, we tackle air quality forecasting by using machine learning approaches to predict the hourly concentration of air pollutants (e.g., ozone, particle matter ( PM 2.5 ) and sulfur dioxide). Machine learning, as one of the most popular techniques, is able to efficiently train a model on big data by using large-scale optimization algorithms. Although there exist some works applying machine learning to air quality prediction, most of the prior studies are restricted to several-year data and simply train standard regression models (linear or nonlinear) to predict the hourly air pollution concentration. In this work, we propose refined models to predict the hourly air pollution concentration on the basis of meteorological data of previous days by formulating the prediction over 24 h as a multi-task learning (MTL) problem. This enables us to select a good model with different regularization techniques. We propose a useful regularization by enforcing the prediction models of consecutive hours to be close to each other and compare it with several typical regularizations for MTL, including standard Frobenius norm regularization, nuclear norm regularization, and ℓ 2 , 1 -norm regularization. Our experiments have showed that the proposed parameter-reducing formulations and consecutive-hour-related regularizations achieve better performance than existing standard regression models and existing regularizations.
- Published
- 2017
- Full Text
- View/download PDF
15. Pre-harvest calcium application increases biomass and delays senescence of broccoli microgreens
- Author
-
Luhong Huang, Liping Kou, Xianjin Liu, Yaguang Luo, Tianbao Yang, and Eton E. Codling
- Subjects
Chemistry ,business.industry ,Biomass ,chemistry.chemical_element ,Horticulture ,Bacterial growth ,Calcium ,Shelf life ,Microgreen ,Biotechnology ,Modified atmosphere ,Postharvest ,Food science ,business ,Agronomy and Crop Science ,Flavor ,Food Science - Abstract
Microgreen consumption has been steadily increasing in recent years due to consumer awareness of their unique color, rich flavor, and concentrated bioactive compounds. However, industrial production and marketing is limited by their short shelf-life associated with rapid deterioration in product quality. This study investigated the effect of pre-harvest calcium application on the post-harvest quality and shelf-life of broccoli microgreens. Broccoli microgreen seedlings were sprayed daily with calcium chloride at concentrations of 1, 10 and 20 mM, or water (control) for 10 days. The fresh-cut microgreens were packaged in sealed polyethylene film bags. Package headspace atmospheric conditions, overall visual quality and tissue membrane integrity were evaluated on days 0, 7, 14, and 21, during 5 °C storage. Results indicated that the 10 mM calcium chloride treatment increased the biomass by more than 50%, and tripled the calcium content as compared to the water-treated controls. Microgreens treated with 10 mM calcium chloride spray exhibited higher superoxide dismutase and peroxidase activities, lower tissue electrolyte leakage, improved overall visual quality, and reduced microbial growth during storage. Furthermore, calcium treatment significantly affected expression of the senescence-associated genes BoSAG12, BoGPX6, BoCAT3 and BoSAG12. These results provide important information for commercial growers to enhance productivity and improve postharvest quality and shelf-life, potentially enabling a broadening of the retail marketing of broccoli microgreens.
- Published
- 2014
- Full Text
- View/download PDF
16. A Multisource Domain Generalization Approach to Visual Attribute Detection
- Author
-
Boqing Gong, Tianbao Yang, and Chuang Gan
- Subjects
Computer science ,business.industry ,Generalization ,Perspective (graphical) ,Cognitive neuroscience of visual object recognition ,A domain ,Machine learning ,computer.software_genre ,Domain (software engineering) ,Data mining ,Artificial intelligence ,business ,Image retrieval ,computer - Abstract
Attributes possess appealing properties and benefit many computer vision problems, such as object recognition, learning with humans in the loop, and image retrieval. Whereas the existing work mainly pursues utilizing attributes for various computer vision problems, we contend that the most basic problem—how to accurately and robustly detect attributes from images—has been left underexplored. Especially, the existing work rarely explicitly tackles the need that attribute detectors should generalize well across different categories, including those previously unseen. Noting that this is analogous to the objective of multisource domain generalization, if we treat each category as a domain, we provide a novel perspective to attribute detection and propose to gear the techniques in multisource domain generalization for the purpose of learning cross-category generalizable attribute detectors. We validate our understanding and approach with extensive experiments on four challenging datasets and two different problems.
- Published
- 2017
- Full Text
- View/download PDF
17. Online Asymmetric Active Learning with Imbalanced Data
- Author
-
Xiaoxuan Zhang, Padmini Srinivasan, and Tianbao Yang
- Subjects
Information retrieval ,Computer science ,business.industry ,Active learning (machine learning) ,Probabilistic logic ,02 engineering and technology ,Machine learning ,computer.software_genre ,Personalization ,Statistical classification ,Web query classification ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,F1 score ,computer - Abstract
This paper considers online learning with imbalanced streaming data under a query budget, where the act of querying for labels is constrained to a budget limit. We study different active querying strategies for classification. In particular, we propose an asymmetric active querying strategy that assigns different probabilities for query to examples predicted as positive and negative. To corroborate the proposed asymmetric query model, we provide a theoretical analysis on a weighted mistake bound. We conduct extensive evaluations of the proposed asymmetric active querying strategy in comparison with several baseline querying strategies and with previous online learning algorithms for imbalanced data. In particular, we perform two types of evaluations according to which examples appear as ``positive"/``negative''. In push evaluation only the positive predictions given to the user are taken into account; in push and query evaluation the decision to query is also considered for evaluation. The push and query evaluation strategy is particularly suited for a recommendation setting because the items selected for querying for labels may go to the end-user to enable customization and personalization. These would not be shown any differently to the end-user compared to recommended content (i.e., the examples predicated as positive). Additionally, given our interest in imbalanced data we measure F-score instead of accuracy that is traditionally considered by online classification algorithms. We also compare the querying strategies on five classification tasks from different domains, and show that the probabilistic query strategy achieves higher F-scores on both types of evaluation than deterministic strategy, especially when the budget is small, and the asymmetric query model further improves performance. When compared to the state-of-the-art cost-sensitive online learning algorithm under a budget, our online classification algorithm with asymmetric querying achieves a higher F-score on four of the five tasks, especially on the push evaluation.
- Published
- 2016
- Full Text
- View/download PDF
18. Learning Attributes Equals Multi-Source Domain Generalization
- Author
-
Tianbao Yang, Boqing Gong, and Chuang Gan
- Subjects
FOS: Computer and information sciences ,business.industry ,Generalization ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Feature extraction ,Cognitive neuroscience of visual object recognition ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Domain (software engineering) ,Visualization ,Kernel (image processing) ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Image retrieval ,computer ,0105 earth and related environmental sciences - Abstract
Attributes possess appealing properties and benefit many computer vision problems, such as object recognition, learning with humans in the loop, and image retrieval. Whereas the existing work mainly pursues utilizing attributes for various computer vision problems, we contend that the most basic problem---how to accurately and robustly detect attributes from images---has been left under explored. Especially, the existing work rarely explicitly tackles the need that attribute detectors should generalize well across different categories, including those previously unseen. Noting that this is analogous to the objective of multi-source domain generalization, if we treat each category as a domain, we provide a novel perspective to attribute detection and propose to gear the techniques in multi-source domain generalization for the purpose of learning cross-category generalizable attribute detectors. We validate our understanding and approach with extensive experiments on four challenging datasets and three different problems., Accepted by CVPR 2016 as a spotlight presentation
- Published
- 2016
19. Detecting communities and their evolutions in dynamic social networks—a Bayesian approach
- Author
-
Rong Jin, Yihong Gong, Yun Chi, Shenghuo Zhu, and Tianbao Yang
- Subjects
Computer science ,business.industry ,Posterior probability ,Bayesian probability ,Bayesian network ,Bayesian inference ,Machine learning ,computer.software_genre ,Data modeling ,symbols.namesake ,Artificial Intelligence ,Stochastic block model ,symbols ,Artificial intelligence ,Point estimation ,business ,computer ,Software ,Gibbs sampling - Abstract
Although a large body of work is devoted to finding communities in static social networks, only a few studies examined the dynamics of communities in evolving social networks. In this paper, we propose a dynamic stochastic block model for finding communities and their evolution in a dynamic social network. The proposed model captures the evolution of communities by explicitly modeling the transition of community memberships for individual nodes in the network. Unlike many existing approaches for modeling social networks that estimate parameters by their most likely values (i.e., point estimation), in this study, we employ a Bayesian treatment for parameter estimation that computes the posterior distributions for all the unknown parameters. This Bayesian treatment allows us to capture the uncertainty in parameter values and therefore is more robust to data noise than point estimation. In addition, an efficient algorithm is developed for Bayesian inference to handle large sparse social networks. Extensive experimental studies based on both synthetic data and real-life data demonstrate that our model achieves higher accuracy and reveals more insights in the data than several state-of-the-art algorithms.
- Published
- 2010
- Full Text
- View/download PDF
20. In-Situ Measurement and Prediction of Hearing Aid Outcomes Using Mobile Phones
- Author
-
Ryan Brummet, Syed Shabih Hasan, Yu-Hsiang Wu, Octav Chipara, and Tianbao Yang
- Subjects
Hearing aid ,Engineering ,medicine.medical_specialty ,business.industry ,medicine.medical_treatment ,Social environment ,Context (language use) ,Audiology ,Outcome (game theory) ,medicine.anatomical_structure ,Mobile phone ,otorhinolaryngologic diseases ,medicine ,Candidacy ,Auditory system ,Active listening ,business - Abstract
Audiologists have devised a battery of clinical tests to measure auditory abilities. While these tests can help determine the candidacy of patients for amplification intervention, they do not accurately predict the degree to which a patient would benefit from using a hearing aid (i.e., The hearing aid outcome). Measuring hearing aid outcomes in the real-world is challenging as it not only depends on a patient's auditory abilities, but also on auditory contexts that include characteristics of the listening activity, social context, and acoustic environment. This paper explores the problem of creating predictive models for hearing aid outcomes that incorporate information about auditory abilities, hearing-aid features, and auditory contexts. Our models are built on a dataset collected using a mobile phone application that measures auditory contexts and hearing aid outcomes using Ecological Momentary Assessments. The use of a mobile application allowed us to collect fine-grained hearing aid outcome measures in different auditory contexts. The dataset includes 5671 surveys from 34 patients collected over two years. Our analysis focuses on identifying the features necessary for predicting hearing aid outcomes in different clinical scenarios. Most importantly, we show that models that only included measures of auditory ability as features are cannot predict the hearing aid outcome of a patient with accuracy better than chance. Incorporating information about auditory contexts increases the prediction accuracy to 68%. More excitingly, accuracies as high as 90% can be achieved when a small amount of training data is collected from a patient in-situ. These results suggest that audiologists could prescribe a mobile phone application at the time of dispensing the hearing aid in order to accurately predict a patient's likelihood of becoming a successful and satisfied hearing aid user.
- Published
- 2015
- Full Text
- View/download PDF
21. Big Data Analytics
- Author
-
Tianbao Yang, Qihang Lin, and Rong Jin
- Subjects
Theoretical computer science ,Computer science ,business.industry ,Big data ,Online machine learning ,Approximation algorithm ,Machine learning ,computer.software_genre ,Randomized algorithm ,Matrix (mathematics) ,Stochastic gradient descent ,Artificial intelligence ,business ,Coordinate descent ,computer ,Curse of dimensionality - Abstract
As the scale and dimensionality of data continue to grow in many applications of data analytics (e.g., bioinformatics, finance, computer vision, medical informatics), it becomes critical to develop efficient and effective algorithms to solve numerous machine learning and data mining problems. This tutorial will focus on simple yet practically effective techniques and algorithms for big data analytics. In the first part, we plan to present the state-of-the-art large-scale optimization algorithms, including various stochastic gradient descent methods, stochastic coordinate descent methods and distributed optimization algorithms, for solving various machine learning problems. In the second part, we will focus on randomized approximation algorithms for learning from large-scale data. We will discuss i) randomized algorithms for low-rank matrix approximation; ii) approximation techniques for solving kernel learning problems; iii) randomized reduction methods for addressing the high-dimensional challenge. Along with the description of algorithms, we will also present some empirical results to facilitate understanding of different algorithms and comparison between them.
- Published
- 2015
- Full Text
- View/download PDF
22. Hyper-class augmented and regularized deep learning for fine-grained image classification
- Author
-
Xiaoyu Wang, Yuanqing Lin, Tianbao Yang, and Saining Xie
- Subjects
Training set ,Contextual image classification ,business.industry ,Computer science ,Deep learning ,Cognitive neuroscience of visual object recognition ,Multi-task learning ,Pattern recognition ,Machine learning ,computer.software_genre ,Class (biology) ,Regularization (mathematics) ,Convolutional neural network ,Image (mathematics) ,Subject-matter expert ,Artificial intelligence ,business ,computer - Abstract
Deep convolutional neural networks (CNN) have seen tremendous success in large-scale generic object recognition. In comparison with generic object recognition, fine-grained image classification (FGIC) is much more challenging because (i) fine-grained labeled data is much more expensive to acquire (usually requiring domain expertise); (ii) there exists large intra-class and small inter-class variance. Most recent work exploiting deep CNN for image recognition with small training data adopts a simple strategy: pre-train a deep CNN on a large-scale external dataset (e.g., ImageNet) and fine-tune on the small-scale target data to fit the specific classification task. In this paper, beyond the fine-tuning strategy, we propose a systematic framework of learning a deep CNN that addresses the challenges from two new perspectives: (i) identifying easily annotated hyper-classes inherent in the fine-grained data and acquiring a large number of hyper-class-labeled images from readily available external sources (e.g., image search engines), and formulating the problem into multitask learning; (ii) a novel learning model by exploiting a regularization between the fine-grained recognition model and the hyper-class recognition model. We demonstrate the success of the proposed framework on two small-scale fine-grained datasets (Stanford Dogs and Stanford Cars) and on a large-scale car dataset that we collected.
- Published
- 2015
- Full Text
- View/download PDF
23. On Data Preconditioning for Regularized Loss Minimization
- Author
-
Qihang Lin, Rong Jin, Shenghuo Zhu, and Tianbao Yang
- Subjects
FOS: Computer and information sciences ,Mathematical optimization ,Boosting (machine learning) ,Big data ,0211 other engineering and technologies ,Machine Learning (stat.ML) ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Convexity ,Bottleneck ,Machine Learning (cs.LG) ,Artificial Intelligence ,Statistics - Machine Learning ,FOS: Mathematics ,Coherence (signal processing) ,Condition number ,0105 earth and related environmental sciences ,Mathematics ,021103 operations research ,Preconditioner ,business.industry ,Computer Science - Numerical Analysis ,Numerical Analysis (math.NA) ,Lipschitz continuity ,Computer Science - Learning ,business ,Software - Abstract
In this work, we study data preconditioning, a well-known and long-existing technique, for boosting the convergence of first-order methods for regularized loss minimization in machine learning. It is well understood that the condition number of the problem, i.e., the ratio of the Lipschitz constant to the strong convexity modulus, has a harsh effect on the convergence of the first-order optimization methods. Therefore, minimizing a small regularized loss for achieving good generalization performance, yielding an ill conditioned problem, becomes the bottleneck for big data problems. We provide a theory on data preconditioning for regularized loss minimization. In particular, our analysis exhibits an appropriate data preconditioner that is similar to zero component analysis whitening. Exploiting the concepts of numerical rank and coherence, we characterize the conditions on the loss function and on the data under which data preconditioning can reduce the condition number and therefore boost the convergence for minimizing the regularized loss. To make the data preconditioning practically useful, we propose an efficient preconditioning method through random sampling. The preliminary experiments on simulated data sets and real data sets validate our theory.
- Published
- 2014
- Full Text
- View/download PDF
24. Combining a popularity-productivity stochastic block model with a discriminative-content model for general structure detection
- Author
-
Caiyan Jia, Jian Yu, Tianbao Yang, Yawen Jiang, and Bian-fang Chai
- Subjects
Structure (mathematical logic) ,Flexibility (engineering) ,Exploit ,Computer science ,business.industry ,Machine learning ,computer.software_genre ,Discriminative model ,Stochastic block model ,Content Model ,Artificial intelligence ,business ,Random variable ,computer ,Block (data storage) - Abstract
Latent community discovery that combines links and contents of a text-associated network has drawn more attention with the advance of social media. Most of the previous studies aim at detecting densely connected communities and are not able to identify general structures, e.g., bipartite structure. Several variants based on the stochastic block model are more flexible for exploring general structures by introducing link probabilities between communities. However, these variants cannot identify the degree distributions of real networks due to a lack of modeling of the differences among nodes, and they are not suitable for discovering communities in text-associated networks because they ignore the contents of nodes. In this paper, we propose a popularity-productivity stochastic block (PPSB) model by introducing two random variables, popularity and productivity, to model the differences among nodes in receiving links and producing links, respectively. This model has the flexibility of existing stochastic block models in discovering general community structures and inherits the richness of previous models that also exploit popularity and productivity in modeling the real scale-free networks with power law degree distributions. To incorporate the contents in text-associated networks, we propose a combined model which combines the PPSB model with a discriminative model that models the community memberships of nodes by their contents. We then develop expectation-maximization (EM) algorithms to infer the parameters in the two models. Experiments on synthetic and real networks have demonstrated that the proposed models can yield better performances than previous models, especially on networks with general structures.
- Published
- 2013
- Full Text
- View/download PDF
25. A Directed Inference Approach towards Multi-class Multi-model Fusion
- Author
-
Piero P. Bonissone, Lei Wu, and Tianbao Yang
- Subjects
Support vector machine ,Fusion ,Data point ,Training set ,Computer science ,business.industry ,Test point ,Inference ,Bias correction ,Artificial intelligence ,business ,Class (biology) - Abstract
In this paper, we propose a directed inference approach for multi-class multi-model fusion. Different from traditional approaches that learn a model in training stage and apply the model to new data points in testing stage, directed inference approach constructs (one) general direction of inference in training stage, and constructs an individual (ad-hoc) rule for each given test point in testing stage. In the present work, we propose a framework for applying the directed inference approach to multiple model fusion problems that consists of three components: (i) learning of individual models on the training samples, (ii) nearest neighbour search for constructing individual rules of bias correction, and (iii) learning of an optimal combination weights of individual models for model fusion. For inference on a test sample, the prediction scores of individual models are first corrected with bias estimated from the nearest training data points, and then the corrected scores are combined using the learned optimal weights. We conduct extensive experiments and demonstrate the effectiveness of the proposed approach towards multi-class multiple model fusion.
- Published
- 2013
- Full Text
- View/download PDF
26. Robust Ensemble Clustering by Matrix Completion
- Author
-
Rong Jin, Jinfeng Yi, Anil K. Jain, Tianbao Yang, and Mehrdad Mahdavi
- Subjects
Fuzzy clustering ,business.industry ,Single-linkage clustering ,Correlation clustering ,Constrained clustering ,Pattern recognition ,ComputingMethodologies_PATTERNRECOGNITION ,Data stream clustering ,CURE data clustering algorithm ,Canopy clustering algorithm ,Artificial intelligence ,business ,Cluster analysis ,Mathematics - Abstract
Data clustering is an important task and has found applications in numerous real-world problems. Since no single clustering algorithm is able to identify all different types of cluster shapes and structures, ensemble clustering was proposed to combine different partitions of the same data generated by multiple clustering algorithms. The key idea of most ensemble clustering algorithms is to find a partition that is consistent with most of the available partitions of the input data. One problem with these algorithms is their inability to handle uncertain data pairs, i.e. data pairs for which about half of the partitions put them into the same cluster and the other half do the opposite. When the number of uncertain data pairs is large, they can mislead the ensemble clustering algorithm in generating the final partition. To overcome this limitation, we propose an ensemble clustering approach based on the technique of matrix completion. The proposed algorithm constructs a partially observed similarity matrix based on the data pairs whose cluster memberships are agreed upon by most of the clustering algorithms in the ensemble. It then deploys the matrix completion algorithm to complete the similarity matrix. The final data partition is computed by applying an efficient spectral clustering algorithm to the completed matrix. Our empirical studies with multiple real-world datasets show that the proposed algorithm performs significantly better than the state-of-the-art algorithms for ensemble clustering.
- Published
- 2012
- Full Text
- View/download PDF
27. Learning kernel combination from noisy pairwise constraints
- Author
-
Anil K. Jain, Tianbao Yang, and Rong Jin
- Subjects
Noise measurement ,business.industry ,Probabilistic logic ,Constrained clustering ,Contrast (statistics) ,Machine learning ,computer.software_genre ,Kernel (statistics) ,Pairwise comparison ,Artificial intelligence ,Noise (video) ,Cluster analysis ,business ,computer ,Algorithm ,Mathematics - Abstract
We consider the problem of learning the combination of multiple kernels given noisy pairwise constraints, which is in contrast to most of the existing studies that assume perfect pairwise constraints. This problem is particularly important when the pairwise constraints are derived from side information such as hyperlinks and paper citations. We propose a probabilistic approach for learning the combination of multiple kernels and show that under appropriate assumptions, the combination weights learned by the proposed approach from the noisy pairwise constraints converge to the optimal weights learned from perfectly labeled pairwise constraints. Empirical studies on data clustering using the learned combined kernel verify the effectiveness of the proposed approach.
- Published
- 2012
- Full Text
- View/download PDF
28. Online multiple kernel classification
- Author
-
Rong Jin, Tianbao Yang, Peilin Zhao, Steven C. H. Hoi, and School of Computer Engineering
- Subjects
Computer Science::Machine Learning ,Engineering::Computer science and engineering [DRNTU] ,Graph kernel ,business.industry ,Computer science ,Online machine learning ,Machine learning ,computer.software_genre ,Kernel method ,Artificial Intelligence ,Kernel embedding of distributions ,Polynomial kernel ,Kernel (statistics) ,Radial basis function kernel ,Artificial intelligence ,Tree kernel ,business ,computer ,Software - Abstract
Although both online learning and kernel learning have been studied extensively in machine learning, there is limited effort in addressing the intersecting research problems of these two important topics. As an attempt to fill the gap, we address a new research problem, termed Online Multiple Kernel Classification (OMKC), which learns a kernel-based prediction function by selecting a subset of predefined kernel functions in an online learning fashion. OMKC is in general more challenging than typical online learning because both the kernel classifiers and the subset of selected kernels are unknown, and more importantly the solutions to the kernel classifiers and their combination weights are correlated. The proposed algorithms are based on the fusion of two online learning algorithms, i.e., the Perceptron algorithm that learns a classifier for a given kernel, and the Hedge algorithm that combines classifiers by linear weights. We develop stochastic selection strategies that randomly select a subset of kernels for combination and model updating, thus improving the learning efficiency. Our empirical study with 15 data sets shows promising performance of the proposed algorithms for OMKC in both learning efficiency and prediction accuracy.
- Published
- 2012
29. A kernel density based approach for large scale image retrieval
- Author
-
Wei Tong, Anil K. Jain, Rong Jin, Fengjie Li, and Tianbao Yang
- Subjects
business.industry ,Computer science ,Computer Science::Information Retrieval ,Binary image ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,Content-based image retrieval ,Automatic image annotation ,Kernel (image processing) ,Image texture ,Computer Science::Computer Vision and Pattern Recognition ,Computer vision ,Mean-shift ,Visual Word ,Artificial intelligence ,business ,Image retrieval - Abstract
Local image features, such as SIFT descriptors, have been shown to be effective for content-based image retrieval (CBIR). In order to achieve efficient image retrieval using local features, most existing approaches represent an image by a bag-of-words model in which every local feature is quantized into a visual word. Given the bag-of-words representation for images, a text search engine is then used to efficiently find the matched images for a given query. The main drawback with these approaches is that the two key steps, i.e., key point quantization and image matching, are separated, leading to sub-optimal performance in image retrieval. In this work, we present a statistical framework for large-scale image retrieval that unifies key point quantization and image matching by introducing kernel density function. The key ideas of the proposed framework are (a) each image is represented by a kernel density function from which the observed key points are sampled, and (b) the similarity of a gallery image to a query image is estimated as the likelihood of generating the key points in the query image by the kernel density function of the gallery image. We present efficient algorithms for kernel density estimation as well as for effective image matching. Experiments with large-scale image retrieval confirm that the proposed method is not only more effective but also more efficient than the state-of-the-art approaches in identifying visually similar images for given queries from large image databases.
- Published
- 2011
- Full Text
- View/download PDF
30. Unsupervised transfer classification
- Author
-
Wei Tong, Rong Jin, Tianbao Yang, Anil K. Jain, and Yang Zhou
- Subjects
business.industry ,Principle of maximum entropy ,Pattern recognition ,Construct (python library) ,Machine learning ,computer.software_genre ,Class (biology) ,Transfer (group theory) ,Empirical research ,Problem domain ,One-class classification ,Artificial intelligence ,Transfer of learning ,business ,computer ,Mathematics - Abstract
We study the problem of building the classification model for a target class in the absence of any labeled training example for that class. To address this difficult learning problem, we extend the idea of transfer learning by assuming that the following side information is available: (i) a collection of labeled examples belonging to other classes in the problem domain, called the auxiliary classes; (ii) the class information including the prior of the target class and the correlation between the target class and the auxiliary classes. Our goal is to construct the classification model for the target class by leveraging the above data and information. We refer to this learning problem as unsupervised transfer classification. Our framework is based on the generalized maximum entropy model that is effective in transferring the label information of the auxiliary classes to the target class. A theoretical analysis shows that under certain assumption, the classification model obtained by the proposed approach converges to the optimal model when it is learned from the labeled examples for the target class. Empirical study on text categorization over four different data sets verifies the effectiveness of the proposed approach.
- Published
- 2010
- Full Text
- View/download PDF
31. Online Multiple Kernel Learning: Algorithms and Mistake Bounds
- Author
-
Rong Jin, Tianbao Yang, and Steven C. H. Hoi
- Subjects
Graph kernel ,Multiple kernel learning ,business.industry ,Computer science ,Online machine learning ,Machine learning ,computer.software_genre ,Kernel method ,Kernel embedding of distributions ,Polynomial kernel ,Radial basis function kernel ,Artificial intelligence ,Tree kernel ,business ,computer ,Algorithm - Abstract
Online learning and kernel learning are two active research topics in machine learning. Although each of them has been studied extensively, there is a limited effort in addressing the intersecting research. In this paper, we introduce a new research problem, termed Online Multiple Kernel Learning (OMKL), that aims to learn a kernel based prediction function from a pool of predefined kernels in an online learning fashion. OMKL is generally more challenging than typical online learning because both the kernel classifiers and their linear combination weights must be learned simultaneously. In this work, we consider two setups for OMKL, i.e. combining binary predictions or real-valued outputs from multiple kernel classifiers, and we propose both deterministic and stochastic approaches in the two setups for OMKL. The deterministic approach updates all kernel classifiers for every misclassified example, while the stochastic approach randomly chooses a classifier(s) for updating according to some sampling strategies. Mistake bounds are derived for all the proposed OMKL algorithms.
- Published
- 2010
- Full Text
- View/download PDF
32. Combining link and content for community detection
- Author
-
Rong Jin, Tianbao Yang, Yun Chi, and Shenghuo Zhu
- Subjects
Optimization problem ,business.industry ,computer.software_genre ,Machine learning ,Set (abstract data type) ,Generative model ,Projection (relational algebra) ,Discriminative model ,Expectation–maximization algorithm ,Data mining ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Link analysis ,Mathematics - Abstract
In this paper, we consider the problem of combining link and content analysis for community detection from networked data, such as paper citation networks and Word Wide Web. Most existing approaches combine link and content information by a generative model that generates both links and contents via a shared set of community memberships. These generative models have some shortcomings in that they failed to consider additional factors that could affect the community memberships and isolate the contents that are irrelevant to community memberships. To explicitly address these shortcomings, we propose a discriminative model for combining the link and content analysis for community detection. First, we propose a conditional model for link analysis and in the model, we introduce hidden variables to explicitly model the popularity of nodes. Second, to alleviate the impact of irrelevant content attributes, we develop a discriminative model for content analysis. These two models are unified seamlessly via the community memberships. We present efficient algorithms to solve the related optimization problems based on bound optimization and alternating projection. Extensive experiments with benchmark data sets show that the proposed framework significantly outperforms the state-of-the-art approaches for combining link and content analysis for community detection.
- Published
- 2009
- Full Text
- View/download PDF
33. Deep unsupervised binary coding networks for multivariate time series retrieval
- Author
-
Bo Zong, Jingchao Ni, Yuncong Chen, Tianbao Yang, Wei Cheng, Dongjin Song, Cristian Lumezanu, Dixian Zhu, Haifeng Chen, and Mizoguchi Takehiko
- Subjects
Multivariate statistics ,Computer science ,business.industry ,Feature vector ,Pattern recognition ,02 engineering and technology ,General Medicine ,010501 environmental sciences ,01 natural sciences ,Identification (information) ,Encoding (memory) ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Binary code ,Anomaly detection ,Artificial intelligence ,Time series ,business ,Cluster analysis ,0105 earth and related environmental sciences - Abstract
Multivariate time series data are becoming increasingly ubiquitous in varies real-world applications such as smart city, power plant monitoring, wearable devices, etc. Given the current time series segment, how to retrieve similar segments within the historical data in an efficient and effective manner is becoming increasingly important. As it can facilitate underlying applications such as system status identification, anomaly detection, etc. Despite the fact that various binary coding techniques can be applied to this task, few of them are specially designed for multivariate time series data in an unsupervised setting. To this end, we present Deep Unsupervised Binary Coding Networks (DUBCNs) to perform multivariate time series retrieval. DUBCNs employ the Long Short-Term Memory (LSTM) encoder-decoder framework to capture the temporal dynamics within the input segment and consist of three key components, i.e., a temporal encoding mechanism to capture the temporal order of different segments within a mini-batch, a clustering loss on the hidden feature space to capture the hidden feature structure, and an adversarial loss based upon Generative Adversarial Networks (GANs) to enhance the generalization capability of the generated binary codes. Thoroughly empirical studies on three public datasets demonstrated that the proposed DUBCNs can outperform state-of-the-art unsupervised binary coding techniques.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.