Back to Search Start Over

Online Asymmetric Active Learning with Imbalanced Data

Authors :
Xiaoxuan Zhang
Padmini Srinivasan
Tianbao Yang
Source :
KDD
Publication Year :
2016
Publisher :
ACM, 2016.

Abstract

This paper considers online learning with imbalanced streaming data under a query budget, where the act of querying for labels is constrained to a budget limit. We study different active querying strategies for classification. In particular, we propose an asymmetric active querying strategy that assigns different probabilities for query to examples predicted as positive and negative. To corroborate the proposed asymmetric query model, we provide a theoretical analysis on a weighted mistake bound. We conduct extensive evaluations of the proposed asymmetric active querying strategy in comparison with several baseline querying strategies and with previous online learning algorithms for imbalanced data. In particular, we perform two types of evaluations according to which examples appear as ``positive"/``negative''. In push evaluation only the positive predictions given to the user are taken into account; in push and query evaluation the decision to query is also considered for evaluation. The push and query evaluation strategy is particularly suited for a recommendation setting because the items selected for querying for labels may go to the end-user to enable customization and personalization. These would not be shown any differently to the end-user compared to recommended content (i.e., the examples predicated as positive). Additionally, given our interest in imbalanced data we measure F-score instead of accuracy that is traditionally considered by online classification algorithms. We also compare the querying strategies on five classification tasks from different domains, and show that the probabilistic query strategy achieves higher F-scores on both types of evaluation than deterministic strategy, especially when the budget is small, and the asymmetric query model further improves performance. When compared to the state-of-the-art cost-sensitive online learning algorithm under a budget, our online classification algorithm with asymmetric querying achieves a higher F-score on four of the five tasks, especially on the push evaluation.

Details

Database :
OpenAIRE
Journal :
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Accession number :
edsair.doi...........8f6c1fcc7659d9e11a372e1805652dca
Full Text :
https://doi.org/10.1145/2939672.2939854