Back to Search
Start Over
Online Asymmetric Active Learning with Imbalanced Data
- Source :
- KDD
- Publication Year :
- 2016
- Publisher :
- ACM, 2016.
-
Abstract
- This paper considers online learning with imbalanced streaming data under a query budget, where the act of querying for labels is constrained to a budget limit. We study different active querying strategies for classification. In particular, we propose an asymmetric active querying strategy that assigns different probabilities for query to examples predicted as positive and negative. To corroborate the proposed asymmetric query model, we provide a theoretical analysis on a weighted mistake bound. We conduct extensive evaluations of the proposed asymmetric active querying strategy in comparison with several baseline querying strategies and with previous online learning algorithms for imbalanced data. In particular, we perform two types of evaluations according to which examples appear as ``positive"/``negative''. In push evaluation only the positive predictions given to the user are taken into account; in push and query evaluation the decision to query is also considered for evaluation. The push and query evaluation strategy is particularly suited for a recommendation setting because the items selected for querying for labels may go to the end-user to enable customization and personalization. These would not be shown any differently to the end-user compared to recommended content (i.e., the examples predicated as positive). Additionally, given our interest in imbalanced data we measure F-score instead of accuracy that is traditionally considered by online classification algorithms. We also compare the querying strategies on five classification tasks from different domains, and show that the probabilistic query strategy achieves higher F-scores on both types of evaluation than deterministic strategy, especially when the budget is small, and the asymmetric query model further improves performance. When compared to the state-of-the-art cost-sensitive online learning algorithm under a budget, our online classification algorithm with asymmetric querying achieves a higher F-score on four of the five tasks, especially on the push evaluation.
- Subjects :
- Information retrieval
Computer science
business.industry
Active learning (machine learning)
Probabilistic logic
02 engineering and technology
Machine learning
computer.software_genre
Personalization
Statistical classification
Web query classification
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Artificial intelligence
business
F1 score
computer
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
- Accession number :
- edsair.doi...........8f6c1fcc7659d9e11a372e1805652dca
- Full Text :
- https://doi.org/10.1145/2939672.2939854