Back to Search Start Over

Fast Nonparametric Estimation of Class Proportions in the Positive-Unlabeled Classification Setting

Authors :
Daniel Zeiberg
Predrag Radivojac
Shantanu Jain
Source :
AAAI
Publication Year :
2020
Publisher :
Association for the Advancement of Artificial Intelligence (AAAI), 2020.

Abstract

Estimating class proportions has emerged as an important direction in positive-unlabeled learning. Well-estimated class priors are key to accurate approximation of posterior distributions and are necessary for the recovery of true classification performance. While significant progress has been made in the past decade, there remains a need for accurate strategies that scale to big data. Motivated by this need, we propose an intuitive and fast nonparametric algorithm to estimate class proportions. Unlike any of the previous methods, our algorithm uses a sampling strategy to repeatedly (1) draw an example from the set of positives, (2) record the minimum distance to any of the unlabeled examples, and (3) remove the nearest unlabeled example. We show that the point of sharp increase in the recorded distances corresponds to the desired proportion of positives in the unlabeled set and train a deep neural network to identify that point. Our distance-based algorithm is evaluated on forty datasets and compared to all currently available methods. We provide evidence that this new approach results in the most accurate performance and can be readily used on large datasets.

Details

ISSN :
23743468 and 21595399
Volume :
34
Database :
OpenAIRE
Journal :
Proceedings of the AAAI Conference on Artificial Intelligence
Accession number :
edsair.doi...........cfc6b20ddc37e82bff47af94d292920a
Full Text :
https://doi.org/10.1609/aaai.v34i04.6151