Back to Search Start Over

Efficient Model Selection for Regularized Classification by Exploiting Unlabeled Data

Authors :
Eric Gaussier
Ioannis Partalas
Rohit Babbar
Georgios Balikas
Massih-Reza Amini
Analyse de données, Modélisation et Apprentissage automatique [Grenoble] (AMA)
Laboratoire d'Informatique de Grenoble (LIG)
Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)
VISEO
Max Planck Institute for Intelligent Systems
Max-Planck-Gesellschaft
ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011)
Source :
14th International Symposium on Intelligent Data Analysis, IDA, 14th International Symposium on Intelligent Data Analysis, IDA, Oct 2015, Saint-Etienne, France. ⟨10.1007/978-3-319-24465-5_3⟩, Fourteenth International Symposium on Intelligent Data Analysis (IDA 2015), Fourteenth International Symposium on Intelligent Data Analysis (IDA 2015), Oct 2015, Saint-Etienne, France, Advances in Intelligent Data Analysis XIV ISBN: 9783319244648, IDA
Publication Year :
2015
Publisher :
HAL CCSD, 2015.

Abstract

International audience; Hyper-parameter tuning is a resource-intensive task when optimizing classification models. The commonly used k-fold cross validation can become intractable in large scale settings when a classifier has to learn billions of parameters. At the same time, in real-world, one often encounters multi-class classification scenarios with only a few labeled examples; model selection approaches often offer little improvement in such cases and the default values of learners are used. We propose bounds for classification on accuracy and macro measures (precision, recall, F1) that motivate efficient schemes for model selection and can benefit from the existence of unlabeled data. We demonstrate the advantages of those schemes by comparing them with k-fold cross validation and hold-out estimation in the setting of large scale classification.

Details

Language :
English
ISBN :
978-3-319-24464-8
ISBNs :
9783319244648
Database :
OpenAIRE
Journal :
14th International Symposium on Intelligent Data Analysis, IDA, 14th International Symposium on Intelligent Data Analysis, IDA, Oct 2015, Saint-Etienne, France. ⟨10.1007/978-3-319-24465-5_3⟩, Fourteenth International Symposium on Intelligent Data Analysis (IDA 2015), Fourteenth International Symposium on Intelligent Data Analysis (IDA 2015), Oct 2015, Saint-Etienne, France, Advances in Intelligent Data Analysis XIV ISBN: 9783319244648, IDA
Accession number :
edsair.doi.dedup.....085e9dbb4e81d8d33ed655b83ff8e21d
Full Text :
https://doi.org/10.1007/978-3-319-24465-5_3⟩