Back to Search Start Over

The Cost of Dichotomizing Continuous Labels for Binary Classification Problems: Deriving a Bayesian-Optimal Classifier

Authors :
Carlos Busso
Soroosh Mariooryad
Source :
IEEE Transactions on Affective Computing. 8:119-130
Publication Year :
2017
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2017.

Abstract

Many pattern recognition problems involve characterizing samples with continuous labels instead of discrete categories. While regression models are suitable for these learning tasks, these labels are often discretized into binary classes to formulate the problem as a conventional classification task (e.g., classes with low versus high values). This methodology brings intrinsic limitations on the classification performance. The continuous labels are typically normally-distributed, with many samples close to the boundary threshold, resulting in poor classification rates. Previous studies only use the discretized labels to train binary classifiers, neglecting the original, continuous labels. This study demonstrates that, even in binary classification problems, exploiting the original labels before splitting the classes can lead to better classification performance. This work proposes an optimal classifier based on the Bayesian maximum a posterior (MAP) criterion for these problems, which effectively utilizes the real-valued labels. We derive the theoretical average performance of this classifier, which can be considered as the expected upper bound performance for the task. Experimental evaluations on synthetic and real data sets show the improvement achieved by the proposed classifier, in contrast to conventional classifiers trained with binary labels. These evaluations clearly demonstrate the optimality of the proposed classifier, and the precision of the expected upper bound obtained by our derivation.

Details

ISSN :
19493045
Volume :
8
Database :
OpenAIRE
Journal :
IEEE Transactions on Affective Computing
Accession number :
edsair.doi...........9647bdc4572d4a7bc6025abd4398d23f