Back to Search
Start Over
The Cost of Dichotomizing Continuous Labels for Binary Classification Problems: Deriving a Bayesian-Optimal Classifier
- Source :
- IEEE Transactions on Affective Computing. 8:119-130
- Publication Year :
- 2017
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2017.
-
Abstract
- Many pattern recognition problems involve characterizing samples with continuous labels instead of discrete categories. While regression models are suitable for these learning tasks, these labels are often discretized into binary classes to formulate the problem as a conventional classification task (e.g., classes with low versus high values). This methodology brings intrinsic limitations on the classification performance. The continuous labels are typically normally-distributed, with many samples close to the boundary threshold, resulting in poor classification rates. Previous studies only use the discretized labels to train binary classifiers, neglecting the original, continuous labels. This study demonstrates that, even in binary classification problems, exploiting the original labels before splitting the classes can lead to better classification performance. This work proposes an optimal classifier based on the Bayesian maximum a posterior (MAP) criterion for these problems, which effectively utilizes the real-valued labels. We derive the theoretical average performance of this classifier, which can be considered as the expected upper bound performance for the task. Experimental evaluations on synthetic and real data sets show the improvement achieved by the proposed classifier, in contrast to conventional classifiers trained with binary labels. These evaluations clearly demonstrate the optimality of the proposed classifier, and the precision of the expected upper bound obtained by our derivation.
- Subjects :
- Structured support vector machine
Computer science
business.industry
05 social sciences
050401 social sciences methods
Pattern recognition
Bayes classifier
Quadratic classifier
Machine learning
computer.software_genre
050105 experimental psychology
Human-Computer Interaction
Support vector machine
ComputingMethodologies_PATTERNRECOGNITION
0504 sociology
Binary classification
Margin classifier
Maximum a posteriori estimation
0501 psychology and cognitive sciences
Artificial intelligence
business
Classifier (UML)
computer
Software
Subjects
Details
- ISSN :
- 19493045
- Volume :
- 8
- Database :
- OpenAIRE
- Journal :
- IEEE Transactions on Affective Computing
- Accession number :
- edsair.doi...........9647bdc4572d4a7bc6025abd4398d23f