Back to Search Start Over

The design of variable-length coding matrix for improving error correcting output codes.

Authors :
Feng, Kai-Jie
Liong, Sze-Teng
Liu, Kun-Hong
Source :
Information Sciences. Sep2020, Vol. 534, p192-217. 26p.
Publication Year :
2020

Abstract

• We proposed the design of a variable length codeword strategy in ECOC encoding algorithm for the first time, which treat the easy classes and hard classes individually by adding more codes for hard classes. • A two-phase strategy is introduced: the first phase generates columns for all classes, and the second phase only tackle hard classes. • The centroids of the two toughest classes are set as the centroids of groups, driving the dichotomizers to concentrate in classifying the hard classes. • Both the random sampling and random feature subspace techniques are utilized in generating training sets for dichotomizers, aiming to enhance the diversity among the ECOC ensemble. • The nearest neighbor search technique is exploited to gather the hard samples, specifically in the hard class stage. At the same time, it reinforces the discriminating power of dichotomizers on the hard classes. Thus far, all existing Error Correcting Output Codes (ECOC) algorithms produce coding matrices with an equal size for all classes. Yet, this paper proposes a variable-length codewords based ECOC (VL-ECOC), which generates longer codes for hard classes than those for easy classes. VL-ECOC consists of two phases: the overall-class phase and the hard-class phase. In the first phase, the centroids of the top two toughest classes are selected as the centroids of the positive group and the negative group respectively, whereas other classes are assigned to their nearer groups. The remaining hard classes with high error rates will be proceeded to the second phase, in which the K nearest neighbors of the misclassified samples are employed to generate new columns. The codewords generated in the second phase are applied to the decoding process of the hard classes. Consequently, both the easy and hard classes contain distinct code lengths. To verify the performance of VL-ECOC, comprehensive experiments are carried out on the UCI data and the microarray data sets. The experiment results demonstrate that owing to the additional codewords for the hard classes, our algorithm can better handle the class imbalance problem and achieve higher performance in most cases. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00200255
Volume :
534
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
143825231
Full Text :
https://doi.org/10.1016/j.ins.2020.04.021