Should supervised discretisation always be trusted unreservedly? On combining characteristics of supervised and unsupervised discretisation algorithms in two-step processing.

Authors :: Stańczyk, Urszula
Baron, Grzegorz
Source :: Procedia Computer Science; 2023, Vol. 225, p2136-2145, 10p
Publication Year :: 2023
Abstract: The paper presents a description of the research methodology dedicated to a two-step discretisation process applied to the input numeric data, with combining the characteristics of selected supervised and unsupervised algorithms, which leads to extended processing of some attributes in train and test sets. The methodology was illustrated with the investigations carried out in the domain of stylometric analysis of texts, for two datasets prepared for the task of binary authorship attribution. The several variants of transformed input data obtained were subjected to exploration using two selected machine learning methods capable of inducing knowledge from both continuous and categorical forms, namely the PART and J48 classifiers. The results from the experiments indicate that, as can be expected, supervised transformations of data work well enough, however, they do not always return the best outcome. The two-step processing of some attributes shows sufficient promise to warrant a closer study, as opposed to always unconditionally relying only on supervised algorithms as outperforming all other approaches. [ABSTRACT FROM AUTHOR]

Subjects :: ATTRIBUTION of authorship
ALGORITHMS
PATTERN recognition systems
MACHINE learning
PRODUCT returns

Full Text Access

Tools