1. Impact of evaluation methods on decision tree accuracy
- Author
-
Baykara, Batuhan, Informaatiotieteiden yksikkö - School of Information Sciences, and University of Tampere
- Subjects
machine learning ,accuracy ,evaluation methods ,decision tree ,data mining ,MDP in Software Development - Abstract
Decision trees are one of the most powerful and commonly used supervised learning algorithms in the field of data mining. It is important that a decision tree performs accurately when employed on unseen data; therefore, evaluation methods are used to measure the predictive performance of a decision tree classifier. However, the predictive accuracy of a decision tree is also dependant on the evaluation method chosen since training and testing sets of decision tree models are selected according to the evaluation methods. The aim of this thesis was to study and understand how using different evaluation methods might have an impact on decision tree accuracies when they are applied to different decision tree algorithms. Consequently, comprehensive research was made on decision trees and evaluation methods. Additionally, an experiment was conducted using ten different datasets, five decision tree algorithms and five different evaluation methods in order to study the relationship between evaluation methods and decision tree accuracies. The decision tree inducers were tested with Leave-one-out, 5-Fold Cross Validation, 10-Fold Cross Validation, Holdout 50 split and Holdout 66 split evaluation methods. According to the results, cross validation methods were superior to holdout methods in overall. Moreover, Holdout 50 split has performed the poorest in most of the datasets. The possible reasons behind these results have also been discussed in the thesis.
- Published
- 2015