Back to Search Start Over

An examination of the effect of discretization on a nave Bayes models performance

Authors :
Davood Poursina
Arezoo Aghaei Chadegani
Source :
Scientific Research and Essays. 8:2181-2186
Publication Year :
2013
Publisher :
Academic Journals, 2013.

Abstract

A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set of variables and their probabilistic independencies. Some researches often involve continuous random variables. In order to apply these continuous variables to BN models, these variables should convert into discrete variables with limited states, often two. During the discretization process, one problem that researchers faced is to decide the number of states for discretization. Does the number of states chosen for discretization impact models’ power? In this study, this issue is examined empirically. The study examines this issue in the financial distress prediction field. The sample consists of 144 firms listed in Tehran stock exchange from 1997 to 2007. In order to develop Naive Bayes models, two methods for choosing variables were used. The first method is based upon conditional correlation between variables and the second method is based upon conditional likelihood. The accuracy in predicting financial distress of the first naive Bayes model's performance that is based upon conditional correlation is 90% and the accuracy of the second naive Bayes model is 93%. Collectively, the results showed that the performance of the second naive Bayes model that based upon conditional likelihood is better than the first one. Further analyses also showed that the number of states chosen for discretization has effect on models’ performance. In comparing the model's performance when continuous variables are discretized into two, three, four and five states, the results showed that the naive Bayes model's performance increases when the number of states for discretization increases from two to three, and from three to four but when the number of states increases from four to five the model's performance decreased.

Details

ISSN :
19922248
Volume :
8
Database :
OpenAIRE
Journal :
Scientific Research and Essays
Accession number :
edsair.doi...........26df53268ac3e5a17b41580a2c52c4f5
Full Text :
https://doi.org/10.5897/sre09.174