Back to Search Start Over

Weighted naïve Bayes text classification algorithm based on improved distance correlation coefficient

Authors :
Shufen Ruan
Kunfang Song
Baozhou Chen
Hongwei Li
Source :
Neural Computing and Applications. 34:2729-2738
Publication Year :
2021
Publisher :
Springer Science and Business Media LLC, 2021.

Abstract

This paper proposes an innovative method to improve the attribute weighting approaches for naive Bayes text classifiers using the improved distance correlation coefficient. The resulted model is called improved distance correlation coefficient attribute weighted multinomial naive Bayes, denoted by IDCWMNB. Unlike the traditional correlation statistical measurements that consider the cumulative distribution function of random vectors, the improved distance correlation coefficient tests the joint correlation of random vectors by describing the distance between the joint characteristic function and the product of the marginal characteristic functions. Specifically, a measurement of inverse document frequency that considers the distribution information of document concentrating and scattering has been proposed. Then, the measurement and the distance correlation coefficient between attributes and categories have been combined to measure the importance of attributes to categories, to allocate different weights to different terms. Meanwhile, the learned attribute weights are incorporated into the posterior probability estimates of the multinomial naive Bayes model, which is known as deep attribute weighting. This measurement is more effective than the traditional statistical measurements in the presence of nonlinear relationship between two random vectors. Experimental results taking benchmark and real-world data indicate that the new attribute weighting method can achieve an effective balance between classification accuracy and execution time.

Details

ISSN :
14333058 and 09410643
Volume :
34
Database :
OpenAIRE
Journal :
Neural Computing and Applications
Accession number :
edsair.doi...........6853784958c5edc420aa752d6b81cf0f
Full Text :
https://doi.org/10.1007/s00521-021-05989-6