Back to Search
Start Over
Weighted naïve Bayes text classification algorithm based on improved distance correlation coefficient
- Source :
- Neural Computing and Applications. 34:2729-2738
- Publication Year :
- 2021
- Publisher :
- Springer Science and Business Media LLC, 2021.
-
Abstract
- This paper proposes an innovative method to improve the attribute weighting approaches for naive Bayes text classifiers using the improved distance correlation coefficient. The resulted model is called improved distance correlation coefficient attribute weighted multinomial naive Bayes, denoted by IDCWMNB. Unlike the traditional correlation statistical measurements that consider the cumulative distribution function of random vectors, the improved distance correlation coefficient tests the joint correlation of random vectors by describing the distance between the joint characteristic function and the product of the marginal characteristic functions. Specifically, a measurement of inverse document frequency that considers the distribution information of document concentrating and scattering has been proposed. Then, the measurement and the distance correlation coefficient between attributes and categories have been combined to measure the importance of attributes to categories, to allocate different weights to different terms. Meanwhile, the learned attribute weights are incorporated into the posterior probability estimates of the multinomial naive Bayes model, which is known as deep attribute weighting. This measurement is more effective than the traditional statistical measurements in the presence of nonlinear relationship between two random vectors. Experimental results taking benchmark and real-world data indicate that the new attribute weighting method can achieve an effective balance between classification accuracy and execution time.
- Subjects :
- 0209 industrial biotechnology
Characteristic function (probability theory)
business.industry
Cumulative distribution function
Posterior probability
Pattern recognition
02 engineering and technology
Weighting
Correlation
Distance correlation
Naive Bayes classifier
020901 industrial engineering & automation
Artificial Intelligence
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Artificial intelligence
tf–idf
business
Software
Mathematics
Subjects
Details
- ISSN :
- 14333058 and 09410643
- Volume :
- 34
- Database :
- OpenAIRE
- Journal :
- Neural Computing and Applications
- Accession number :
- edsair.doi...........6853784958c5edc420aa752d6b81cf0f
- Full Text :
- https://doi.org/10.1007/s00521-021-05989-6