Back to Search Start Over

A Framework for Titled Document Categorization with Modified Multinomial Naivebayes Classifier.

Authors :
Carbonell, Jaime G.
Siekmann, Jörg
Alhajj, Reda
Hong Gao
Xue Li
Jianzhong Li
Zaïane, Osmar R.
Hang Guo
Lizhu Zhou
Source :
Advanced Data Mining & Applications; 2007, p335-344, 10p
Publication Year :
2007

Abstract

Titled Documents (TD) are short text documents that are segmented into two parts: Heading Part and Excerpt Part. With the development of the Internet, TDs are widely used as papers, news, messages, etc. In this paper we discuss the problem of automatic TDs categorization. Unlike traditional text documents, TDs have short headings which have less useless words comparing to their excerpts. Though headings are usually short, their words are more important than other words. Based on this observation we propose a titled document classification framework using the widely used MNB classifier. This framework puts higher weight on the heading words at the cost of some excerpt words. By this means heading words play more important roles in classification than the traditional method. According to our experiments on four datasets that cover three types of documents, the performance of the classifier is improved by our approach. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783540738701
Database :
Complementary Index
Journal :
Advanced Data Mining & Applications
Publication Type :
Book
Accession number :
33088245
Full Text :
https://doi.org/10.1007/978-3-540-73871-8_31