Back to Search Start Over

Probabilistic Methods for Structured Document Classification at INEX΄07.

Authors :
de Campos, Luis M.
Fernández-Luna, Juan M.
Huete, Juan F.
Romero, Alfonso E.
Source :
Focused Access to XML Documents; 2008, p195-206, 12p
Publication Year :
2008

Abstract

This paper exposes the results of our participation in the Document Mining track at INEX΄07. We have focused on the task of classification of XML documents. Our approach to deal with structured document representations uses classification methods for plain text, applied to flattened versions of the documents, where some of their structural properties have been translated to plain text. We have explored several options to convert structured documents into flat documents, in combination with two probabilistic methods for text categorization. The main conclusion of our experiments is that taking advantage of document structure to improve classification results is a difficult task. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783540859017
Database :
Complementary Index
Journal :
Focused Access to XML Documents
Publication Type :
Book
Accession number :
76823715
Full Text :
https://doi.org/10.1007/978-3-540-85902-4_18