Back to Search
Start Over
Probabilistic Methods for Structured Document Classification at INEX΄07.
- Source :
- Focused Access to XML Documents; 2008, p195-206, 12p
- Publication Year :
- 2008
-
Abstract
- This paper exposes the results of our participation in the Document Mining track at INEX΄07. We have focused on the task of classification of XML documents. Our approach to deal with structured document representations uses classification methods for plain text, applied to flattened versions of the documents, where some of their structural properties have been translated to plain text. We have explored several options to convert structured documents into flat documents, in combination with two probabilistic methods for text categorization. The main conclusion of our experiments is that taking advantage of document structure to improve classification results is a difficult task. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISBNs :
- 9783540859017
- Database :
- Complementary Index
- Journal :
- Focused Access to XML Documents
- Publication Type :
- Book
- Accession number :
- 76823715
- Full Text :
- https://doi.org/10.1007/978-3-540-85902-4_18