Back to Search
Start Over
An Overview of Similarity Measures for Clustering XML Documents
- Publication Year :
- 2007
- Publisher :
- IGI Publishing, 2007.
-
Abstract
- The large amount and heterogeneity of XML documents on the Web require the development of clustering techniques to group together similar documents. Documents can be grouped together according to their content, their structure, and links inside and among documents. For instance, grouping together documents with similar structures has interesting applications in the context of information extraction, of heterogeneous data integration, of personalized content delivery, of access control definition, of web site structural analysis, of comparison of RNA secondary structures. Many approaches have been proposed for evaluating the structural and content similarity between tree-based and vector-based representations of XML documents. Link-based similarity approaches developed for Web data clustering have been adapted for XML documents. This chapter discusses and compares the most relevant similarity measures and their employment for XML document clustering.
- Subjects :
- Document Structure Description
Information retrieval
computer.internet_protocol
Computer science
Efficient XML Interchange
computer.file_format
computer.software_genre
XML database
Similarity (network science)
Streaming XML
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Binary XML
Cluster analysis
computer
XML
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....7d6457f31195e26246a504fba84225c8