Back to Search Start Over

Dealing with structural patterns of XML documents

Authors :
Fabio Vitali
Angelo Di Iorio
Francesco Poggi
Silvio Peroni
Source :
Journal of the Association for Information Science and Technology. 65:1884-1900
Publication Year :
2014
Publisher :
Wiley, 2014.

Abstract

Evaluating collections of XML documents without paying attention to the schema they were written in may give interesting insights into the expected characteristics of a markup language, as well as any regularity that may span vocabularies and languages, and that are more fundamental and frequent than plain content models. In this paper we explore the idea of structural patterns in XML vocabularies, by examining the characteristics of elements as they are used, rather than as they are defined. We introduce from the ground up a formal theory of 8 plus 3 structural patterns for XML elements, and verify their identifiability in a number of different XML vocabularies. The results allowed the creation of visualization and content extraction tools that are completely independent of the schema and without any previous knowledge of the semantics and organization of the XML vocabulary of the documents.

Details

ISSN :
23301635
Volume :
65
Database :
OpenAIRE
Journal :
Journal of the Association for Information Science and Technology
Accession number :
edsair.doi...........ec1e9ea1c5e96ce8a1e1c0331ba7894d