1. Regular expression types for XML
- Author
-
Haruo Hosoya, Benjamin C. Pierce, Jérôme Vouillon, Preuves, Programmes et Systèmes (PPS), and Centre National de la Recherche Scientifique (CNRS)-Université Paris Diderot - Paris 7 (UPD7)
- Subjects
Document Structure Description ,XML Encryption ,Theoretical computer science ,Computer science ,Generalization ,computer.internet_protocol ,Efficient XML Interchange ,XML Signature ,Well-formed document ,0102 computer and information sciences ,02 engineering and technology ,Document type definition ,computer.software_genre ,01 natural sciences ,Simple API for XML ,XML Schema Editor ,0202 electrical engineering, electronic engineering, information engineering ,Natural (music) ,RELAX NG ,Tree automaton ,XML schema ,Regular expression ,computer.programming_language ,[INFO.INFO-PL]Computer Science [cs]/Programming Languages [cs.PL] ,Programming language ,XML validation ,020207 software engineering ,computer.file_format ,Computer Graphics and Computer-Aided Design ,Subtyping ,XML framework ,XML Schema (W3C) ,Document Schema Definition Languages ,010201 computation theory & mathematics ,Regular Language description for XML ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,Software ,XML ,XML Catalog - Abstract
We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (*), alternation (|), etc., to describe XML documents. The novelty of our type system is a semantic presentation of subtyping, as inclusion between the sets of documents denoted by two types. We give several examples illustrating the usefulness of this form of subtyping in XML processing.The decision problem for the subtype relation reduces to the inclusion problem between tree automata, which is known to be EXPTIME-complete. To avoid this high complexity in typical cases, we develop a practical algorithm that, unlike classical algorithms based on determinization of tree automata, checks the inclusion relation by a top-down traversal of the original type expressions. The main advantage of this algorithm is that it can exploit the property that type expressions being compared often share portions of their representations. Our algorithm is a variant of Aiken and Murphy's set-inclusion constraint solver, to which are added several new implementation techniques, correctness proofs, and preliminary performance measurements on some small programs in the domain of typed XML processing.
- Published
- 2000