59 results on '"Document type declaration"'
Search Results
2. Inference Document Type (Dtd) From Xml Document: Web Structure Mining
- Author
-
R. K. Chauhan, Nanhay Singh, and Raghuraj Singh
- Subjects
Document Structure Description ,XML Encryption ,Computer science ,computer.internet_protocol ,Relational database ,Efficient XML Interchange ,XML Signature ,Well-formed document ,XML Base ,Document type definition ,Simple API for XML ,XML Schema Editor ,Schema (psychology) ,Streaming XML ,XML schema ,Foreign key ,computer.programming_language ,XHTML ,Information retrieval ,Document type declaration ,InformationSystems_DATABASEMANAGEMENT ,XML validation ,computer.file_format ,XML framework ,XML Schema (W3C) ,Web mining ,Document Schema Definition Languages ,Data exchange ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Functional dependency ,computer ,XML ,XML Catalog - Abstract
XML is becoming a prevalent format and defacto standard for data exchange in many applications. While traditionally, lots of data are stored and managed in relational databases. There is an urgent need to research some efficient methods to convert these data stored in relational databases to XML format when integrating and exchanging these data in XML format. The semantics of XML schemas are crucial to design, query, and store XML documents and functional dependencies are very important representations of semantic information of XML schemas. As DTDs are one of the most frequently used schemas for XML documents in these days, we will use DTDs as schemas of XML documents here. This paper studies the problem of schema conversion from relational schemas to XML DTDs. As functional dependencies play an important role in the schema conversion process, the concept of functional dependency for XML DTDs is used to preserve the semantics implied by functional dependencies and keys of relational schemas. A conversion method is proposed to convert relational schemas to XML DTDs in the presence of functional dependencies, keys and foreign keys. The methods presented here can preserve the semantics implied by functional dependencies, keys and foreign keys of relational schemas and can convert multiple relational tables to XML DTDs at the same time.
- Published
- 2010
3. Generating XML structure using examples and constraints
- Author
-
Sara Cohen
- Subjects
Document Structure Description ,XML Encryption ,computer.internet_protocol ,Computer science ,Efficient XML Interchange ,XML Signature ,Well-formed document ,Document type definition ,Simple API for XML ,XML Schema Editor ,Streaming XML ,XML schema ,XPath ,computer.programming_language ,Information retrieval ,Document type declaration ,cXML ,General Engineering ,XML validation ,computer.file_format ,XML framework ,XML Schema (W3C) ,Document Schema Definition Languages ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,XML ,XML Catalog - Abstract
This paper presents a framework for automatically generating structural XML documents. The user provides a target DTD and an example of an XML document, called a Generate-XML-By-Example Document , or a GxBE document , for short. GxBE documents use a natural declarative syntax, which includes XPath expressions and the function count. Using GxBE documents, users can express important global and local characteristics for the desired target documents, and can require satisfaction of XPath expressions from a given workload. This paper explores the problem of efficiently generating a document that satisfies a given DTD and GxBE document.
- Published
- 2008
4. Logical structure analysis: From HTML to XML
- Author
-
Kyong-Ho Lee, Minhyung Lee, and Yeon-Seok Kim
- Subjects
Document Structure Description ,Information retrieval ,Computer science ,Document type declaration ,Well-formed document ,XML validation ,Document type definition ,Simple API for XML ,Hardware and Architecture ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Law ,Software ,Document layout analysis ,XML Catalog - Abstract
This paper presents an efficient method for extracting a logical structure from a Web document. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of a specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully, compared with previous work. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.
- Published
- 2007
5. Validating Scripted Web-Pages
- Author
-
Roger G. Stone
- Subjects
Document Structure Description ,XHTML ,General Computer Science ,Computer science ,computer.internet_protocol ,Programming language ,Document type declaration ,XML validation ,Well-formed document ,WML ,PHP ,computer.software_genre ,HTML element ,VALIDATION ,DTD ,Theoretical Computer Science ,XML framework ,Simple API for XML ,Web page ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,XML ,computer.programming_language ,Computer Science(all) - Abstract
The validation of XML documents against a DTD is well understood and tools exist to accomplish this task. But the problem considered here is the validation of a generator of XML documents. The desired outcome is to establish for a particular generator that it is incapable of producing invalid output. Many (X)HTML web pages are generated from a document containing embedded scripts written in languages such as PHP. Existing tools can validate any particular instance of the XHTML generated from the document. Howevere there is no tool for validating the document itself, guaranteeing that all instances that might be generated are valid.A prototype validating tool for scripted-documents has been developed which uses a notation developed to capture the generalised output from the document and a systematically augmented DTD.
- Published
- 2006
- Full Text
- View/download PDF
6. Representing Annotations in XML Document using String-Trees Model
- Author
-
Keng Hoon Gan
- Subjects
Document Structure Description ,Information retrieval ,Simple API for XML ,Computer science ,Document type declaration ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,XML validation ,Well-formed document ,Document type definition ,XML schema ,computer ,XML Catalog ,computer.programming_language - Abstract
The flexibility of XML allows document to be annotated easily. However, these annotations come from different sources like Wordnet thesaurus, POS, DTD, semantic roles etc. These annotations can either be combined in the same document or captured separately in different document. The former, though richer in annotations, may look messy and requires more parsing time. The latter needs control of document consistency. This paper proposes a string-trees model to represent XML document for multiple sources of annotations. This model extends the existing string-tree structure for linguistic content in order to support structured contents of XML document. In this paper, we describe how this model is refined and applied on XML document.
- Published
- 2014
7. A Design and Implementation of the Tree-based Document Editing System for XML Application
- Author
-
Young Chul Kim and Chun Kil Kang
- Subjects
Document Structure Description ,Database ,Document type declaration ,Programming language ,Computer science ,Well-formed document ,XML validation ,Document type definition ,computer.software_genre ,Simple API for XML ,Document Schema Definition Languages ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,XML Catalog - Abstract
This paper describes a design and implementation of the tree-based document editing system for XML application, available at the structure-oriented environment. This system converts DTD to ASTD( Syntax Tree Definition) to support syntax-directed editing for valid document, considers the extensibility to add new tools and supports multiple entry parser for real-time document validation. It is expected that this paper contributes related XML application document editing system development model.
- Published
- 2004
8. Managing very large document collections using semantics
- Author
-
Yubin Bao, Ge Yu, Guoren Wang, and Hongjun Lu
- Subjects
Information retrieval ,Document type declaration ,Computer science ,Well-formed document ,Document management system ,Document clustering ,computer.software_genre ,Semantics ,Computer Science Applications ,Theoretical Computer Science ,Set (abstract data type) ,Computational Theory and Mathematics ,Hardware and Architecture ,Document Schema Definition Languages ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,Software ,Document layout analysis - Abstract
In this paper, a system is presented where documents are no longer identified by their file names. Instead, a document is represented by its semantics in terms of descriptor and content vector. The descriptor of a document consists of a set of attributes, such as date of creation, its type, its size, annotations, etc. The content vector of a document consists of a set of terms extracted from the document. In this paper, a semantic document management system XBASE is designed and implemented based on the semantics and the functions of three main modules, X-Loader, X-Explorer and X-Query.
- Published
- 2003
9. Clustering DTDs: An interactive two-level approach
- Author
-
Long Zhang, Weining Qian, Hailei Qian, Wen Jin, Yuqi Liang, and Aoying Zhou
- Subjects
Document Structure Description ,RuleML ,computer.internet_protocol ,Computer science ,Well-formed document ,Document type definition ,computer.software_genre ,Theoretical Computer Science ,RELAX NG ,SGML ,Cluster analysis ,computer.programming_language ,XHTML ,Document type declaration ,XML validation ,computer.file_format ,Computer Science Applications ,XML Schema (W3C) ,Computational Theory and Mathematics ,Categorization ,Hardware and Architecture ,Data exchange ,Document Definition Markup Language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Data mining ,computer ,Software ,XML ,PCDATA - Abstract
XML (eXtensible Markup Language) is a standard which is widely applied in data representation and data exchange. However, as an important concept of XML, DTD (Document Type Definition) is not taken full advantage in current applications. In this paper, a new method for clustering DTDs is presented, and it can be used in XML document clustering. The two-level method clusters the elements in DTDs and clusters DTDs separately. Element clustering forms the first level and provides dement clusters, which are the generalization of relevant elements. DTD clustering utilizes the generalized information and forms the second level in the whole clustering process. The two-level method has the following advantages: 1) It takes into consideration both the content and the structure within DTDs; 2) The generalized information about elements is more useful than the separated words in the vector model; 3) The two-level method facilitates the searching of outliers. The experiments show that this method is able to categorize the relevant DTDs effectively.
- Published
- 2002
10. Succession in standardization: grafting XML onto SGML
- Author
-
A. G. A. J. Loeffen and Tineke M. Egyedi
- Subjects
DocBook ,Standardization ,Document type declaration ,Computer science ,computer.internet_protocol ,business.industry ,Efficient XML Interchange ,XML validation ,Document type definition ,computer.file_format ,Ecological succession ,HTML ,World Wide Web ,XML Schema Editor ,Hardware and Architecture ,Compatibility (mechanics) ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,SGML ,Software engineering ,business ,Law ,computer ,Software ,XML ,computer.programming_language - Abstract
Succession in standardization is usually a problem. The advantages of improvements are weighed against those of compatibility. If compatibility considerations dominate, a grafting process takes place. This process need not lead to compatibility. According to our taxonomy of successor standards, there are three types of succession (outcomes). Type I, where grafting is achieved, entails compatibility between successors, technical paradigm-compliance, and continuity in the standards trajectory. In this paper, we examine issues of succession and focus on the Extensible Markup Language (XML). It was to be grafted on the Standard Generalized Markup Language (SGML), a stable standard since 1988. However, XML was a profile, a subset and an extension of SGML (1988). Adaptation of SGML was needed (SGML1999) to forge full (downward) compatibility with XML (1998). We describe the grafting efforts and analyze their outcomes. We conclude that XML largely fits the SGML paradigm. SGML was a technical exemplar for XML developers. In contrast, widespread use of HTML exemplified the desirability of simplicity in XML standardization. The latter issue and HTML's user market largely explain discontinuity in SGML-XML succession.
- Published
- 2002
11. WWW (World Wide Web) Communication and Publishing of Structural Formulas by XyMML (XyM Markup Language)
- Subjects
World Wide Web ,XHTML ,Markup language ,Computer science ,Document type declaration ,Document Definition Markup Language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Well-formed document ,Document type definition ,HTML ,computer ,computer.programming_language ,PCDATA - Abstract
A tool for displaying and communicating chemical structural formulas has been developed on the basis of XyMML (XyM Markup Language), where a XyMML document according to the XML (Extensible Markup Language) specification has been transformed into an HTML (HyperText Markup Language) document by means of a translator program due to XSLT (Extensible Stylesheet Language Transformations). During this process, XyMML data written in such a XyMML document have been converted into XyM notations embedded in such an HTML document, which is browsed by virtue of a World Wide Web (WWW) browser including the XyMJava system. Another tool for printing chemical structural formulas has been developed so that the same XyMML document has been transformed into a XyMTeX document by means of XSLT. The resulting XyMTeX document has been used to print a document containing structural formulas through the TeX/LaTeX typesetting system. Thereby, the XyMML and the related techniques have been shown to have the potentiality of serving as a kernel for integrating WWW communication, electronic publishing, and conventional publishing in chemistry.
- Published
- 2002
12. Interoperable Document Collaboration
- Author
-
Patrick Durusau, Svante Schubert, and Sebastian Rönnau
- Subjects
World Wide Web ,Multiple document interface ,Computer science ,Document type declaration ,Document Schema Definition Languages ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Well-formed document ,Document management system ,Document engineering ,computer.software_genre ,User requirements document ,computer ,Vision document - Abstract
To provide office applications with an easy interoperable document merge capability and to enable the usage of document revision across applications, it is necessary to not only standardize the representations of a document state, but also of the changes made to the document during the editing process. Tracking the changes during editing retains the information usually being recovered afterwards. This avoids costly and time consuming processes like document comparison and diff heuristics [1].To this day, file formats such as the OpenDocument file format (ODF) do only specify all possible document variations of a document representing the final state of user data. Interoperability is therefore only given on a document level: One ODF application saves a document and a different application is able to load and continue work on the same document state. Common scenarios of document exchange have been by floppy disc, attached to email and exchange across networks via file services such as Dropbox.Nowadays, the Internet is ubiquitous and multiple users want to work simultaneously on the same document. In that context the transfer of a whole document from user to user is inefficient. Additionally, finding and merging changes in XML-based documents appears to be complex and possibly error-prone [2].For this reason, the OASIS Advanced Document Collaboration subcommittee has started to simplify collaboration by specifying the changes applicable to an ODF document and raising ODF application interoperability from a full document level to a more granular document change level.In this paper, we present an approach to ODF change representation called "Merge enabled Change-Tracking" (MCT), which is based on the Operational Transformation approach [3].
- Published
- 2014
13. Transforming paper documents into XML format with WISDOM++
- Author
-
Donato Malerba, O. Altamura, and Floriana Esposito
- Subjects
Document Structure Description ,Information retrieval ,Document type declaration ,Computer science ,Document classification ,Well-formed document ,XML validation ,computer.software_genre ,Computer Science Applications ,XML framework ,Simple API for XML ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Computer Vision and Pattern Recognition ,computer ,Software ,Document layout analysis - Abstract
The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems. In fact, the generation of documents in HTML format is easier when the layout structure of a page has been extracted by means of a document analysis process. The adoption of an XML format is even better, since it can facilitate the retrieval of documents in the Web. Nevertheless, an effective transformation of paper documents into this format requires further processing steps, namely document image classification and understanding. WISDOM++ is a document processing system that operates in five steps: document analysis, document classification, document understanding, text recognition with an OCR, and transformation into HTML/XML format. The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmentation, the acquisition of block classification rules using techniques from machine learning, the layout analysis based on general layout principles, and a method that uses document layout information for conversion to HTML/XML formats. A benchmarking of the system components implementing these innovative aspects is reported.
- Published
- 2001
14. Securing XML documents with Author-X
- Author
-
Elisa Bertino, Elena Ferrari, and Silvana Castano
- Subjects
Document Structure Description ,Database ,Computer Networks and Communications ,Computer science ,computer.internet_protocol ,Document type declaration ,XML validation ,Well-formed document ,Document type definition ,computer.software_genre ,User requirements document ,World Wide Web ,Simple API for XML ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,XML - Abstract
Author-X is a Java-based system that addresses the security issues of access control and policy design for XML document administration. Author-X supports the specification of policies at varying granularity levels and the specification of user credentials as a way to enforce access control. Access control is available according to both push and pull document distribution policies, and document updates are distributed through a combination of hash functions and digital signature techniques. The Author-X approach to distributed updates allows a user to verify a document's integrity without contacting the document server.
- Published
- 2001
15. Mapping the XML data model into the object model of the SYNTHESIS language
- Author
-
Leonid A. Kalinichenko, O. L. Machul'sky, and M. A. Osipov
- Subjects
Document Structure Description ,Programming language ,Document type declaration ,Computer science ,computer.internet_protocol ,XML validation ,Well-formed document ,Document type definition ,computer.software_genre ,Simple API for XML ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,XML schema ,computer ,Software ,XML ,computer.programming_language - Abstract
In this paper, a mapping of the XML document structure into the canonical data model is studied [5]. The XML document structure is specified by the Document Type Definition (DTD). DTD serves as a basis for a specification in the SYNTHESIS language; each DTD element declaration is mapped into some data type of SYNTHESIS.
- Published
- 2000
16. Structured storage and retrieval of SGML documents using Grove
- Author
-
Hak-Gyoon Kim and Sung-Bae Cho
- Subjects
Document Structure Description ,Information retrieval ,Database ,Document type declaration ,Computer science ,Search engine indexing ,Document type definition ,computer.file_format ,Library and Information Sciences ,Management Science and Operations Research ,computer.software_genre ,Computer Science Applications ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Media Technology ,Information system ,Processing Instruction ,HyTime ,SGML ,computer ,Information Systems - Abstract
SGML standardized in ISO 8879 [International Organization for Standardization (1986)] has been proliferated because it can provide various styles and transform documents on different platforms. The SGML document has logical structure information in addition to the contents. As SGML documents are widely used, there is an increasing demand for a storage and retrieval system to use the logical structure of documents efficiently. However, traditional retrieval systems based on document indexes cannot exploit the logical structure appropriately. In this paper, we have developed a document storage and retrieval system based on structure information, where the SGML document is transformed into Grove, which is the document model for DSSSL and HyTime, and stored at an element level by an object-oriented DBMS, Object Store. It supports structured documents and provides a query interface to retrieve information contained in the structures.
- Published
- 2000
17. Standard Generalized Markup Language for self-defining structured reports
- Author
-
Charles E. Kahn
- Subjects
Information retrieval ,Medical Records Systems, Computerized ,Standardization ,Computer science ,Document type declaration ,Unified Medical Language System ,Information Storage and Retrieval ,Health Informatics ,SGML entity ,computer.file_format ,Document type definition ,Open standard ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Programming Languages ,SGML ,computer ,Information Systems ,PCDATA - Abstract
Structured reporting is the process of using standardized data elements and predetermined data-entry formats to record observations. The Standard Generalized Markup Language (SGML; International Standards Organization (ISO) 8879:1986)—an open, internationally accepted standard for document interchange—was used to encode medical observations acquired in an Internet-based structured reporting system. The resulting report is self-documenting: it includes a definition of its allowable data fields and values encoded as a report-specific SGML document type definition (DTD). The data-entry forms, DTD, and report document instances are based on report specifications written in a simple, SGML-based language designed for that purpose. Reporting concepts can be linked with those of external vocabularies such as the Unified Medical Language System (UMLS) Metathesaurus. The use of open standards such as SGML is an important step in the creation of open, universally comprehensible structured reports.
- Published
- 1999
18. The origin of (document) species
- Author
-
Adam Rifkin and Rohit Khare
- Subjects
XHTML ,RuleML ,HTML5 ,Markup language ,Information retrieval ,Document type declaration ,computer.internet_protocol ,Computer science ,Electronic document ,General Engineering ,Well-formed document ,computer.file_format ,Document type definition ,HTML ,World Wide Web ,Metadata ,Document Definition Markup Language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,SGML ,computer ,XML ,PCDATA ,computer.programming_language - Abstract
The World Wide Web's extraordinary reach is based in part on its open assimilation of document formats. Although Web transfer protocols and addressing can accommodate any kinds of resources, the unique application context of a truly global hypermedia system favours the adoption of certain Web-adapted formats. In this paper we consider the evolutionary record that has led to the ascent of the eXtensible Markup Language (XML). We present a taxonomy of document species in the Web according to their syntax, style, structure, and semanties. We observe the preferential adoption of SGML, CSS, HTML, and XML, respectively, which leverage a parsimonious evolutionary strategy favouring declarative encodings over Turing-complete languages; separable styles over inline formatting; declarative markup over presentational markup; and well-defined semantics over operational behavior. The paper concludes with an evolutionary walkthrough of citation formats. Ultimately, combined with the self-referential power of the Web to document itself, we believe XML can catalyze a critical shift of the Web from a global information space into a universal knowledge network.
- Published
- 1998
19. SGML and patent document processing. Part II: Experience in the EPO
- Author
-
Paul Brewin
- Subjects
Markup language ,Information retrieval ,Renewable Energy, Sustainability and the Environment ,Document type declaration ,Computer science ,Process Chemistry and Technology ,Energy Engineering and Power Technology ,Bioengineering ,computer.file_format ,Document type definition ,European patent office ,Library and Information Sciences ,Computer Science Applications ,World Wide Web ,Fuel Technology ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Processing Instruction ,SGML ,computer ,Patent document ,PCDATA - Abstract
In this article, the practical use of SGML (Standard Generalized Markup Language) in the EPO (European Patent Office) is described: it discusses the history of the SGML project and how SGML is used in the production of patent documents, databases and CD-ROMs.
- Published
- 1997
20. A formal language model for parsing SGML
- Author
-
R. W. Matzen, K. M. George, and G. E. Hedrick
- Subjects
Parsing ,Computer science ,Programming language ,Document type declaration ,business.industry ,Document type definition ,computer.file_format ,SGML entity ,computer.software_genre ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Hardware and Architecture ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Regular expression ,Language model ,Artificial intelligence ,SGML ,business ,computer ,Software ,Natural language processing ,Information Systems - Abstract
The Standard Generalized Markup Language (SGML) is an international standard for document definition (ISO 8879) that was adopted in 1986 and is rapidly gaining acceptance in industry and government. It is a meta-language system for document design rather than a specific scheme for document processing; almost any kind of document can be described using SGML. Productions called element declarations are used to define arbitrary elements of documents and the context in which they can occur. A finite set of element declarations called a document type definition (DTD) defines the high-level syntax of a set of documents. DTDs are similar to context-free grammars, but the productions are more complex. The standard does not describe a formal language model for SGML, and there is little work in the literature on this topic. This article defines a formal language model for SGML; systems of finite automata from systems of regular expressions. This model is applied in two ways: a parser is constructed for DTDs, and methods are shown for automatically constructing parsers for the documents defined by a DTD. These methods for parsing SGML are new, and they include features of DTDs that have not previously been included in a static language model. The model applies directly to the syntactic constructs of SGML, and thus, the methods shown in this article have distinct advantages for parsing SGML over traditional context-free parsing methods.
- Published
- 1997
21. [Untitled]
- Author
-
Ron Sacks-Davis, Brian Lowe, and Justin Zobel
- Subjects
Structure (mathematical logic) ,Markup language ,Information retrieval ,Document type declaration ,business.industry ,Computer science ,Representation (systemics) ,computer.file_format ,Basis (universal algebra) ,computer.software_genre ,Query language ,Expression (mathematics) ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,General Earth and Planetary Sciences ,Artificial intelligence ,business ,SGML ,computer ,Natural language processing - Abstract
Most documents have a hierarchical structure, which can be made explicit by markup languages such as SGML. In this paper we propose a formal model for representation of hierarchically structured documents, to be used as the basis for document query languages. The model uses a redundant representation of the document elements to simplify the expression of common queries. As an illustration of the power of the model we show how queries might be expressed, both as set-theoretic expressions and in a simple algebra, and outline how queries might be evaluated in a practical system.
- Published
- 1997
22. A Practical Method for Compatibility Evaluation of Portable Document Formats
- Author
-
Dariusz Król and Michał Łopatka
- Subjects
Document Structure Description ,Information retrieval ,Database ,Document type declaration ,Computer science ,Well-formed document ,Document management system ,Document type definition ,computer.software_genre ,Simple API for XML ,Document Schema Definition Languages ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,Document layout analysis - Abstract
This paper presents a method for verification of PDF documents for compatibility with publication models provided by scientific publishers. We first consider the problem of converting a document from PDF to XML format. Subsequently, we present an analysis of the document's graphical layout which operates in two phases. The first phase develops a model using a semi-automatic process with limited user interaction. This is followed by comparing and matching of submitted documents. The experimental results demonstrate the degree of document compatibility with the model along with a report of errors and warning messages.
- Published
- 2013
23. SGML and patent document processing. Part I: WIPO Standard ST.32
- Author
-
Paul Brewin
- Subjects
Markup language ,Information retrieval ,Renewable Energy, Sustainability and the Environment ,Computer science ,Document type declaration ,Process Chemistry and Technology ,Energy Engineering and Power Technology ,Bioengineering ,computer.file_format ,European patent office ,Document type definition ,Library and Information Sciences ,Computer Science Applications ,World Wide Web ,Fuel Technology ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,SGML ,Electronic filing ,Patent document ,computer - Abstract
A description of SGML (Standard Generalised Markup Language) is given together with a detailed description of WIPO Standard ST. 32. The benefits of the use of SGML are highlighted — its system independence and flexibility in building publication systems and full-text databases. The use of SGML for patent document processing and how it might be beneficial for patent departments and representatives to use SGML in their own document systems, as well as for electronic filing of applications, is discussed. Reference is made to its use in the European Patent Office.
- Published
- 1996
24. HTML to the max: a manifesto for adding SGML intelligence to the World-Wide Web
- Author
-
C. M. Sperberg-Mcqueen and Robert F. Goldstein
- Subjects
Numeric character reference ,Markup language ,Style sheet ,Document type declaration ,Programming language ,Computer science ,General Engineering ,Document type definition ,computer.file_format ,SGML entity ,computer.software_genre ,World Wide Web ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Processing Instruction ,SGML ,computer - Abstract
HTML demonstrates that SGML markup is useful for networked information. How can it be made even more useful? One way is to extend the tag set from HTML to HTML2, etc. We argue here for a more radical approach: full SGML awareness in WWW. We believe the difficulties are small, the cost affordable, and the advantages overwhelming. SGML is a metalanguage for defining markup languages; HTML is just one instance of this infinite family. At present, documents in other SGML document types must be translated into HTML for display by a Mosaic client—sometimes this imposes unacceptable information loss. WWW browsers could handle other SGML document types without translation by launching a general-purpose SGML browser to view them, as they now launch graphics viewers; a better solution overall would be to build SGML display into the WWW browsers themselves. Either way, display of an SGML document would be controlled by a style sheet using a small number of display primitives (“bold”, “line break”, etc.) to specify the rendition of each element type. For “well-known” document type definitions (DTDs) like HTML, style sheets could be distributed with the browser, or built in. For other DTDs, the browser would fetch a style sheet from the server. Using style sheets, browser software can also make it easy to customize document display. DTDs and style sheets can be designed to accommodate extensions, ensuring that authors can make small extensions to the tag set with no change whatsoever in the target browsers and virtually no performance penalty.
- Published
- 1995
25. Transformation list for SGML application
- Author
-
Hong Gao
- Subjects
Numeric character reference ,Information retrieval ,Computer science ,Interface (Java) ,Programming language ,Document type declaration ,Document type definition ,SGML entity ,computer.file_format ,computer.software_genre ,Computer Science Applications ,Theoretical Computer Science ,Computational Theory and Mathematics ,Hardware and Architecture ,Application domain ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Processing Instruction ,SGML ,computer ,Software - Abstract
SGML (Standard Generalized Markup Language) is an ISO standard for document description (ISO 8879). The main idea in SGML is to specify document both by text and by the document’s structure without reference to a particular processing system. This kind of document description puts the document interchange into fact. But there are very few systems of SGML that have friendly interface and are portable in many applications. In this paper, various approaches to implementing SGML are assessed and the transformation list for SGML application is introduced. This approach is not limited to specific application fields. It is suitable to any application domain and is friendly to users. Users can understand it without any training and can use it as easily as doing their routine work. It will accelerate the development of the document interchange.
- Published
- 1995
26. The qwertz synthesis of SGML and LaTEX
- Author
-
Thomas F. Gordon
- Subjects
Unix ,Markup language ,Computer science ,Programming language ,Document type declaration ,Document type definition ,computer.file_format ,Document processing ,computer.software_genre ,troff ,Hardware and Architecture ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,SGML ,Law ,computer ,Software ,De facto standard - Abstract
Markup languages are used to identify and delimit the components of manuscripts. The principal application of these languages is to provide a means for authors to markup their manuscripts with the information required by publishers for typesetting. LaT E X is a popular de facto standard markup language in some technical communities, such as academic computer science. SGML is an official ISO standard for defining markup languages. The qwertz document processing system is an SGML application we have developed for our own use, intended to combine the advantages of SGML and LaT E X. It consists of a model of LaT E X as a SGML Document Type Declaration (DTD) and Unix tools for translating SGML documents using this DTD into LaT E X, as well as troff. This article discusses our experiences in building and using the system.
- Published
- 1995
27. HTML
- Author
-
Mario Heiderich, Eduardo Alberto Vela Nava, Gareth Heyes, and David Lindsay
- Subjects
Markup language ,Computer science ,business.industry ,Document type declaration ,Scalable Vector Graphics ,Document type definition ,computer.file_format ,HTML ,JavaScript ,World Wide Web ,Web page ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Web application ,business ,computer ,computer.programming_language - Abstract
Publisher Summary This chapter discusses HTML (HyperText Markup Language), the markup language for structuring Web pages. Mastering HTML from a security point of view—in terms of both attack and defense—is complicated and requires almost encyclopedic knowledge. This chapter attempts to provide hat knowledge. In addition to discussing the HTML family and its hidden gems for attackers and trapdoors for defenders, this chapter sheds some light on the differences between the different HTML standards and their actual implementations. The history and basic elements of HTML and markup languages are discussed to get a better understanding of how and where to obfuscate. Some ways to obfuscate markup include execution of JavaScript, the obfuscation of a URL, or even a DoS attack against the client rendering the markup. Markup and HTML are difficult to parse and secure, and the user agents make this task difficult by allowing crazy combinations of characters, attributes, and tags to execute JavaScript. HTML is usually part of an attack against Web applications; although it is called a “markup language,” it is very powerful and should be treated with respect.
- Published
- 2011
28. A novel XML-based document format with printing quality for web publishing
- Author
-
Yinyan Yu, Liangcai Gao, Zhi Tang, and Ruiheng Qiu
- Subjects
Document Structure Description ,Computer science ,computer.internet_protocol ,Document type declaration ,business.industry ,XML validation ,Well-formed document ,Document management system ,computer.software_genre ,World Wide Web ,Simple API for XML ,Publishing ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Single source publishing ,Document engineering ,business ,computer ,XML - Abstract
Although many XML-based document formats are available for printing or publishing on the Internet, none of them is well designed to support both high quality printing and web publishing. Therefore, we propose a novel XML-based document format for web publishing, called CEBX, in this paper. The proposed format is a fixed-layout document supporting high quality printing, which has optimized document content organization, physical structure and protection scheme to support web publishing. There are four noteworthy features of CEBX documents: (1) CEBX provides original fixed layout by graphic units for printing quality. (2) The content in CEBX document can be reflowed to fit the display device basing on the content blocks and additional fluid information. (3) XML Document Archiving model (XDA), the packaging model used in CEBX, supports document linearization and incremental edit well. (4) By introducing a segment-based content protection scheme into CEBX, some part of a document can be previewed directly while the remaining part is protected effectively such that readers only need to purchase partial content of a book that they are interested in. This will be very helpful to document distribution and support flexible business models such as try-beforebuy, on-demand reading, superdistribution, etc.
- Published
- 2010
29. Document recognition
- Author
-
Nenad Marovac
- Subjects
Multiple document interface ,Information retrieval ,Document type declaration ,Computer science ,Programming language ,Well-formed document ,Document management system ,computer.software_genre ,User requirements document ,Document processing ,Document Schema Definition Languages ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,General Materials Science ,computer ,Document layout analysis - Abstract
Document recognition is a task in which a document in its physical presentation format is transformed into a structured author-oriented model of the document. The presentation format can be bitmaps of document pages, a description of the document in a Page Description Language (PDL), or encoding of the document in a printer or graphics language. The structured model is a format allowing for addition to the document, manipulation of the document, and reformating the layout and the output appearance of the document.Fully automatic document recognition is not possible, in general, for the same reason that it is not possible to de-translate computer programs automatically. However, it is possible to develop a man-assisted semi-automatic document recognition method. This method uses two passes. The first pass is completely automatic; it produces a document format called Interactive Document Model. The Interactive Document Model comprises recognized typesetting and descriptive structures together with derived ODA logical and layout structures for the document. The model generated in the first pass is enough for most purposes and applications. However, if it is not acceptable, the user can then enter the second pass and interactively edit the logical structure.This paper has three objectives. The first is to formalize the concept of document recognition. The second is to subdivide the problem of document recognition and classify it into a number of subproblems, each dealing with different aspects of the problem. The third objective is to introduce a problem which we wish to solve, and then to present a High Level Document Recognition method and the experience in developing and using a number of implementations of the method.
- Published
- 1992
30. Organizational Hypermedia Document Management Through Metadata
- Author
-
Garp Choong Kim and Woojong Suh
- Subjects
Computer science ,Document type declaration ,Hypermedia ,Well-formed document ,Document management system ,Document type definition ,computer.software_genre ,law.invention ,Metadata ,World Wide Web ,law ,Document Definition Markup Language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Meta element ,computer - Abstract
Web business systems, the most popular application of hypermedia, typically include a lot of hypermedia documents (hyperdocuments), which are also called Web pages. These systems have been conceived as an essential instrument in obtaining various beneficial opportunities for CRM (customer relationship management), SCM (supply chain management), e-banking or e-stock trading, and so forth (Turban et al., 2004). Most companies have made a continuous effort to build such systems. As a result, today the hyperdocuments in the organizations are growing explosively. The hyperdocuments employed for business tasks in the Web business systems may be referred to as organizational hyperdocuments (OHDs). The OHDs typically play a critical role in business, including the forms of invoices, checks, orders, and so forth. The organization’s ability to adapt the OHDs rapidly to ever-changing business requirements may impact on business performance. However, the maintenance of the OHDs increasing continuously is becoming a burdensome task to many organizations; managing them is as important to economic success as is software maintenance (Brereton et al., 1998). An approach to solve the challenge of managing OHDs is to use metadata. Metadata are generally known as data about data (or information about information). Concerning this approach, this article first reviews the previous studies and discusses perspectives desirable to manage the OHSs and then provides metadata classification and elements. Finally, this article discusses future trends and makes a conclusion.
- Published
- 2009
31. The X Factor: From HTML to XHTML
- Author
-
N. Perlin
- Subjects
XHTML ,Computer science ,Document type declaration ,computer.file_format ,Character encodings in HTML ,HTML element ,World Wide Web ,XML framework ,Wireless Markup Language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Document Object Model ,computer ,computer.programming_language ,XForms - Abstract
Created by Tim Berners-Lee in 1989/1990, HTML was the heart of the World Wide Web. Today, HTML is dead, replaced by XHTML which is HTML reformulated as an instance of XML. As technical communication moves into an era in which Flare works in native XHTML, ePublisher Professional outputs XHTML, and so on, it's important to know what XHTML is in order to understand how it will affect your work and choice of tools. This paper summarizes XHTML. The conference presentation will go into more detail
- Published
- 2006
32. Document Markup for the Web
- Author
-
Michael Kohlhase
- Subjects
World Wide Web ,XHTML ,Markup language ,Document type declaration ,Computer science ,Document Definition Markup Language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Well-formed document ,Document type definition ,HTML ,computer ,PCDATA ,computer.programming_language - Abstract
Document markup is the process of adding codes to a document to identify the structure of a document and to specify the format in which its fragments are to appear. We will discuss two conflicting aspects — structure and appearance — in document markup. As the Internet imposes special constraints imposed on markup formats, we will reflect its influence.
- Published
- 2006
33. Integrating Translation Services within a Structured Editor
- Author
-
Ali Choumane, Cécile Roisin, Hervé Blanchon, Communication Langagière et Interaction Personne-Système (CLIPS - IMAG), Université Joseph Fourier - Grenoble 1 (UJF)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Web, adaptation and multimedia (WAM), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), P. King, and Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Grenoble (INPG)-Université Joseph Fourier - Grenoble 1 (UJF)
- Subjects
Machine translation ,Process (engineering) ,Computer science ,computer.internet_protocol ,Well-formed document ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,World Wide Web ,Structured document ,0202 electrical engineering, electronic engineering, information engineering ,Dialog box ,0105 earth and related environmental sciences ,Information retrieval ,Document type declaration ,ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing/I.2.7.4: Machine translation ,ACM: I.: Computing Methodologies/I.7: DOCUMENT AND TEXT PROCESSING/I.7.2: Document Preparation/I.7.2.1: Format and notation ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Document Schema Definition Languages ,ACM: I.: Computing Methodologies/I.7: DOCUMENT AND TEXT PROCESSING/I.7.2: Document Preparation/I.7.2.5: Markup languages ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,020201 artificial intelligence & image processing ,computer ,XML - Abstract
International audience; Fully automatic machine translation cannot produce high quality translation; Dialog-Based Machine Translation (DB-MT) is the only way to provide authors with a means of translating documents in languages they have not mastered, or do not even know. With such environment, the author must help the system to understand the document by means of an interactive disambiguation step. In this pa- per we study the consequences of integrating the DBMT services within a structured document editor (Amaya). The source document (named edited document) needs a compan- ion document enriched with dierent data produced during the interactive translation process (question trees, answers of the author, translations). The edited document also needs to be enriched (annotated) in order to enable access to the question trees. The enriched edited document and the com- panion document have to be synchronized in case the edited document is further updated.
- Published
- 2005
34. Separating XHTML content from navigation clutter using DOM-structure block analysis
- Author
-
Mehmet A. Orgun, Constantine Mantratzis, and Steve Cassidy
- Subjects
XHTML ,Information retrieval ,Computer science ,Document type declaration ,Short paper ,Document clustering ,Hyperlink ,World Wide Web ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Clutter ,Clipping (computer graphics) ,Document Object Model ,computer ,computer.programming_language - Abstract
This short paper gives an overview of the principles behind an algorithm that separates the core-content of a web document from hyperlinked-clutter such as text advertisements and long links of syndicated references to other resources.Its advantage over other approaches is its ability to identify both loosely as well as tightly defined "table-like" or "list-like" structures of hyperlinks (from nested tables to simple, bullet-pointed lists) by operating at various levels within the DOM tree.The resulting data can then be used to extract the core-content from a web document for semantic analysis or other information retrieval purposes as well as to aid in the process of "clipping" a web document to its bare essentials for use with hardware-limited devices such as PDAs and cell phones.
- Published
- 2005
35. Conversion of PDF documents into HTML: a case study of document image analysis
- Author
-
Hassan Alam and Fuad Rahman
- Subjects
HTML5 ,Information retrieval ,Computer science ,Document type declaration ,Well-formed document ,Document management system ,Document clustering ,HTML ,computer.software_genre ,World Wide Web ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,Document layout analysis ,computer.programming_language - Abstract
Portable document format (PDF) has become the de facto standard in many fields because of its independence of local formatting restrictions and its accurate reproducibility. On the other hand, HTML documents are becoming an integral form of our lives by being the dominant form for information exchange within the World Wide Web environment. This paper discusses how image-processing techniques can be used to perform document layout analysis of complex multiple-column PDF documents. This analysis allows the conversion of these documents into the HTML format keeping the logical and physical layout intact.
- Published
- 2004
36. Document transformation system from papers to XML data based on pivot XML document method
- Author
-
Y. Ishitani
- Subjects
Document Structure Description ,XML Encryption ,computer.internet_protocol ,Computer science ,Efficient XML Interchange ,XML Signature ,Well-formed document ,XSLT ,Document type definition ,computer.software_genre ,Simple API for XML ,XML Schema Editor ,Streaming XML ,XML namespace ,XML schema ,computer.programming_language ,XHTML ,Information retrieval ,Document type declaration ,XML validation ,computer.file_format ,XML framework ,XML database ,XML Schema (W3C) ,Document Schema Definition Languages ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Document Object Model ,computer ,XML ,XML Catalog - Abstract
This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transformation strategy based on a pivot XML document. Firstly, document elements such as title, authors, abstract, headings, paragraphs, lists, captions, tables and figures are extracted from document images. Secondly, the hierarchical structure of document elements is extracted and is described using a DOM tree. Thirdly, this document structure is converted into a pivot XML document described as an XHTML document by an XML parser. Finally, this pivot XML document is transformed into the target XML document by the XML parser with XSLT scripts or specific programs. Experimental results show the method is effective in transforming printed documents to various XML documents.
- Published
- 2004
37. A Correspondence between UML Diagrams and SGML/XML DTDs
- Author
-
Anne Eerola and Eila Kuikka
- Subjects
Document Structure Description ,Markup language ,RuleML ,computer.internet_protocol ,Computer science ,Well-formed document ,SGML entity ,Document type definition ,computer.software_genre ,Unified Modeling Language ,SGML ,Object Constraint Language ,computer.programming_language ,XHTML ,Document type declaration ,Programming language ,XML validation ,computer.file_format ,Geography Markup Language ,XML Schema (W3C) ,Extensible markup ,Document Definition Markup Language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,XML ,XML Catalog ,Collaborative Application Markup Language ,PCDATA - Abstract
In this paper, we compare the semantics and structure of the conceptual information presented in the Unified Modeling Language (UML), which is used in analyzing object-oriented systems, and Document Type Definitions (DTD), which define the structures of SGML and XML documents. SGML (Standard Generalized Markup Language) and XML (Extensible Markup Language) are international standards for specifying the notations used for defining structured documents. We present correspondence rules for generating DTDs semiautomatically from UML diagrams. The rules have been developed as a part of the analysis and design method to create the structure definition for a document. As an example, we use a patient record.
- Published
- 2004
38. Automatic generation algorithm of uniform DTD for structured documents
- Author
-
Chun-Sik Yoo, Seon-Mi Woo, and Yong-Sung Kim
- Subjects
Structure (mathematical logic) ,Intranet ,Information retrieval ,Finite-state machine ,Computer science ,Document type declaration ,computer.file_format ,Document type definition ,Tree structure ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Processing Instruction ,SGML ,computer ,Algorithm - Abstract
SGML is the international standard for digital documents to be used in fields like intranet, CALS/EC, and so on. On the other hand, there is a notable problem that in spite of having a similar structure and being conceptually the same kind of document, many SGML documents have different DTDs and are stored in different databases. We propose an algorithm that automatically unifies DTDs of these SGML documents using a tree structure and finite automata. Constructing the SGML document database to apply the proposed algorithm reduces the number of database accesses and increases the efficiency of information retrieval. It provides a more effective management and operation environment for SGML document databases.
- Published
- 2003
39. Sharing SGML/XML document information conformable to business standards
- Author
-
K. Suzuki, M. Imamura, H. Tsuji, and O. Moriguchi
- Subjects
Markup language ,Database ,computer.internet_protocol ,Document type declaration ,Computer science ,Well-formed document ,Document type definition ,computer.file_format ,computer.software_genre ,World Wide Web ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Processing Instruction ,Information system ,SGML ,computer ,XML - Abstract
In CALS or EC, it is important to share product information conformable to business standards. This paper proposes a conformance-testing method for authorizing SGML/XML documents to follow business standards about document content constraints, such as standard vocabularies, datatypes, ranges of numerical values and units of measures. This method can promote reliable document sharing in information systems by circulating documents which passed the conformance-test.
- Published
- 2003
40. XML and XHTML
- Author
-
John Cowell
- Subjects
XHTML ,Markup language ,Document type declaration ,Computer science ,Document type definition ,computer.file_format ,XML framework ,World Wide Web ,Wireless Markup Language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,SGML ,computer ,computer.programming_language ,XForms - Abstract
The most common use of HTML is to create pages on the World Wide Web, although it can be used for developing local applications. It is a language which is used to describe the layout of documents and also allows you to do a variety of other jobs such as link to other documents, describe forms and produce emails. Since we can already do this with HTML why do we need XHTML? XML is another hot topic, but what is the relationship between XML and XHTML? Other markup languages such as SGML are also often talked about at the same time as XHTML, what are they? How does a scripting language such as JavaScript relate to XHTML?
- Published
- 2003
41. Once Upon a Time a DTD Evolved into Another DTD
- Author
-
Fatmé El-Moukaddem and Lina Al-Jadir
- Subjects
Document Structure Description ,computer.internet_protocol ,Computer science ,Schematron ,Efficient XML Interchange ,Well-formed document ,Document type definition ,External Data Representation ,computer.software_genre ,Simple API for XML ,XML Schema Editor ,Schema (psychology) ,Streaming XML ,RELAX NG ,XML schema ,computer.programming_language ,Information retrieval ,Document type declaration ,cXML ,Database schema ,InformationSystems_DATABASEMANAGEMENT ,XML validation ,computer.file_format ,XML framework ,XML Schema (W3C) ,Document Schema Definition Languages ,Data exchange ,Document Definition Markup Language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Data mining ,computer ,XML ,XML Catalog - Abstract
XML has become an emerging standard for data representation and data exchange over the web. In many applications a schema is associated with an XML document to specify and enforce the structure of the document. The schema may change over time to reflect a change in the real-world, a change in the user’s requirements, mistakes or missing information in the initial design. In this paper, we consider DTDs as XML schema mechanism, and present an approach to manage DTD evolution. We build a set of DTD changes. We identify invariants which must be preserved across DTD changes. We define the semantics of each DTD change such that the new DTD is valid, existing documents conform to the new DTD, and data is not lost if possible. We illustrate our approach with a scenario.
- Published
- 2003
42. SGML nets: integrating document and workflow modeling
- Author
-
W. Weitz
- Subjects
Document type declaration ,Programming language ,Computer science ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Processing Instruction ,Well-formed document ,Document type definition ,computer.file_format ,SGML entity ,Petri net ,SGML ,computer.software_genre ,computer - Abstract
We introduce so called SGML nets as a new formalism for an integrated modeling of document structures as well as document manipulation processes. SGML nets are a variant of high level Petri nets where each place (passive element "document store") is typed using an SGML document type definition (DTD). Each place may be marked with a set of DTD conforming document instances. Each transition (active element) specifies a class of operations on these document stores. Edges in SGML nets are inscribed with document templates. The incoming arcs of a transition select a set of instances to be read from the input places, while outgoing arcs define insertions into output places. The definition of the occurrence role ensures DTD conformance of the document instances in all places of the net at every moment.
- Published
- 2002
43. Object databases for SGML document management
- Author
-
Byung Suk Lee and M.R. Olson
- Subjects
Information retrieval ,Database ,Computer science ,Document type declaration ,Relational database ,SGML entity ,computer.file_format ,Document type definition ,Document management system ,computer.software_genre ,Object (computer science) ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Processing Instruction ,SGML ,computer - Abstract
We have investigated the use of an object database as a platform for storing and retrieving Standard Generalized Markup Language (SGML) documents. Qualitative studies convinced us that object databases are a perfect fit for supporting SGML document management. Unfortunately, quantitative benchmark results showed that the particular object database management system (ODBMS) product we used was not capable of supporting large scale SGML applications due to certain defects in its system architecture. The most critical defect was a weak support for location-independent persistent object identifiers. We strongly believe however, ODBMSs in general are perfect platforms and continue the experiment using another ODBMS product. We explain why and how an ODBMS fits well with SGML document management applications, describe how the benchmark experiment was performed and what were the results, and finally present a list of features as a recommendation to those interested in developing or using an ODBMS in support for SGML document management.
- Published
- 2002
44. Structured document framework for design patterns based on SGML
- Author
-
M. Ohtsuki, N. Yoshida, A. Makinouchi, and J. Segawa
- Subjects
Markup language ,Information retrieval ,Document type declaration ,Programming language ,Computer science ,Design pattern ,computer.file_format ,Document type definition ,Connascence ,computer.software_genre ,Structured document ,Software design pattern ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,SGML ,computer - Abstract
Design patterns are abstract software components for system structures and functions used in OOA/OOD. They are described currently as texts (with figures), and difficult to catalog, maintain and handle. The paper presents a framework to describe patterns as structured documents based on SGML (Structured Generalized Markup Language). A design pattern, in general, has three elements: texts, configuration charts and pseudo-codes, therefore SGML schemes are proposed for them respectively so that they are integrated into a single structured document. The document also includes link to related pattern documents and corresponding class source codes. This SGML-based pattern document is visualized by automatic HTML conversion and automatic chart generation. And the document is used for interactive code generation.
- Published
- 2002
45. ODIL: an SGML description language of the layout structure of documents
- Author
-
P. Lefevre and F. Reynaud
- Subjects
Structure (mathematical logic) ,Information retrieval ,Document type declaration ,Computer science ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Processing Instruction ,Document type definition ,SGML entity ,computer.file_format ,Graphics ,SGML ,computer - Abstract
This paper describes a coding format in SGML for the output of a document recognition prototype. Our proposal is a DTD named "ODIL"-Office Document Image description Language-that describes precisely the layout structure of a document after all recognition phases, including OCR. All layout objects of a document are defined in the form of SGML elements, and their characteristics are defined by SGML attributes. The basic objects are blocks, containing homogeneous information. Five types of information are supported by the ODIL language: texts, photos, line graphics, tables, mathematic formulas. The ODIL representation of the recognition results is well adapted to a further logical structure recognition. Starting from the ODIL DTD and using the RAINBOW transit DTD will permit to use SGML tools for the logical structure recognition which is viewed as an SGML up-conversion problem.
- Published
- 2002
46. Architecture of a content management server for XML document applications
- Author
-
Ron Sacks-Davis, Alan J. Kent, N. Sharman, Timothy Arnold-Moore, and Michael Fuller
- Subjects
Database ,computer.internet_protocol ,Document type declaration ,Computer science ,XML validation ,computer.file_format ,Document type definition ,computer.software_genre ,XML framework ,Streaming XML ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,XML schema ,SGML ,computer ,XML ,computer.programming_language - Abstract
Describes the data model that is used to implement the SIM content management server (CMS), an SGML/XML-native content server that is designed to support extremely fast data access to and dynamic updating of 100-GByte collections under high loads. This paper describes the requirements for supporting text-intensive applications and for building XML/SGML document management solutions. The SIM CMS employs a data model that is designed to directly support SGML and XML; this model is described, and a comparison with other models based on general-purpose database management systems is made.
- Published
- 2002
47. Extending Java for High-Level Web Service Construction
- Author
-
Aske Simon Christensen, Michael I. Schwartzbach, and Anders Møller
- Subjects
medicine.medical_specialty ,XHTML ,Java ,Document type declaration ,computer.internet_protocol ,Programming language ,Computer science ,Suite ,Program structure ,Document type definition ,HTML ,computer.software_genre ,Session (web analytics) ,XML framework ,Java API for XML-based RPC ,medicine ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Web service ,Web modeling ,computer ,Software ,XML ,computer.programming_language - Abstract
We incorporate innovations from the project into the Java language to provide high-level features for Web service programming. The resulting language, JWIG, contains an advanced session model and a flexible mechanism for dynamic construction of XML documents, in particular XHTML. To support program development we provide a suite of program analyses that at compile-time verify for a given program that no run-time errors can occur while building documents or receiving form input, and that all documents being shown are valid according to the document type definition for XHTML 1.0. We compare JWIG with Servlets and JSP which are widely used Web service development platforms. Our implementation and evaluation of JWIG indicate that the language extensions can simplify the program structure and that the analyses are sufficiently fast and precise to be practically useful.
- Published
- 2002
48. Matching an XML Document against a Set of DTDs
- Author
-
Marco Mesiti, Giovanna Guerrini, and Elisa Bertino
- Subjects
Document Structure Description ,XML Encryption ,Computer science ,computer.internet_protocol ,Efficient XML Interchange ,XML Signature ,Well-formed document ,Document type definition ,Simple API for XML ,XML Schema Editor ,Streaming XML ,XML schema ,computer.programming_language ,Information retrieval ,Document type declaration ,cXML ,XML validation ,computer.file_format ,XML framework ,XML Schema (W3C) ,Document Schema Definition Languages ,Document Definition Markup Language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,XML ,XML Catalog - Abstract
Sources of XML documents are proliferating on the Web and documents are more and more frequently exchanged among sources. At the same time, there is an increasing need of exploiting database tools to manage this kind of data. An important novelty of XML is that information on document structures is available on the Web together with the document contents. However, in such an heterogeneous environment as the Web, it is not reasonable to assume that XML documents that enter a source always conform to a predefined DTD in the source. In this paper we address the problem of document classification by proposing a metric for quantifying the structural similarity between an XML document and a DTD. Based on such notion, we propose an approach to match a document entering a source against the set of DTDs available in the source, determining whether a DTD exists similar enough to the document.
- Published
- 2002
49. Document image representation using XML technologies
- Author
-
Kusuma Harnath Atmakuri and Essam A. El-Kwae
- Subjects
Document Structure Description ,Style sheet ,Computer science ,computer.internet_protocol ,XSL ,Electronic document ,Well-formed document ,Document type definition ,Document management system ,User requirements document ,computer.software_genre ,World Wide Web ,Simple API for XML ,Document engineering ,computer.programming_language ,XHTML ,Information retrieval ,Document type declaration ,XML validation ,XML Schema (W3C) ,Document Schema Definition Languages ,Document Definition Markup Language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,computer ,XML ,XML Catalog ,PCDATA - Abstract
Electronic documents have gained wide acceptance due to the ease of editing and sharing of information. However, paper documents are still widely used in many environments. Moving into a paperless and distributed office has become a major goal for document image research. A new approach for form document representation is presented. This approach allows for electronic document sharing over the World Wide Web (WWW) using Extensible Markup Language (XML) technologies. Each document is mapped into three different views, an XML view to represent the preprinted and filled-in data, an XSL (Extensible style Sheets) view to represent the structure of the document, and a DTD (Document Type Definition) view to represent the document grammar and field constraints. The XML and XSL views are generated from a document template, either automatically using image processing techniques, or semi-automatically with minimal user interaction. The DTD representation may be fixed for general documents or may be generated semi-automatically by mining a number of filled-in document examples. Document templates need to be entered once to create the proposed representation. Afterwards, documents may be displayed, updated, or shared over the web. The merits of this approach are demonstrated using a number of examples of widely used forms.© (2001) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.
- Published
- 2001
50. JChemTidy: a tool for converting chemical Web document collections to an XHTML representation
- Author
-
Henry Rzepa, Philip R. Kenway, and Georgios V. Gkoutos
- Subjects
Web browser ,XHTML ,Information retrieval ,Computer science ,Document type declaration ,Search engine indexing ,Representation (systemics) ,General Chemistry ,computer.file_format ,HTML element ,Computer Science Applications ,World Wide Web ,Computational Theory and Mathematics ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,RDFa ,computer ,Web document ,Information Systems ,computer.programming_language - Abstract
A robot-based procedure is described for traversing a collection of hyperlinked documents written in HTML and converting these to the XML-compliant and well-formed XHTML representation. Transcluded chemical content invoked usingembedorappletHTML calls are converted to the XHTML recommendedobjectform. Additional attributes such as title or derived chemical attributes such as a SMILES descriptor are added to improve the indexing of the resulting document collection. Conformance tests for the popular Web browsers are reported.
- Published
- 2001
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.