Descriptor: "Document type declaration" - Searchworks@Jio Institute Digital Library Search Results

1. HTML5 Documents: Top-Level Document Definition

Author: Wallace Jackson
Subjects: Information retrieval, HTML5, Document type declaration, Computer science, Order (business), Document Definition Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Well-formed document
Abstract: All three tags need to be in your HTML5 document, in the proper order and used for the proper purposes.
Published: 2016

2. Inference Document Type (Dtd) From Xml Document: Web Structure Mining

Author: R. K. Chauhan, Nanhay Singh, and Raghuraj Singh
Subjects: Document Structure Description, XML Encryption, Computer science, computer.internet_protocol, Relational database, Efficient XML Interchange, XML Signature, Well-formed document, XML Base, Document type definition, Simple API for XML, XML Schema Editor, Schema (psychology), Streaming XML, XML schema, Foreign key, computer.programming_language, XHTML, Information retrieval, Document type declaration, InformationSystems_DATABASEMANAGEMENT, XML validation, computer.file_format, XML framework, XML Schema (W3C), Web mining, Document Schema Definition Languages, Data exchange, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Functional dependency, computer, XML, XML Catalog
Abstract: XML is becoming a prevalent format and defacto standard for data exchange in many applications. While traditionally, lots of data are stored and managed in relational databases. There is an urgent need to research some efficient methods to convert these data stored in relational databases to XML format when integrating and exchanging these data in XML format. The semantics of XML schemas are crucial to design, query, and store XML documents and functional dependencies are very important representations of semantic information of XML schemas. As DTDs are one of the most frequently used schemas for XML documents in these days, we will use DTDs as schemas of XML documents here. This paper studies the problem of schema conversion from relational schemas to XML DTDs. As functional dependencies play an important role in the schema conversion process, the concept of functional dependency for XML DTDs is used to preserve the semantics implied by functional dependencies and keys of relational schemas. A conversion method is proposed to convert relational schemas to XML DTDs in the presence of functional dependencies, keys and foreign keys. The methods presented here can preserve the semantics implied by functional dependencies, keys and foreign keys of relational schemas and can convert multiple relational tables to XML DTDs at the same time.
Published: 2010

3. Generating XML structure using examples and constraints

Author: Sara Cohen
Subjects: Document Structure Description, XML Encryption, computer.internet_protocol, Computer science, Efficient XML Interchange, XML Signature, Well-formed document, Document type definition, Simple API for XML, XML Schema Editor, Streaming XML, XML schema, XPath, computer.programming_language, Information retrieval, Document type declaration, cXML, General Engineering, XML validation, computer.file_format, XML framework, XML Schema (W3C), Document Schema Definition Languages, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML, XML Catalog
Abstract: This paper presents a framework for automatically generating structural XML documents. The user provides a target DTD and an example of an XML document, called a Generate-XML-By-Example Document , or a GxBE document , for short. GxBE documents use a natural declarative syntax, which includes XPath expressions and the function count. Using GxBE documents, users can express important global and local characteristics for the desired target documents, and can require satisfaction of XPath expressions from a given workload. This paper explores the problem of efficiently generating a document that satisfies a given DTD and GxBE document.
Published: 2008

4. Logical structure analysis: From HTML to XML

Author: Kyong-Ho Lee, Minhyung Lee, and Yeon-Seok Kim
Subjects: Document Structure Description, Information retrieval, Computer science, Document type declaration, Well-formed document, XML validation, Document type definition, Simple API for XML, Hardware and Architecture, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Law, Software, Document layout analysis, XML Catalog
Abstract: This paper presents an efficient method for extracting a logical structure from a Web document. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of a specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully, compared with previous work. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.
Published: 2007

5. Validating Scripted Web-Pages

Author: Roger G. Stone
Subjects: Document Structure Description, XHTML, General Computer Science, Computer science, computer.internet_protocol, Programming language, Document type declaration, XML validation, Well-formed document, WML, PHP, computer.software_genre, HTML element, VALIDATION, DTD, Theoretical Computer Science, XML framework, Simple API for XML, Web page, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML, computer.programming_language, Computer Science(all)
Abstract: The validation of XML documents against a DTD is well understood and tools exist to accomplish this task. But the problem considered here is the validation of a generator of XML documents. The desired outcome is to establish for a particular generator that it is incapable of producing invalid output. Many (X)HTML web pages are generated from a document containing embedded scripts written in languages such as PHP. Existing tools can validate any particular instance of the XHTML generated from the document. Howevere there is no tool for validating the document itself, guaranteeing that all instances that might be generated are valid.A prototype validating tool for scripted-documents has been developed which uses a notation developed to capture the generalised output from the document and a systematically augmented DTD.
Published: 2006
Full Text: View/download PDF

6. On Finding an Edit Script between an XML Document and a DTD

Author: Nobutaka Suzuki
Subjects: Document Structure Description, Information retrieval, Simple API for XML, Computer science, XML Schema Editor, Document type declaration, Well-formed document, XML validation, XML schema, Document type definition, computer, computer.programming_language
Published: 2006

7. Representing Annotations in XML Document using String-Trees Model

Author: Keng Hoon Gan
Subjects: Document Structure Description, Information retrieval, Simple API for XML, Computer science, Document type declaration, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, XML validation, Well-formed document, Document type definition, XML schema, computer, XML Catalog, computer.programming_language
Abstract: The flexibility of XML allows document to be annotated easily. However, these annotations come from different sources like Wordnet thesaurus, POS, DTD, semantic roles etc. These annotations can either be combined in the same document or captured separately in different document. The former, though richer in annotations, may look messy and requires more parsing time. The latter needs control of document consistency. This paper proposes a string-trees model to represent XML document for multiple sources of annotations. This model extends the existing string-tree structure for linguistic content in order to support structured contents of XML document. In this paper, we describe how this model is refined and applied on XML document.
Published: 2014

8. A New Technique for Authenticating Content in Evolving Marked-up Documents

Author: Phillip Berrie
Subjects: Document Structure Description, Linguistics and Language, Authentication, Markup language, Document type declaration, business.industry, Computer science, Electronic document, Language and Linguistics, World Wide Web, Structured document, Software, Transcription (software), business, Information Systems
Abstract: Accuracy of transcription is vital when preparing a scholarly version of an existing document. This process has not changed with the advent of electronic editions. In fact, ensuring the continued accuracy of a transcription in the digital realm is more difficult because a file, unlike a piece of paper, does not retain information about its previous states and it is therefore possible that accidental changes can go undetected unless the content is continually checked against the original. This article presents a new, character-set-independent, programming algorithm that allows for the ongoing authentication of the textual content of files being marked up with SGML-like languages. The study also describes an implementation of this algorithm and how it can be used with existing software tools to provide a more efficient and trusted editing environment for creating and editing marked-up files. The Just In Time Authentication Mechanism (JITAM) algorithm was developed in response to the need for some form of automated authentication mechanism for projects already employing embedded markup and is seen as a preparatory step that editors can take with their projects before making the leap to the more versatile Just In Time Markup (JITM) system.
Published: 2005

9. A Design and Implementation of the Tree-based Document Editing System for XML Application

Author: Young Chul Kim and Chun Kil Kang
Subjects: Document Structure Description, Database, Document type declaration, Programming language, Computer science, Well-formed document, XML validation, Document type definition, computer.software_genre, Simple API for XML, Document Schema Definition Languages, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML Catalog
Abstract: This paper describes a design and implementation of the tree-based document editing system for XML application, available at the structure-oriented environment. This system converts DTD to ASTD( Syntax Tree Definition) to support syntax-directed editing for valid document, considers the extensibility to add new tools and supports multiple entry parser for real-time document validation. It is expected that this paper contributes related XML application document editing system development model.
Published: 2004

10. Managing very large document collections using semantics

Author: Yubin Bao, Ge Yu, Guoren Wang, and Hongjun Lu
Subjects: Information retrieval, Document type declaration, Computer science, Well-formed document, Document management system, Document clustering, computer.software_genre, Semantics, Computer Science Applications, Theoretical Computer Science, Set (abstract data type), Computational Theory and Mathematics, Hardware and Architecture, Document Schema Definition Languages, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, Software, Document layout analysis
Abstract: In this paper, a system is presented where documents are no longer identified by their file names. Instead, a document is represented by its semantics in terms of descriptor and content vector. The descriptor of a document consists of a set of attributes, such as date of creation, its type, its size, annotations, etc. The content vector of a document consists of a set of terms extracted from the document. In this paper, a semantic document management system XBASE is designed and implemented based on the semantics and the functions of three main modules, X-Loader, X-Explorer and X-Query.
Published: 2003

11. Clustering DTDs: An interactive two-level approach

Author: Long Zhang, Weining Qian, Hailei Qian, Wen Jin, Yuqi Liang, and Aoying Zhou
Subjects: Document Structure Description, RuleML, computer.internet_protocol, Computer science, Well-formed document, Document type definition, computer.software_genre, Theoretical Computer Science, RELAX NG, SGML, Cluster analysis, computer.programming_language, XHTML, Document type declaration, XML validation, computer.file_format, Computer Science Applications, XML Schema (W3C), Computational Theory and Mathematics, Categorization, Hardware and Architecture, Data exchange, Document Definition Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Data mining, computer, Software, XML, PCDATA
Abstract: XML (eXtensible Markup Language) is a standard which is widely applied in data representation and data exchange. However, as an important concept of XML, DTD (Document Type Definition) is not taken full advantage in current applications. In this paper, a new method for clustering DTDs is presented, and it can be used in XML document clustering. The two-level method clusters the elements in DTDs and clusters DTDs separately. Element clustering forms the first level and provides dement clusters, which are the generalization of relevant elements. DTD clustering utilizes the generalized information and forms the second level in the whole clustering process. The two-level method has the following advantages: 1) It takes into consideration both the content and the structure within DTDs; 2) The generalized information about elements is more useful than the separated words in the vector model; 3) The two-level method facilitates the searching of outliers. The experiments show that this method is able to categorize the relevant DTDs effectively.
Published: 2002

12. Succession in standardization: grafting XML onto SGML

Author: A. G. A. J. Loeffen and Tineke M. Egyedi
Subjects: DocBook, Standardization, Document type declaration, Computer science, computer.internet_protocol, business.industry, Efficient XML Interchange, XML validation, Document type definition, computer.file_format, Ecological succession, HTML, World Wide Web, XML Schema Editor, Hardware and Architecture, Compatibility (mechanics), ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, SGML, Software engineering, business, Law, computer, Software, XML, computer.programming_language
Abstract: Succession in standardization is usually a problem. The advantages of improvements are weighed against those of compatibility. If compatibility considerations dominate, a grafting process takes place. This process need not lead to compatibility. According to our taxonomy of successor standards, there are three types of succession (outcomes). Type I, where grafting is achieved, entails compatibility between successors, technical paradigm-compliance, and continuity in the standards trajectory. In this paper, we examine issues of succession and focus on the Extensible Markup Language (XML). It was to be grafted on the Standard Generalized Markup Language (SGML), a stable standard since 1988. However, XML was a profile, a subset and an extension of SGML (1988). Adaptation of SGML was needed (SGML1999) to forge full (downward) compatibility with XML (1998). We describe the grafting efforts and analyze their outcomes. We conclude that XML largely fits the SGML paradigm. SGML was a technical exemplar for XML developers. In contrast, widespread use of HTML exemplified the desirability of simplicity in XML standardization. The latter issue and HTML's user market largely explain discontinuity in SGML-XML succession.
Published: 2002

13. WWW (World Wide Web) Communication and Publishing of Structural Formulas by XyMML (XyM Markup Language)

Subjects: World Wide Web, XHTML, Markup language, Computer science, Document type declaration, Document Definition Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Well-formed document, Document type definition, HTML, computer, computer.programming_language, PCDATA
Abstract: A tool for displaying and communicating chemical structural formulas has been developed on the basis of XyMML (XyM Markup Language), where a XyMML document according to the XML (Extensible Markup Language) specification has been transformed into an HTML (HyperText Markup Language) document by means of a translator program due to XSLT (Extensible Stylesheet Language Transformations). During this process, XyMML data written in such a XyMML document have been converted into XyM notations embedded in such an HTML document, which is browsed by virtue of a World Wide Web (WWW) browser including the XyMJava system. Another tool for printing chemical structural formulas has been developed so that the same XyMML document has been transformed into a XyMTeX document by means of XSLT. The resulting XyMTeX document has been used to print a document containing structural formulas through the TeX/LaTeX typesetting system. Thereby, the XyMML and the related techniques have been shown to have the potentiality of serving as a kernel for integrating WWW communication, electronic publishing, and conventional publishing in chemistry.
Published: 2002

14. Interoperable Document Collaboration

Author: Patrick Durusau, Svante Schubert, and Sebastian Rönnau
Subjects: World Wide Web, Multiple document interface, Computer science, Document type declaration, Document Schema Definition Languages, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Well-formed document, Document management system, Document engineering, computer.software_genre, User requirements document, computer, Vision document
Abstract: To provide office applications with an easy interoperable document merge capability and to enable the usage of document revision across applications, it is necessary to not only standardize the representations of a document state, but also of the changes made to the document during the editing process. Tracking the changes during editing retains the information usually being recovered afterwards. This avoids costly and time consuming processes like document comparison and diff heuristics [1].To this day, file formats such as the OpenDocument file format (ODF) do only specify all possible document variations of a document representing the final state of user data. Interoperability is therefore only given on a document level: One ODF application saves a document and a different application is able to load and continue work on the same document state. Common scenarios of document exchange have been by floppy disc, attached to email and exchange across networks via file services such as Dropbox.Nowadays, the Internet is ubiquitous and multiple users want to work simultaneously on the same document. In that context the transfer of a whole document from user to user is inefficient. Additionally, finding and merging changes in XML-based documents appears to be complex and possibly error-prone [2].For this reason, the OASIS Advanced Document Collaboration subcommittee has started to simplify collaboration by specifying the changes applicable to an ODF document and raising ODF application interoperability from a full document level to a more granular document change level.In this paper, we present an approach to ODF change representation called "Merge enabled Change-Tracking" (MCT), which is based on the Operational Transformation approach [3].
Published: 2014

15. Transforming paper documents into XML format with WISDOM++

Author: Donato Malerba, O. Altamura, and Floriana Esposito
Subjects: Document Structure Description, Information retrieval, Document type declaration, Computer science, Document classification, Well-formed document, XML validation, computer.software_genre, Computer Science Applications, XML framework, Simple API for XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Computer Vision and Pattern Recognition, computer, Software, Document layout analysis
Abstract: The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems. In fact, the generation of documents in HTML format is easier when the layout structure of a page has been extracted by means of a document analysis process. The adoption of an XML format is even better, since it can facilitate the retrieval of documents in the Web. Nevertheless, an effective transformation of paper documents into this format requires further processing steps, namely document image classification and understanding. WISDOM++ is a document processing system that operates in five steps: document analysis, document classification, document understanding, text recognition with an OCR, and transformation into HTML/XML format. The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmentation, the acquisition of block classification rules using techniques from machine learning, the layout analysis based on general layout principles, and a method that uses document layout information for conversion to HTML/XML formats. A benchmarking of the system components implementing these innovative aspects is reported.
Published: 2001

16. Securing XML documents with Author-X

Author: Elisa Bertino, Elena Ferrari, and Silvana Castano
Subjects: Document Structure Description, Database, Computer Networks and Communications, Computer science, computer.internet_protocol, Document type declaration, XML validation, Well-formed document, Document type definition, computer.software_genre, User requirements document, World Wide Web, Simple API for XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML
Abstract: Author-X is a Java-based system that addresses the security issues of access control and policy design for XML document administration. Author-X supports the specification of policies at varying granularity levels and the specification of user credentials as a way to enforce access control. Access control is available according to both push and pull document distribution policies, and document updates are distributed through a combination of hash functions and digital signature techniques. The Author-X approach to distributed updates allows a user to verify a document's integrity without contacting the document server.
Published: 2001

17. 6th JSIK SGML/XML Forum

Author: Tim Bray
Subjects: World Wide Web, DocBook, Information retrieval, Computer science, Document type declaration, XML Schema Editor, computer.internet_protocol, Efficient XML Interchange, computer.file_format, Document type definition, SGML, computer, XML
Published: 2001

18. Mapping the XML data model into the object model of the SYNTHESIS language

Author: Leonid A. Kalinichenko, O. L. Machul'sky, and M. A. Osipov
Subjects: Document Structure Description, Programming language, Document type declaration, Computer science, computer.internet_protocol, XML validation, Well-formed document, Document type definition, computer.software_genre, Simple API for XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, XML schema, computer, Software, XML, computer.programming_language
Abstract: In this paper, a mapping of the XML document structure into the canonical data model is studied [5]. The XML document structure is specified by the Document Type Definition (DTD). DTD serves as a basis for a specification in the SYNTHESIS language; each DTD element declaration is mapped into some data type of SYNTHESIS.
Published: 2000

19. Structured storage and retrieval of SGML documents using Grove

Author: Hak-Gyoon Kim and Sung-Bae Cho
Subjects: Document Structure Description, Information retrieval, Database, Document type declaration, Computer science, Search engine indexing, Document type definition, computer.file_format, Library and Information Sciences, Management Science and Operations Research, computer.software_genre, Computer Science Applications, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Media Technology, Information system, Processing Instruction, HyTime, SGML, computer, Information Systems
Abstract: SGML standardized in ISO 8879 [International Organization for Standardization (1986)] has been proliferated because it can provide various styles and transform documents on different platforms. The SGML document has logical structure information in addition to the contents. As SGML documents are widely used, there is an increasing demand for a storage and retrieval system to use the logical structure of documents efficiently. However, traditional retrieval systems based on document indexes cannot exploit the logical structure appropriately. In this paper, we have developed a document storage and retrieval system based on structure information, where the SGML document is transformed into Grove, which is the document model for DSSSL and HyTime, and stored at an element level by an object-oriented DBMS, Object Store. It supports structured documents and provides a query interface to retrieve information contained in the structures.
Published: 2000

20. Standard Generalized Markup Language for self-defining structured reports

Author: Charles E. Kahn
Subjects: Information retrieval, Medical Records Systems, Computerized, Standardization, Computer science, Document type declaration, Unified Medical Language System, Information Storage and Retrieval, Health Informatics, SGML entity, computer.file_format, Document type definition, Open standard, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Programming Languages, SGML, computer, Information Systems, PCDATA
Abstract: Structured reporting is the process of using standardized data elements and predetermined data-entry formats to record observations. The Standard Generalized Markup Language (SGML; International Standards Organization (ISO) 8879:1986)—an open, internationally accepted standard for document interchange—was used to encode medical observations acquired in an Internet-based structured reporting system. The resulting report is self-documenting: it includes a definition of its allowable data fields and values encoded as a report-specific SGML document type definition (DTD). The data-entry forms, DTD, and report document instances are based on report specifications written in a simple, SGML-based language designed for that purpose. Reporting concepts can be linked with those of external vocabularies such as the Unified Medical Language System (UMLS) Metathesaurus. The use of open standards such as SGML is an important step in the creation of open, universally comprehensible structured reports.
Published: 1999

21. Document structure and markup in the FRESS hypertext system

Author: Steven J. DeRose and Andries van Dam
Subjects: Document Structure Description, XHTML, Information retrieval, Markup language, Document type declaration, Computer science, Well-formed document, law.invention, World Wide Web, law, Hypertext, computer, computer.programming_language, PCDATA
Published: 1999

22. [Untitled]

Author: Xien Fan, Qianhong Liu, and Peter A. Ng
Subjects: Information retrieval, Database, Computer science, Document type declaration, Frame (networking), Word processing, Well-formed document, Document type definition, Document clustering, computer.software_genre, Document processing, Data_FILES, General Earth and Planetary Sciences, Document retrieval, computer
Abstract: TEXPROS (TEXt PROcessing System) is an automatic document processing system which supports text-based information representation and manipulation, conveying meanings from stored information within office document texts. A dual modeling approach is employed to describe office documents and support document search and retrieval. The frame templates for representing document classes are organized to form a document type hierarchy. Based on its document type, the synopsis of a document is extracted to form its corresponding frame instance. According to the user predefined criteria, these frame instances are stored in different folders, which are organized as a folder organization (i.e., repository of frame instances associated with their documents). The concept of linking folders establishes filing paths for automatically filing documents in the folder organization. By integrating document type hierarchy and folder organization, the dual modeling approach provides efficient frame instance access by limiting the searches to those frame instances of a document type within those folders which appear to be the most similar to the corresponding queries. This paper presents an agent-based document filing system using folder organization. A storage architecture is presented to incorporate the document type hierarchy, folder organization and original document storage into a three-level storage system. This folder organization supports effective filing strategy and allows rapid frame instance searches by confining the search to the actual predicate-driven retrieval method. A predicate specification is proposed for specifying criteria on filing paths in terms of user predefined predicates for governing the document filing. A method for evaluating whether a given frame instance satisfies the criteria of a filing path is presented. The basic operations for constructing and reorganizing a folder organization are proposed.
Published: 1999

23. The origin of (document) species

Author: Adam Rifkin and Rohit Khare
Subjects: XHTML, RuleML, HTML5, Markup language, Information retrieval, Document type declaration, computer.internet_protocol, Computer science, Electronic document, General Engineering, Well-formed document, computer.file_format, Document type definition, HTML, World Wide Web, Metadata, Document Definition Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, SGML, computer, XML, PCDATA, computer.programming_language
Abstract: The World Wide Web's extraordinary reach is based in part on its open assimilation of document formats. Although Web transfer protocols and addressing can accommodate any kinds of resources, the unique application context of a truly global hypermedia system favours the adoption of certain Web-adapted formats. In this paper we consider the evolutionary record that has led to the ascent of the eXtensible Markup Language (XML). We present a taxonomy of document species in the Web according to their syntax, style, structure, and semanties. We observe the preferential adoption of SGML, CSS, HTML, and XML, respectively, which leverage a parsimonious evolutionary strategy favouring declarative encodings over Turing-complete languages; separable styles over inline formatting; declarative markup over presentational markup; and well-defined semantics over operational behavior. The paper concludes with an evolutionary walkthrough of citation formats. Ultimately, combined with the self-referential power of the Web to document itself, we believe XML can catalyze a critical shift of the Web from a global information space into a universal knowledge network.
Published: 1998

24. SGML and patent document processing. Part II: Experience in the EPO

Author: Paul Brewin
Subjects: Markup language, Information retrieval, Renewable Energy, Sustainability and the Environment, Document type declaration, Computer science, Process Chemistry and Technology, Energy Engineering and Power Technology, Bioengineering, computer.file_format, Document type definition, European patent office, Library and Information Sciences, Computer Science Applications, World Wide Web, Fuel Technology, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Processing Instruction, SGML, computer, Patent document, PCDATA
Abstract: In this article, the practical use of SGML (Standard Generalized Markup Language) in the EPO (European Patent Office) is described: it discusses the history of the SGML project and how SGML is used in the production of patent documents, databases and CD-ROMs.
Published: 1997

25. A formal language model for parsing SGML

Author: R. W. Matzen, K. M. George, and G. E. Hedrick
Subjects: Parsing, Computer science, Programming language, Document type declaration, business.industry, Document type definition, computer.file_format, SGML entity, computer.software_genre, TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES, Hardware and Architecture, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Regular expression, Language model, Artificial intelligence, SGML, business, computer, Software, Natural language processing, Information Systems
Abstract: The Standard Generalized Markup Language (SGML) is an international standard for document definition (ISO 8879) that was adopted in 1986 and is rapidly gaining acceptance in industry and government. It is a meta-language system for document design rather than a specific scheme for document processing; almost any kind of document can be described using SGML. Productions called element declarations are used to define arbitrary elements of documents and the context in which they can occur. A finite set of element declarations called a document type definition (DTD) defines the high-level syntax of a set of documents. DTDs are similar to context-free grammars, but the productions are more complex. The standard does not describe a formal language model for SGML, and there is little work in the literature on this topic. This article defines a formal language model for SGML; systems of finite automata from systems of regular expressions. This model is applied in two ways: a parser is constructed for DTDs, and methods are shown for automatically constructing parsers for the documents defined by a DTD. These methods for parsing SGML are new, and they include features of DTDs that have not previously been included in a static language model. The model applies directly to the syntactic constructs of SGML, and thus, the methods shown in this article have distinct advantages for parsing SGML over traditional context-free parsing methods.
Published: 1997

26. [Untitled]

Author: Ron Sacks-Davis, Brian Lowe, and Justin Zobel
Subjects: Structure (mathematical logic), Markup language, Information retrieval, Document type declaration, business.industry, Computer science, Representation (systemics), computer.file_format, Basis (universal algebra), computer.software_genre, Query language, Expression (mathematics), ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, General Earth and Planetary Sciences, Artificial intelligence, business, SGML, computer, Natural language processing
Abstract: Most documents have a hierarchical structure, which can be made explicit by markup languages such as SGML. In this paper we propose a formal model for representation of hierarchically structured documents, to be used as the basis for document query languages. The model uses a redundant representation of the document elements to simplify the expression of common queries. As an illustration of the power of the model we show how queries might be expressed, both as set-theoretic expressions and in a simple algebra, and outline how queries might be evaluated in a practical system.
Published: 1997

27. Extensible Markup Language Document Management

Author: Tatiana Kovacikova and Giovanni Bartolomeo
Subjects: XHTML, Markup language, RuleML, Computer science, Document type declaration, Programming language, computer.file_format, computer.software_genre, Synchronized Multimedia Integration Language, Document Definition Markup Language, computer, Collaborative Application Markup Language, PCDATA, computer.programming_language
Published: 2013

28. A Practical Method for Compatibility Evaluation of Portable Document Formats

Author: Dariusz Król and Michał Łopatka
Subjects: Document Structure Description, Information retrieval, Database, Document type declaration, Computer science, Well-formed document, Document management system, Document type definition, computer.software_genre, Simple API for XML, Document Schema Definition Languages, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, Document layout analysis
Abstract: This paper presents a method for verification of PDF documents for compatibility with publication models provided by scientific publishers. We first consider the problem of converting a document from PDF to XML format. Subsequently, we present an analysis of the document's graphical layout which operates in two phases. The first phase develops a model using a semi-automatic process with limited user interaction. This is followed by comparing and matching of submitted documents. The experimental results demonstrate the degree of document compatibility with the model along with a report of errors and warning messages.
Published: 2013

29. SGML and patent document processing. Part I: WIPO Standard ST.32

Author: Paul Brewin
Subjects: Markup language, Information retrieval, Renewable Energy, Sustainability and the Environment, Computer science, Document type declaration, Process Chemistry and Technology, Energy Engineering and Power Technology, Bioengineering, computer.file_format, European patent office, Document type definition, Library and Information Sciences, Computer Science Applications, World Wide Web, Fuel Technology, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, SGML, Electronic filing, Patent document, computer
Abstract: A description of SGML (Standard Generalised Markup Language) is given together with a detailed description of WIPO Standard ST. 32. The benefits of the use of SGML are highlighted — its system independence and flexibility in building publication systems and full-text databases. The use of SGML for patent document processing and how it might be beneficial for patent departments and representatives to use SGML in their own document systems, as well as for electronic filing of applications, is discussed. Reference is made to its use in the European Patent Office.
Published: 1996

30. YAdumper: extracting and translating large information volumes from relational databases to structured flat files

Author: José M. Fernández and Alfonso Valencia
Subjects: Statistics and Probability, SQL, Databases, Factual, computer.internet_protocol, Relational database, Computer science, Information Storage and Retrieval, computer.software_genre, Biochemistry, Database design, Information schema, User-Computer Interface, Entity–relationship model, Object-relational impedance mismatch, Molecular Biology, computer.programming_language, Database model, Electronic Data Processing, Database, Information Dissemination, Document type declaration, Computational Biology, Computer Science Applications, Computational Mathematics, Computational Theory and Mathematics, Relational model, Database Management Systems, Database theory, Semi-structured data, computer, Algorithms, Software, XML
Abstract: Summary: Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance. Availability: YAdumper is freely available.
Published: 2004

31. HTML to the max: a manifesto for adding SGML intelligence to the World-Wide Web

Author: C. M. Sperberg-Mcqueen and Robert F. Goldstein
Subjects: Numeric character reference, Markup language, Style sheet, Document type declaration, Programming language, Computer science, General Engineering, Document type definition, computer.file_format, SGML entity, computer.software_genre, World Wide Web, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Processing Instruction, SGML, computer
Abstract: HTML demonstrates that SGML markup is useful for networked information. How can it be made even more useful? One way is to extend the tag set from HTML to HTML2, etc. We argue here for a more radical approach: full SGML awareness in WWW. We believe the difficulties are small, the cost affordable, and the advantages overwhelming. SGML is a metalanguage for defining markup languages; HTML is just one instance of this infinite family. At present, documents in other SGML document types must be translated into HTML for display by a Mosaic client—sometimes this imposes unacceptable information loss. WWW browsers could handle other SGML document types without translation by launching a general-purpose SGML browser to view them, as they now launch graphics viewers; a better solution overall would be to build SGML display into the WWW browsers themselves. Either way, display of an SGML document would be controlled by a style sheet using a small number of display primitives (“bold”, “line break”, etc.) to specify the rendition of each element type. For “well-known” document type definitions (DTDs) like HTML, style sheets could be distributed with the browser, or built in. For other DTDs, the browser would fetch a style sheet from the server. Using style sheets, browser software can also make it easy to customize document display. DTDs and style sheets can be designed to accommodate extensions, ensuring that authors can make small extensions to the tag set with no change whatsoever in the target browsers and virtually no performance penalty.
Published: 1995

32. Transformation list for SGML application

Author: Hong Gao
Subjects: Numeric character reference, Information retrieval, Computer science, Interface (Java), Programming language, Document type declaration, Document type definition, SGML entity, computer.file_format, computer.software_genre, Computer Science Applications, Theoretical Computer Science, Computational Theory and Mathematics, Hardware and Architecture, Application domain, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Processing Instruction, SGML, computer, Software
Abstract: SGML (Standard Generalized Markup Language) is an ISO standard for document description (ISO 8879). The main idea in SGML is to specify document both by text and by the document’s structure without reference to a particular processing system. This kind of document description puts the document interchange into fact. But there are very few systems of SGML that have friendly interface and are portable in many applications. In this paper, various approaches to implementing SGML are assessed and the transformation list for SGML application is introduced. This approach is not limited to specific application fields. It is suitable to any application domain and is friendly to users. Users can understand it without any training and can use it as easily as doing their routine work. It will accelerate the development of the document interchange.
Published: 1995

33. The qwertz synthesis of SGML and LaTEX

Author: Thomas F. Gordon
Subjects: Unix, Markup language, Computer science, Programming language, Document type declaration, Document type definition, computer.file_format, Document processing, computer.software_genre, troff, Hardware and Architecture, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, SGML, Law, computer, Software, De facto standard
Abstract: Markup languages are used to identify and delimit the components of manuscripts. The principal application of these languages is to provide a means for authors to markup their manuscripts with the information required by publishers for typesetting. LaT E X is a popular de facto standard markup language in some technical communities, such as academic computer science. SGML is an official ISO standard for defining markup languages. The qwertz document processing system is an SGML application we have developed for our own use, intended to combine the advantages of SGML and LaT E X. It consists of a model of LaT E X as a SGML Document Type Declaration (DTD) and Unix tools for translating SGML documents using this DTD into LaT E X, as well as troff. This article discusses our experiences in building and using the system.
Published: 1995

34. HTML5: The New Semantics and New Approaches to Document Markup

Author: Joshue O. Connor
Subjects: World Wide Web, XHTML, RuleML, Markup language, Information retrieval, HTML5, Computer science, Document type declaration, Document Definition Markup Language, computer, PCDATA, Collaborative Application Markup Language, computer.programming_language
Abstract: In this chapter, you’ll start to look at the HTML5 specification in more detail, especially the aspects of it that most relate to the development of accessible interfaces. There are many new APIs that do background client/server processing and data storage that can be leveraged for rich, responsive applications, but you’ll be seeing mostly the aspects of HTML5 that impact accessibility for users.
Published: 2012

35. The latest information related technology. SGML and full-text database

Author: Hidehiro Ishizuka
Subjects: Information retrieval, Computer science, Document type declaration, Text database, business.industry, computer.file_format, Document type definition, law.invention, World Wide Web, law, Electronic publishing, Hypertext, SGML, business, computer, PCDATA
Abstract: SGML (Standard Generalized Markup Language，標準汎用マーク付け言語)と，その全文データベースヘの適用について解説した。なお，ここで全文データベースとは図表や画像を含むものを言う。 SGMLに基づく全文データベースでは，構造はSGMLで書いたDTD (document type definition)で表現され，テキスト自体はDTDに従った汎用マーク付けを用いて記述される。本橋では章節，段落などの階層構造，注，図表，画像などの非階層構造(参照構造)といった文書構造をいかに表現するか，例を挙げて解説した。そして，SGMLの効用，電子出版，検索システム，ハイパーテキスト，SGML関連ツールなどについても述べた。
Published: 1994

36. Version-aware XML documents

Author: Ethan V. Munson and Cheng Thao
Subjects: Document Structure Description, Information retrieval, Document type declaration, computer.internet_protocol, Computer science, Well-formed document, Document management system, computer.software_genre, World Wide Web, Document Schema Definition Languages, computer, Software versioning, XML, Vision document
Abstract: A document often goes through many revisions before it is finalized. In the normal document creation process, newer revisions overwrite older ones and only the final revision is kept. At any stage of document creation, it might be desirable to see how the document came to its current form or to revert back to a previous revision. Conventional version control tools such as CVS could help authors do exactly this. However, these tools are unlikely to be adopted by non-technical document authors due to the overhead of managing a repository and the tools' learning curves.This paper presents an approach called version-aware documents that embeds versioning data within the document thus making version control for single documents a seamless part of the authoring process.
Published: 2011

37. HTML

Author: Mario Heiderich, Eduardo Alberto Vela Nava, Gareth Heyes, and David Lindsay
Subjects: Markup language, Computer science, business.industry, Document type declaration, Scalable Vector Graphics, Document type definition, computer.file_format, HTML, JavaScript, World Wide Web, Web page, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Web application, business, computer, computer.programming_language
Abstract: Publisher Summary This chapter discusses HTML (HyperText Markup Language), the markup language for structuring Web pages. Mastering HTML from a security point of view—in terms of both attack and defense—is complicated and requires almost encyclopedic knowledge. This chapter attempts to provide hat knowledge. In addition to discussing the HTML family and its hidden gems for attackers and trapdoors for defenders, this chapter sheds some light on the differences between the different HTML standards and their actual implementations. The history and basic elements of HTML and markup languages are discussed to get a better understanding of how and where to obfuscate. Some ways to obfuscate markup include execution of JavaScript, the obfuscation of a URL, or even a DoS attack against the client rendering the markup. Markup and HTML are difficult to parse and secure, and the user agents make this task difficult by allowing crazy combinations of characters, attributes, and tags to execute JavaScript. HTML is usually part of an attack against Web applications; although it is called a “markup language,” it is very powerful and should be treated with respect.
Published: 2011

38. A novel XML-based document format with printing quality for web publishing

Author: Yinyan Yu, Liangcai Gao, Zhi Tang, and Ruiheng Qiu
Subjects: Document Structure Description, Computer science, computer.internet_protocol, Document type declaration, business.industry, XML validation, Well-formed document, Document management system, computer.software_genre, World Wide Web, Simple API for XML, Publishing, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Single source publishing, Document engineering, business, computer, XML
Abstract: Although many XML-based document formats are available for printing or publishing on the Internet, none of them is well designed to support both high quality printing and web publishing. Therefore, we propose a novel XML-based document format for web publishing, called CEBX, in this paper. The proposed format is a fixed-layout document supporting high quality printing, which has optimized document content organization, physical structure and protection scheme to support web publishing. There are four noteworthy features of CEBX documents: (1) CEBX provides original fixed layout by graphic units for printing quality. (2) The content in CEBX document can be reflowed to fit the display device basing on the content blocks and additional fluid information. (3) XML Document Archiving model (XDA), the packaging model used in CEBX, supports document linearization and incremental edit well. (4) By introducing a segment-based content protection scheme into CEBX, some part of a document can be previewed directly while the remaining part is protected effectively such that readers only need to purchase partial content of a book that they are interested in. This will be very helpful to document distribution and support flexible business models such as try-beforebuy, on-demand reading, superdistribution, etc.
Published: 2010

39. Document recognition

Author: Nenad Marovac
Subjects: Multiple document interface, Information retrieval, Document type declaration, Computer science, Programming language, Well-formed document, Document management system, computer.software_genre, User requirements document, Document processing, Document Schema Definition Languages, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, General Materials Science, computer, Document layout analysis
Abstract: Document recognition is a task in which a document in its physical presentation format is transformed into a structured author-oriented model of the document. The presentation format can be bitmaps of document pages, a description of the document in a Page Description Language (PDL), or encoding of the document in a printer or graphics language. The structured model is a format allowing for addition to the document, manipulation of the document, and reformating the layout and the output appearance of the document.Fully automatic document recognition is not possible, in general, for the same reason that it is not possible to de-translate computer programs automatically. However, it is possible to develop a man-assisted semi-automatic document recognition method. This method uses two passes. The first pass is completely automatic; it produces a document format called Interactive Document Model. The Interactive Document Model comprises recognized typesetting and descriptive structures together with derived ODA logical and layout structures for the document. The model generated in the first pass is enough for most purposes and applications. However, if it is not acceptable, the user can then enter the second pass and interactively edit the logical structure.This paper has three objectives. The first is to formalize the concept of document recognition. The second is to subdivide the problem of document recognition and classify it into a number of subproblems, each dealing with different aspects of the problem. The third objective is to introduce a problem which we wish to solve, and then to present a High Level Document Recognition method and the experience in developing and using a number of implementations of the method.
Published: 1992

40. Organizational Hypermedia Document Management Through Metadata

Author: Garp Choong Kim and Woojong Suh
Subjects: Computer science, Document type declaration, Hypermedia, Well-formed document, Document management system, Document type definition, computer.software_genre, law.invention, Metadata, World Wide Web, law, Document Definition Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Meta element, computer
Abstract: Web business systems, the most popular application of hypermedia, typically include a lot of hypermedia documents (hyperdocuments), which are also called Web pages. These systems have been conceived as an essential instrument in obtaining various beneficial opportunities for CRM (customer relationship management), SCM (supply chain management), e-banking or e-stock trading, and so forth (Turban et al., 2004). Most companies have made a continuous effort to build such systems. As a result, today the hyperdocuments in the organizations are growing explosively. The hyperdocuments employed for business tasks in the Web business systems may be referred to as organizational hyperdocuments (OHDs). The OHDs typically play a critical role in business, including the forms of invoices, checks, orders, and so forth. The organization’s ability to adapt the OHDs rapidly to ever-changing business requirements may impact on business performance. However, the maintenance of the OHDs increasing continuously is becoming a burdensome task to many organizations; managing them is as important to economic success as is software maintenance (Brereton et al., 1998). An approach to solve the challenge of managing OHDs is to use metadata. Metadata are generally known as data about data (or information about information). Concerning this approach, this article first reviews the previous studies and discusses perspectives desirable to manage the OHSs and then provides metadata classification and elements. Finally, this article discusses future trends and makes a conclusion.
Published: 2009

41. The X Factor: From HTML to XHTML

Author: N. Perlin
Subjects: XHTML, Computer science, Document type declaration, computer.file_format, Character encodings in HTML, HTML element, World Wide Web, XML framework, Wireless Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Document Object Model, computer, computer.programming_language, XForms
Abstract: Created by Tim Berners-Lee in 1989/1990, HTML was the heart of the World Wide Web. Today, HTML is dead, replaced by XHTML which is HTML reformulated as an instance of XML. As technical communication moves into an era in which Flare works in native XHTML, ePublisher Professional outputs XHTML, and so on, it's important to know what XHTML is in order to understand how it will affect your work and choice of tools. This paper summarizes XHTML. The conference presentation will go into more detail
Published: 2006

42. Document Markup for the Web

Author: Michael Kohlhase
Subjects: World Wide Web, XHTML, Markup language, Document type declaration, Computer science, Document Definition Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Well-formed document, Document type definition, HTML, computer, PCDATA, computer.programming_language
Abstract: Document markup is the process of adding codes to a document to identify the structure of a document and to specify the format in which its fragments are to appear. We will discuss two conflicting aspects — structure and appearance — in document markup. As the Internet imposes special constraints imposed on markup formats, we will reflect its influence.
Published: 2006

43. Integrating Translation Services within a Structured Editor

Author: Ali Choumane, Cécile Roisin, Hervé Blanchon, Communication Langagière et Interaction Personne-Système (CLIPS - IMAG), Université Joseph Fourier - Grenoble 1 (UJF)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Web, adaptation and multimedia (WAM), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), P. King, and Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Grenoble (INPG)-Université Joseph Fourier - Grenoble 1 (UJF)
Subjects: Machine translation, Process (engineering), Computer science, computer.internet_protocol, Well-formed document, 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], World Wide Web, Structured document, 0202 electrical engineering, electronic engineering, information engineering, Dialog box, 0105 earth and related environmental sciences, Information retrieval, Document type declaration, ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing/I.2.7.4: Machine translation, ACM: I.: Computing Methodologies/I.7: DOCUMENT AND TEXT PROCESSING/I.7.2: Document Preparation/I.7.2.1: Format and notation, [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, Document Schema Definition Languages, ACM: I.: Computing Methodologies/I.7: DOCUMENT AND TEXT PROCESSING/I.7.2: Document Preparation/I.7.2.5: Markup languages, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, 020201 artificial intelligence & image processing, computer, XML
Abstract: International audience; Fully automatic machine translation cannot produce high quality translation; Dialog-Based Machine Translation (DB-MT) is the only way to provide authors with a means of translating documents in languages they have not mastered, or do not even know. With such environment, the author must help the system to understand the document by means of an interactive disambiguation step. In this pa- per we study the consequences of integrating the DBMT services within a structured document editor (Amaya). The source document (named edited document) needs a compan- ion document enriched with dierent data produced during the interactive translation process (question trees, answers of the author, translations). The edited document also needs to be enriched (annotated) in order to enable access to the question trees. The enriched edited document and the com- panion document have to be synchronized in case the edited document is further updated.
Published: 2005

44. Separating XHTML content from navigation clutter using DOM-structure block analysis

Author: Mehmet A. Orgun, Constantine Mantratzis, and Steve Cassidy
Subjects: XHTML, Information retrieval, Computer science, Document type declaration, Short paper, Document clustering, Hyperlink, World Wide Web, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Clutter, Clipping (computer graphics), Document Object Model, computer, computer.programming_language
Abstract: This short paper gives an overview of the principles behind an algorithm that separates the core-content of a web document from hyperlinked-clutter such as text advertisements and long links of syndicated references to other resources.Its advantage over other approaches is its ability to identify both loosely as well as tightly defined "table-like" or "list-like" structures of hyperlinks (from nested tables to simple, bullet-pointed lists) by operating at various levels within the DOM tree.The resulting data can then be used to extract the core-content from a web document for semantic analysis or other information retrieval purposes as well as to aid in the process of "clipping" a web document to its bare essentials for use with hardware-limited devices such as PDAs and cell phones.
Published: 2005

45. Contextual Metadata for Document Databases

Author: Airi Salminen, Virpi Lyytikäinen, and Pasi Tiitinen
Subjects: Information retrieval, Database, Computer science, Document type declaration, Well-formed document, computer.software_genre, Metadata repository, World Wide Web, Metadata, Document Schema Definition Languages, Synonym ring, Geospatial metadata, computer, Database catalog
Abstract: Metadata has always been an important means to support accessibility of information in document collections. Metadata can be, for example, bibliographic data manually created for each document at the time of document storage. The indexes created by Web search engines serve as metadata about the content of Web documents. In the semantic Web solutions, ontologies are used to store semantic metadata (Berners-Lee et al., 2001). Attaching a common ontology to a set of heterogeneous document databases may be used to support data integration. Creation of the common ontology requires profound understanding of the concepts used in the databases. It is a demanding task, especially in cases where the content of the documents is written in various natural languages. In this chapter, we propose the use of contextual metadata as another means to add meaning to document collections, and as a way to support data integration. By contextual metadata, we refer to data about the context where documents are created (e.g., data about business processes, organizations involved, and document types). We will restrict our discussion to contextual metadata on the level of collections, leaving metadata about particular document instances out of the discussion. Thus, the contextual metadata can be created, like ontologies, independently of the creation of instances in the databases.
Published: 2005

46. Practical SGML as an introduction to SGML

Author: Lynne A. Price
Subjects: World Wide Web, Information retrieval, Computer science, Document type declaration, Processing Instruction, General Medicine, computer.file_format, Document type definition, SGML, computer
Published: 1996

47. Conversion of PDF documents into HTML: a case study of document image analysis

Author: Hassan Alam and Fuad Rahman
Subjects: HTML5, Information retrieval, Computer science, Document type declaration, Well-formed document, Document management system, Document clustering, HTML, computer.software_genre, World Wide Web, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, Document layout analysis, computer.programming_language
Abstract: Portable document format (PDF) has become the de facto standard in many fields because of its independence of local formatting restrictions and its accurate reproducibility. On the other hand, HTML documents are becoming an integral form of our lives by being the dominant form for information exchange within the World Wide Web environment. This paper discusses how image-processing techniques can be used to perform document layout analysis of complex multiple-column PDF documents. This analysis allows the conversion of these documents into the HTML format keeping the logical and physical layout intact.
Published: 2004

48. Document transformation system from papers to XML data based on pivot XML document method

Author: Y. Ishitani
Subjects: Document Structure Description, XML Encryption, computer.internet_protocol, Computer science, Efficient XML Interchange, XML Signature, Well-formed document, XSLT, Document type definition, computer.software_genre, Simple API for XML, XML Schema Editor, Streaming XML, XML namespace, XML schema, computer.programming_language, XHTML, Information retrieval, Document type declaration, XML validation, computer.file_format, XML framework, XML database, XML Schema (W3C), Document Schema Definition Languages, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Document Object Model, computer, XML, XML Catalog
Abstract: This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transformation strategy based on a pivot XML document. Firstly, document elements such as title, authors, abstract, headings, paragraphs, lists, captions, tables and figures are extracted from document images. Secondly, the hierarchical structure of document elements is extracted and is described using a DOM tree. Thirdly, this document structure is converted into a pivot XML document described as an XHTML document by an XML parser. Finally, this pivot XML document is transformed into the target XML document by the XML parser with XSLT scripts or specific programs. Experimental results show the method is effective in transforming printed documents to various XML documents.
Published: 2004

49. A Correspondence between UML Diagrams and SGML/XML DTDs

Author: Anne Eerola and Eila Kuikka
Subjects: Document Structure Description, Markup language, RuleML, computer.internet_protocol, Computer science, Well-formed document, SGML entity, Document type definition, computer.software_genre, Unified Modeling Language, SGML, Object Constraint Language, computer.programming_language, XHTML, Document type declaration, Programming language, XML validation, computer.file_format, Geography Markup Language, XML Schema (W3C), Extensible markup, Document Definition Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML, XML Catalog, Collaborative Application Markup Language, PCDATA
Abstract: In this paper, we compare the semantics and structure of the conceptual information presented in the Unified Modeling Language (UML), which is used in analyzing object-oriented systems, and Document Type Definitions (DTD), which define the structures of SGML and XML documents. SGML (Standard Generalized Markup Language) and XML (Extensible Markup Language) are international standards for specifying the notations used for defining structured documents. We present correspondence rules for generating DTDs semiautomatically from UML diagrams. The rules have been developed as a part of the analysis and design method to create the structure definition for a document. As an example, we use a patient record.
Published: 2004

50. Automatic generation algorithm of uniform DTD for structured documents

Author: Chun-Sik Yoo, Seon-Mi Woo, and Yong-Sung Kim
Subjects: Structure (mathematical logic), Intranet, Information retrieval, Finite-state machine, Computer science, Document type declaration, computer.file_format, Document type definition, Tree structure, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Processing Instruction, SGML, computer, Algorithm
Abstract: SGML is the international standard for digital documents to be used in fields like intranet, CALS/EC, and so on. On the other hand, there is a notable problem that in spite of having a similar structure and being conceptually the same kind of document, many SGML documents have different DTDs and are stored in different databases. We propose an algorithm that automatically unifies DTDs of these SGML documents using a tree structure and finite automata. Constructing the SGML document database to apply the proposed algorithm reduces the number of database accesses and increases the efficiency of information retrieval. It provides a more effective management and operation environment for SGML document databases.
Published: 2003

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

78 results on '"Document type declaration"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources