Descriptor: "XML validation" - Searchworks@Jio Institute Digital Library Search Results

101. Identifying XML Entities Via Virtual Keys

Author: Jing Tian
Subjects: Computer science, computer.internet_protocol, Relational database, Efficient XML Interchange, XML validation, Context (language use), computer.file_format, computer.software_genre, Identification (information), Information gain ratio, Data mining, computer, XML, Data integration
Abstract: Since data are acquired from a number of sources, a real-world entity commonly owns multiple formats of representations. Although the heterogeneity issue maintains a prominent concern in data integration and data mining, entity identification is an effective means to solve duplicates. However, the efforts in complicated structures, such as XML data, are not as extensive as the practical relevance that has been explored in the context of relational databases. This article focuses on the duplication detection of XML element based on Sorted Neighbor Method (SNM) and Multi-Pass SNM (MPS). The XEIVK (XML Entities Identification via Virtual Keys) algorithm primarily homogenizes the structures by labels mapping to the template. Subsequently, the virtual keys are created by extracting content of nodes after determining the weight. It calculates the degree of textual similarities in an orchestrated function within a set number of clusters in terms of nodes’ information gain ratio. The experiment illustrates that XEIVK outperforms both SNM and MPS significantly on high precision, meanwhile the less time consuming benefits from the filtering strategies.
Published: 2017

102. Document Type Definition (DTD)

Author: Judith Wusteman
Subjects: Document Structure Description, Information retrieval, computer.internet_protocol, Computer science, Programming language, InformationSystems_DATABASEMANAGEMENT, XML validation, Document type definition, computer.software_genre, XML Schema Editor, Document Definition Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, RELAX NG, computer, XML, PCDATA
Abstract: Document Type Definitions (DTDs) are schemas that describe the structure and, to a limited extent, the content of Extensible Markup Language (XML) and Standard Generalized Markup Language (SGML) documents. At its inception, the XML standard inherited the DTD from SGML as its only schema language. Many alternative schema languages have subsequently been developed for XML. But the DTD is still alive and actively used to define narrative-based document types. This entry describes the basic syntax of the DTD and compares it to its two main rivals: W3C XML Schema and RELAX NG
Published: 2017

103. Heterogeneous data security fusion system based on binary tree

Author: Cai Fu, Guohui Li, Deliang Xu, Lansheng Han, and Tao Lv
Subjects: XML Encryption, Binary tree, Computer science, computer.internet_protocol, Efficient XML Interchange, Data security, XML Signature, XML validation, 02 engineering and technology, computer.file_format, computer.software_genre, XML framework, XML database, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, computer, XML, Data integration
Abstract: Nowadays, the performance and security are always hot research topics in different data fusion systems. In this paper, a more efficient and secure heterogeneous data fusion system has been proposed and implemented. Firstly, based on the global data pattern, which is referred to the concept of view in database, the heterogeneous data is transformed into Extensible Markup Language(XML) documents which are regarded as intermediate data conversion format. Secondly, these XML documents are organized as binary trees and stored in trifurcate chain-table, combined with the XML document tree node encoding. Then the appropriate node attributes are selected as index entries with the security level information, all of these makes up the innovative and dynamic index file with security labels, which is the basis of data querying, fusing and showing. Finally, the system is verified by experiments, and results show that the system can feasibly fuse the heterogeneous data correctly and guarantee the safety of the system.
Published: 2017

104. Modeling and Storage of XML Data as a Graph and Processing with Graph Processor

Author: G. Suganthi and A. Sana
Subjects: XML Encryption, Database, Programming language, Computer science, Efficient XML Interchange, XML Signature, XML validation, computer.file_format, computer.software_genre, XML framework, XML database, XML Schema Editor, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer
Abstract: XML is a standard format for data exchange overinternet. Also huge amount of information is tagged and storedin XML format. Processing XML data has its difficulties due tothe schema centric and semi-structured nature of the majorportion of existing XML data. The data embeded tree stucturemakes it more complicated to process. XML processing usingRDBMS systems and Native XML databases like BaseX, eXist-DBhas its own limitations. Native XML databases are not suitablefor distributed processing. So they just have to bound withsingle systems resources, which are not enough for big dataprocessing. Graph databases and Graph database technologies areemmerging in the recent past. They are also suitable to process bigdata due to the extension of parallel processing features in graphdata processors. Modeling XML data as a graph and processingit with graph processors are benaficial in many contests. In thispaper the graph modleing, storage and processing possibilities ofXML data are analysed. The major graph database Neo4j andthe GraphX graph processor extension embeded with ApacheSpark distributed in-memory processing system are utilized forquerying XML data.
Published: 2017

105. A review on XML keyword query processing

Author: Prashant R. Lambole and P. N. Chatur
Subjects: Document Structure Description, XML Encryption, Information retrieval, Computer science, computer.internet_protocol, Efficient XML Interchange, XML validation, 02 engineering and technology, computer.file_format, computer.software_genre, XML database, XML Schema Editor, 020204 information systems, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, computer, XPath
Abstract: Keyword search is gaining popularity for querying XML data now days as it relieves user from understanding the complex schemas of XML document and query languages such as XQuery and XPath. Various query processing techniques and efficient algorithms have been proposed in recent days to address the keyword search over XML data. The most popular techniques for XML keyword search today use query semantics ELCA (Exclusive LCA) and SLCA (Smallest LCA), both based on LCA (Lowest Common Ancestor). Among these ELCA captures more meaningful results compared with LCA and ELCA. However these techniques can result in redundant computation due to problems like common-ancestor-repetition (CAR) and visiting-useless-node (VUN). Irregular schemas of given XML document and missing elements in it are also problems of consideration in keyword query processing over XML data. In this paper we try to make an attempt to review various XML keyword query processing techniques. We also highlight some of the important issues associated with respective techniques and improvements done in order to address the issues and thereby improving overall efficiency of the XML keyword search query processing.
Published: 2017

106. Introduction to XML

Author: Jonathan Hartwell
Subjects: Document Structure Description, XML Encryption, Database, Computer science, XML Signature, XML validation, Document type definition, computer.file_format, computer.software_genre, Linguistics, XML database, XML Schema Editor, SGML, computer
Abstract: So when should you use attributes and when should you use elements? Well, that is largely up to you, but convention usually says that when you have descriptive information, it should go as an attribute. Conversely, information that is part of the data should be an element. In the above examples it makes more sense for us to put the title and author as attributes than it does as children, or elements. This is because those two pieces of information are directly tied to the book. If we were to add chapters to the example, then it would make sense for those chapters to be child elements of the book element.
Published: 2017

107. Search and Aggregation in XML Documents

Author: Abdelmalek Habi, Hamamache Kheddouci, Brice Effantin, Effantin, Brice, Appel à projets générique - Recherche d'Information Agrégative et Contextuelle - - CAIR2014 - ANR-14-CE23-0006 - Appel à projets générique - VALID, Graphes, AlgOrithmes et AppLications (GOAL), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), and ANR-14-CE23-0006,CAIR,Recherche d'Information Agrégative et Contextuelle(2014)
Subjects: Document Structure Description, Computer science, computer.internet_protocol, Efficient XML Interchange, XML Signature, Well-formed document, 02 engineering and technology, computer.software_genre, Simple API for XML, XML Schema Editor, 0202 electrical engineering, electronic engineering, information engineering, XML schema, computer.programming_language, Information retrieval, 05 social sciences, XML validation, computer.file_format, XML framework, XML database, [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, 020201 artificial intelligence & image processing, [INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR], 0509 other social sciences, 050904 information & library sciences, computer, XML, XML Catalog
Abstract: International audience; Information retrieval encounters a migration from the traditional paradigm (returning an ordered list of responses) to the aggregate search paradigm (grouping the most comprehensive and relevant answers into one final aggregated document). Nowadays extensible markup language (XML) is an important standard of information exchange and representation. Usually the tree representation of documents and queries is used to process them. It allows to consider the XML documents retrieval as a tree matching problem between the document trees and the query tree. Several paradigms for retrieving XML documents have been proposed in the literature but only a few of them try to aggregate a set of XML documents in order to provide more significant answers for a given query. In this paper, we propose and evaluate an aggregated search method to obtain the most accurate and richest answers in XML fragment search. Our search method is based on the Top-k Approximate Subtree Matching (TASM) algorithm and a new similarity function is proposed to improve the returned fragments. Then an aggregation process is presented to generate a single aggregate response containing the most relevant, exhaustive and non-redundant information given by the fragments. The method is evaluated on two real world datasets. Experimentations show that it generates good results in terms of relevance and quality.
Published: 2017

108. SpiderX

Author: Jianguo Wang and Chunbin Lin
Subjects: Document Structure Description, XML Encryption, computer.internet_protocol, Computer science, Efficient XML Interchange, XML Signature, computer.software_genre, Query language, Web query classification, XML Schema Editor, Schema (psychology), Entity–relationship model, Streaming XML, XML schema, RDF, computer.programming_language, Information retrieval, Web search query, business.industry, cXML, Semantic search, XML validation, computer.file_format, XML framework, XML database, business, computer, XML, XML Catalog
Abstract: Keyword search in XML has gained popularity as it enables users to easily access XML data without the need of learning query languages and studying complex data schemas. In XML keyword search, query semantics is based on the concept of Lowest Common Ancestor (LCA), e.g., SLCA and ELCA. However, LCA-based search methods depend heavily on hierarchical structures of XML data, which may result in meaningless answers. To obtain desired answers, a successful system should be able to (i) match a semantic entity for each keyword, (ii) discover the relationships of the matched entities, (iii) support efficient query processing, (iv) release users from having the knowledge of the XML content, and (v) visualize the search results. None of the existing XML keyword search systems completely meet the above requirements. In this paper, we design a system called SpiderXto completely solves the above challenges. We propose a query semantics Entity-Relationship Graph (ERG), which adopts the RDF subject-predicate-object semantics to capture the information of search entities along with associated attributes and the relationships between entities. SpiderX proposes a novel index structure, which has small space cost by combining the optimizations of column databases and the data compression schemes. In addition, SpiderX processes queries in a bottom-up way to achieve high performance, which is about 100X faster than the state-of-the-art algorithms. To demonstrate the high performance of SpiderX, we implement an online demo for SpiderX, which operating on three real-life datasets. The demo also provides (1) query auto-completion to guide users to formulate queries; and (2) visualization panel to display the query answers, which interacts with users by providing zoom-in and zoom-out exploration features. Demo link: http://chunbinlin.com/spiderx.
Published: 2017

109. Generating Customized PDF Document Based XML Source Data

Author: Shisheng Zhou, Rubai Luo, and Meng Wang
Subjects: Document Structure Description, Information retrieval, Database, Computer science, computer.internet_protocol, Well-formed document, XML validation, Document type definition, computer.software_genre, XML framework, Simple API for XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML, XML Catalog
Abstract: To study the generation method of customized PDF document in Publishing on demand (POD) system with one source data. First, based on the study of the requirements of source data description in POD system, the publishing data description with XML is proposed; then, the customized content and layout are described using XSL-FO; Finally, the XML source data is converted to customized PDF document using XSL-FO in FOP transformation engine, which is open source. The automatic generation method of customized PDF document based XML source data is proposed. The customized PDF documents are successfully generated using FOP in a experimental. The experimental result shows that the generation method of customized PDF document based XSL-FO is available.
Published: 2017

110. XML in .NET Framework

Author: Bipin Joshi
Subjects: XML Encryption, computer.internet_protocol, Computer science, Programming language, Efficient XML Interchange, XML validation, computer.file_format, computer.software_genre, XML framework, XML Schema Editor, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, XML schema, computer, XML, computer.programming_language
Abstract: Up until now, you have learned how to work with your own XML data. This includes reading, writing, validating, serializing, and querying XML data. Microsoft has used XML extensively in the .NET Framework. This use of XML comes in different flavors, such as XAML markup of note, server control markup of ASP.NET, and configuration system of the .NET Framework. Understanding the use of XML in the .NET Framework is therefore essential for any .NET developer.
Published: 2017

111. A Method of XML Twig Query Processing based on XML Document Schema

Author: Yi Yu
Subjects: Document Structure Description, XML Encryption, Information retrieval, XML Schema Editor, Computer science, Streaming XML, Efficient XML Interchange, Well-formed document, XML validation, RELAX NG, computer.file_format, computer
Published: 2017

112. Research of Core Configuration File for Integrated SSH Framework

Author: HouHua SHen and Yongchang Ren
Subjects: XML Encryption, Database, computer.internet_protocol, Computer science, XML Signature, XML validation, computer.software_genre, XML framework, Simple API for XML, Operating system, XML schema, computer, XML, XML Catalog, computer.programming_language
Published: 2017

113. Pattern-Based Misalignment Symptom Detection with XML Validation: A Case Study

Author: Dóra Őri
Subjects: Structure (mathematical logic), Correctness, Strategic alignment, Schematron, Computer science, Enterprise architecture, XML validation, 02 engineering and technology, computer.software_genre, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Information system, 020201 artificial intelligence & image processing, State (computer science), Data mining, computer
Abstract: In this paper, an analytical solution is built to approach the topic of strategic misalignment from an enterprise architecture (EA)-based perspective. The study aims to accomplish an EA-based, systematic analysis of mismatches between business and information systems. The research takes a pattern-based approach to reveal the symptoms of malfunctioning alignment areas. In this study, the analytical potential of pattern generation and rule testing are utilized in complex EA environment. Misalignment symptoms – defined as formal patterns – are detected in the underlying EA models by using XML validation tools. Pattern generation and rule testing are supported by Schematron, a pattern-based XML validation language. The operation, the correctness and the significance of the approach is validated via a compound case study at a road management authority. The proposed research has the potential to extend our understanding on assessing the state of misalignment in a complex EA model structure by applying rule testing and XML validation techniques in EA environment.
Published: 2017

114. Research and Application of Word Format Checking Technology based on Java and XML

Author: Lu Han, Jinmin Jiang, and Kun Liu
Subjects: 021110 strategic, defence & security studies, XML Encryption, Programming language, computer.internet_protocol, Computer science, Efficient XML Interchange, 0211 other engineering and technologies, 020207 software engineering, XML validation, 02 engineering and technology, computer.file_format, computer.software_genre, XML framework, Java API for XML-based RPC, Streaming XML, 0202 electrical engineering, electronic engineering, information engineering, Fast Infoset, computer, XML
Published: 2017

115. XML in SQL Server

Author: Bipin Joshi
Subjects: XML Encryption, Database, Computer science, Efficient XML Interchange, XML validation, computer.file_format, Data Transformation Services, computer.software_genre, Language Integrated Query, XML database, Streaming XML, computer, Business Intelligence Markup Language, computer.programming_language
Abstract: Most business applications store data in some kind of datastore, which is usually a relational database. To that end, SQL Server is one of Microsoft’s flagship products. Since many applications rely on the XML data, that Microsoft found it necessary to incorporate strong support for XML in their database engine.
Published: 2017

116. XML in ADO.NET

Author: Bipin Joshi
Subjects: XML Encryption, Computer science, Efficient XML Interchange, XML Signature, XML validation, computer.file_format, computer.software_genre, World Wide Web, XML database, XML Schema Editor, Streaming XML, Hardware_INTEGRATEDCIRCUITS, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, XML schema, computer, computer.programming_language
Abstract: ADO.NET is a technology for accessing and manipulating databases. XML has been integrated in several ways in ADO.NET. In this chapter, you are going to see how ADO.NET has harnessed the power of XML in data representation.
Published: 2017

117. A Framework for Clustering and Dynamic Maintenance of XML Documents

Author: Chengfei Liu, Rui Zhou, Ahmed Al-Shammari, Tarique Anwar, Mehdi Naseriparsa, and Bao Quoc Vo
Subjects: Document Structure Description, Database, Computer science, cXML, Efficient XML Interchange, XML validation, 02 engineering and technology, computer.file_format, computer.software_genre, XML framework, 020204 information systems, Streaming XML, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, XML schema, Cluster analysis, computer, computer.programming_language
Abstract: Web data clustering has been widely studied in the data mining communities. However, dynamic maintenance of the web data clusters is still a challenging task. In this paper, we propose a novel framework called XClusterMaint which serves for both clustering and maintenance of the XML documents. For clustering, we take both structure and content into account and propose an efficient solution for grouping the documents based on the combination of structure and content similarity. For maintenance, we propose an incremental approach for maintaining the existing clusters dynamically when we receive new incoming XML documents. Since the dynamic maintenance of the clusters is computationally expensive, we also propose an improved approach which uses a lazy maintenance scheme to improve the performance of the clusters maintenance. The experimental results on real datasets verify the efficiency of the proposed clustering and maintenance model.
Published: 2017

118. Accessing XML Documents Using the XPath Data Model

Author: Bipin Joshi
Subjects: Information retrieval, Database, computer.internet_protocol, Computer science, XPath 2.0, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, XML validation, XSLT, computer.software_genre, XML database, XML Schema (W3C), Simple API for XML, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, computer.programming_language, XPath
Abstract: These classes allow you to access the underlying documents, but by themselves they hardly provide a way to query and retrieve the data. That is why we need something that allows us to navigate, query, and retrieve data from XML documents easily and efficiently. The XPath standard is designed to do just that.
Published: 2017

119. Towards the XML schema measurement based on mapping between XML and OO domain

Author: Gordana Rakic, Marjan Hericko, Maja Pusnik, and Zoran Budimac
Subjects: Document Structure Description, XML Encryption, Programming language, Computer science, Efficient XML Interchange, XML validation, computer.file_format, computer.software_genre, XML Schema Editor, Streaming XML, RELAX NG, XML schema, computer, computer.programming_language
Abstract: Measuring quality of IT solutions is a priority in software engineering. Although numerous metrics for measuring object-oriented code already exist, measuring quality of UML models or XML Schemas is still developing. One of the research questions in the overall research leaded by ideas described in this paper is whether we can apply already defined object-oriented design metrics on XML schemas based on predefined mappings. In this paper, basic ideas for mentioned mapping are presented. This mapping is prerequisite for setting the future approach to XML schema quality measuring with object-oriented metrics.
Published: 2017

120. Transformation of XML Data Sources for Sequential Path Mining

Author: Guoze Zhao, Yaxin Bi, Bing Han, and Ruth McNerlan
Subjects: XML tree, Theoretical computer science, Computer science, Efficient XML Interchange, XML Signature, XML validation, computer.file_format, computer.software_genre, XML framework, XML database, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, XML schema, Data mining, computer, computer.programming_language
Abstract: In recent years XML has become one of the most promising ways to define semi-structured data. Data mining techniques devised for detecting interesting patterns from semi-structure data have also grown in popularity, but carrying out such techniques on XML data can be problematic due to its hierarchical structure. Therefore, it has become necessary to transform XML into flattened, path data, so as to enable data mining to be carried out efficiently. However, problems may arise when the XML tree needs to be reconstructed from the traversal path. There are currently many transformation techniques for XML data, many of which take advantage of its tree-like hierarchical structure; but most of these approaches do not allow the XML tree to be reconstructed from the traversal path. In this paper we propose a new approach to the transformation of XML data into path data. The new approach employs a 5 step transformation process along with a new ‘Postorder Sequencing’ method of traversing the XML tree. The proposed method, on the one hand, can be seen an efficient and effective way of transforming XML data into collections of paths, and on the other hand enables XML trees to be generated from the traversal paths.
Published: 2017

121. Manipulating XML Documents Using the Document Object Model

Author: Bipin Joshi
Subjects: Document Structure Description, Information retrieval, computer.internet_protocol, Computer science, XML validation, Well-formed document, Document type definition, World Wide Web, Simple API for XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, XML schema, computer, XML, XML Catalog, computer.programming_language
Abstract: You also learned that the Document Object Model (DOM) is a set of APIs for manipulating XML documents. To that end, this chapter covers all the essential functionality offered by the .NET Framework’s DOM classes.
Published: 2017

122. IDEF0-DIAGRAM INTO DATABASE CONVERSION APPROACH DEVELOPMENT

Author: A.A. Vakalyuk and S.N. Basmanov
Subjects: IDEF0-diagram, Document Structure Description, computer.internet_protocol, Computer science, Efficient XML Interchange, functional blocks, XML Signature, XML файл, XML file, computer.software_genre, information system, interface arrows, IDEF0-диаграмма, XML schema, информационная система, интерфейсные стрелки, computer.programming_language, функциональные блоки, Database, УДК 004.9, XML validation, General Medicine, computer.file_format, XML framework, XML database, IDEF0-DIAGRAM,INFORMATION SYSTEM,*.XML FILE,FUNCTIONAL BLOCKS,INTERFACE ARROWS,DEF0-ДИАГРАММА,ИНФОРМАЦИОННАЯ СИСТЕМА,*.XML ФАЙЛ,ФУНКЦИОНАЛЬНЫЕ БЛОКИ,ИНТЕРФЕЙСНЫЕ СТРЕЛКИ, computer, XML
Abstract: Вакалюк Андрей Александрович, канд. техн. наук, доцент кафедры мехатроники, Уральский государственный университет путей сообщения, г. Екатеринбург; avakalyuk@yandex.ru. Басманов Сергей Николаевич, аспирант кафедры мехатроники, Уральский государственный университет путей сообщения, г. Екатеринбург; seregabasmanov@rambler.ru. A.A. Vakalyuk, avakalyuk@yandex.ru, S.N. Basmanov, seregabasmanov@rambler.ru Ural State University of Railway Transport, Ekaterinburg, Russian Federation Разработан подход к конвертированию IDEF0-диаграммы, созданной в средстве проектирования бизнес-процессов CA ERwin Process Modeler, в базу данных по средствам анализа *.XML файла для решения задачи моделирования бизнес-процессов организации в рамках информационной системы. Произведен анализ основных тегов *.XML файла с целью распознания имен и связей между функциональными блоками, их декомпозициями и интерфейсными связями. На основе анализа *.XML файла разработана структура БД, в рамках которой все таблицы связаны отношением «один-ко-многим» при помощи внешнего ключа. Полученные в ходе исследования результаты отражают актуальные задачи, стоящие перед организациями, реализация которых позволит перевести предприятие на новый технологический и организационный уровни, и сделать его более конкурентоспособным в современных экономических условиях. Работа выполнена по специальности 05.13.01 – Системный анализ, управление и обработка информации (по отраслям). An approach was developed to the conversion IDEF0-diagram, created in the business processes design tool CA ERwin Process Modeler, to database by analysis of *.XML file for task solution of organization business processes modeling within the scope of information system. Due to recognizing names and links between functional blocks, their decompositions and interface connections the analysis of the basic tags of *.XML file was done. Based on the analysis *.XML file database structure is developed, in which all the tables are related by “one-to-many” with a foreign key. Research results display actual tasks for companies. Solving tasks companies can move to a new technological and organizational level and compete more effective in modern economic conditions. Job was done by 05.13.01 specialty – systems analysis, control and information processing (branches).
Published: 2017

123. Validating XML Documents

Author: Bipin Joshi
Subjects: Document Structure Description, Information retrieval, Database, Computer science, Efficient XML Interchange, XML Signature, XML validation, computer.file_format, computer.software_genre, XML database, XML Schema Editor, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, XML schema, computer, computer.programming_language
Abstract: However, in many real-world cases this assumption may not be true. For example, a purchase order application might be accepting orders from various customers in XML format. What is the guarantee that each submitted order adheres to the agreed-on XML structure? What if somebody deviates from the agreed-on structure? This is where XML Schema comes into the picture.
Published: 2017

124. An efficient XML query pattern mining algorithm for ebXML applications in e-commerce

Author: Tsui-Ping Chang
Subjects: XML Encryption, Information retrieval, Computer science, Efficient XML Interchange, XML Signature, XML validation, computer.file_format, computer.software_genre, XML database, ebXML, XML Schema Editor, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, General Earth and Planetary Sciences, computer, General Environmental Science
Abstract: Providing efficient query to XML data for ebXML applications in e-commerce is crucial, as XML has become the most important technique to exchange data over the Internet. ebXML is a set of specification for companies to exchange their data in e-commerce. Following the ebXML specifications, companies have a standard method to exchange business messages, communicate data, and business rules in e-commerce. Due to its tree-structure paradigm, XML is superior for its capability of storing and querying complex data for ebXML applications. Therefore, discovering frequent XML query patterns has become an interesting topic for XML data management in ebXML applications. The study presents an efficient mining algorithm, namely ebX2Miner, to discover the frequent XML query patterns for ebXML applications. Unlike the existing algorithms, the study proposes a new idea by encoding the XML user queries and then storing these codes to generate the frequent XML user query patterns. Furthermore, the simulation results show that the ebX2Miner outperforms other algorithms in its execution time and used memory space. Key words: XML query pattern mining, XML query, encoding scheme, ebXML, e-commerce.
Published: 2014

125. Does discarding XML declarations and changing file extensions improve the indexability and visibility of metadata tag names in web search engines?

Author: Sayyed Ramatollah Fattahi, Sayyed Mahdi Taheri, and Nadjla Hariri
Subjects: Information retrieval, computer.internet_protocol, Computer science, Efficient XML Interchange, XML Signature, XML validation, XML Base, computer.file_format, Library and Information Sciences, Metadata, World Wide Web, XML Schema Editor, Web search engine, computer, XML, Information Systems
Abstract: The aim of the study was to find out whether discarding XML declarations and changing file extensions (i.e. .xml) improve the indexability and visibility of DCXML, MARCXML and MODS element tag names in Web search engines. Two groups of metadata records were included in an experimental study: an experimental group composed of 300 XML-based records without XML declarations and with file extensions according to the name of related metadata standards, and a control group composed of 300 XML-based records with the normal XML structure. These were analysed through an experimental approach. The two sets of records were published on two separate websites and then the sites were introduced to Google and Yahoo!. Findings showed that Google and Yahoo! indexed and retrieved all the tag names relating to the experimental group. However they did not index the tag names in the control group’s records. Based on the findings, some patterns are suggested to metadata creators and Web search engine developers.
Published: 2014

126. A Labeling Methods for Keyword Search over Large XML Documents

Author: Dong-Han Sun and Soo-Chan Hwang
Subjects: Document Structure Description, XML Encryption, Information retrieval, Computer science, Efficient XML Interchange, XML Signature, XML validation, computer.file_format, computer.software_genre, XML database, XML Schema Editor, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, XML schema, computer, computer.programming_language
Abstract: As XML documents are getting bigger and more complex, a keyword-based search method that does not require structural information is needed to search these large XML documents. In order to use this method, not only all keywords expressed as nodes in the XML document must be labeled for indexing but also structural information should be well represented. However, the existing labeling methods either have very simple information of XML documents for index or represent the structural information which is difficult to deal with the increase of XML documents' size. As the size of XML documents is getting larger, it causes either the poor performance of keyword search or the exponential increase of space usage. In this paper, we present the Repetitive Prime Labeling Scheme (RPLS) in order to improve the problem of the existing labeling methods for keyword-based search of large XML documents. This method is based on the existing prime number labeling method and allows a parent's prime number to be used at a lower level repeatedly so that the number of prime numbers being generated can be reduced. Then, we show an experimental result of the comparison between our methods and the existing methods.
Published: 2014

127. Tree Based Association Rules for Mining in XML Query -Answerin

Author: Theepigaa. Th, Suganya. P, Rajeswari. S, and Radha. M
Subjects: Document Structure Description, Information retrieval, Database, computer.internet_protocol, Computer science, Efficient XML Interchange, XML validation, computer.file_format, Query optimization, computer.software_genre, XML database, XML Schema Editor, XML schema, computer, XML, computer.programming_language
Published: 2014

128. The Research and Application of Web Data Mining Based on XML

Author: Li Juan Du
Subjects: Computer science, computer.internet_protocol, SOAP, Efficient XML Interchange, XML Signature, Concept mining, XML Base, computer.software_genre, World Wide Web, Text mining, XML Schema Editor, Streaming XML, Binary XML, business.industry, Data stream mining, XML validation, General Medicine, computer.file_format, XML framework, XML Schema (W3C), XML database, Data model, Web mining, The Internet, business, computer, XML
Abstract: With the development of computer and network technology, data mining based on database tables already cannot satisfy the need. The emergence of the Internet has huge amounts of information resources of the computer, and the implied knowledge but it has not been fully used, therefore the Web mining technology become the hotspot in research of high-tech. XML allows structured data from different sources together easily, thus making it possible to search diversification, incompatible database, Web data mining brings new opportunity. In this paper, through the study of the application of XML in Web data mining, this paper proposes a data mining system structure based on XML.
Published: 2014

129. The XML Data Mining Research Based on the Multi-Level Technology

Author: Ping Fang Hu and Su Yu Huang
Subjects: Document Structure Description, XML Encryption, Information retrieval, Database, computer.internet_protocol, Data stream mining, Computer science, Relational database, cXML, Efficient XML Interchange, XML Signature, XML validation, General Medicine, computer.file_format, computer.software_genre, XML framework, Simple API for XML, XML database, XML Schema (W3C), Data exchange, XML Schema Editor, Streaming XML, computer, XML
Abstract: XML has become the standard form of data exchange, more and more data in this form for storage, implying a lot of knowledge in these data information, the need for data mining processing. For XML data mining method at present, most of the need is to pass the XML data into relational data pretreatment process, using the traditional method for processing, data mining process is complex and the effect is not ideal. Therefore, there is an urgent need some effective methods for XML data mining directly.
Published: 2014

130. An Expansion Method of XML Element Retrieval Techniques into Web Documents

Author: Jun Miyazaki, Atsushi Keyaki, and Kenji Hatano
Subjects: Document Structure Description, XML Encryption, Information retrieval, Computer science, computer.internet_protocol, Efficient XML Interchange, Well-formed document, XML validation, computer.file_format, World Wide Web, XML Schema Editor, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML, XML Catalog
Abstract: In this paper, we propose a method to expand XML element retrieval techniques into Web documents. XML element retrieval techniques return partial (sub) documents as search results, and are expected to be able to apply to other structured documents, namely, Web documents besides XML documents. The point is that physical document structures of Web documents are literally disorganized because Web documents are generated for not managing data but rendering on a Web browser. As another feature of Web documents, they contain many incomprehensive contents for human readers. To address challenges caused by these features, we propose 1) a reconstruction method of document structures according to logical structures of contents and 2) a filter for removing unimportant content which does not convey useful information to users. Our experimental evaluations showed that our proposed method improved search accuracy compared with both naive XML element retrieval approach and document retrieval approach.
Published: 2014

131. Research on XML Documents and Relational Database Mapping Based on XML Schema

Author: Xia Zhou
Subjects: Document Structure Description, XML Encryption, Computer science, computer.internet_protocol, Relational database, Semi-structured model, Efficient XML Interchange, XML Signature, XML Base, computer.software_genre, XML Schema Editor, Streaming XML, RELAX NG, XML schema, computer.programming_language, Database model, Information retrieval, Database, cXML, Database schema, XML validation, General Medicine, computer.file_format, XML framework, XML database, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML, XML Catalog
Abstract: In order to implement the mapping between XML documents and relational database, this paper proposes the mapping method based on XML documents and relational database. On the basis of keeping documents ’hierarchy, order and uniqueness, this method can mapping XML documents to the corresponding relational database very quickly and implement the reconstruction of XML documents.
Published: 2014

132. A Sliding-Window Method to Discover Recent Frequent Query Patterns from XML Query Streams

Author: Tsui-Ping Chang
Subjects: Document Structure Description, XML Encryption, Information retrieval, Computer Networks and Communications, Computer science, Efficient XML Interchange, InformationSystems_DATABASEMANAGEMENT, XML Signature, XML validation, computer.file_format, computer.software_genre, Query optimization, Computer Graphics and Computer-Aided Design, XML database, Artificial Intelligence, Streaming XML, Data mining, computer, Software
Abstract: Providing efficient mining algorithm to discover recent frequent XML user query patterns is crucial, as many applications use XML to represent data in their disciplines over the Internet. These recent frequent XML user query patterns can be used to design an index mechanism or cached and thus enhance XML query performance. Several XML query pattern stream mining algorithms have been proposed to record user queries in the system and thus discover the recent frequent XML query patterns over a stream. By using these recent frequent XML query patterns, the query performance of XML data stream is improved. In this paper, user queries are modeled as a stream of XML queries and the recent frequent XML query patterns are thus mined over the stream. Data-stream mining differs from traditional data mining since its input of mining is data streams, while the latter focuses on mining static databases. To facilitate the one-pass mining process, novel schemes (i.e. XstreamCode and XstreamList) are devised in the mining algorithm (i.e. X2StreamMiner) in this paper. X2StreamMiner not only reduces the memory space, but also improves the mining performance. The simulation results also show that X2StreamMiner algorithm is both efficient and scalable. There are two major contributions in this paper. First, the novel schemes are proposed to encode and store the information of user queries in an XML query stream. Second, based on the two schemes, an efficient XML query stream mining algorithm, X2StreamMiner, is proposed to discover the recent frequent XML query patterns.
Published: 2014

133. Semantic-based Structural and Content indexing for the efficient retrieval of queries over large XML data repositories

Author: Eric Pardede, Wenny Rahayu, and Norah Saleh Alghamdi
Subjects: Document Structure Description, XML Encryption, Computer Networks and Communications, computer.internet_protocol, Computer science, Efficient XML Interchange, XML Signature, External Data Representation, computer.software_genre, Query optimization, Twig, Simple API for XML, Data retrieval, Schema (psychology), Streaming XML, Binary XML, XML schema, computer.programming_language, Information retrieval, Search engine indexing, XML validation, computer.file_format, XML framework, XML Schema (W3C), XML database, Hardware and Architecture, Data mining, computer, Software, XML
Abstract: a b s t r a c t The emergence of XML adoption as semi-structured data representation in multi-disciplinary domains has highlighted the need to support the optimization of complex data retrieval processing. In a Big Data environment, the need to speed up data retrieval processes has further grown significantly. In this paper, we have adopted an optimization approach that takes into consideration the semantics of the dataset in order to deal with the complexity of multi-disciplinary domains in Big Data, in particular when the data is represented as XML documents. Our method particularly addresses a twig XML query (or a branched path query), as it is one of the most costly query tasks due to the complexity of the join operation between multiple paths. Our work focuses on optimizing the structural and the content part of XML queries by presenting a method for indexing and processing XML data based on the concept of objects that is formed from the semantic connectivity between XML data nodes. Our method performs object-based data partitioning, which aims at leveraging the notion of frequently-accessed data subsets and putting these subsets together into adjacent partitions. Then, it evaluates branched queries through two essential components: (i) Structural and Content indexing, which use an object-based connection to construct indices i.e. Schema Index, Data Index and Value Index; and (ii) query processing to produce the final results in optimal time. At the end of this paper, a set of experimental results for the proposed approach on a range of real and synthetic XML data, as well as a comparative study with other related work in the area, are presented to demonstrate the effectiveness of our proposed method in terms of CPU cost, matching and merging cost, scalability (size and number of branches) and total number of scanned elements. Our evaluation demonstrates the benefit of the proposed index in terms of performance speed as well as scalability which is critical in a large data repository.
Published: 2014

134. Mining Approximate Keys based on Reasoning from XML Data

Author: Yijun Liu, Sheng He, Jixue Liu, Feiyue Ye, Liu, Y, Ye, Feiyue, Liu, Jixue, and He, S
Subjects: Document Structure Description, Numerical Analysis, XML Encryption, Information retrieval, Database, Computer science, Applied Mathematics, Efficient XML Interchange, key implication, XML Signature, XML validation, computer.file_format, XML, computer.software_genre, Computer Science Applications, Computational Theory and Mathematics, XML Schema Editor, Streaming XML, keys, XML schema, computer, support and confidence, Analysis, computer.programming_language
Abstract: Keys are very important for data management. Due to the hierarchical structure and syntactic flexibility of XML, mining keys from XML data is a more complex and difficult task than from relational databases. In discovering keys from XML data there are some challenges in practice such as unclearness of keys, storage of enormous keys, efficient mining algorithms, etc. In this paper, in order to fill the gap between theory and practice, we propose a novel approximate measure of the support and confidence for XML keys on the basis of the number of null values on key paths. In the mining process, inference rules are used to derive new keys. Through the two-phase reasoning, a target set of approximate keys and its reduced set are obtained. Our research conducted experiments over ten benchmark XML datasets from XMark and four files in the UW XML Repository. The results show that the approach is feasible and efficient, with which effective keys in various XML data can be discovered. Refereed/Peer-reviewed
Published: 2014

135. eXtensible Markup Language access control model with filtering privacy based on matrix storage

Author: Haitao Wu, Lihong Guo, Jian Wang, and He Du
Subjects: Document Structure Description, XML Encryption, RuleML, computer.internet_protocol, Computer science, Efficient XML Interchange, XML Signature, Well-formed document, Document type definition, computer.software_genre, Simple API for XML, XML Schema Editor, Streaming XML, XML schema, Electrical and Electronic Engineering, SGML, computer.programming_language, XHTML, Information retrieval, Database, XML validation, computer.file_format, Computer Science Applications, XML framework, XML database, XML Schema (W3C), Document Definition Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML, PCDATA
Abstract: With eXtensible Markup Language (XML) becoming a ubiquitous language for data storage and transmission in various domains, effectively safeguarding the XML document containing sensitive information is a critical issue. In this study, the authors propose a new access control model with filtering privacy. Based on the idea of separating the structure and content of the XML document, they provide a method to extract the main structure of the XML document and use matrix to save the structure information, at the same time, the start-end region encoding is used to combine the corresponding structure and content skillfully. These not only save the storage space but also efficiently speed up the search and make it convenient to find the relevant elements, especially the finding of the related content. In order to evaluate the security and efficiency of this model, the security analysis and simulation experiment verify its performance in this work.
Published: 2014

136. A study of PL / SQL Procedure for the Automatic Generation of XML Documents

Author: Bong-Im Jang, Chang-Su Kim, and Hoe-Kyung Jung
Subjects: Document Structure Description, SQL, Information retrieval, Database, Relational database, Computer science, XML validation, PL/SQL, computer.software_genre, Anesthesiology and Pain Medicine, Streaming XML, Table (database), Stored procedure, computer, computer.programming_language
Abstract: Currently, XML is a standard language used to exchange data. Most of the data in the file system is not stored in the database system. The data stored in an object-oriented database, the data can be represented by a hierarchical structure. However, in the case of a relational database table, each independently of the hierarchical structure data is present can not be expressed. In this paper, a hierarchical representation of data is difficult in traditional relational database without changing the data in the database, without having to build a new database, Define the structure of the existing data in the XML document for the automatic generation of a PL / SQL procedure is designed.
Published: 2014

137. Unleashing XQuery for Data-Independent Programming

Author: Caetano Sauer and Sebastian Bächle
Subjects: SQL, Computer science, computer.internet_protocol, Programming language, Efficient XML Interchange, XML validation, computer.file_format, computer.software_genre, XQuery, XML database, Streaming XML, Data control language, computer, XML, computer.programming_language
Abstract: The XQuery language was initially developed as an SQL equivalent for XML data, but its roots in functional programming make it also a perfect choice for processing almost any kind of structured and semi-structured data. Apart from standard XML processing, however, advanced language features make it hard to efficiently implement the complete language for large data volumes. This work proposes a novel compilation strategy that provides both flexibility and efficiency to unleash XQuery’s potential as data programming language. It combines the simplicity and versatility of a storage-independent data abstraction with the scalability advantages of set-oriented processing. Expensive iterative sections in a query are unrolled to a pipeline of relational-style operators, which is open for optimized join processing, index use, and parallelization. The remaining aspects of the language are processed in a standard fashion, yet can be compiled anytime to more efficient native operations of the actual runtime environment. This hybrid compilation mechanism yields an efficient and highly flexible query engine that is able to drive any computation from simple XML transformation to complex data analysis, even on non-XML data. Experiments with our prototype and state-of-the-art competitors in classic XML query processing and business analytics over relational data attest the generality and efficiency of the design.
Published: 2014

138. IETM Database Design Based on Native XML Database Technology

Author: Jian Qiang Zhang, Hong Yan Zhao, and Jun Zhang
Subjects: Information retrieval, Database, Relational database, computer.internet_protocol, Computer science, General Engineering, Database schema, XML validation, computer.software_genre, Database design, XML database, Streaming XML, computer, IETM, XML
Abstract: Database is the foundation of making IETM, database structure has great influence to the TETM production method and use efficiency. Aiming at the defects when using relational database to process XML documents, According to the characteristics of IETM data module and information objects under S1000D standard, native XML database needed to meeting the conditions were analyzed, using a native XML database technology to structure IETM data Module and information set; and giving data module code, information objects, information control composition structure codes of the data module structure, gives the design method of IETM database storage and index model, which can effectively avoid the traditional the deficiency of IETM database support for XML technology.
Published: 2014

139. Research of Information Integration Based on XML Schema Matching

Author: Ya Ling Zhu, He Ping Gou, and Yong Xia Jing
Subjects: Document Structure Description, computer.internet_protocol, Computer science, SOAP, Semi-structured model, Efficient XML Interchange, computer.software_genre, Schema matching, Information schema, XML Schema Editor, Schema (psychology), Streaming XML, XML schema, computer.programming_language, Information retrieval, Database, Ontology-based data integration, XML validation, General Medicine, computer.file_format, Enterprise information integration, XML Schema (W3C), Data exchange, Document Schema Definition Languages, Star schema, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Web service, computer, XML, Information integration
Abstract: Information integration is one of core problem for enterprise, the automatic integration for heterogeneous information is important to solve this problem. So this paper presents the method for information integration based on XML schema matching to realize information transport and integration across heterogeneous platforms. In this method, the Web services is adopted to solve the heterogeneity of data platform, the information is transformed to XML model, the schema integration is completed according to the XML schema matching, and the mapping relationship between different XML schema elements is created accordingly. When the user accesses the heterogeneous information, the mediator implements the instance integration according to this mapping relationship. Experiment indicated that this method can realize the information integration automatically.
Published: 2014

140. Efficiently Subtree Matching between XML and Probabilistic XML Documents

Author: Chang Yong Yu, Miao Fang, Hai Tao Ma, and Chang Ming Xu
Subjects: Document Structure Description, XML tree, Uncertain data, Computer science, computer.internet_protocol, Computer Science::Information Retrieval, XML validation, General Medicine, Similarity measure, computer.software_genre, Tree (data structure), XML database, Simple API for XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Edit distance, Data mining, XML schema, computer, Computer Science::Databases, XML, computer.programming_language
Abstract: We explored the subtree matching problem of probabilistic XML documents: finding the matches of an XML query tree over a probabilistic XML document, using the canonical tree edit distance as a similarity measure between subtrees. Probabilistic XML is a probability distribution model capturing uncertainty of both value and structure. Query over probabilistic XML documents is difficult: an naivie algorithm has exponential complexity by directly compute the tree edit distance between the query tree and each certain XML tree represented by the probabilistic XML document. Based on the method of tree edit distance computation over certain XML subtrees, we defined a minimum-solution to the edit distance computation, which means the minimum cost to translate the query tree to the probabilistic XML tree. Furthermore, we developed an algorithm---ASM (Algorithm of Subtree Matching) to compute the minimum solution. Finally, we proved the complexity of ASM is linear in the size of the probabilistic XML document.
Published: 2014

141. Air Indexing for On-Demand XML Data Broadcast

Author: Peng Liu, Baihua Zheng, Weiwei Sun, Ping Yu, Jian Zhang, Zhuoyao Zhang, Yongrui Qin, and Jingjing Wu
Subjects: XML Encryption, Database, Computer science, computer.internet_protocol, Search engine indexing, Efficient XML Interchange, cXML, XML Signature, XML validation, computer.file_format, computer.software_genre, XML framework, XML database, Simple API for XML, Computational Theory and Mathematics, Hardware and Architecture, Signal Processing, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Binary XML, computer, XML
Abstract: XML data broadcast is an efficient way to disseminate semistructured information in wireless mobile environments. In this paper, we propose a novel two-tier index structure to facilitate the access of XML document in an on-demand broadcast system. It provides the clients with an overall image of all the XML documents available at the server side and hence enables the clients to locate complete result sets accordingly. A pruning strategy is developed to cut down the index size and a two-tier structure is proposed to further remove any redundant information. In addition, two index distribution strategies, namely naive distribution and partial distribution, have been designed to interleave the index information with the XML documents in the wireless channels. Theoretical analysis and simulation experiments are also put forward to show the benefits of our indexing methods.
Published: 2014

142. Using Personalization to Improve XML Retrieval

Author: Eduardo Vicente-López, Luis M. de Campos, Juan M. Fernández-Luna, and Juan F. Huete
Subjects: Document Structure Description, computer.internet_protocol, Computer science, Efficient XML Interchange, XML Signature, Well-formed document, computer.software_genre, Personalization, World Wide Web, Query expansion, Simple API for XML, XML Schema Editor, Streaming XML, XML schema, Document retrieval, computer.programming_language, User profile, Information retrieval, XML validation, computer.file_format, Computer Science Applications, Personalized search, XML framework, XML database, Computational Theory and Mathematics, Human–computer information retrieval, computer, XML, Information Systems, XML Catalog, XML retrieval
Abstract: As the amount of information increases every day and the users normally formulate short and ambiguous queries, personalized search techniques are becoming almost a must. Using the information about the user stored in a user profile, these techniques retrieve results that are closer to the user preferences. On the other hand, the information is being stored more and more in an semi-structured way, and XML has emerged as a standard for representing and exchanging this type of data. XML search allows a higher retrieval effectiveness, due to its ability to retrieve and to show the user specific parts of the documents instead of the full document. In this paper we propose several personalization techniques in the context of XML retrieval. We try to combine the different approaches where personalization may be applied: query reformulation, re-ranking of results and retrieval model modification. The experimental results obtained from a user study using a parliamentary document collection support the validity of our approach.
Published: 2014

143. XML Data Mining Model Based on Rough Set Theory

Author: Gang Wang, Wei Ping Li, and Jie Yang
Subjects: Document Structure Description, Information retrieval, Data stream mining, Computer science, computer.internet_protocol, Relational database, Efficient XML Interchange, InformationSystems_DATABASEMANAGEMENT, XML validation, General Medicine, computer.file_format, computer.software_genre, Data warehouse, XML database, XML Schema (W3C), Simple API for XML, XML Schema Editor, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Predictive Model Markup Language, Data mining, computer, XML
Abstract: With the emergence of a large number of XML data, proposed the new challenge to data mining field. Traditional data mining is based on the relational database and data warehouse, how to dig out in the form of XML data becomes a hot research issue. Due to the XML document is a kind of semi-structured data, using the traditional data mining methods for mining of XML data is not applicable. Puts forward a XML mining model based on rough set theory, and has carried on the experiment, the results show that using rough set theory to XML data mining is feasible.
Published: 2014

144. The Research of Heterogeneous Database Migration Based on XML and Middleware

Author: Yao Wen Xia and Sai Dong Lv
Subjects: Document Structure Description, XML Encryption, Computer science, computer.internet_protocol, XSL, Semi-structured model, Efficient XML Interchange, Data transformation, XML Signature, XML Base, computer.software_genre, Database design, Database testing, Simple API for XML, XML Schema Editor, Streaming XML, XML schema, computer.programming_language, Database model, Database, Database schema, XML validation, General Medicine, computer.file_format, XML framework, Open Database Connectivity, XML database, Data extraction, Middleware, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML, Data migration
Abstract: For many government departments or enterprises to use new system upgrades, data transfer between heterogeneous database problem, this paper puts forward a kind of heterogeneous database based on XML and middleware model of data migration.The model by using XML technology to build a middleware, through the ODBC interface connected to the source database and target database.After extracting data from the source database into a unified XML documents, again according to the mapping relationship between old and new database and XSL document, and converts it to meet the needs of the target XML document, the target database and load the data to the target database.The model including data extraction, transformation, validation, write, and other functions, is a cross-platform, easy extensibility and reusability.
Published: 2014

145. The Study & Implementation of the Model of Heterogeneous Data Exchange Based on XML

Author: Yao Wen Xia
Subjects: Document Structure Description, XML Encryption, Computer science, SOAP, computer.internet_protocol, Semi-structured model, Efficient XML Interchange, XML Signature, XML Base, computer.software_genre, Simple API for XML, XML Schema Editor, Streaming XML, Binary XML, XML schema, computer.programming_language, Database, Programming language, XML validation, General Medicine, computer.file_format, Geography Markup Language, XML framework, XML database, XML Schema (W3C), Data exchange, computer, XML
Abstract: Face the reality application in the "information island" problem, based on the analysis of the data exchange between heterogeneous database conflict, on the basis of this paper proposes a data exchange model in XML as middle format. Make full use of the XML is easy to extend, good interactivity and semantic is strong, can be formatted, easy processing, has nothing to do with the platform, and other features and advantages, and USES the c # language and XML and related technologies of the model carried on the detailed design and implementation of the main modules, and each function module in the model are relatively independent and extensible, strong commonality.
Published: 2014

146. Performance Enhancement of XML Parsing By Using Artificial Neural Network

Author: Yugandhara V. Dhepe and G.R. Bamnote
Subjects: Document Structure Description, XML Encryption, Information retrieval, Computer science, Computer Science::Information Retrieval, Computer Science::Neural and Evolutionary Computation, Efficient XML Interchange, General Engineering, Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing), XML validation, computer.file_format, computer.software_genre, Simple API for XML, XML database, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, XML schema, Data mining, computer, Computer Science::Databases, computer.programming_language
Abstract: XML is used for data representation and exchange. XML data processing becomes more and more important for server workloads like web servers and database servers. One of the most time consuming part is XML document parsing. Parsing is a core operation performed before an XML document can be navigated, queried, or manipulated. Recently, high performance XML parsing has become a topic of considerable interest. This paper proposes a mechanism for efficiently processing XML documents with the help of Artificial Neural Network (ANN). Provide the set of XML documents to the system and parsing results will store in the database. With the help of Artificial Neural Network (ANN) the performance of XML parsing will improve by reducing parsing time of XML document. Levenberg-Marquardt algorithm (LMA) is used to train the neurons in the Artificial Neural Network (ANN) to recognize XML document pattern. This proposed system will improve the performance of xml parsing by reducing parsing time of XML document with the help of Artificial Neural Network (ANN).
Published: 2014

147. Fast Computational Mining Technique for XML Query Answering Support

Author: J. Jabez and R. Brindhadevi
Subjects: Document Structure Description, Information retrieval, Computer science, computer.internet_protocol, Efficient XML Interchange, XML Signature, XML validation, computer.file_format, computer.software_genre, XML framework, XML Schema (W3C), XML database, XML Schema Editor, Schema (psychology), Streaming XML, XML schema, Binary XML, computer, XML, computer.programming_language, XML Catalog
Abstract: database research field has focused on the Extensible Mark-up Language (XML) because of its adaptable progressive nature which can use to represent to huge amount of data, likewise it doesn't have absolute and fixed schema, yet having possibly spasmodic and deficient structure. Quite hard undertaking to concentrate data from semi organized documents and is set to wind up more challenging as the measure of computerized data accessible on the Internet develops. Really, the data set returned as response to a query may be so enormous it is not possible pass on interpretable information, as documents are regularly so extensive. A methodology based on Tree- Based Association Rules (TARs), which furnish rough, intentionaldata about the structure and the contents of XML documents both, and additionally it might be saved in XML format. This mined information is utilized to give, a brief thought of both the structure and the content of the XML archive and snappy, inexact replies to queries at whatever point needed.
Published: 2014

148. Dealing with structural patterns of XML documents

Author: Fabio Vitali, Angelo Di Iorio, Francesco Poggi, and Silvio Peroni
Subjects: Document Structure Description, Information Systems and Management, Markup language, Computer Networks and Communications, Computer science, computer.internet_protocol, Efficient XML Interchange, XML Signature, Document type definition, Library and Information Sciences, Ontology (information science), computer.software_genre, XML Schema Editor, Schema (psychology), Streaming XML, RELAX NG, XML schema, SGML, computer.programming_language, Information retrieval, cXML, XML validation, computer.file_format, XML framework, XML database, XML Schema (W3C), Document Definition Markup Language, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, computer, XML, Information Systems, XML Catalog
Abstract: Evaluating collections of XML documents without paying attention to the schema they were written in may give interesting insights into the expected characteristics of a markup language, as well as any regularity that may span vocabularies and languages, and that are more fundamental and frequent than plain content models. In this paper we explore the idea of structural patterns in XML vocabularies, by examining the characteristics of elements as they are used, rather than as they are defined. We introduce from the ground up a formal theory of 8 plus 3 structural patterns for XML elements, and verify their identifiability in a number of different XML vocabularies. The results allowed the creation of visualization and content extraction tools that are completely independent of the schema and without any previous knowledge of the semantics and organization of the XML vocabulary of the documents.
Published: 2014

149. Integrating Semantic Web technologies with XML Schema using role-mapping annotations

Author: Jang Yang Lee, I-Ching Hsu, Kuan-Yang Lai, and Der-Chen Huang
Subjects: Document Structure Description, XML Encryption, Information retrieval, Computer science, Efficient XML Interchange, XML Signature, XML validation, computer.file_format, Library and Information Sciences, Computer Science Applications, World Wide Web, XML Schema Editor, Streaming XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, RELAX NG, computer
Abstract: Purpose – XML Schema is used to define schema of XML documents that have become standards for data exchange in various Web-based information applications. The main problem of XML Schema is that it emphasizes syntax and format rather than semantics and knowledge representation. Hence, even though having the advantage of describing the structure and constraining the contents of XML documents, XML Schema lacks the computer-interpretability to support knowledge representation for existing information systems. The purpose of this study is to propose role-mapping annotations for XML Schema (RMAXS) to integrate Semantic Web with XML Schema, which allows the facilitation interoperability between adjoining layers of the Semantic Web stack. Design/methodology/approach – The XML, XML Schema, ontology, and rule can be completely integrated into a multi-layered intelligent framework (MIF) for XML-based applications in the current web environment. This work presents a semantic-role-mapping intelligent system, called SRMIS, based on the MIF. SRMIS consists of XML-based document repository, search engine, inference engine and transformation engine, which provides different approaches to present the various metadata and knowledge representations. Findings – The traditional Semantic Web stack has three gaps between adjoining layers. The first gap, between the XML and XML Schema layers can be bridged with an XMLSchema-instance mechanism. The third gap, between the ontology and rule layers can be connected by building rules on top of ontologies. This study proposes RMAXS to couple the second gap, between the XML schema and ontology layers. The proposed multi-layered intelligent framework (MIF) adopts these coupling technologies to facilitate interoperability between adjoining layers. Therefore, the XML, XML Schema, ontology, and rule can be completely integrated into the MIF for intelligent applications in the web environment. Practical implications – To demonstrate the SRMIS applications, this work implements a prototype that helps researchers to find interested papers. Originality/value – This work presents a semantic-role-mapping intelligent system, called SRMIS, based on the MIF. SRMIS consists of XML-based document repository, search engine, inference engine and transformation engine, which provides different approaches to present the various metadata and knowledge representations. The proposed SRMIS can be applied in various application domains.
Published: 2014

150. Modest XML for Corpora: Not a standard, but a suggestion

Author: Andrew Hardie
Subjects: Information retrieval, Computer science, business.industry, computer.internet_protocol, Efficient XML Interchange, XML validation, PE1-3729, Document type definition, computer.file_format, computer.software_genre, English language, XML Schema Editor, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, XML schema, Artificial intelligence, Binary XML, business, computer, Natural language processing, XML, computer.programming_language, De facto standard
Abstract: This paper argues for, and presents, a modest approach to XML encoding for use by the majority of contemporary linguists who need to engage in corpus construction. While extensive standards for corpus encoding exist - most notably, the Text Encoding Initiative’s Guidelines and the Corpus Encoding Standard based on them - these are rather heavyweight approaches, implicitly intended for major corpus-building projects, which are rather different from the increasingly common efforts in corpus construction undertaken by individual researchers in support of their personal research goals. Therefore, there is a clear benefit to be had from a set of recommendations (not a standard) that outlines general best practices in the use of XML in corpora without going into any of the more technical aspects of XML or the full weight of TEI encoding. This paper presents such a set of suggestions, dubbed Modest XML for Corpora, and posits that such a set of pointers to a limited level of XML knowledge could work as part of the normal, general training of corpus linguists. The Modest XML recommendations cover the following set of things, which, according to the foregoing argument, are sufficient knowledge about XML for most corpus linguists’ day-to-day needs: use of tags; adding attribute value pairs; recommended use of attributes; nesting of tags; encoding of special characters; XML well-formedness; a collection of de facto standard tags and attributes; going beyond the basic de facto standard tags; and text headers.
Published: 2014

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

4,571 results on '"XML validation"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources