38 results on '"property graphs"'
Search Results
2. Property Graphs at Scale: A Roadmap and Vision for the Future (Short Paper)
- Author
-
Kondylakis, Haridimos, Efthymiou, Vassilis, Troullinou, Georgia, Ymeralli, Elisjana, Plexousakis, Dimitris, van der Aalst, Wil, Series Editor, Ram, Sudha, Series Editor, Rosemann, Michael, Series Editor, Szyperski, Clemens, Series Editor, Guizzardi, Giancarlo, Series Editor, Almeida, João Paulo A., editor, Di Ciccio, Claudio, editor, and Kalloniatis, Christos, editor
- Published
- 2024
- Full Text
- View/download PDF
3. The Property Graph Data Format (PGDF)
- Author
-
Renzo Angles, Sebastian Ferrada, and Ignacio Burgos
- Subjects
Graph databases ,property graphs ,graph data formats ,PGDF ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Property graphs are popular in both industry and academia due to their versatility in modeling complex data across diverse application domains, ranging from social networks to knowledge graphs. Despite their popularity, there is no standardized data format for storing and exchanging property graphs. This paper introduces PGDF, a text-based data format for property graphs, designed to be both simple and flexible, while remaining expressive and efficient. The simplicity of PGDF comes from its tabular-like structure, where each line in a PGDF file contains a single schema or data declaration. PGDF offers great flexibility by allowing schema and data declarations to be combined in any order. This means that nodes and edges can each have their own distinct properties, providing greater adaptability and customization. The expressiveness of PGDF is defined by its ability to represent a wide range of property graph features. In this article, we describe the syntax and semantics of PGDF, outline methods for converting property graphs stored in multiple CSV files to PGDF and other graph data formats, and present an experimental evaluation comparing PGDF, YARS-PG, GraphML, and JSON-Neo4j. The experiments show that PGDF enables the production of smaller files more quickly compared to other graph data formats.
- Published
- 2024
- Full Text
- View/download PDF
4. Model-to-Model Transformation: From UML Class Diagrams to Labeled Property Graphs.
- Author
-
León, Ana, Santos, Maribel Yasmina, García, Alberto, Casamayor, Juan Carlos, and Pastor, Oscar
- Abstract
Conceptual schemas are the basis to build well-grounded Information Systems, by representing the main concepts of a domain of knowledge, as well as the relationships among them. Since conceptual schemas focus on the concepts, they are independent of the specific technological platform used to implement them. This allows a single conceptual schema to be transformed into different platform-specific models according to the implementation requirements. This is a non-trivial process that is crucial for the performance and maintainability of the system, as well as for the accomplishment of the domain data requirements. Much research has been done on transforming conceptual schemas into relational data models. Nevertheless, less work has been done on transforming conceptual schemas into property graphs, a data structure indispensable to building appropriate and efficient systems based on graph databases. The work proposes a systematic approach to transform conceptual schemas, represented as UML class diagrams, into property graphs by using a set of transformation rules and patterns applied in a systematic way. Besides a practical example used to help the presentation of the proposed approach, the evaluation has been done by measuring different quality dimensions such as semantic equivalence, readability, maintainability, complexity, size, and performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Temporal graph patterns by timed automata.
- Author
-
Aghasadeghi, Amir, Van den Bussche, Jan, and Stoyanovich, Julia
- Abstract
Temporal graphs represent graph evolution over time, and have been receiving considerable research attention. Work on expressing temporal graph patterns or discovering temporal motifs typically assumes relatively simple temporal constraints, such as journeys or, more generally, existential constraints, possibly with finite delays. In this paper we propose to use timed automata to express temporal constraints, leading to a general and powerful notion of temporal basic graph pattern (BGP). The new difficulty is the evaluation of the temporal constraint on a large set of matchings. An important benefit of timed automata is that they support an iterative state assignment, which can be useful for early detection of matches and pruning of non-matches. We introduce algorithms to retrieve all instances of a temporal BGP match in a graph, and present results of an extensive experimental evaluation, demonstrating interesting performance trade-offs. We show that an on-demand algorithm that processes total matchings incrementally over time is preferable when dealing with cyclic patterns on sparse graphs. On acyclic patterns or dense graphs, and when connectivity of partial matchings can be guaranteed, the best performance is achieved by maintaining partial matchings over time and allowing automaton evaluation to be fully incremental. The code and datasets used in our analysis are available at http://github.com/amirpouya/TABGP. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. GRANDPA: GeneRAtive network sampling using degree and property augmentation applied to the analysis of partially confidential healthcare networks
- Author
-
Carly A. Bobak, Yifan Zhao, Joshua J. Levy, and A. James O’Malley
- Subjects
Generative graphs ,Simulation ,Property graphs ,Confidentiality ,Healthcare data privacy ,Applied mathematics. Quantitative methods ,T57-57.97 - Abstract
Abstract Protecting medical privacy can create obstacles in the analysis and distribution of healthcare graphs and statistical inferences accompanying them. We pose a graph simulation model which generates networks using degree and property augmentation and provide a flexible R package that allows users to create graphs that preserve vertex attribute relationships and approximating the retention of topological properties observed in the original graph (e.g., community structure). We illustrate our proposed algorithm using a case study based on Zachary’s karate network and a patient-sharing graph generated from Medicare claims data in 2019. In both cases, we find that community structure is preserved, and normalized root mean square error between cumulative distributions of the degrees across the generated and the original graphs is low (0.0508 and 0.0514 respectively).
- Published
- 2023
- Full Text
- View/download PDF
7. Indexing Structures for the Efficient Multi-Resolution Visualization of Big Graphs
- Author
-
Marco Mesiti, Mario Pennacchioni, and Paolo Perlasca
- Subjects
Property graphs ,node indices ,edge indices ,aggregations according to a cluster hierarchy ,multi-resolution visualization ,zoom-in and zoom-out operations ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Nowadays there is a great interest in the visualization of property graphs to make their navigation, inspection, and visual analysis easier. However, property graphs can be quite large and their rendering on web browsers can lead to a dark cloud of points that is difficult to visually explore. With the aim of reducing the size of the visualized graph, several approaches have been proposed for substituting clusters of related vertices with aggregated meta-nodes and introducing meta-edges among them, but they usually consider the graph in main-memory and do not adopt efficient data structures for extracting parts of it from the disk. The purpose of this paper is to optimize the preparation of the graph to be visualized according to a certain resolution level by introducing refined data structures and specifically tailored algorithms. By means of them, the rendering time is reduced when changing the current visualization through zoom-in, zoom-out, and related operations. Starting from a cluster hierarchy that represents the possible aggregations of graph nodes, in the paper we characterize a visualization according to a horizontal slice of the hierarchy and propose indexing structures and incremental algorithms for quickly passing to a new visualization with minimal changes of the current one. In this process, we ensure a consistent and efficient aggregation of addictive properties associated with nodes and edges. An extensive experimental analysis has been conducted to assess the quality of the proposed solution.
- Published
- 2023
- Full Text
- View/download PDF
8. Models and Query Languages for Temporal Property Graph Databases
- Author
-
Soliani, Valeria, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Chiusano, Silvia, editor, Cerquitelli, Tania, editor, Wrembel, Robert, editor, Nørvåg, Kjetil, editor, Catania, Barbara, editor, Vargas-Solar, Genoveva, editor, and Zumpano, Ester, editor
- Published
- 2022
- Full Text
- View/download PDF
9. ProGS: Property Graph Shapes Language
- Author
-
Seifer, Philipp, Lämmel, Ralf, Staab, Steffen, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hotho, Andreas, editor, Blomqvist, Eva, editor, Dietze, Stefan, editor, Fokoue, Achille, editor, Ding, Ying, editor, Barnaghi, Payam, editor, Haller, Armin, editor, Dragoni, Mauro, editor, and Alani, Harith, editor
- Published
- 2021
- Full Text
- View/download PDF
10. GRANDPA: GeneRAtive network sampling using degree and property augmentation applied to the analysis of partially confidential healthcare networks
- Author
-
Bobak, Carly A., Zhao, Yifan, Levy, Joshua J., and O’Malley, A. James
- Published
- 2023
- Full Text
- View/download PDF
11. Hypergraph Based Data Model for Complex Health Data Exploration and Its Implementation in PREDIMED Clinical Data Warehouse.
- Author
-
Cancé, Christophe, Lenne, Christian, Artemova, Svetlana, Mossuz, Pascal, and Moreau-Gaudry, Alexandre
- Abstract
Within the PREDIMED Clinical Data Warehouse (CDW) of Grenoble Alpes University Hospital (CHUGA), we have developed a hypergraph based operational data model, aiming at empowering physicians to explore, visualize and qualitatively analyze interactively the complex and massive information of the patients treated in CHUGA. This model constitutes a central target structure, expressed in a dual form, both graphical and formal, which gathers the concepts and their semantic relations into a hypergraph whose implementation can easily be manipulated by medical experts. The implementation is based on a property graph database linked to an interactive graphical interface allowing to navigate through the data and to interact in real time with a search engine, visualization and analysis tools. This model and its agile implementation allow for easy structural changes inherent to the evolution of techniques and practices in the health field. This flexibility provides adaptability to the evolution of interoperability standards. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. PGO: Describing Property Graphs in RDF
- Author
-
Dominik Tomaszuk, Renzo Angles, and Harsh Thakkar
- Subjects
RDF ,property graphs ,data transformation ,OWL ,ontology ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
RDF and Property Graphs are data models that are being used to represent Knowledge Graphs. The definition of methods to transform RDF data into Property graph data is fundamental to allow interoperability among the systems using these models. Although both models are based on a graph structure, they have special features that complicate the definition of data transformation methods. This article presents an ontology-based approach to transform (automatically) property graphs into RDF graphs. The ontology, called PGO, defines a set of terms that allows describing the elements of a property graph. The algorithm corresponding to the transformation method is described, and some properties of the method are discussed (complexity, data preservation, and monotonicity). The results of an experimental evaluation are also presented.
- Published
- 2020
- Full Text
- View/download PDF
13. Hypergraph Based Data Model for Complex Health Data Exploration and Its Implementation in PREDIMED Clinical Data Warehouse.
- Author
-
Cancé, Christophe, Lenne, Christian, Artemova, Svetlana, Mossuz, Pascal, and Moreau-Gaudry, Alexandre
- Subjects
DATA warehousing ,SEMANTICS ,DATABASES ,MEDICAL information storage & retrieval systems ,INFORMATION display systems ,USER interfaces ,DATA analytics ,GRAPHICAL user interfaces - Abstract
Within the PREDIMED Clinical Data Warehouse (CDW) of Grenoble Alpes University Hospital (CHUGA), we have developed a hypergraph based operational data model, aiming at empowering physicians to explore, visualize and qualitatively analyze interactively the complex and massive information of the patients treated in CHUGA. This model constitutes a central target structure, expressed in a dual form, both graphical and formal, which gathers the concepts and their semantic relations into a hypergraph whose implementation can easily be manipulated by medical experts. The implementation is based on a property graph database linked to an interactive graphical interface allowing to navigate through the data and to interact in real time with a search engine, visualization and analysis tools. This model and its agile implementation allow for easy structural changes inherent to the evolution of techniques and practices in the health field. This flexibility provides adaptability to the evolution of interoperability standards. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
14. Reasoning on property graphs with graph generating dependencies.
- Author
-
Shimomura, Larissa C., Yakovets, Nikolay, and Fletcher, George
- Subjects
- *
DATA management , *VALUATION of real property , *COMPUTATIONAL complexity , *DATA quality , *PROBLEM solving - Abstract
Data dependencies are a key concept in data management and have been researched in data integration, data quality and query optimization. With the increasing use of graph-structured data in diverse applications, there is also an increasing interest in the study of graph data dependencies. In this scenario, different classes of graph data dependencies have been proposed in the literature. In this work, we study the class of Graph Generating Dependencies (GGDs). Graph Generating Dependencies (GGDs) informally express constraints between two (possibly different) graph patterns which enforce relationships on both graph's data (via property value constraints) and its structure (via topological constraints). While most of previously proposed classes of graph data dependencies focus on generalizing equality-generating dependencies for graph data, Graph Generating Dependencies (GGDs) can express tuple- and equality-generating dependencies on property graphs, both of which find broad application in graph data management. Given this new class of dependency, in this paper, we discuss the reasoning behind GGDs on Property Graphs. We propose algorithms to solve three main reasoning problems: the satisfiability , implication , and validation problems for GGDs and analyze their complexity. By studying these problems, we can understand the expressiveness and the limitations of GGDs in practical applications. To demonstrate the practical use of GGDs, we propose an algorithm that finds inconsistencies in data through validation of GGDs. Our experiments show that even though the validation of GGDs has high computational complexity, GGDs can be used to find data inconsistencies in a feasible execution time on both synthetic and real-world data. • A new class of dependencies for property graphs named Graph Generating Dependencies (GGDs) is introduced. • Study is performed on the reasoning problems: satisfiability, implication and validation on property graphs with GGDs. • A Chase procedure for GGDs to solve the problems of satisfiability and implication is proposed. • Experimental results show the feasibility of using GGDs to identify data inconsistencies in practice. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Rewriting ontology-mediated navigational queries into cypher
- Author
-
Dragovic, Nikola, Okulmus, Cem, Ortiz, Magdalena, Dragovic, Nikola, Okulmus, Cem, and Ortiz, Magdalena
- Abstract
The ontology-based data access (OBDA) paradigm has successfully grown over the last decade as a powerful means to access data from possibly diverse and incomplete sources, using a domain ontology as a mediator. The ability to query generic graph-structured data is often highlighted as an advantage of OBDA, but in practice, existing solutions do not allow to access data in popular graph database management systems (DBMS) (e.g., Neo4j) that adopt the so-called 'property graph' data model and support dedicated query languages such as Cypher. Towards overcoming this major limitation, we propose a technique for ontology-mediated querying (OMQ) of property graphs. We tailor a suitable query language that supports path navigation in a form that can be naturally expressed in Cypher and other important graph query languages. It keeps the data complexity of query evaluation tractable even under trail semantics and is sufficient for our motivating use case in the autonomous driving domain. We address the semantic gap between the traditional path semantics adopted by most works on graph databases, and the trail semantics used in Cypher, and identify cases where both semantics coincide. To our knowledge, OMQs with trail semantics had not been addressed before. We develop a rewriting algorithm for queries mediated by DL-Lite ontologies that enables query answering using plain Cypher. The experimental evaluation of our proof-of-concept prototype on a sample set of use case queries reveals that the approach is promising, and can be a stepping stone to making OBDA applicable to data stored in graph DBMS.
- Published
- 2023
16. DONNA: a data model for enabling extensible and efficient metaverse applications
- Author
-
Bouloukakis, Georgios, Kattepur, Ajay, Institut Polytechnique de Paris (IP Paris), Département Informatique (TSP - INF), Institut Mines-Télécom [Paris] (IMT)-Télécom SudParis (TSP), Algorithmes, Composants, Modèles Et Services pour l'informatique répartie (ACMES-SAMOVAR), Services répartis, Architectures, MOdélisation, Validation, Administration des Réseaux (SAMOVAR), Institut Mines-Télécom [Paris] (IMT)-Télécom SudParis (TSP)-Institut Mines-Télécom [Paris] (IMT)-Télécom SudParis (TSP), Middleware on the Move (MIMOVE), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), and Ericsson AI Research [Bangalore]
- Subjects
Data Model ,Metaverse ,Extensibility ,Property Graphs ,[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE] - Abstract
International audience; The advent of Metaverse applications, that exploit extended reality technologies, has the potential to disrupt multiple industries including gaming, social networks, entertainment and travel. While there has been initial work on network, compute and synchronization features needed for the Metaverse, a comprehensive data model that captures the interactions between the physical and virtual worlds has not been evaluated extensively. Providing a formal data model would be crucial to ensure interoperability, model extensibility and applicability to multiple use cases. In this paper, we propose DONNA: A Data Model for Enabling Extensible and Efficient Metaverse Applications. DONNA provides a detailed data model of interactions between physical space, virtual spaces, sensors, devices, physical participants and avatars. Via the use of property graph schemas, we demonstrate the varied interactions between the physical and virtual worlds and the extensibility of the approach across multiple use cases. The data model is specifically demonstrated over a virtual museum visit use case to explain the nuances of sensing, dynamic property update and semantic interactions between physical and virtual objects.
- Published
- 2023
17. PG-Schemas: Schemas for Property Graphs
- Author
-
Angles, Renzo, Bonifati, Angela, Dumbrava, Stefania, Fletcher, George, Green, Alastair, Hidders, Jan, Li, Bei, Libkin, Leonid, Marsault, Victor, Martens, Wim, Murlak, Filip, Plantikow, Stefan, Savković, Ognjen, Schmidt, Michael, Sequeda, Juan, Staworko, Slawek, Tomaszuk, Dominik, Voigt, Hannes, Vrgoc, Domagoj, Wu, Mingxi, and Zivkovic, Dusan
- Subjects
graph databases ,schemas ,property graphs - Abstract
Property graphs have reached a high level of maturity, witnessed by multiple robust graph database systems as well as the ongoing ISO standardization effort aiming at creating a new standard Graph Query Language (GQL). Yet, despite documented demand, schema support is limited both in existing systems and in the first version of the GQL Standard. It is anticipated that the second version of the GQL Standard will include a rich DDL. Aiming to inspire the development of GQL and enhance the capabilities of graph database systems, we propose PG-Schema, a simple yet powerful formalism for specifying property graph schemas. It features PG-Types with flexible type definitions supporting multi-inheritance, as well as expressive constraints based on the recently proposed PG-Keys formalism. We provide the formal syntax and semantics of PG-Schema, which meet principled design requirements grounded in contemporary property graph management scenarios, and offer a detailed comparison of its features with those of existing schema languages and graph database systems.
- Published
- 2023
- Full Text
- View/download PDF
18. Property Hypergraphs as an Attributed Predicate RDF
- Author
-
Wardani, Dewi W., Küng, Josef, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Debruyne, Christophe, editor, Panetto, Hervé, editor, Meersman, Robert, editor, Dillon, Tharam, editor, Weichhart, Georg, editor, An, Yuan, editor, and Ardagna, Claudio Agostino, editor
- Published
- 2015
- Full Text
- View/download PDF
19. DiscoPG
- Author
-
Angela Bonifati, Stefania Dumbrava, Emile Martinez, Fatemeh Ghasemi, Malo Jaffré, Pacôme Luton, Thomas Pickles, Université Claude Bernard Lyon 1 - Faculté des sciences (UCBL FS), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Méthodes et modèles pour les réseaux (METHODES-SAMOVAR), Services répartis, Architectures, MOdélisation, Validation, Administration des Réseaux (SAMOVAR), Institut Mines-Télécom [Paris] (IMT)-Télécom SudParis (TSP)-Institut Mines-Télécom [Paris] (IMT)-Télécom SudParis (TSP), Institut Polytechnique de Paris (IP Paris), Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise (ENSIIE), and École normale supérieure de Lyon (ENS de Lyon)
- Subjects
Graph databases ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Noeuds ,General Engineering ,ACM: H.: Information Systems/H.2: DATABASE MANAGEMENT ,Property graphs ,Graph applications - Abstract
Property graphs are becoming pervasive in a variety of graph processing applications using interconnected data. They allow to encode multi-labeled nodes and edges, as well as their properties, represented as key/value pairs. Although property graphs are widely used in several open-source and commercial graph databases, they lack a schema definition, unlike their relational counterparts. The property graph schema discovery problem consists of extracting the underlying schema concepts and types from such graph datasets. We showcase DiscoPG, a system for efficiently and accurately discovering and exploring property graph schemas. To this end, it leverages hierarchical clustering using a Gaussian Mixture Model, which accounts for both node labels and properties. DiscoPG allows users to perform schema discovery for both static and dynamic graph datasets. Suitable visualization layouts and dedicated dashboards enable the user perception of the static and dynamic inferred schema on the node clusters, as well as the differences in runtimes and clustering quality. To the best of our knowledge, DiscoPG is the first system to tackle the property graph schema discovery problem. As such, it supports the insightful exploration of the graph schema components and their evolving behavior, while revealing the underpinnings of the clustering-based discovery process.
- Published
- 2022
- Full Text
- View/download PDF
20. Temporal graph patterns by timed automata
- Author
-
Amir Aghasadeghi, Jan Van den Bussche, Julia Stoyanovich, Aghasadeghi, Amir, VAN DEN BUSSCHE, Jan, and Stoyanovich, Julia
- Subjects
Temporal graphs ,Computer Science - Databases ,Hardware and Architecture ,Graph query languages ,Property graphs ,Timed automata ,dataflow systems ,Information Systems - Abstract
Temporal graphs represent graph evolution over time, and have been receiving considerable research attention. Work on expressing temporal graph patterns or discovering temporal motifs typically assumes relatively simple temporal constraints, such as journeys or, more generally, existential constraints, possibly with finite delays. In this paper we propose to use timed automata to express temporal constraints, leading to a general and powerful notion of temporal basic graph pattern (BGP). The new difficulty is the evaluation of the temporal constraint on a large set of matchings. An important benefit of timed automata is that they support an iterative state assignment, which can be useful for early detection of matches and pruning of non-matches. We introduce algorithms to retrieve all instances of a temporal BGP match in a graph, and present results of an extensive experimental evaluation, demonstrating interesting performance trade-offs. We show that an on-demand algorithm that processes total matchings incrementally over time is preferable when dealing with cyclic patterns on sparse graphs. On acyclic patterns or dense graphs, and when connectivity of partial matchings can be guaranteed, the best performance is achieved by maintaining partial matchings over time and allowing automaton evaluation to be fully incremental. The code and datasets used in our analysis are available at http://github.com/amirpouya/TABGP.
- Published
- 2023
- Full Text
- View/download PDF
21. Compact and efficient representation of general graph databases.
- Author
-
Álvarez-García, Sandra, Freire, Borja, Ladra, Susana, and Pedreira, Óscar
- Subjects
REPRESENTATIONS of graphs ,DATA structures ,COMPACTING - Abstract
In this paper, we propose a compact data structure to store labeled attributed graphs based on the k 2 -tree, which is a very compact data structure designed to represent a simple directed graph. The idea we propose can be seen as an extension of the k 2 -tree to support property graphs. In addition to the static approach, we also propose a dynamic version of the storage representation, which allows flexible schemas and insertion or deletion of data. We provide an implementation of a basic set of operations, which can be combined to form complex queries over these graphs with attributes. We evaluate the performance of our proposal with existing graph database systems and prove that our compact attributed graph representation obtains also competitive time results. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
22. Foundations of Modern Query Languages for Graph Databases.
- Author
-
ANGLES, RENZO, ARENAS, MARCELO, BARCELÓ, PABLO, HOGAN, AIDAN, REUTTER, JUAN, and VRGOČ, DOMAGOJ
- Subjects
- *
QUERY languages (Computer science) , *GRAPHIC methods , *DATABASE industry , *BIG data , *DATA science - Abstract
We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter, we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed and how they can be combined with graph patterns.We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher and Gremlin. We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
23. How can graph databases and reasoning be combined and integrated?
- Author
-
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals, Pasarella Sánchez, Ana Edelmira, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Universitat Politècnica de Catalunya. ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals, and Pasarella Sánchez, Ana Edelmira
- Abstract
Nowadays the graph data model has been accepted as one of the most suitable data models to formalize relationships among entities of many domains. Deductive databases based on the Datalog language have been used to deduce new information from large amounts of data. Most of the attempts to combine logic and graph databases are based on translating knowledge in graph databases into Datalog and then use its inference engine. We aim to open the discussion about combining graph databases and a graph-oriented logic to define «native» deductive graph databases. This is, graph databases equipped with an inference mechanism based on graph based logic. To be concrete, we plan to use the recently introduced graph navigational logic., Peer Reviewed, Postprint (published version)
- Published
- 2022
24. How can graph databases and reasoning be combined and integrated?
- Author
-
Pasarella Sánchez, Ana Edelmira, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, and Universitat Politècnica de Catalunya. ALBCOM - Algorísmia, Bioinformàtica, Complexitat i Mètodes Formals
- Subjects
Graph patterns queries ,Graph Databases ,68 Computer science::68P Theory of data [Classificació AMS] ,Informàtica [Àrees temàtiques de la UPC] ,Deductive databases ,Informàtica ,Path queries ,Graph navigational logic ,Property graphs ,Computer science - Abstract
Nowadays the graph data model has been accepted as one of the most suitable data models to formalize relationships among entities of many domains. Deductive databases based on the Datalog language have been used to deduce new information from large amounts of data. Most of the attempts to combine logic and graph databases are based on translating knowledge in graph databases into Datalog and then use its inference engine. We aim to open the discussion about combining graph databases and a graph-oriented logic to define «native» deductive graph databases. This is, graph databases equipped with an inference mechanism based on graph based logic. To be concrete, we plan to use the recently introduced graph navigational logic.
- Published
- 2022
25. Querying Property Graphs with Ontologies
- Author
-
Dragovic, Nikola
- Subjects
Graph Databases ,Ontologies ,Property Graphs ,Ontology-Mediated Querying ,Ontology-Based Data Access ,Semantic Web - Abstract
Bei Ontology-Mediated Querying fragen NutzerInnen Daten mit Hilfe einer Ontologie ab.Eine Ontologie bietet nicht nur ein Mittel, Daten aus heterogenen Quellen miteinander zu verbinden, sondern auch die Möglichkeit, bei unvollständigen Daten Schlüsse zu ziehen.Moderne Systeme erlauben NutzerInnen ihre Daten mit SPARQL 1.1 abzufragen, wobei davon ausgegangen wird, dass diese in einem relationalen Schema gespeichert sind.Obwohl diese Systeme von Graph-strukturierten Daten ausgehen, ermöglichen sie keine Abfragen mit Navigationsfunktionen.Unser Ziel ist es, NutzerInnen die Möglichkeit zu geben, eine Neo4j Property-Graph Datenbank mit Ontologien und einer Abfragesprache mit Navigationsfunktionen abzufragen.In unserer Arbeit diskutieren wir die Unterschiede in der Semantik zwischen SPARQL 1.1 und der Property-Graph Abfragesprache Cypher.Abfragen in unserem Framework sollen in Bezug auf eine gegebene Ontologie umschreibbar, und ihre Auswertung hinsichtlich der Datenkomplexität praktisch ausführbar sein.Wir schlagen ein Framework für die Abfrage von Neo4j Property-Graphen mit Ontologien vor.Darüber hinaus definieren wir Bedingungen, die sicherstellen, dass die Antworten in den betrachteten Abfragesprachen übereinstimmen.Unsere Arbeit umfasst auch eine neue Umschreibetechnik für Abfragen in unserem Framework.Weiters zeigen wir, dass unsere Umschreibung vollständig und korrekt ist und dass die Beantwortung von Abfragen praktisch umsetzbar ist.Abschließend stellen wir eine Implementierung unserer Umschreibung vor und diskutieren die Realisierbarkeit unseres Ansatzes anhand eines Anwendungsfalls aus dem Bereich des autonomen Fahrens, der von der Virtual Vehicle Research GmbH bereitgestellt wurde.Unsere Ergebnisse weisen darauf hin, dass die Abfrage von Property-Graphen mit Ontologien in der Praxis realisierbar ist.Darüber hinaus zeigt sich, dass wir Property-Graph Datenbankmanagementsysteme nutzen können, um Navigationsabfragen in Bezug auf Ontologien zu beantworten., In ontology-mediated querying, users query their data by the means of an ontology.Not only does an ontology provide a means to connect data from heterogeneous sources together, but also a way to reason about incompleteness in the data.State of the art systems allow users to query their data with SPARQL 1.1, which is assumed to be stored in a relational schema.Despite the fact that these systems assume graph-structured data, they do not facilitate querying with navigational features.We aim to enable users to query a Neo4j property graph database with ontologies and navigational features in the query language.In our work, we discuss the differences in semantics between SPARQL 1.1 and the property graph query language Cypher.Finally, queries in our framework should be rewritable with respect to an input ontology, and evaluating them should be tractable in data complexity.We propose a framework for querying Neo4j property graphs with ontologies.Further, we define conditions which ensure that the answers given by the query languages under consideration coincide.Our work also includes a novel rewriting technique for queries in our framework.In addition, we show that our rewriting is complete and correct, and query answering is feasible.Finally, we present an implementation of our rewriting and discuss the viability of our approach based on a use case from the autonomous driving industry, provided by Virtual Vehicle Research GmbH.Our results indicate that querying of property graphs with ontologies is viable in practice.Furthermore, it shows that we can make use of the property graph database management system to answer navigational queries with regard to ontologies.
- Published
- 2022
- Full Text
- View/download PDF
26. PG-Keys: Keys for Property Graphs
- Author
-
Keith W. Hare, George H. L. Fletcher, Sławek Staworko, Michael Schmidt, Leonid Libkin, Jan Hidders, Stefania Dumbrava, Angela Bonifati, Victor E. Lee, Bei Li, Filip Murlak, Juan F. Sequeda, Wim Martens, Ognjen Savkovic, Dominik Tomaszuk, Renzo Angles, Josh Perryman, Universidad de Talaca, Université Claude Bernard Lyon 1 - Faculté des sciences (UCBL FS), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise (ENSIIE), Institut Polytechnique de Paris (IP Paris), Eindhoven University of Technology [Eindhoven] (TU/e), JCC Consulting Inc, Neo4j, Birkbeck College [University of London], TigerGraph, Google LLC, Laboratory for the Foundations of Computer Science [Edinburgh] (LFCS), University of Edinburgh, École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL), University of Bayreuth, Uniwersytet Warszawski, Interos Inc., Free University of Bozen-Bolzano, Amazon Web Services [Seattle] (AWS), data.world, Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Linking Dynamic Data (LINKS), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), University of Bialystok, Réponse efficace aux requêtes sous mises à jourEQUUS - ANR-19-CE48-0019AAPG2019 - 2019, Université Claude Bernard Lyon 1 - Faculté des sciences et technologies (UCBL FST), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2), École normale supérieure - Paris (ENS Paris), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
Focus (computing) ,Property (philosophy) ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,Standardization ,Computer science ,business.industry ,key constraints ,0102 computer and information sciences ,02 engineering and technology ,Linked data ,01 natural sciences ,010201 computation theory & mathematics ,020204 information systems ,Data integrity ,Benchmark (surveying) ,0202 electrical engineering, electronic engineering, information engineering ,Position (finance) ,Use case ,Software engineering ,business ,property graphs - Abstract
International audience; We report on a community effort between industry and academia to shape the future of property graph constraints. The standardization for a property graph query language is currently underway through the ISO Graph Query Language (GQL) project. Our position is that this project should pay close attention to schemas and constraints, and should focus next on key constraints. The main purposes of keys are enforcing data integrity and allowing the referencing and identifying of objects. Motivated by use cases from our industry partners, we argue that key constraints should be able to have different modes, which are combinations of basic restriction that require the key to be exclusive, mandatory, and singleton. Moreover, keys should be applicable to nodes, edges, and properties since these all can represent valid real-life entities. Our result is PG-Keys, a flexible and powerful framework for defining key constraints, which fulfills the above goals. PG-Keys is a design by the Linked Data Benchmark Council's Property Graph Schema Working Group, consisting of members from industry, academia, and ISO GQL standards group, intending to bring the best of all worlds to property graph practitioners. PG-Keys aims to guide the evolution of the standardization efforts towards making systems more useful, powerful, and expressive. CCS CONCEPTS • Information systems → Integrity checking; • Theory of computation → Data modeling; Database constraints theory.
- Published
- 2021
- Full Text
- View/download PDF
27. Schema Inference for Property Graphs
- Author
-
Lbath, Hanâ, Bonifati, Angela, Harmer, Russ, Modèles statistiques bayésiens et des valeurs extrêmes pour données structurées et de grande dimension (STATIFY), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Preuves et Langages (PLUME), Laboratoire de l'Informatique du Parallélisme (LIP), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon), Base de Données (BD), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL)
- Subjects
Index Terms-Big Graph management ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,ACM: H.: Information Systems/H.2: DATABASE MANAGEMENT/H.2.1: Logical Design/H.2.1.2: Schema and subschema ,graph databases ,graph subtyping ,schema inference ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,property graphs - Abstract
Property graph instances are typically populated without defining a schema beforehand. Although this ensures great flexibility, the lack of a schema implies to miss opportunities for query optimization, data integration and analytics, to name a few. Since several graph instances exist prior to the schema definition, extracting the schema from those instances in a principled way might become a significant yet daunting task. In this paper, we present a novel end-to-end schema inference method for property graph schemas that tackles complex and nested property values, multi-labeled nodes and node hierarchies. Our method consists of three main steps, the first of which builds upon Cypher queries to extract the node and edge serialization of a property graph. The second step builds over a MapReduce type inference system, working on the serialized output thereby obtained during the first step. The third step analyzes subtypes and supertypes to infer node hierarchies. We describe our schema inference pipeline and its implementation, a labels-and a properties-oriented variant. Finally, we experimentally evaluate and compare the scalability and accuracy of our approaches on several real-life datasets. To the best of our knowledge, our work is the first to tackle the problem of schema inference for property graphs.
- Published
- 2021
- Full Text
- View/download PDF
28. PG-explorer: Resource Description Framework data exploration with property graphs.
- Author
-
Jiang, Weihao, Yan, Li, Tu, Yaofeng, Zhou, Xiangsheng, and Ma, Zongmin
- Subjects
- *
RDF (Document markup language) , *INFORMATION sharing - Abstract
• A method of exploring RDF triples is proposed for user to manipulate property graphs interactively. • To improve exploration efficiency, RDF triples are persisted in graph databases. • We implement an RDF exploration system PG-Explorer and its usability is evaluated. The Resource Description Framework (RDF) has been widely applied to represent and exchange domain information because of its machine-readable characteristic. With a huge amount of RDF data available, retrieving RDF data is essential, so that many RDF query approaches have been developed. But many traditional approaches generally require users to know RDF model and query language, and this seriously prevents a large number of common non-expert users from obtaining information in RDF datasets. In this paper, we propose an approach that users can explore massive RDF datasets by interactively manipulating property graphs. Our approach provides users with a series of operations in interactively constructing property graphs to describe their query intents. To efficiently explore massive RDF datasets, we convert RDF data into property graphs for storage. This can greatly reduce the size of massive RDF datasets and more importantly the converted property graphs can instruct users to understand the underlying structure of RDF datasets, which is very useful in users' construction of their query property graphs. The constructed query property graphs are finally transformed into expressions with the query language of graph databases. With high-performance graph databases as query engines, we developed an RDF data exploration system – PG-Explorer. Through experiments over real-world datasets, we evaluated the effectiveness and superiority of our approach. The user study demonstrates that the proposed approach simplifies users' exploration of RDF data and can satisfy their exploration needs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
29. GGDs : Graph Generating Dependencies
- Author
-
George H. L. Fletcher, Larissa Capobianco Shimomura, Nikolay Yakovets, Database Group, and EAISI Foundational
- Subjects
FOS: Computer and information sciences ,Theoretical computer science ,business.industry ,Computer science ,Data management ,H.2 ,Databases (cs.DB) ,02 engineering and technology ,tuple-generating dependencies ,equality-generating dependencies ,Graph ,graph dependencies ,Computer Science - Databases ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Tuple ,business ,Formal description ,property graphs - Abstract
We propose Graph Generating Dependencies (GGDs), a new class of dependencies for property graphs. Extending the expressivity of state of the art constraint languages, GGDs can express both tuple- and equality-generating dependencies on property graphs, both of which find broad application in graph data management. We provide the formal definition of GGDs, analyze the validation problem for GGDs, and demonstrate the practical utility of GGDs., 5 pages
- Published
- 2020
30. Gradual Pattern Extraction from Property Graphs
- Author
-
Shah, Faaiz, WEB-CUBE, Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Université Montpellier, Anne Laurent, and Arnaud Castelltort
- Subjects
Gradual Patterns ,Motifs de graduels ,[INFO.INFO-OH]Computer Science [cs]/Other [cs.OH] ,Data Mining ,Graph de la propriété ,Fouille de données ,Property graphs - Abstract
Graph databases (NoSQL oriented graph databases) provide the ability to manage highly connected data and complex database queries along with the native graph-storage and processing. A property graph in a NoSQL graph engine is a labeled directed graph composed of nodes connected through relationships with a set of attributes or properties in the form of (key:value) pairs. It facilitates to represent the data and knowledge that are in form of graphs. Practical applications of graph database systems have been seen in social networks, recommendation systems, fraud detection, and data journalism, as in the case for panama papers. Often, we face the issue of missing data in such kind of systems. In particular, these semi-structured NoSQL databases lead to a situation where some attributes (properties) are filled-in while other ones are not available, either because they exist but are missing (for instance the age of a person that is unknown) or because they are not applicable for a particular case (for instance the year of military service for a girl in countries where it is mandatory only for boys). Therefore, some keys can be provided for some nodes and not for other ones. In such a scenario, when we want to extract knowledge from these new generation database systems, we face the problem of missing data that arise need for analyzing them. Some approaches have been proposed to replace missing values so as to be able to apply data mining techniques. However, we argue that it is not relevant to consider such approaches so as not to introduce biases or errors. In our work, we focus on the extraction of gradual patterns from property graphs that provide end-users with tools for mining correlations in the data when there exist missing values. Our approach requires first to define gradual patterns in the context of NoSQL property graph and then to extend existing algorithms so as to treat the missing values, because anti-monotonicity of the support can not be considered anymore in a simple manner. Thus, we introduce a novel approach for mining gradual patterns in the presence of missing values and we test it on real and synthetic data. Further to this work, we present our approach for mining such graphs in order to extract frequent gradual patterns in the form of "the more/less A_1,..., the more/less A_n" where A_i are information from the graph, should it be from the nodes or from the relationships. In order to retrieve more valuable patterns, we consider fuzzy gradual patterns in the form of "The more/less the A_1 is F_1,...,the more/less the A_n is F_n" where A_i are attributes retrieved from the graph nodes or relationships and F_i are fuzzy descriptions. For this purpose, we introduce the definitions of such concepts, the corresponding method for extracting the patterns, and the experiments that we have led on synthetic graphs using a graph generator. We show the results in terms of time utilization, memory consumption and the number of patterns being generated.; Les bases de données orientées graphes (NoSQL par exemple) permettent de gérer des données dans lesquelles les liens sont importants et des requêtes complexes sur ces données à l’aide d’un environnement dédié offrant un stockage et des traitements spécifiquement destinés à la structure de graphe. Un graphe de propriété dans un environnement NoSQL est alors vu comme un graphe orienté étiqueté dans lequel les étiquettes des nœuds et les relations sont des ensembles d’attributs (propriétés) de la forme (clé:valeur). Cela facilite la représentation de données et de connaissances sous la forme de graphes. De nombreuses applications réelles de telles bases de données sont actuellement connues dans le monde des réseaux sociaux, mais aussi des systèmes de recommandation, de la détection de fraudes, du data-journalisme (pour les panama papers par exemple). De telles structures peuvent cependant être assimilées à des bases NoSQL semi-structurées dans lesquelles toutes les propriétés ne sont pas présentes partout, ce qui conduit à des valeurs non présentes de manière homogène, soit parce que la valeur n’est pas connue (l’âge d’une personne par exemple) ou parce qu’elle n’est pas applicable (l’année du service militaire d’une femme par exemple dans un pays et à une époque à laquelle les femmes ne le faisaient pas). Cela gêne alors les algorithmes d’extraction de connaissance qui ne sont pas tous robustes aux données manquantes. Des approches ont été proposées pour remplacer les données manquantes et permettre aux algorithmes d’être appliqués. Cependant,nous considérons que de telles approches ne sont pas satisfaisantes car elles introduisent un biais ou même des erreurs quand aucune valeur n’était applicable. Dans nos travaux, nous nous focalisons sur l’extraction de motifs graduels à partir de telles bases de données. Ces motifs permettent d’extraire automatiquement les informations corrélées. Une première contribution est alors de définir quels sont les motifs pouvant être extraits à partir de telles bases de données. Nous devons, dans un deuxième temps, étendre les travaux existant dans la littérature pour traiter les valeurs manquantes dans les bases de données graphe, comme décrit ci-dessus. L’application de telles méthodes est alors rendue difficile car les propriétés classiquement appliquées en fouille de données (anti-monotonie) ne sont plus valides. Nous proposons donc une nouvelle approche qui est testée sur des données réelles et synthétiques. Une première forme de motif est extrait à partir des propriétés des nœuds et est étendue pour prendre en compte les relations entre nœuds. Enfin, notre approche est étendue au cas des motifs graduels flous afin de mieux prendre en compte la nature imprécise des connaissances présentes et à extraire. Les expérimentations sur des bases synthétiques ont été menées grâce au développement d’un générateur de bases de données de graphes de propriétés synthétiques. Nous en montrons les résultats en termes de temps calcul et consommation mémoire ainsi qu’en nombre de motifs générés.
- Published
- 2019
31. l’extraction de motifs graduels à partir de graphes de propriétés
- Author
-
Shah, Faaiz, WEB-CUBE, Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Université Montpellier, Anne Laurent, and Arnaud Castelltort
- Subjects
Gradual Patterns ,Motifs de graduels ,[INFO.INFO-OH]Computer Science [cs]/Other [cs.OH] ,Data Mining ,Graph de la propriété ,Fouille de données ,Property graphs - Abstract
Graph databases (NoSQL oriented graph databases) provide the ability to manage highly connected data and complex database queries along with the native graph-storage and processing. A property graph in a NoSQL graph engine is a labeled directed graph composed of nodes connected through relationships with a set of attributes or properties in the form of (key:value) pairs. It facilitates to represent the data and knowledge that are in form of graphs. Practical applications of graph database systems have been seen in social networks, recommendation systems, fraud detection, and data journalism, as in the case for panama papers. Often, we face the issue of missing data in such kind of systems. In particular, these semi-structured NoSQL databases lead to a situation where some attributes (properties) are filled-in while other ones are not available, either because they exist but are missing (for instance the age of a person that is unknown) or because they are not applicable for a particular case (for instance the year of military service for a girl in countries where it is mandatory only for boys). Therefore, some keys can be provided for some nodes and not for other ones. In such a scenario, when we want to extract knowledge from these new generation database systems, we face the problem of missing data that arise need for analyzing them. Some approaches have been proposed to replace missing values so as to be able to apply data mining techniques. However, we argue that it is not relevant to consider such approaches so as not to introduce biases or errors. In our work, we focus on the extraction of gradual patterns from property graphs that provide end-users with tools for mining correlations in the data when there exist missing values. Our approach requires first to define gradual patterns in the context of NoSQL property graph and then to extend existing algorithms so as to treat the missing values, because anti-monotonicity of the support can not be considered anymore in a simple manner. Thus, we introduce a novel approach for mining gradual patterns in the presence of missing values and we test it on real and synthetic data. Further to this work, we present our approach for mining such graphs in order to extract frequent gradual patterns in the form of "the more/less A_1,..., the more/less A_n" where A_i are information from the graph, should it be from the nodes or from the relationships. In order to retrieve more valuable patterns, we consider fuzzy gradual patterns in the form of "The more/less the A_1 is F_1,...,the more/less the A_n is F_n" where A_i are attributes retrieved from the graph nodes or relationships and F_i are fuzzy descriptions. For this purpose, we introduce the definitions of such concepts, the corresponding method for extracting the patterns, and the experiments that we have led on synthetic graphs using a graph generator. We show the results in terms of time utilization, memory consumption and the number of patterns being generated.; Les bases de données orientées graphes (NoSQL par exemple) permettent de gérer des données dans lesquelles les liens sont importants et des requêtes complexes sur ces données à l’aide d’un environnement dédié offrant un stockage et des traitements spécifiquement destinés à la structure de graphe. Un graphe de propriété dans un environnement NoSQL est alors vu comme un graphe orienté étiqueté dans lequel les étiquettes des nœuds et les relations sont des ensembles d’attributs (propriétés) de la forme (clé:valeur). Cela facilite la représentation de données et de connaissances sous la forme de graphes. De nombreuses applications réelles de telles bases de données sont actuellement connues dans le monde des réseaux sociaux, mais aussi des systèmes de recommandation, de la détection de fraudes, du data-journalisme (pour les panama papers par exemple). De telles structures peuvent cependant être assimilées à des bases NoSQL semi-structurées dans lesquelles toutes les propriétés ne sont pas présentes partout, ce qui conduit à des valeurs non présentes de manière homogène, soit parce que la valeur n’est pas connue (l’âge d’une personne par exemple) ou parce qu’elle n’est pas applicable (l’année du service militaire d’une femme par exemple dans un pays et à une époque à laquelle les femmes ne le faisaient pas). Cela gêne alors les algorithmes d’extraction de connaissance qui ne sont pas tous robustes aux données manquantes. Des approches ont été proposées pour remplacer les données manquantes et permettre aux algorithmes d’être appliqués. Cependant,nous considérons que de telles approches ne sont pas satisfaisantes car elles introduisent un biais ou même des erreurs quand aucune valeur n’était applicable. Dans nos travaux, nous nous focalisons sur l’extraction de motifs graduels à partir de telles bases de données. Ces motifs permettent d’extraire automatiquement les informations corrélées. Une première contribution est alors de définir quels sont les motifs pouvant être extraits à partir de telles bases de données. Nous devons, dans un deuxième temps, étendre les travaux existant dans la littérature pour traiter les valeurs manquantes dans les bases de données graphe, comme décrit ci-dessus. L’application de telles méthodes est alors rendue difficile car les propriétés classiquement appliquées en fouille de données (anti-monotonie) ne sont plus valides. Nous proposons donc une nouvelle approche qui est testée sur des données réelles et synthétiques. Une première forme de motif est extrait à partir des propriétés des nœuds et est étendue pour prendre en compte les relations entre nœuds. Enfin, notre approche est étendue au cas des motifs graduels flous afin de mieux prendre en compte la nature imprécise des connaissances présentes et à extraire. Les expérimentations sur des bases synthétiques ont été menées grâce au développement d’un générateur de bases de données de graphes de propriétés synthétiques. Nous en montrons les résultats en termes de temps calcul et consommation mémoire ainsi qu’en nombre de motifs générés.
- Published
- 2019
32. Extracting Fuzzy Gradual Patterns from Property Graphs
- Author
-
Anne Laurent, Faaiz Shah, Arnaud Castelltort, WEB-CUBE, Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), and Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)
- Subjects
Graph database ,Theoretical computer science ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,Computer science ,Fuzzy Gradual Patterns ,Fuzzy set ,02 engineering and technology ,Directed graph ,computer.software_genre ,NoSQL ,Fuzzy logic ,Graph ,Data modeling ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Property Graphs ,020201 artificial intelligence & image processing ,computer - Abstract
International audience; A property graph in a NoSQL graph database engine provides an efficient way to manage the data and knowledge due to its native graph-structure storage. A property graph is a labeled directed graph having nodes and relationships with a set of attributes or properties in form of (key:value) pairs. In this work, we aim at mining such graphs in order to extract frequent gradual patterns in the form of "the more/less A1,..., the more/less An" where Ai are information from the graph, should it be from the nodes or from the relationships. In order to retrieve more valuable patterns, we consider fuzzy gradual patterns in the form of "The more/less the A1 is F1,...,the more/less the An is Fn" where Ai are attributes retrieved from the graph nodes or relationships and Fi are fuzzy descriptions. For this purpose, we introduce the definitions of such concepts, the corresponding method for extracting the patterns, and the experiments that we have led on synthetic graphs using a graph generator. We show the results in terms of time and memory consumption.
- Published
- 2019
- Full Text
- View/download PDF
33. 4.-8. März 2019
- Author
-
Allen, David, Hodler, Amy, Hunger, Michael, Knobloch, Martin, Lyon, William, Needham, Mark, and Voigt, Hannes
- Subjects
graph algorithms ,Graph databases ,graph analytics ,MathematicsofComputing_DISCRETEMATHEMATICS ,property graphs - Abstract
Analytics of large graph data set has become an important means of understanding and influencing the world. The use of graph database technology in the International Consortium of Investigative Journalists’ (ICIJ) investigation of the Panama Papers and Paradise Papers or in cancer research illustrates how analysing graph-structured data helps to uncover important but hidden relationships. A very current example in that regards shows how graph analytics can help shed light on the operations of social media troll-networks, e.g. on Twitter. In similar fashion, graph analytics can help enterprises to unearth hidden patterns and structures within connected data, to make more accurate predictions and faster decisions. All this requires efficient graph analytics well-integrated with management of graph data. The Neo4j Graph Platform provides such an environment. It provides transactional processing and analytical processing of graph data including data management and analytics tooling. A central element for graph analytics in the Graph Platform are the Neo4j graph algorithms. Neo4j graph algorithms provide efficiently implemented, parallel versions of common graph algorithms, integrated and optimized for the Neo4j transactional database. In this paper, we will describe the design and integration Neo4j Graph Algorithms, demonstrate its utility of our approach with a Twitter Troll analysis, and show case its performance with a few experiments on large graphs.
- Published
- 2019
- Full Text
- View/download PDF
34. Towards a scalable generation of realistic property graphs with arbitrary schemas
- Author
-
Larriba Pey, Josep, Prat Pérez, Arnau, Fernández Salas, Xavier, Larriba Pey, Josep, Prat Pérez, Arnau, and Fernández Salas, Xavier
- Abstract
This Masters thesis provides the user with a comprehensive DSL to define property graph generation tasks, including the node and edge property schemas and the graph structure, generating an Intermediate Language representation, that will be executed on distributed computing framework implementations
- Published
- 2018
35. Towards a scalable generation of realistic property graphs with arbitrary schemas
- Author
-
Fernández Salas, Xavier, Larriba Pey, Josep, and Prat Pérez, Arnau
- Subjects
generació de grafs ,graphs ,graph generation ,Grafs, Teoria de ,macros de compilador ,compiler macros ,property graph generation ,metallenguatge ,Graph theory ,DSL ,macro annotations ,generació de grafs sintètics ,Scala ,grafs ,Gnormalizer ,benchmarks ,graph benchmarks ,compile-time code enrichment ,meta-language ,Babel ,property graphs ,DataSynth ,Matemàtiques i estadística::Matemàtica discreta::Teoria de grafs [Àrees temàtiques de la UPC] - Abstract
This Masters thesis provides the user with a comprehensive DSL to define property graph generation tasks, including the node and edge property schemas and the graph structure, generating an Intermediate Language representation, that will be executed on distributed computing framework implementations
- Published
- 2018
36. Semantically Linking in Silico Cancer Models
- Author
-
Zhihui Wang, Steve McKeever, Tom Quaiser, Eliezer Shochat, Anthony J. Connor, David Johnson, and Thomas S. Deisboeck
- Subjects
Medicin och hälsovetenskap ,Cancer Research ,Markup language ,Theoretical computer science ,computer.internet_protocol ,Computer science ,Cancer Model ,interoperability ,Medical and Health Sciences ,lcsh:RC254-282 ,neo4j ,03 medical and health sciences ,0302 clinical medicine ,in silico oncology ,linking models ,Controlled vocabulary ,Cancer models ,semantics ,030304 developmental biology ,Cancer och onkologi ,0303 health sciences ,Computational model ,Methodology ,Domain model ,Semantic interoperability ,lcsh:Neoplasms. Tumors. Oncology. Including cancer and carcinogens ,Data science ,tumor modeling ,model exploration ,Subject-matter expert ,Oncology ,Cancer and Oncology ,030220 oncology & carcinogenesis ,computer ,XML ,property graphs ,online repositories - Abstract
Multiscale models are commonplace in cancer modeling, where individual models acting on different biological scales are combined within a single, cohesive modeling framework. However, model composition gives rise to challenges in understanding interfaces and interactions between them. Based on specific domain expertise, typically these computational models are developed by separate research groups using different methodologies, programming languages, and parameters. This paper introduces a graph-based model for semantically linking computational cancer models via domain graphs that can help us better understand and explore combinations of models spanning multiple biological scales. We take the data model encoded by TumorML, an XML-based markup language for storing cancer models in online repositories, and transpose its model description elements into a graph-based representation. By taking such an approach, we can link domain models, such as controlled vocabularies, taxonomic schemes, and ontologies, with cancer model descriptions to better understand and explore relationships between models. The union of these graphs creates a connected property graph that links cancer models by categorizations, by computational compatibility, and by semantic interoperability, yielding a framework in which opportunities for exploration and discovery of combinations of models become possible.
- Published
- 2014
- Full Text
- View/download PDF
37. Semantically linking in silico cancer models.
- Author
-
Johnson D, Connor AJ, McKeever S, Wang Z, Deisboeck TS, Quaiser T, and Shochat E
- Abstract
Multiscale models are commonplace in cancer modeling, where individual models acting on different biological scales are combined within a single, cohesive modeling framework. However, model composition gives rise to challenges in understanding interfaces and interactions between them. Based on specific domain expertise, typically these computational models are developed by separate research groups using different methodologies, programming languages, and parameters. This paper introduces a graph-based model for semantically linking computational cancer models via domain graphs that can help us better understand and explore combinations of models spanning multiple biological scales. We take the data model encoded by TumorML, an XML-based markup language for storing cancer models in online repositories, and transpose its model description elements into a graph-based representation. By taking such an approach, we can link domain models, such as controlled vocabularies, taxonomic schemes, and ontologies, with cancer model descriptions to better understand and explore relationships between models. The union of these graphs creates a connected property graph that links cancer models by categorizations, by computational compatibility, and by semantic interoperability, yielding a framework in which opportunities for exploration and discovery of combinations of models become possible.
- Published
- 2014
- Full Text
- View/download PDF
38. Transforming RDF-star to Property Graphs: A Preliminary Analysis of Transformation Approaches
- Author
-
Ghadeer Abuoda, Daniele Dell'Aglio, Arthur Keen, and Katja Hose
- Subjects
Knowledge graph ,Property graphs ,Data management - Abstract
RDF and property graph models have many similarities, such as using basic graph concepts like nodes and edges. However, such models differ in their modeling approach, expressivity, serialization, and the nature of applications. RDF is the de-facto standard model for knowledge graphs on the Semantic Web and supported by a rich ecosystem for inference and processing. The property graph model, in contrast, provides advantages in scalable graph analytical tasks, such as graph matching, path analysis, and graph traversal. RDF-star extends RDF and allows capturing metadata as a first-class citizen. To tap on the advantages of alternative models, the literature proposes different ways of transforming knowledge graphs between property graphs and RDF. However, most of these approaches cannot provide complete transformations for RDF-star graphs. Hence, this paper provides a step towards transforming RDF-star graphs into property graphs. In particular, we identify different cases to evaluate transformation approaches from RDF-star to property graphs. Specifically, we categorize two classes of transformation approaches and analyze them based on the test cases. The obtained insights will form the foundation for building complete transformation approaches in the future.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.