Descriptor: "Data federation" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Data federation"' showing total 114 results

Start Over Descriptor "Data federation"

114 results on '"Data federation"'

1. Efficient and Secure Skyline Query Over Horizontal Data Federation

Author: Kuang, Yilun, Liu, An, Qu, Jianfeng, Fang, Junhua, Zhang, Xiao-Fang, Zhao, Lei, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Onizuka, Makoto, editor, Lee, Jae-Gil, editor, Tong, Yongxin, editor, Xiao, Chuan, editor, Ishikawa, Yoshiharu, editor, Amer-Yahia, Sihem, editor, Jagadish, H. V., editor, and Lu, Kejing, editor
Published: 2024
Full Text: View/download PDF

2. Segam: Secure and Efficient Group-by-Aggregation Queries across Multiple Private Database*

Author: Cao, Zicheng, Ma, Qingzhi, Chen, Wei, Zhao, Lei, Liu, An, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Onizuka, Makoto, editor, Lee, Jae-Gil, editor, Tong, Yongxin, editor, Xiao, Chuan, editor, Ishikawa, Yoshiharu, editor, Amer-Yahia, Sihem, editor, Jagadish, H. V., editor, and Lu, Kejing, editor
Published: 2024
Full Text: View/download PDF

3. Confirming Secure Interoperability in Mobile Financial Services: Challenges of Data Federation and Cryptography-Based Solution

Author: Khan, Razib Hayat, Haque, Rakib Ul, Syeed, M. M. Mahbubul, Uddin, Mohammad Faisal, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Nagar, Atulya K., editor, Jat, Dharm Singh, editor, Mishra, Durgesh, editor, and Joshi, Amit, editor
Published: 2024
Full Text: View/download PDF

4. Gaia-X & Fiware: Implementation of a Federated Data Platform in Smart Cities.

Author: Lopes, Pedro M., Guimarães, Pedro, Pereira, Tiago F., and Machado, Ricardo J.
Subjects: SMART cities, CITIES & towns, INFRASTRUCTURE (Economics), MUNICIPAL lighting, URBANIZATION, INTELLIGENT transportation systems
Abstract: Using and connecting different technologies to improve the quality of urban life is a significant challenge for smart cities. The interconnection of transport, energy, water, public lighting, and other urban infrastructure systems is critical to providing quality services to citizens and reducing environmental impact. However, data interoperability from these systems is a significant challenge, as data is mainly stored in different formats and locations from various sources. This paper examines the challenges of integrating existing datasets in intelligent cities and proposes solutions to create secure and interoperable data-sharing environments. To this end, an approach called data federation is discussed by implementing a federated platform for integrating such data. However, it is essential to note that there may be challenges in terms of cybersecurity and interoperability. This is where context brokers play a crucial role, regulating access to data by enforcing rules and providing security standards. In this way, these challenges can be managed appropriately, ensuring data protection and the system's proper functioning. Finally, it is essential to mention that this work is linked to a smart cities project, which aims to promote innovation in smart cities by implementing several innovative solutions. The study presented in this research contributes to understanding the challenges in smart cities. It proposes solutions for creating secure and interoperable data-sharing environments, which are essential both for the project's success and for the development of smart cities worldwide. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Semantics-Enabled Data Federation: Bringing Materials Scientists Closer to FAIR Data

Author: Aggour, Kareem S., Kumar, Vijay S., Gupta, Vipul K., Gabaldon, Alfredo, Cuddihy, Paul, and Mulwad, Varish
Published: 2024
Full Text: View/download PDF

6. Cross trust domain federated k-dominant skyline query processing

Author: Yexuan SHI, Yongxin TONG, Hao ZHOU, Ke XU, and Weifeng LYU
Subjects: k-dominant skyline, data federation, secure multi-party computation, homomorphic encryption, Electronic computers. Computer science, QA75.5-76.95
Abstract: k-dominant skyline is a prevailing skyline query which has widespread applications in multi-criteria decision making and recommendation.As these applications continuously scale up, there is an increasing demand to support k-dominant skyline over a data federation which consists of multiple data silos, each holding disjoint columns of the entire dataset.Yet it is challenging to support k-dominant skyline over a data federation.This is because strict security constraints are often imposed to query processing over data federations, whereas naively adopting security techniques leads to unacceptably inefficient queries.In this paper, we presented an efficient and secure k-dominant skyline for a data federation.Specifically, we devised a novel private vector aggregation-based solution with ciphertext compressionbased optimization for efficient k-dominant skyline query processing while providing security guarantees.Extensive evaluations on both synthetic and real datasets showed the superiority of our method.
Published: 2023
Full Text: View/download PDF

7. 跨信任域的联邦k-支配 Skyline查询算法.

Author: 史烨轩, 童咏昕, 周昊, 许可, and 吕卫锋
Abstract: k-dominant skyline is a prevailing skyline query which has widespread applications in multi-criteria decision making and recommendation. As these applications continuously scale up, there is an increasing demand to support k-dominant skyline over a data federation which consists of multiple data silos, each holding disjoint columns of the entire dataset. Yet it is challenging to support k-dominant skyline over a data federation. This is because strict security constraints are often imposed to query processing over data federations, whereas naively adopting security techniques leads to unacceptably inefficient queries. In this paper, we presented an efficient and secure k-dominant skyline for a data federation. Specifically, we devised a novel private vector aggregation-based solution with ciphertext compressionbased optimization for efficient k-dominant skyline query processing while providing security guarantees. Extensive evaluations on both synthetic and real datasets showed the superiority of our method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

8. Secure Multi-partyθ-join Algorithms Toward Data Federation.

Author: Yuanyuan Zhang, Shuyuan Li, Yexuan Shi, Nan Zhou, Yi Xu, and Ke Xu
Subjects: DATA security, GENERAL Data Protection Regulation, 2016
Abstract: Recently, many countries and regions have enacted data security policies, such as the General Data Protection Regulation proposed by the EU. The release of related laws and regulations has aggravated the problem of data silos, which makes it difficult to share data among various data owners. Data federation is a possible solution to this problem. Data federation refers to the calculation of query tasks jointly performed by multiple data owners without original data leaks using privacy computing technologies such as secure multi-party computing. This concept has become a research trend in recent years, and a series of representative systems have been proposed, such as SMCQL and Conclave. However, for the core join queries in the relational database system, the existing data federation system still has the following problems. First of all, the join query type is single, which is difficult to meet the query requirements under complex join conditions. Secondly, the algorithm performance has huge improvement space because the existing systems often call the security tool library directly, which means the runtime and communication overhead is high. Therefore, this paper proposes a join algorithm under data federation to address the above issues. The main contributions of this paper are as follows: firstly, multi-party-oriented federation security operators are designed and implemented, which can support many operations. Secondly, a federated 1-join algorithm and an optimization strategy are proposed to significantly reduce the security computation cost. Finally, the performance of the algorithm proposed in this paper is verified by the benchmark dataset TPC-H. The experimental results showthat the proposed algorithm can reduce the runtime and communication overhead by 61.33% and 95.26%, respectively, compared with the existing data federation systems SMCQL and Conclave. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. Puncturable Search: Enabling Authorized Search in Cross-data Federation

Author: Mei, Lin, Xu, Chungen, Li, Qianmu, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin (Sherman), Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Yuan, Xingliang, editor, Bao, Wei, editor, Yi, Xun, editor, and Tran, Nguyen Hoang, editor
Published: 2021
Full Text: View/download PDF

10. Data Federation Challenges in Remote Near-Real-Time Fusion Experiment Data Processing

Author: Choi, Jong, Wang, Ruonan, Churchill, R. Michael, Kube, Ralph, Choi, Minjun, Park, Jinseop, Logan, Jeremy, Mehta, Kshitij, Eisenhauer, Greg, Podhorszki, Norbert, Wolf, Matthew, Chang, C. S., Klasky, Scott, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Nichols, Jeffrey, editor, Verastegui, Becky, editor, Maccabe, Arthur ‘Barney’, editor, Hernandez, Oscar, editor, Parete-Koon, Suzanne, editor, and Ahearn, Theresa, editor
Published: 2020
Full Text: View/download PDF

11. A Semantic Model in the Context of Maintenance: A Predictive Maintenance Case Study.

Author: May, Gokan, Cho, Sangje, Majidirad, AmirHossein, and Kiritsis, Dimitris
Subjects: BUILDING maintenance, ONTOLOGY, DATA mapping, MAINTENANCE, INDUSTRY 4.0
Abstract: Advanced technologies in modern industry collect massive volumes of data from a plethora of sources, such as processes, machines, components, and documents. This also applies to predictive maintenance. To provide access to these data in a standard and structured way, researchers and practitioners need to design and develop a semantic model of maintenance entities to build a reference ontology for maintenance. To date, there have been numerous studies combining the domain of predictive maintenance and ontology engineering. However, such earlier works, which focused on semantic interoperability to exchange data with standardized meanings, did not fully leverage the opportunities provided by data federation to elaborate these semantic technologies further. Therefore, in this paper, we fill this research gap by addressing interoperability in smart manufacturing and the issue of federating different data formats effectively by using semantic technologies in the context of maintenance. Furthermore, we introduce a semantic model in the form of an ontology for mapping relevant data. The proposed solution is validated and verified using an industrial implementation. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

12. Data Federation in Healthcare for Artificial Intelligence Solutions.

Author: CASTELLANOS, Julio, RAPOSO, Gonzalo, and ANTUNEZ, Lucia
Abstract: Data federation offers a way to get data moving from multiple sources providing advantages in healthcare systems where medical data is often hard to reach because of regulations or the lack of reliable solutions that can integrate on top of protocols like FHIR, HL7, DICOM, among others. Given the increasing need for solutions that augment healthcare systems with artificial intelligence (AI), in fields like genomics, cancer treatment, and radiology, all of which will require solutions that can provide data at scale while being traceable, safe, and regulatory compliant. This paper proposes an architectural solution that may provide the core capabilities to implement a data federation approach in a healthcare system to enable AI. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

13. Towards a Scottish Safe Haven Federation.

Author: Chuang Gao, Charlie Mayor, Sophie McCall, Katie Wilde, Christian Cole, and Emily Jefferson
Subjects: Clinic Laboratory Data, Electronic Health Records, Safe Havens, Data Federation, Data Discovery, Demography. Population. Vital events, HB848-3697
Abstract: Objectives Building upon the Scottish Safe Haven Network’s (SSHN) collaborative experience to develop an exemplar dataset on Scottish national level laboratory tests within the SSHN. To test the feasibility of a federated safe haven network in Scotland for clinical laboratory data using existing federation solutions. Approach Considerations were given to the clinical tests that are being commonly used across Scotland. An investigation of laboratory data structure was conducted using SQL with the assistance of White Rabbit which captures the metadata information from each safe haven. Data heterogeneity or commonalities were investigated in detail in R. We examined the inter-relationships among clinical laboratories from different regions of NHS Scotland. A common data structure was proposed to facilitate the sharing of clinical test results. We investigated multiple existing federation solutions to streamline access to data across the SSHN. Results A dataset of all laboratory tests for patients registered with the Scottish Health Research Register and Biobank (SHARE) from laboratory data sources within the network (Grampian Data Safe Haven, Health Informatics Center, Glasgow Safe Haven & Lothian/DataLoch Safe Haven) were developed. An open-source toolkit that includes SQL scripts to harmonise laboratory data from multiple safe haven and R scripts to conduct heterogeneity investigation to streamline the sharing and the analysis was created. We have shown a working model for federation within Scotland for the first time with the view to expand into more routine projects. The project has developed a systematic concept mapping of data available across the SSHN to aid cohort discovery. Conclusions The exemplar national level laboratory data together with the toolkit will help provide researchers with information such as which data are available to access. It will also support and improve national-level research studies at a faster pace. The work has demonstrated the feasibility of a common process for federated data discovery and sharing across the SSHN.
Published: 2022
Full Text: View/download PDF

14. Smart tourism in Villages: Challenges and the Alpujarra Case Study.

Author: Flores-Crespo, Pedro, Bermudez-Edo, Maria, and Garrido, Jose Luis
Subjects: INFORMATION & communication technologies, TRAFFIC congestion, TOURISM, INTERNET of things, WASTE management
Abstract: Cities are making significant efforts to implement the Internet of Things (IoT) paradigm, in which sensors collect data from heterogeneous sources, and advanced software systems can provide an accurate city context. Most initiatives focus on improving the quality of life through energy-efficient buildings, waste management, or reducing traffic congestion. One of the main applications of IoT is smart tourism. This concept focuses on using Information and Communication Technologies to give visitors better experiences without interfering with the city's daily life. This paper studies related concepts and related works in smart tourism, sustainability, and IoT, identifying challenges and literature gaps. Finally, we provide a brief analysis of the real case study of the mountainous Alpujarra region in Spain based on its specific characteristics. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

15. Concept Bag: A New Method for Computing Concept Similarity in Biomedical Data

Author: Bradshaw, Richard L., Gouripeddi, Ramkiran, Facelli, Julio C., Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Rojas, Ignacio, editor, Valenzuela, Olga, editor, Rojas, Fernando, editor, and Ortuño, Francisco, editor
Published: 2019
Full Text: View/download PDF

16. An ontology-based approach for developing a harmonised data-validation tool for European cancer registration

Author: Nicholas Charles Nicholson, Francesco Giusti, Manola Bettio, Raquel Negrao Carvalho, Nadya Dimitrova, Tadeusz Dyba, Manuela Flego, Luciana Neamtiu, Giorgia Randi, and Carmen Martos
Subjects: Cancer registry, Ontology, Data validation, Data federation, Semantic web, Data harmonisation, Computer applications to medicine. Medical informatics, R858-859.7
Abstract: Abstract Background Population-based cancer registries constitute an important information source in cancer epidemiology. Studies collating and comparing data across regional and national boundaries have proved important for deploying and evaluating effective cancer-control strategies. A critical aspect in correctly comparing cancer indicators across regional and national boundaries lies in ensuring a good and harmonised level of data quality, which is a primary motivator for a centralised collection of pseudonymised data. The recent introduction of the European Union’s general data-protection regulation (GDPR) imposes stricter conditions on the collection, processing, and sharing of personal data. It also considers pseudonymised data as personal data. The new regulation motivates the need to find solutions that allow a continuation of the smooth processes leading to harmonised European cancer-registry data. One element in this regard would be the availability of a data-validation software tool based on a formalised depiction of the harmonised data-validation rules, allowing an eventual devolution of the data-validation process to the local level. Results A semantic data model was derived from the data-validation rules for harmonising cancer-data variables at European level. The data model was encapsulated in an ontology developed using the Web-Ontology Language (OWL) with the data-model entities forming the main OWL classes. The data-validation rules were added as axioms in the ontology. The reasoning function of the resulting ontology demonstrated its ability to trap registry-coding errors and in some instances to be able to correct errors. Conclusions Describing the European cancer-registry core data set in terms of an OWL ontology affords a tool based on a formalised set of axioms for validating a cancer-registry’s data set according to harmonised, supra-national rules. The fact that the data checks are inherently linked to the data model would lead to less maintenance overheads and also allow automatic versioning synchronisation, important for distributed data-quality checking processes.
Published: 2021
Full Text: View/download PDF

17. A Secure Multi-party Data Federation System.

Author: Shuyuan Li, Yudian Ji, Dingyuan Shi, Wangdong Liao, Lipeng Zhang, Yongxin Tong, and Ke Xu
Subjects: DATA mining, DATA analysis
Abstract: In the era of big data, data is of great value as an essential factor in production. It is of great significance to implement its analysis, mining, and utilization of large-scale data via data sharing. However, due to the heterogeneous dispersion of data and increasingly rigorous privacy protection regulations, data owners cannot arbitrarily share data, and thus data owners are turned into data silos. Since data federation can achieve collaborative queries while preserving the privacy of data silos, we present in this paper a secure multi-party relational data federation system based on the idea of federated computation that "data stays, computation moves." The system is compatible with a variety of relational databases and can shield users from the heterogeneity of the underlying data from multiple data owners. On the basis of secret sharing, the system implements the secure multi-party operator library supporting the secure multi-party basic operations, and the resulting reconstruction process of operators is optimized with higher execution efficiency. On this basis, the system supports query operations such as Summation (SUM), Averaging (AVG), Minimization/Maximization (MIN/MAX), equi-join, and θ-join and makes full use of multi-party features to reduce data interactions among data owners and security overhead, thus effectively supporting efficient data sharing. Finally, experiments are conducted on the benchmark dataset TPC-H. The experimental results show that the system can support more data owners than the current data federation systems SMCQL and Conclave and has higher execution efficiency in a variety of query operations, exceeding the existing systems by as much as 3.75 times. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

18. Merging NMR Data and Computation Facilitates Data-Centered Research

Author: Kumaran Baskaran, D. Levi Craft, Hamid R. Eghbalnia, Michael R. Gryk, Jeffrey C. Hoch, Mark W. Maciejewski, Adam D. Schuyler, Jonathan R. Wedell, and Colin W. Wilburn
Subjects: data federation, structural biology, data repositories, reproducible research, nuclear magnetic resonance, Biology (General), QH301-705.5
Abstract: The Biological Magnetic Resonance Data Bank (BMRB) has served the NMR structural biology community for 40 years, and has been instrumental in the development of many widely-used tools. It fosters the reuse of data resources in structural biology by embodying the FAIR data principles (Findable, Accessible, Inter-operable, and Re-usable). NMRbox is less than a decade old, but complements BMRB by providing NMR software and high-performance computing resources, facilitating the reuse of software resources. BMRB and NMRbox both facilitate reproducible research. NMRbox also fosters the development and deployment of complex meta-software. Combining BMRB and NMRbox helps speed and simplify workflows that utilize BMRB, and enables facile federation of BMRB with other data repositories. Utilization of BMRB and NMRbox in tandem will enable additional advances, such as machine learning, that are poised to become increasingly powerful.
Published: 2022
Full Text: View/download PDF

19. Data Federation in the Era of Digital, Consumer-Centric Cares and Empowered Citizens

Author: Nokkala, Tiina, Dahlberg, Tomi, Barbosa, Simone Diniz Junqueira, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Li, Hongxiu, editor, Pálsdóttir, Ágústa, editor, Trill, Roland, editor, Suomi, Reima, editor, and Amelina, Yevgeniya, editor
Published: 2018
Full Text: View/download PDF

20. A Semantic Model in the Context of Maintenance: A Predictive Maintenance Case Study

Author: Gokan May, Sangje Cho, AmirHossein Majidirad, and Dimitris Kiritsis
Subjects: predictive maintenance, Industry 4.0, semantic technologies, semantic interoperability, data federation, ontology engineering, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Advanced technologies in modern industry collect massive volumes of data from a plethora of sources, such as processes, machines, components, and documents. This also applies to predictive maintenance. To provide access to these data in a standard and structured way, researchers and practitioners need to design and develop a semantic model of maintenance entities to build a reference ontology for maintenance. To date, there have been numerous studies combining the domain of predictive maintenance and ontology engineering. However, such earlier works, which focused on semantic interoperability to exchange data with standardized meanings, did not fully leverage the opportunities provided by data federation to elaborate these semantic technologies further. Therefore, in this paper, we fill this research gap by addressing interoperability in smart manufacturing and the issue of federating different data formats effectively by using semantic technologies in the context of maintenance. Furthermore, we introduce a semantic model in the form of an ontology for mapping relevant data. The proposed solution is validated and verified using an industrial implementation.
Published: 2022
Full Text: View/download PDF

21. An ontology-based approach for developing a harmonised data-validation tool for European cancer registration.

Author: Nicholson, Nicholas Charles, Giusti, Francesco, Bettio, Manola, Negrao Carvalho, Raquel, Dimitrova, Nadya, Dyba, Tadeusz, Flego, Manuela, Neamtiu, Luciana, Randi, Giorgia, and Martos, Carmen
Subjects: *PERSONALLY identifiable information, *EPIDEMIOLOGY of cancer, *INFORMATION resources, *SOFTWARE development tools, *DATA modeling
Abstract: Background: Population-based cancer registries constitute an important information source in cancer epidemiology. Studies collating and comparing data across regional and national boundaries have proved important for deploying and evaluating effective cancer-control strategies. A critical aspect in correctly comparing cancer indicators across regional and national boundaries lies in ensuring a good and harmonised level of data quality, which is a primary motivator for a centralised collection of pseudonymised data. The recent introduction of the European Union's general data-protection regulation (GDPR) imposes stricter conditions on the collection, processing, and sharing of personal data. It also considers pseudonymised data as personal data. The new regulation motivates the need to find solutions that allow a continuation of the smooth processes leading to harmonised European cancer-registry data. One element in this regard would be the availability of a data-validation software tool based on a formalised depiction of the harmonised data-validation rules, allowing an eventual devolution of the data-validation process to the local level. Results: A semantic data model was derived from the data-validation rules for harmonising cancer-data variables at European level. The data model was encapsulated in an ontology developed using the Web-Ontology Language (OWL) with the data-model entities forming the main OWL classes. The data-validation rules were added as axioms in the ontology. The reasoning function of the resulting ontology demonstrated its ability to trap registry-coding errors and in some instances to be able to correct errors. Conclusions: Describing the European cancer-registry core data set in terms of an OWL ontology affords a tool based on a formalised set of axioms for validating a cancer-registry's data set according to harmonised, supra-national rules. The fact that the data checks are inherently linked to the data model would lead to less maintenance overheads and also allow automatic versioning synchronisation, important for distributed data-quality checking processes. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

22. Ontology-based data federation : a framework proposal

Author: Gu, Zhenzhen, Calvanese, Diego, Di Panfilo, Marco, Lanti, Davide, Mosca, Alessandro, Xiao, Guohui, Gu, Zhenzhen, Calvanese, Diego, Di Panfilo, Marco, Lanti, Davide, Mosca, Alessandro, and Xiao, Guohui
Abstract: Ontology-based data access (OBDA) is a well established approach to information management that facilitates the access to relational data sources through the mediation of a conceptual domain view, given in terms of an ontology, and the use of a declarative mapping between the data layer and the ontology. We formally introduce here the notion of ontology-based data federation (OBDF) to denote a framework that combines OBDA with a data federation layer where multiple heterogeneous sources are virtually exposed as a single relational database. We discuss opportunities and challenges of OBDF, and propose novel techniques to make query answering in the OBDF setting more efficient. Our techniques are validated through an extensive experimental evaluation based on the Berlin SPARQL Benchmark. This work is an abridged version of [1].
Published: 2023

23. Empowering citizens through data interoperability - data federation applied to consumer-centric healthcare

Author: Tiina Anneli Nokkala and Tomi Dahlberg
Subjects: patient empowerment, data federation, data interoperability, health literacy, consumer-centric healthcare, Computer applications to medicine. Medical informatics, R858-859.7, Public aspects of medicine, RA1-1270
Abstract: During the era of open systems, healthcare services and related data are in a constant flux caused by digital transformation. The amount, sources and dimensionality of data grow rapidly, and solutions for data governance, integration and interoperability are urgently needed. At the same time, digital data and information technology–enabled healthcare services are offered as a means to empower citizens. The objective is for active citizens to take better care of their own health. It is possible to support empowerment in many ways, such as with easy-to-use information systems (IS) or personal health records (PHR), or by supporting citizens’ participation in health data creation. In this article, we first present the federative approach to data governance with data federation matrixes in order to show how data are made interoperable by combining data from different data storages. Federation matrixes define shared attributes with their technical, information-flow and socio-contextual metadata. We then contemplate how the federative approach can be deployed to citizens’ healthcare data empowerment. We propose that data ontologies, e.g., data federation matrixes, are useful in bridging gaps between the social contexts of citizens and healthcare professionals and, by doing so, to promote citizen empowerment. The present article contributes to research on the federative approach to data governance, its deployment to citizens’ healthcare empowerment, and to the practice-oriented further development of the federation matrix tools for this and other use cases.
Published: 2019
Full Text: View/download PDF

24. Data system design alters meaning in ecological data: salmon habitat restoration across the U.S. Pacific Northwest

Author: Stephen L. Katz, Katie A. Barnas, Monica Diaz, and Stephanie E. Hampton
Subjects: applied epistemology, bioinformatics, crosswalk, data confederation, data federation, data synthesis, Ecology, QH540-549.5
Abstract: Abstract As an increasing variety and complexity of environmental issues confront scientists and natural resource managers, assembling the most relevant and informative data into accessible data systems becomes critical to timely problem solving. Data interoperability is the key criterion for succeeding in that assembly, and much informatics research is focused on data federation, or synthesis to produce interoperable data. However, when candidate data come from numerous, diverse, and high‐value legacy data sources, the issue of data variety or heterogeneity can be a significant impediment to interoperability. Research in informatics, computer science and philosophy has frequently focused on resolving data heterogeneity with automation, but subject matter expertise still plays a large role. In particular, human expertise is a large component in the development of tools such as data dictionaries, crosswalks, and ontologies. Such representations may not always match from one data system to another, presenting potentially inconsistent results even with the same data. Here, we use a long‐term data set on management actions designed to improve stream habitat for endangered salmon in the Pacific Northwest, to illustrate how different representations can change the underlying information content in the data system. We pass the same data set comprised of 49,619 records through three ontologies, each developed to address a rational management need, and show that the inferences drawn from the data can change with choice of data representation or ontology. One striking example shows that the use of one ontology would suggest water quality improvement projects are the rarest and most expensive restoration actions undertaken, while another will suggest these actions to be the most common and least expensive type of management actions. The discrepancy relates to the origins of the data dictionaries themselves, with one designed to catalog management actions and the other focused on ecological processes. Thus, we argue that in data federation efforts humans are “in the loop” rationally, in the form of the ontologies they have chosen, and diminishing the human component in favor of automation carries risks. Consequently, data federation exercises should be accompanied by validations in order to evaluate and manage those risks.
Published: 2019
Full Text: View/download PDF

25. The Alleviate Advanced Pain Discovery Platform Data Hub

Author: Mulligan, Gordon, Hall, Christopher, Appleby, Philip, Masood, Erum, Martin, Gillian, Beggs, Jillian, Chuter, Antony, Giles, Thomas Charles, Villalon, Armando M, and Quinlan, Philip
Subjects: FAIR data, Pain, Chronic pain, Data federation, OMOP
Abstract: There are many clinical and non-clinical datasets investigating pain that have been collected by researchers over many years, however, finding and getting access to them is challenging. The data are siloed in hard-to-reach places, in non-standard formats, and it is not possible to assess how relevant they are before getting access. This is a barrier to the pain research community and results in duplication of effort. It does not have to be this way and there are alternative solutions available. Alleviate is an HDR UK Data Hub for the federated querying and secure sharing of UK pain data to researchers, analysts and clinicians at a national and international level. Alleviate is the Data Hub for the Advanced Pain Discovery Platform (APDP), a £24 million research initiative to break through the complexity of pain and reveal potential new treatment approaches to address a wide spectrum of chronic and debilitating clinical conditions.
Published: 2023
Full Text: View/download PDF

26. HOW TO ADDRESS MASTER DATA COMPLEXITY IN INFORMATION SYSTEMS DEVELOPMENT -- A FEDERATIVE APPROACH.

Author: Dahlberg, Tomi, Lagstedt, Altti, and Nokkala, Tiina
Subjects: INFORMATION storage & retrieval systems, DATA integration, METADATA, DATA management, INTERNETWORKING
Abstract: We investigated the failure of an IS project within a global industrial company. The challenges in the complexity of product master data were one of the failure reasons. We detected a research gap in how to handle master data complexity in IS development, especially in data storage integrations. The traditional approach is to define so-called golden records, "single versions of truth", for each record, and then harmonize and cleanse data so that only or mainly golden record values will be used. We offer federative approach as an alternative to the golden record approach. According to this approach data interoperability is achieved by identifying shared attributes, by federating data on the basis of shared attributes' metadata, and by developing IS functionalities to process the metadata and their crossreferences. We compare the ontological stances of the approaches theoretically and with figures. We present the results of a case, where the federative approach was probed. Our study contributes to research by showing how to link data management to IS development to address the complexity of master data in data interoperability projects, by comparing the golden record and the federative approaches, and by showing how the federative approach can be used in real-life contexts. [ABSTRACT FROM AUTHOR]
Published: 2018

27. Cloud Data Federation for Scientific Applications

Author: Koulouzis, Spiros, Vasyunin, Dmitry, Cushing, Reginald, Belloum, Adam, Bubak, Marian, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Kobsa, Alfred, editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Weikum, Gerhard, editor, an Mey, Dieter, editor, Alexander, Michael, editor, Bientinesi, Paolo, editor, Cannataro, Mario, editor, Clauss, Carsten, editor, Costan, Alexandru, editor, Kecskemeti, Gabor, editor, Morin, Christine, editor, Ricci, Laura, editor, Sahuquillo, Julio, editor, Schulz, Martin, editor, Scarano, Vittorio, editor, Scott, Stephen L., editor, and Weidendorfer, Josef, editor
Published: 2014
Full Text: View/download PDF

28. Data system design alters meaning in ecological data: salmon habitat restoration across the U.S. Pacific Northwest.

Author: KATZ, STEPHEN L., BARNAS, KATIE A., DIAZ, MONICA, and HAMPTON, STEPHANIE E.
Subjects: SYSTEMS design, PHILOSOPHY of science, DATA dictionaries, INTERNETWORKING, SALMON, PACIFIC salmon, ONTOLOGIES (Information retrieval), ESTUARINE restoration
Abstract: As an increasing variety and complexity of environmental issues confront scientists and natural resource managers, assembling the most relevant and informative data into accessible data systems becomes critical to timely problem solving. Data interoperability is the key criterion for succeeding in that assembly, and much informatics research is focused on data federation, or synthesis to produce interoperable data. However, when candidate data come from numerous, diverse, and high-value legacy data sources, the issue of data variety or heterogeneity can be a significant impediment to interoperability. Research in informatics, computer science and philosophy has frequently focused on resolving data heterogeneity with automation, but subject matter expertise still plays a large role. In particular, human expertise is a large component in the development of tools such as data dictionaries, crosswalks, and ontologies. Such representations may not always match from one data system to another, presenting potentially inconsistent results even with the same data. Here, we use a long-term data set on management actions designed to improve stream habitat for endangered salmon in the Pacific Northwest, to illustrate how different representations can change the underlying information content in the data system.We pass the same data set comprised of 49,619 records through three ontologies, each developed to address a rational management need, and show that the inferences drawn from the data can change with choice of data representation or ontology. One striking example shows that the use of one ontology would suggest water quality improvement projects are the rarest and most expensive restoration actions undertaken, while another will suggest these actions to be the most common and least expensive type of management actions. The discrepancy relates to the origins of the data dictionaries themselves, with one designed to catalog management actions and the other focused on ecological processes. Thus, we argue that in data federation efforts humans are "in the loop" rationally, in the form of the ontologies they have chosen, and diminishing the human component in favor of automation carries risks. Consequently, data federation exercises should be accompanied by validations in order to evaluate and manage those risks. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

29. NCBI’s Virus Discovery Codeathon: Building 'FIVE' —The Federated Index of Viral Experiments API Index

Author: Joan Martí-Carreras, Alejandro Rafael Gener, Sierra D. Miller, Anderson F. Brito, Christiam E. Camacho, Ryan Connor, Ward Deboutte, Cody Glickman, David M. Kristensen, Wynn K. Meyer, Sejal Modha, Alexis L. Norris, Surya Saha, Anna K. Belford, Evan Biederstedt, James Rodney Brister, Jan P. Buchmann, Nicholas P. Cooley, Robert A. Edwards, Kiran Javkar, Michael Muchow, Harihara Subrahmaniam Muralidharan, Charles Pepe-Ranney, Nidhi Shah, Migun Shakya, Michael J. Tisza, Benjamin J. Tully, Bert Vanmechelen, Valerie C. Virta, JL Weissman, Vadim Zalunin, Alexandre Efremov, and Ben Busby
Subjects: data federation, CRISPR, protein domain, metagenomics, virus, genome graphs, Microbiology, QR1-502
Abstract: Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus–host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.
Published: 2020
Full Text: View/download PDF

30. Blue-Cloud final conference slides

Author: Julia Vera, Sara Pittonet, Dick Schaap, Pasquale Pagano, Patricia Cabrera, Jean-Olivier Irisson, Massimiliano Drudi, Anton Ellenbroek, Andreas Petzold, Marina Tonani, Akrivi Vivian Kiousi, Tiziana Ferrari, Laurent Delauney, and Marc Taconet
Subjects: EOSC, Open Science, DTO, VRE, Data infrastructure, Data federation, VLab, Demonstrators, Marine environment
Abstract: The Blue-Cloud final conference was organised on 8 December 2022 and represented anopportunity to learn about the main results, achievements and the road ahead of this key component of the EU “Future of Seas and Oceans Flagship Initiative”. By reading and downloading these presentations you can gain insight into the key conclusions stemming from this 3-year effort to build a “marine” thematic community within the European Open Science Cloud (EOSC) and into the role it can play in evolving a thriving EU digital knowledge system in support of the EU Green Deal, Mission Ocean & the UN Decade of Ocean Science for Sustainable Development.
Published: 2022
Full Text: View/download PDF

31. Distributed Archives, Databases and Data Portals: The Scene

Author: Lautenschlager, Michael, Hiller, Wolfgang, Budich, Reinhard, and Redler, René
Published: 2013
Full Text: View/download PDF

32. Federated Query Processing Service in Service Oriented Business Intelligence

Author: Hema, M. S., Chandramathi, S., Das, Vinu V, editor, Stephen, Janahanlal, editor, and Chaba, Yogesh, editor
Published: 2011
Full Text: View/download PDF

33. New Challenges in Information Integration

Author: Haas, Laura M., Soffer, Aya, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Pedersen, Torben Bach, editor, Mohania, Mukesh K., editor, and Tjoa, A Min, editor
Published: 2009
Full Text: View/download PDF

34. Ontology-based data federation

Author: Gu, Zhenzhen, Lanti, Davide, Mosca, Alessandro, Xiao, Guohui, Xiong, Jing, Calvanese, Diego, Gu, Zhenzhen, Lanti, Davide, Mosca, Alessandro, Xiao, Guohui, Xiong, Jing, and Calvanese, Diego
Abstract: We formally introduce ontology-based data federation (OBDF), to denote a framework combining ontology-based data access (OBDA) with a data federation layer, which virtually exposes multiple heterogeneous sources as a single relational database. In this setting, the SQL queries generated by the OBDA component by translating user SPARQL queries are further transformed by the data federation layer so as to be efficiently executed over the data sources. The structure of these SQL queries directly affects their execution time in the data federation layer and their optimization is crucial for performance. We propose here novel optimizations specific for OBDF, which are based on “hints” about existing data redundancies in the sources, empty join operations, and the need for materialized views. Such hints can be systematically inferred by analyzing the OBDA mappings and ontology and exploited to simplify the query structure. We also carry out an experimental evaluation in which we show the effectiveness of our optimizations.
Published: 2022

35. Data Lakes em ambientes híbridos Cloud/Edge

Author: Costa, Daniel Vilar da, Vilaça, Ricardo Manuel Pereira, Pereira, José, and Universidade do Minho
Subjects: Replicação, Exploratory data analysis, Ambiente Cloud/Edge, Federação de dados, Replication, Data federation, Synchronization, Sincronização, Análise de dados exploratória, Cloud/Edge environment, Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Abstract: Dissertação de mestrado integrado em Engenharia Informática, A análise dos dados tem sido, tradicionalmente, realizada em servidores na nuvem, onde a capacidade de armazenamento e de processamento são quase ilimitadas. Em contrapartida, os dispositivos periféricos têm severas limitações tanto de armazenamento como de processamento. No entanto, estes dispositivos encontram-se mais próximos do local onde os dados são gerados. Por causa disso, estes são, usualmente, utilizados para cargas de trabalho transacionais onde a confiabilidade e interatividade são fulcrais. Devido às limitações dos dispositivos periféricos, os dados são, geralmente, extraídos periodicamente para a nuvem onde são depois armazenados e processados. De modo a permitir a análise exploratória de dados heterogéneos, é comum utilizar uma infraestrutura Data Lake que permite gerir dados em formato bruto de múltiplas fontes. No entanto, transferir todos os dados coletados para a nuvem é inviável devido à limitada capacidade da rede que não tem conseguido acompanhar o crescimento do volume de dados coletados. Esta dissertação ultrapassa estes desafios ao implementar um componente middleware capaz de armazenar os dados previamente transmitidos na nuvem e propaga partes da interrogação para a periferia. Deste modo, consegue-se reduzir o volume de dados transferido ao enviar, idealmente, apenas uma vez os dados necessários para responder aos pedidos. Além disso, esta solução equilibra o impacto na rede e o custo computacional na periferia de modo a minimizar o tempo de execução., Data analysis has traditionally been performed on dedicated servers in the cloud, where storage and processing capabilities are almost unlimited, in contrast to edge devices. Nonetheless, these devices are closer to where data is generated. Because of this, they have, usually, a transactional workload, where reliability and interactivity are essential. Due to the limitations of edge devices, generally, data is extracted periodically to the cloud to be stored and processed. In order to allow exploratory data analysis, the heterogeneous data is stored in a Data Lake infrastructure that manages data in raw format from multiple data sources. Nonetheless, transferring all collected data to the cloud is unfeasible because the increase in the volume of collected data has surpassed the network capabilities. This thesis overcomes these challenges by employing a middleware component capable of storing previously transmitted data in the cloud and pushing down query fragments to the edge. Consequently, the volume of data transmitted to the cloud is reduced by uploading, ideally, only once the required data. Furthermore, the solution balances the impact on the network and the computational effort in the edge in order to minimize execution time., Parcialmente financiado pelo projeto AIDA – Adaptive, Intelligent and Distributed Assurance Platform (POCI-01-0247-FEDER-045907), cofinanciado pelo Fundo Europeu de Desenvolvimento Regional (FEDER) através do Programa Operacional da Competitividade e Internacionalização (COMPETE 2020) e pela Fundação para a Ciência e Tecnologia (FCT) no âmbito do CMU Portugal.
Published: 2022

36. A COMPARATIVE STUDY OF DATA FEDERATION TOOLS FOR INTEGRATION.

Author: Nabi, Zubair, Sabir, Naima, Bilal, Muhammad Atif, and Ayub, Nafees
Subjects: DATA integration, SEMANTICS, COMPUTER software
Abstract: Information group is a classification of information joining that gives the capacity to question and integrate information from independent sources in a virtual database. In this exploration we concentrated on comparative investigation of semantic information grouping tools for information coordination. We discover similarities and contrasts between these tools. The aim of this research is the creation of a methodology for evaluating Data Federation Measurement Tools. This research will help to understand the capabilities of data federation tools, difference between vendors' contributions and possible solutions with organization's requirements, development of an implementation strategy and optimize our investment in data federation tools. [ABSTRACT FROM AUTHOR]
Published: 2017

37. FAIR and GDPR Compliant Population Health Data Generation, Processing and Analytics

Subjects: FAIR Data, Biomedical Ontologies, Data Stewardship, Clinical Data, Data Visiting, GDPR, Data Federation, Data management
Abstract: Generating and analysing patient data in clinical settings is an inherently sensitive process, requiring collaborative effort between clinicians and informaticians to generate value from these data, while mitigating risks to the data subject. As a result, efforts in utilizing external patient data pose significant challenges. We propose a data-centric framework based on the FAIR principles and GDPR guidelines to enhance data management at the point of care. By using the process of data visiting, a cross-facility method for federated data analytics, we can automate generation of novel aggregate data which was previously not realizable. In two sequential studies we show that these techniques, supported by a data stewardship programme, increase community-wide involvement in data generation, improve transparency and trust, provide direct value and data ownership, and enable regulatory and ethically compliant, cross-national data visiting under curated accessibility patterns for federated analytics.
Published: 2022

38. GAIA-X Compatible Data Flow Monitoring in Data Exchange System

Author: Akther, Shamshad, Informaatioteknologian ja viestinnän tiedekunta - Faculty of Information Technology and Communication Sciences, and Tampere University
Subjects: Data Flow Monitoring, Data Exchange, Master's Programme in Computing Sciences, Federated Data Infrastructure, Data Federation
Abstract: The Internet of Things (IoT) age is here, thanks to the increased availability and affordability of small computing devices like sensors. The rapid expansion of IoT devices has resulted in larger volumes of data being generated and exchanged between numerous entities. It is challenging to manage such a massive volume of data created from various sources while also controlling how the data is shared or routed. This emphasizes the need for effective data management while maintaining data sovereignty and efficient data transmission between providers and customers. To this goal, Nokia Bell Labs has developed a distributed data dissemination system based on the Publish/Subscribe messaging protocol and according to the microservice architecture. Meanwhile, GAIA-X is working on a federated data architecture that will deliver a networked data infrastructure to meet the most demanding digital sovereignty requirements while staying future-proof. This thesis is part of the Nokia Bell Labs project, the goal of the project is to find whether Nokia's data exchange system is GAIA-X compatible. Nokia's data exchange system can currently monitor traffic between individual system components. However, for the system to be GAIA-X compatible, it must adhere to GAIA-X data exchange standards. One requirement for the data exchange system is that it has mechanisms to guarantee data sovereignty. The system must track data flow across microservices within components and identify unexpected data usage rules breaches. Hence, data usage policy violations must be noticed in both manual unauthorized system service calls and internal misbehaviour flow. This thesis presents a monitoring approach based on a distributed tracing mechanism to implement such detection capabilities. Distributed tracing mechanism is a technique for profiling and monitoring systems, particularly those built on a microservices architecture, such as the current data exchange system. It introduces the benefits of logging and monitoring the distributed system. A proof-of-concept solution for the monitoring of the system using distributed tracing mechanism is presented and its implications on the system performance are discussed.
Published: 2022

39. FAIR and GDPR Compliant Population Health Data Generation, Processing and Analytics

Subjects: FAIR Data, Biomedical Ontologies, Data Stewardship, Clinical Data, Data Visiting, GDPR, Data Federation, Data management
Abstract: Generating and analysing patient data in clinical settings is an inherently sensitive process, requiring collaborative effort between clinicians and informaticians to generate value from these data, while mitigating risks to the data subject. As a result, efforts in utilizing external patient data pose significant challenges. We propose a data-centric framework based on the FAIR principles and GDPR guidelines to enhance data management at the point of care. By using the process of data visiting, a cross-facility method for federated data analytics, we can automate generation of novel aggregate data which was previously not realizable. In two sequential studies we show that these techniques, supported by a data stewardship programme, increase community-wide involvement in data generation, improve transparency and trust, provide direct value and data ownership, and enable regulatory and ethically compliant, cross-national data visiting under curated accessibility patterns for federated analytics.
Published: 2022

40. ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data.

Author: Carter, Kim W., Francis, Richard W., Carter, K W, Francis, R W, Bresnahan, M, Gissler, M, Grønborg, T K, Gross, R, Gunnes, N, Hammond, G, Hornig, M, Hultman, C M, Huttunen, J, Langridge, A, Leonard, H, Newman, S, Parner, E T, Petersson, G, Reichenberg, A, and Sandin, S
Subjects: *COMPUTER software, *STATISTICAL power analysis, *DATA analysis, *INTERNATIONAL relations, *QUANTITATIVE research
Abstract: Background: Research studies exploring the determinants of disease require sufficient statistical power to detect meaningful effects. Sample size is often increased through centralized pooling of disparately located datasets, though ethical, privacy and data ownership issues can often hamper this process. Methods that facilitate the sharing of research data that are sympathetic with these issues and which allow flexible and detailed statistical analyses are therefore in critical need. We have created a software platform for the Virtual Pooling and Analysis of Research data (ViPAR), which employs free and open source methods to provide researchers with a web-based platform to analyse datasets housed in disparate locations.Methods: Database federation permits controlled access to remotely located datasets from a central location. The Secure Shell protocol allows data to be securely exchanged between devices over an insecure network. ViPAR combines these free technologies into a solution that facilitates 'virtual pooling' where data can be temporarily pooled into computer memory and made available for analysis without the need for permanent central storage.Results: Within the ViPAR infrastructure, remote sites manage their own harmonized research dataset in a database hosted at their site, while a central server hosts the data federation component and a secure analysis portal. When an analysis is initiated, requested data are retrieved from each remote site and virtually pooled at the central site. The data are then analysed by statistical software and, on completion, results of the analysis are returned to the user and the virtually pooled data are removed from memory.Conclusions: ViPAR is a secure, flexible and powerful analysis platform built on open source technology that is currently in use by large international consortia, and is made publicly available at [http://bioinformatics.childhealthresearch.org.au/software/vipar/]. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

41. GA4GH: International policies and standards for data sharing across genomic research and healthcare

Author: Abigail Wexner Research Institute, Academy of Finland, Medical Research Future Fund, BioBank Japan, Canada Foundation for Innovation, Canadian Institutes of Health Research, European Commission, German Research Foundation, Genome Canada, Google, Howard Hughes Medical Institute, Instituto de Salud Carlos III, Japan Agency for Medical Research and Development, Mayo Clinic, Fundación la Caixa, Ministère de l'Économie et de l'Innovation (Québec), Monarch Initiative, National Human Genome Research Institute (US), National University of Singapore, Agency for Science, Technology and Research A*STAR (Singapore), National Health and Medical Research Council (Australia), National Institutes of Health (US), National Institute of General Medical Sciences (US), Swiss Institute of Bioinformatics, State Secretariat for Education, Research and Innovation (Switzerland), Terry Fox Research Institute, Canada Research Chairs, European Molecular Biology Laboratory, Ministry of Research, Innovation and Science (Ontario), Ontario Genomics Institute, Natural Sciences and Engineering Research Council of Canada, Wellcome Trust, National Taiwan University, Rehm, Heidi L., Page, Angela, Smith, Lindsay, Adams, Jeremy B., Alterovitz, Gil, Babb, Lawrence J., Barkley, Maxmillian P., Baudis, Michael, Beauvais, Michael J. S., Beck, Tim, Beckmann, Jacques S., Varma, Susheel, Vears, Danya F., Viner, Coby, Voisin, Craig, Wagner, Alex H., Wallace, Susan E., Walsh, Brian P., Williams, Marc S., Winkler, Eva C., Brudno, Michael, Kelleher, Jerome, Wold, Barbara J., Wood, Grant M., Woolley, J. Patrick, Yamasaki, Chisato, Yates, Andrew D., Yung, Christina K., Zass, Lyndon J., Zaytseva, Ksenia, Zhang, Junjun, Goodhand, Peter, Kerry, Giselle, Brush,Matthew H., North, Kathryn, Birney, Ewan, Bujold, David, Burdett, Tony, Buske, Orion J., Cabili, Moran N., Cameron, Daniel L., Carroll, Robert J., Casas-Silva, Esmeralda, Khor, Seik-Soon, Chakravarty, Debyani, Chaudhari, Bimal P., Chen, Shu Hui, Cherry, J. Michael, Chung, Justina, Cline, Melissa, Clissold, Hayley L., Cook-Deegan, Robert M., Courtot, Melanie, Cunningham, Fiona, Knoppers, Bartha M., Cupak, Miro, Davies, Robert M., Denisko, Danielle, Doerr, Megan J., Dolman, Lena I., Dove, Edward S., Dursi, Lewis Jonathan, Dyke, Stephanie O. M., Eddy, James A., Eilbeck, Karen, Konopko, Melissa A., Ellrott, Kyle P., Fairley, Susan, Fakhro, Khalid A., Firth, Helen V., Fitzsimons, Michael S., Fiume, Marc, Flicek, Paul, Fore, Ian M., Freeberg, Mallory A., Freimuth, Robert R., Kosaki, Kenjiro, Fromont, Lauren A., Fuerth, Jonathan, Gaff, Clara L., Gan, Weiniu, Ghanaim, Elena M., Glazer, David, Green, Robert C., Griffith, Malachi, Griffith, Obi L., Grossman, Robert L., Kuba, Martin, Groza, Tudor, Guidry Auvil, Jaime M., Guigó, Roderic, Gupta, Dipayan, Haendel, Melissa A., Hamosh, Ada, Hansen, David P., Hart, Reece K., Hartley, Dean Mitchell, Haussler, David, Lawson, Jonathan, Hendricks-Sturrup, Rachele M., Ho, Calvin W. L., Hobb, Ashley E., Hoffman, Michael M., Hofmann, Oliver M., Holub, Petr, Shujui Hsu, Jacob, Hubaux, Jean-Pierre, Hunt, Sarah E., Husami, Ammar, Leinonen, Rasko, Jacobsen, Julius O., Jamuar, Saumya S., Janes, Elizabeth L., Jeanson, Francis, Jene, Aina, Johns, Amber L., Joly, Yann, Jones, Steven J. M., Kanitz, Alexander, Kato, Kazuto, Li, Stephanie, Keane, Thomas M., Kekesi-Lafrance, Kristina, Beltran, Sergi, Lin, Michael F., Linden, Mikael, Liu, Xianglin, Udara Liyanage, Isuru, López, Javier, Lucassen, Anneke M., Lukowski, Michael, Mann, Alice L., Marshall, John, Mattioni, Michele, Bernick, David, Metke-Jiménez, Alejandro, Middleton, Anna, Milne, Richard J., Molnár-Gábor, Fruzsina, Mulder, Nicola, Muñoz-Torres, Mónica C., Nag, Rishi, Nakagawa, Hidewaki, Nasir, Jamal, Navarro, Arcadi, Bernier, Alexander, Nelson, Tristan H., Niewielska, Ania, Nisselle, Amy, Niu, Jeffrey, Nyrönen, Tommi H., O’Connor, Brian D., Oesterle, Sabine, Ogishima, Soichi, Wang, Vivian Ota, Paglione, Laura A. D., Bonfield, James K., Palumbo, Emilio, Parkinson, Helen E., Philippakis, Anthony A., Pizarro, Angel D., Prlic, Andreas, Rambla, Jordi, Rendon, Augusto, Rider, Renee A., Robinson, Peter N., Rodarmer, Kurt W., Boughtwood, Tiffany F., Lyman Rodríguez, Laura, Rubin, Alan F., Rueda, Manuel, Rushton, Gregory A., Ryan, Rosalyn S., Saunders, Gary I., Schuilenburg, Helen, Schwede, Torsten, Scollen, Serena, Senf, Alexander, Bourque, Guillaume, Sheffield, Nathan C., Skantharajah, Neerjah, Smith, Albert V., Sofia, Heidi J., Spalding, Dylan, Spurdle, Amanda B., Stark, Zornitza, Stein, Lincoln D., Suematsu, Makoto, Tan, Patrick, Bowers, Sarion R., Tedds, Jonathan A., Thomson, Alastair A., Thorogood, Adrian, Tickle, Timothy L., Tokunaga, Katsushi, Törnroos,Juha, Torrents, David, Upchurch, Sean, Valencia, Alfonso, Valls Guimera, Roman, Brookes, Anthony J., Vamathevan, Jessica, Abigail Wexner Research Institute, Academy of Finland, Medical Research Future Fund, BioBank Japan, Canada Foundation for Innovation, Canadian Institutes of Health Research, European Commission, German Research Foundation, Genome Canada, Google, Howard Hughes Medical Institute, Instituto de Salud Carlos III, Japan Agency for Medical Research and Development, Mayo Clinic, Fundación la Caixa, Ministère de l'Économie et de l'Innovation (Québec), Monarch Initiative, National Human Genome Research Institute (US), National University of Singapore, Agency for Science, Technology and Research A*STAR (Singapore), National Health and Medical Research Council (Australia), National Institutes of Health (US), National Institute of General Medical Sciences (US), Swiss Institute of Bioinformatics, State Secretariat for Education, Research and Innovation (Switzerland), Terry Fox Research Institute, Canada Research Chairs, European Molecular Biology Laboratory, Ministry of Research, Innovation and Science (Ontario), Ontario Genomics Institute, Natural Sciences and Engineering Research Council of Canada, Wellcome Trust, National Taiwan University, Rehm, Heidi L., Page, Angela, Smith, Lindsay, Adams, Jeremy B., Alterovitz, Gil, Babb, Lawrence J., Barkley, Maxmillian P., Baudis, Michael, Beauvais, Michael J. S., Beck, Tim, Beckmann, Jacques S., Varma, Susheel, Vears, Danya F., Viner, Coby, Voisin, Craig, Wagner, Alex H., Wallace, Susan E., Walsh, Brian P., Williams, Marc S., Winkler, Eva C., Brudno, Michael, Kelleher, Jerome, Wold, Barbara J., Wood, Grant M., Woolley, J. Patrick, Yamasaki, Chisato, Yates, Andrew D., Yung, Christina K., Zass, Lyndon J., Zaytseva, Ksenia, Zhang, Junjun, Goodhand, Peter, Kerry, Giselle, Brush,Matthew H., North, Kathryn, Birney, Ewan, Bujold, David, Burdett, Tony, Buske, Orion J., Cabili, Moran N., Cameron, Daniel L., Carroll, Robert J., Casas-Silva, Esmeralda, Khor, Seik-Soon, Chakravarty, Debyani, Chaudhari, Bimal P., Chen, Shu Hui, Cherry, J. Michael, Chung, Justina, Cline, Melissa, Clissold, Hayley L., Cook-Deegan, Robert M., Courtot, Melanie, Cunningham, Fiona, Knoppers, Bartha M., Cupak, Miro, Davies, Robert M., Denisko, Danielle, Doerr, Megan J., Dolman, Lena I., Dove, Edward S., Dursi, Lewis Jonathan, Dyke, Stephanie O. M., Eddy, James A., Eilbeck, Karen, Konopko, Melissa A., Ellrott, Kyle P., Fairley, Susan, Fakhro, Khalid A., Firth, Helen V., Fitzsimons, Michael S., Fiume, Marc, Flicek, Paul, Fore, Ian M., Freeberg, Mallory A., Freimuth, Robert R., Kosaki, Kenjiro, Fromont, Lauren A., Fuerth, Jonathan, Gaff, Clara L., Gan, Weiniu, Ghanaim, Elena M., Glazer, David, Green, Robert C., Griffith, Malachi, Griffith, Obi L., Grossman, Robert L., Kuba, Martin, Groza, Tudor, Guidry Auvil, Jaime M., Guigó, Roderic, Gupta, Dipayan, Haendel, Melissa A., Hamosh, Ada, Hansen, David P., Hart, Reece K., Hartley, Dean Mitchell, Haussler, David, Lawson, Jonathan, Hendricks-Sturrup, Rachele M., Ho, Calvin W. L., Hobb, Ashley E., Hoffman, Michael M., Hofmann, Oliver M., Holub, Petr, Shujui Hsu, Jacob, Hubaux, Jean-Pierre, Hunt, Sarah E., Husami, Ammar, Leinonen, Rasko, Jacobsen, Julius O., Jamuar, Saumya S., Janes, Elizabeth L., Jeanson, Francis, Jene, Aina, Johns, Amber L., Joly, Yann, Jones, Steven J. M., Kanitz, Alexander, Kato, Kazuto, Li, Stephanie, Keane, Thomas M., Kekesi-Lafrance, Kristina, Beltran, Sergi, Lin, Michael F., Linden, Mikael, Liu, Xianglin, Udara Liyanage, Isuru, López, Javier, Lucassen, Anneke M., Lukowski, Michael, Mann, Alice L., Marshall, John, Mattioni, Michele, Bernick, David, Metke-Jiménez, Alejandro, Middleton, Anna, Milne, Richard J., Molnár-Gábor, Fruzsina, Mulder, Nicola, Muñoz-Torres, Mónica C., Nag, Rishi, Nakagawa, Hidewaki, Nasir, Jamal, Navarro, Arcadi, Bernier, Alexander, Nelson, Tristan H., Niewielska, Ania, Nisselle, Amy, Niu, Jeffrey, Nyrönen, Tommi H., O’Connor, Brian D., Oesterle, Sabine, Ogishima, Soichi, Wang, Vivian Ota, Paglione, Laura A. D., Bonfield, James K., Palumbo, Emilio, Parkinson, Helen E., Philippakis, Anthony A., Pizarro, Angel D., Prlic, Andreas, Rambla, Jordi, Rendon, Augusto, Rider, Renee A., Robinson, Peter N., Rodarmer, Kurt W., Boughtwood, Tiffany F., Lyman Rodríguez, Laura, Rubin, Alan F., Rueda, Manuel, Rushton, Gregory A., Ryan, Rosalyn S., Saunders, Gary I., Schuilenburg, Helen, Schwede, Torsten, Scollen, Serena, Senf, Alexander, Bourque, Guillaume, Sheffield, Nathan C., Skantharajah, Neerjah, Smith, Albert V., Sofia, Heidi J., Spalding, Dylan, Spurdle, Amanda B., Stark, Zornitza, Stein, Lincoln D., Suematsu, Makoto, Tan, Patrick, Bowers, Sarion R., Tedds, Jonathan A., Thomson, Alastair A., Thorogood, Adrian, Tickle, Timothy L., Tokunaga, Katsushi, Törnroos,Juha, Torrents, David, Upchurch, Sean, Valencia, Alfonso, Valls Guimera, Roman, Brookes, Anthony J., and Vamathevan, Jessica
Abstract: The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.
Published: 2021

42. Trusted Data Spaces as a Viable and Sustainable Solution for Networks of Population-Based Patient Registries.

Author: Nicholson N, Caldeira S, Furtado A, and Nicholl C
Subjects: Humans, Registries, European Union, Information Systems, Public Health
Abstract: Harmonization and integration of health data remain as the focus of many ongoing efforts toward the goal of optimizing health and health care policies. Population-based patient registries constitute a critical element of these endeavors. Although their main function is monitoring and surveillance of a particular disease within a given population, they are also an important data source for epidemiology. Comparing indicators across national boundaries brings an extra dimension to the use of registry data, especially in regions where supranational initiatives are or could be coordinated to leverage good practices; this is particularly relevant for the European Union. However, strict data protection laws can unintentionally hamper the efforts of data harmonization to ensure the removal of statistical bias in the individual data sets, thereby compromising the integrated value of registries' data. Consequently, there is the motivation for creating a new paradigm to ensure that registries can operate in an environment that is not unnecessarily restrictive and to allow accurate comparison of data to better ascertain the measures and practices that are most conducive to the public health of societies. The pan-European organizational model of cancer registries, owing to its long and successful establishment, was considered as a sound basis from which to proceed toward such a paradigm. However, it has certain drawbacks, particularly regarding governance, scalability, and resourcing, which are essential elements to consider for a generic patient registry model. These issues are addressed in a proposal of an adapted model that promises a valuable pan-European data resource for epidemiological research, while providing a closely regulated environment for the processing of pseudonymized patient summary data on a broader scale than has hitherto been possible., (©Nicholas Nicholson, Sandra Caldeira, Artur Furtado, Ciaran Nicholl. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 13.01.2023.)
Published: 2023
Full Text: View/download PDF

43. An ontology-based approach for developing a harmonised data-validation tool for European cancer registration

Author: M Bettio, Carmen Martos, L. Neamtiu, N. Dimitrova, N. Nicholson, M. Flego, Raquel Negrao Carvalho, T.A. Dyba, Giorgia Randi, and Francesco Giusti
Subjects: Computer Networks and Communications, Computer science, Population, Data validation, Health Informatics, Ontology (information science), Semantic data model, lcsh:Computer applications to medicine. Medical informatics, 03 medical and health sciences, Data harmonisation, Neoplasms, Humans, media_common.cataloged_instance, Data federation, European union, education, Language, 030304 developmental biology, 0505 law, computer.programming_language, media_common, 0303 health sciences, education.field_of_study, Ontology, 05 social sciences, Web Ontology Language, Cancer registry, Data science, Computer Science Applications, Data model, Data quality, 050501 criminology, lcsh:R858-859.7, computer, Software, Semantic web, Information Systems
Abstract: Background Population-based cancer registries constitute an important information source in cancer epidemiology. Studies collating and comparing data across regional and national boundaries have proved important for deploying and evaluating effective cancer-control strategies. A critical aspect in correctly comparing cancer indicators across regional and national boundaries lies in ensuring a good and harmonised level of data quality, which is a primary motivator for a centralised collection of pseudonymised data. The recent introduction of the European Union’s general data-protection regulation (GDPR) imposes stricter conditions on the collection, processing, and sharing of personal data. It also considers pseudonymised data as personal data. The new regulation motivates the need to find solutions that allow a continuation of the smooth processes leading to harmonised European cancer-registry data. One element in this regard would be the availability of a data-validation software tool based on a formalised depiction of the harmonised data-validation rules, allowing an eventual devolution of the data-validation process to the local level. Results A semantic data model was derived from the data-validation rules for harmonising cancer-data variables at European level. The data model was encapsulated in an ontology developed using the Web-Ontology Language (OWL) with the data-model entities forming the main OWL classes. The data-validation rules were added as axioms in the ontology. The reasoning function of the resulting ontology demonstrated its ability to trap registry-coding errors and in some instances to be able to correct errors. Conclusions Describing the European cancer-registry core data set in terms of an OWL ontology affords a tool based on a formalised set of axioms for validating a cancer-registry’s data set according to harmonised, supra-national rules. The fact that the data checks are inherently linked to the data model would lead to less maintenance overheads and also allow automatic versioning synchronisation, important for distributed data-quality checking processes.
Published: 2021

44. Cloud and data federation in MobiDataLab

Author: Francesco Lettich, Patrizio Dazzi, Emanuele Carlini, Raffaele Perego, and Chiara Renso
Subjects: Mobility, Government, Process management, business.industry, Short paper, Cloud Federation, 020206 networking & telecommunications, Cloud computing, Cloud federation, 02 engineering and technology, Data Federation, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), 020201 artificial intelligence & image processing, Data federation, business
Abstract: Today's innovative digital services dealing with the mobility of per- sons and goods produce huge amount of data. To propose advanced and efficient mobility services, the collection and aggregation of new sources of data from various producers are necessary. The overall objective of the MobiDataLab H2020 project is to propose to the mobility stakeholders (transport organising authorities, operators, industry, government and innovators) reproducible methodologies and sustainable tools that foster the development of a data-sharing culture in Europe and beyond. This short paper introduces the key concepts driving the design and definition of the Cloud and Data Federation that stands at the basis of MobiDataLab.
Published: 2021

45. GA4GH: International policies and standards for data sharing across genomic research and healthcare

Author: Amber L. Johns, Ian Fore, Juha Törnroos, Melissa Haendel, Bimal Chaudhari, J. Patrick Woolley, Brian Walsh, Susan Fairley, Jonathan A. Tedds, Jessica Vamathevan, Martin Kuba, Clara L. Gaff, Ksenia Zaytseva, Sabine Oesterle, David Bujold, Sarion R. Bowers, Alexander Kanitz, Jordi Rambla, Anthony J. Brookes, Alice L. Mann, Gregory A. Rushton, Paul Flicek, Seik-Soon Khor, Khalid A. Fakhro, Aina Jene, Miro Cupak, Moran N. Cabili, Emilio Palumbo, Nathan C. Sheffield, Vivian Ota Wang, James K. Bonfield, Julius O.B. Jacobsen, Michael M. Hoffman, Neerjah Skantharajah, Ewan Birney, Rasko Leinonen, Anna Middleton, Anneke M. Lucassen, Ania Niewielska, Angela Page, Jeffrey Niu, Alastair A. Thomson, Elena M. Ghanaim, Albert V. Smith, Megan Doerr, Lena I. Dolman, Arcadi Navarro, Ada Hamosh, Sean Upchurch, Michael Baudis, Jerome Kelleher, Marc Fiume, Mikael Linden, Roderic Guigó, Orion J. Buske, Tristan H. Nelson, Kyle Ellrott, Lauren A. Fromont, Alex H. Wagner, Alexander Senf, Tommi Nyrönen, Michele Mattioni, David Haussler, Alejandro Metke-Jimenez, Francis Jeanson, Mélanie Courtot, David Hansen, Matthew H. Brush, Helen Parkinson, Peter Goodhand, Lindsay Smith, Jonathan Fuerth, Stephanie Li, Tim Beck, Debyani Chakravarty, Kristina Kekesi-Lafrance, Giselle Kerry, James A. Eddy, Torsten Schwede, Jaime M. Guidry Auvil, Xianglin Liu, Soichi Ogishima, Fiona Cunningham, Oliver Hofmann, Dean Hartley, Amy Nisselle, Katsushi Tokunaga, Alfonso Valencia, Hidewaki Nakagawa, Kurt W. Rodarmer, Lawrence J. Babb, Heidi J. Sofia, David Glazer, Angel Pizarro, Ammar Husami, Gil Alterovitz, Serena Scollen, J. Michael Cherry, Helen V. Firth, Zornitza Stark, Monica C. Munoz-Torres, Daniel L Cameron, Robert R. Freimuth, Manuel Rueda, Stephanie O.M. Dyke, Makoto Suematsu, Christina K. Yung, Rosalyn S. Ryan, Chisato Yamasaki, Michael S. Fitzsimons, Amanda B. Spurdle, Renee A. Rider, Karen Eilbeck, Ashley E. Hobb, Roman Valls Guimera, Calvin W. L. Ho, Robert L. Davies, Maxmillian P. Barkley, Malachi Griffith, Rishi Nag, Javier Lopez, Jacob Shujui Hsu, Isuru Udara Liyanage, Petr Holub, Dylan Spalding, Reece K. Hart, Barbara J. Wold, Fruzsina Molnár-Gábor, Sarah E. Hunt, Augusto Rendon, Danielle Denisko, Dipayan Gupta, Obi L. Griffith, Robert J. Carroll, Patrick Tan, Craig Voisin, Saumya Shekhar Jamuar, Mallory A. Freeberg, Michael Brudno, Andreas Prlic, Kenjiro Kosaki, Shu Hui Chen, Edward S. Dove, Tony Burdett, Anthony A. Philippakis, Richard Milne, Bartha Maria Knoppers, Kathryn North, David Torrents, Eva C. Winkler, Marc S. Williams, Melissa A. Konopko, Rachele M. Hendricks-Sturrup, Brian O'Connor, Grant M. Wood, Robert L. Grossman, Timothy L. Tickle, Michael F. Lin, Laura Lyman Rodriguez, Weiniu Gan, Laura A.D. Paglione, Justina Chung, Thomas M. Keane, Susan E. Wallace, Lyndon J. Zass, Heidi L. Rehm, Kazuto Kato, Alexander Bernier, Nicola Mulder, Jamal Nasir, Yann Joly, Junjun Zhang, Adrian Thorogood, Lincoln Stein, Guillaume Bourque, L. Jonathan Dursi, Tudor Groza, Jean-Pierre Hubaux, Coby Viner, Helen Schuilenburg, Sergi Beltran, Michael J.S. Beauvais, Hayley L. Clissold, Elizabeth L. Janes, Jacques S. Beckmann, Michael Lukowski, Melissa S. Cline, John F. Marshall, Alan F. Rubin, Tiffany Boughtwood, Peter N. Robinson, Robert C. Green, Robert Cook-Deegan, Esmeralda Casas-Silva, Jeremy Adams, Steven J.M. Jones, Gary I. Saunders, Danya F. Vears, Jonathan Lawson, Andrew D. Yates, David Bernick, Susheel Varma, Middleton, Anna [0000-0003-3103-8098], Milne, Richard [0000-0002-8770-2384], Apollo - University of Cambridge Repository, Abigail Wexner Research Institute, Academy of Finland, Medical Research Future Fund, BioBank Japan, Canada Foundation for Innovation, Canadian Institutes of Health Research, European Commission, German Research Foundation, Genome Canada, Google, Howard Hughes Medical Institute, Instituto de Salud Carlos III, Japan Agency for Medical Research and Development, Mayo Clinic, Fundación 'la Caixa', Ministère de l'Économie et de l'Innovation (Québec), Monarch Initiative, National Human Genome Research Institute (US), National University of Singapore, Agency for Science, Technology and Research A*STAR (Singapore), National Health and Medical Research Council (Australia), National Institutes of Health (US), National Institute of General Medical Sciences (US), Swiss Institute of Bioinformatics, State Secretariat for Education, Research and Innovation (Switzerland), Terry Fox Research Institute, Canada Research Chairs, European Molecular Biology Laboratory, Ministry of Research, Innovation and Science (Ontario), Ontario Genomics Institute, Natural Sciences and Engineering Research Council of Canada, Wellcome Trust, and National Taiwan University
Subjects: Standards, Knowledge management, data sharing, precision medicine, Interoperability, Technical standard, data federation, Article, 3105 Genetics, 03 medical and health sciences, 0302 clinical medicine, data access, learning health system, Clinical Research, Health care, genomics, Genetics, Data federation, Data access, 030304 developmental biology, 0303 health sciences, business.industry, Precision medicine, Human Genome, Learning health system, 3 Good Health and Well Being, Bioethics, Genomics, Health Services, 3. Good health, Data sharing, Data aggregator, Policy, 030220 oncology & carcinogenesis, FOS: Biological sciences, standards, Business, Generic health relevance, bioethics, policy, 31 Biological Sciences
Abstract: The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits., B.P.C. acknowledges funding from Abigail Wexner Research Institute at Nationwide Children’s Hospital; T.H. Nyrönen acknowledges funding from Academy of Finland grant #31996; A.M.-J., K.N., T.F.B., O.M.H., and Z.S. acknowledge funding from Australian Medical Research Future Fund; M.S. acknowledges funding from Biobank Japan; D. Bujold and S.J.M.J. acknowledge funding from Canada Foundation for Innovation; L.J.D. acknowledges funding from Canada Foundation for Innovation Cyber Infrastructure grant #34860; D. Bujold and G.B. acknowledge funding from CANARIE; L.J.D. acknowledges funding from CANARIE Research Data Management contract #RDM-090 (CHORD) and #RDM2-053 (ClinDIG); K.K.-L. acknowledges funding from CanSHARE; T.L.T. acknowledges funding from Chan Zuckerberg Initiative; T. Burdett acknowledges funding from Chan Zuckerberg Initiative grant #2017-171671; D. Bujold, G.B., and L.D.S. acknowledge funding from CIHR; L.J.D. acknowledges funding from CIHR grant #404896; M.J.S.B. acknowledges funding from CIHR grant #SBD-163124; M. Courtot and M. Linden acknowledge funding from CINECA project EU Horizon 2020 grant #825775; D. Bujold and G.B. acknowledge funding from Compute Canada; F.M.-G. acknowledges funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – NFDI 1/1 “GHGA – German Human Genome-Phenome Archive; R.M.H.-S. acknowledges funding from Duke-Margolis Center for Health Policy; S.B. and A.J.B. acknowledge funding from EJP-RD EU Horizon 2020 grant #825575; A. Niewielska, A.K., D.S., G.I.S., J.A.T., J.R., M.A.K., M. Baudis, M. Linden, S.B., S.S., T.H. Nyrönen, and T.M.K. acknowledge funding from ELIXIR; A. Niewielska acknowledges funding from EOSC-Life EU Horizon 2020 grant #824087; J.-P.H. acknowledges funding from ETH Domain Strategic Focal Area “Personalized Health and Related Technologies (PHRT)” grant #2017-201; F.M.-G. acknowledges funding from EUCANCan EU Horizon 2020 grant #825835; B.M.K., D. Bujold, G.B., L.D.S., M.J.S.B., N.S., S.E.W., and Y.J. acknowledge funding from Genome Canada; B.M.K., M.J.S.B., S.E.W., and Y.J. acknowledge funding from Genome Quebec; F.M.-G. acknowledges funding from German Human Genome-Phenome Archive; C. Voisin acknowledges funding from Google; A.J.B. acknowledges funding from Health Data Research UK Substantive Site Award; D.H. acknowledges funding from Howard Hughes Medical Institute; S.B. acknowledges funding from Instituto de Salud Carlos III; S.-S.K. and K.T. acknowledge funding from Japan Agency for Medical Research and Development (AMED); S. Ogishima acknowledges funding from Japan Agency for Medical Research and Development (AMED) grant #20kk0205014h0005; C.Y. and K. Kosaki acknowledge funding from Japan Agency for Medical Research and Development (AMED) grant #JP18kk0205012; GEM Japan acknowledges funding from Japan Agency for Medical Research and Development (AMED) grants #19kk0205014h0004, #20kk0205014h0005, #20kk0205013h0005, #20kk0205012h0005, #20km0405401h0003, and #19km0405001h0104; J.R. acknowledges funding from La Caixa Foundation under project #LCF/PR/GN13/50260009; R.R.F. acknowledges funding from Mayo Clinic Center for Individualized Medicine; Y.J. and S.E.W. acknowledge funding from Ministère de l’Économie et de l’Innovation du Québec for the Can-SHARE Connect Project; S.E.W. and S.O.M.D. acknowledge funding from Ministère de l’Économie et de l’Innovation du Québec for the Can-SHARE grant #141210; M.A.H., M.C.M.-T., J.O.J., H.E.P., and P.N.R. acknowledge funding from Monarch Initiative grant #R24OD011883 and Phenomics First NHGRI grant #1RM1HG010860; A.L.M. and E.B. acknowledge funding from MRC grant #MC_PC_19024; P.T. acknowledges funding from National University of Singapore and Agency for Science, Technology and Research; J.M.C. acknowledges funding from NHGRI; A.H.W. acknowledges funding from NHGRI awards K99HG010157, R00HG010157, and R35HG011949; A.M.-J., K.N., D.P.H., O.M.H., T.F.B., and Z.S. acknowledge funding from NHMRC grants #GNT1113531 and #GNT2000001; D.L.C. acknowledges funding from NHMRC Ideas grant #1188098; A.B.S. acknowledges funding from NHMRC Investigator Fellowship grant #APP177524; J.M.C. and L.D.S. acknowledge funding from NIH; A.A.P. acknowledges funding from NIH Anvil; A.V.S. acknowledges funding from NIH contract #HHSN268201800002I (TOPMed Informatics Research Center); S.U. acknowledges funding from NIH ENCODE grant #UM1HG009443; M.C.M.-T. and M.A.H. acknowledge funding from NIH grant #1U13CA221044; R.J.C. acknowledges funding from NIH grants #1U24HG010262 and #1U2COD023196; M.G. acknowledges funding from NIH grant #R00HG007940; J.B.A., S.L., P.G., E.B., H.L.R., and L.S. acknowledge funding from NIH grant #U24HG011025; K.P.E. acknowledges funding from NIH grant #U2C-RM-160010; J.A.E. acknowledges funding from NIH NCATS grant #U24TR002306; M.M. acknowledges funding from NIH NCI contract #HHSN261201400008c and ID/IQ Agreement #17X146 under contract #HHSN2612015000031 and #75N91019D00024; R.M.C.-D. acknowledges funding from NIH NCI grant #R01CA237118; M. Cline acknowledges funding from NIH NCI grant #U01CA242954; K.P.E. acknowledges funding from NIH NCI ITCR grant #1U24CA231877-01; O.L.G. acknowledges funding from NIH NCI ITCR grant #U24CA237719; R.L.G. acknowledges funding from NIH NCI task order #17X147F10 under contract #HHSN261200800001E; A.F.R. acknowledges funding from NIH NHGRI grant #RM1HG010461; N.M. and L.J.Z. acknowledge funding from NIH NHGRI grant #U24HG006941; R.R.F., T.H. Nelson, L.J.B., and H.L.R. acknowledge funding from NIH NHGRI grant #U41HG006834; B.J.W. acknowledges funding from NIH NHGRI grant #UM1HG009443A; M. Cline acknowledges funding from NIH NHLBI BioData Catalyst Fellowship grant #5118777; M.M. acknowledges funding from NIH NHLBI BioData Catalyst Program grant #1OT3HL142478-01; N.C.S. acknowledges funding from NIH NIGMS grant #R35-GM128636; M.C.M.-T., M.A.H., P.N.R., and R.R.F. acknowledge funding from NIH NLM contract #75N97019P00280; E.B. and A.L.M. acknowledge funding from NIHR; R.G. acknowledges funding from Project Ris3CAT VEIS; S.B. acknowledges funding from RD-Connect, Seventh Framework Program grant #305444; J.K. acknowledges funding from Robertson Foundation; S.B. and A.J.B. acknowledge funding from Solve-RD, EU Horizon 2020 grant #779257; T.S. and S. Oesterle acknowledge funding from Swiss Institute of Bioinformatics (SIB) and Swiss Personalized Health Network (SPHN), supported by the Swiss State Secretariat for Education, Research and Innovation SERI; S.J.M.J. acknowledges funding from Terry Fox Research Institute; A.E.H., M.P.B., M. Cupak, M.F., and J.F. acknowledge funding from the Digital Technology Supercluster; D.F.V. acknowledges funding from the Australian Medical Research Future Fund, as part of the Genomics Health Futures Mission grant #76749; M. Baudis acknowledges funding from the BioMedIT Network project of Swiss Institute of Bioinformatics (SIB) and Swiss Personalized Health Network (SPHN); B.M.K. acknowledges funding from the Canada Research Chair in Law and Medicine and CIHR grant #SBD-163124; D.S., G.I.S., M.A.K., S.B., S.S., and T.H. Nyrönen acknowledge funding from the EU Horizon 2020 Beyond 1 Million Genomes (B1MG) Project grant #951724; P.F., A.D.Y., F.C., H.S., I.U.L., D. Gupta, M. Courtot, S.E.H., T. Burdett, T.M.K., and S.F. acknowledge funding from the European Molecular Biology Laboratory; Y.J. and S.E.W. acknowledge funding from the Government of Canada; P.G. acknowledges funding from the Government of Canada through Genome Canada and the Ontario Genomics Institute (OGI-206); J.Z. acknowledges funding from the Government of Ontario; C.K.Y. acknowledges funding from the Government of Ontario, Canada Foundation for Innovation; C. Viner and M.M.H. acknowledge funding from the Natural Sciences and Engineering Research Council of Canada (grant #RGPIN-2015-03948 to M.M.H. and Alexander Graham Bell Canada Graduate Scholarship to C.V.); K.K.-L. acknowledges funding from the Program for Integrated Database of Clinical and Genomic Information; J.K. acknowledges funding from the Robertson Foundation; D.F.V. acknowledges funding from the Victorian State Government through the Operational Infrastructure Support (OIS) Program; A.M.L., R.N., and H.V.F. acknowledge funding from Wellcome (collaborative award); F.C., H.S., P.F., and S.E.H. acknowledge funding from Wellcome Trust grant #108749/Z/15/Z; A.D.Y., H.S., I.U.L., M. Courtot, H.E.P., P.F., and T.M.K. acknowledge funding from Wellcome Trust grant #201535/Z/16/Z; A.M., J.K.B., R.J.M., R.M.D., and T.M.K. acknowledge funding from Wellcome Trust grant #206194; E.B., P.F., P.G., and S.F. acknowledge funding from Wellcome Trust grant #220544/Z/20/Z; A. Hamosh acknowledges funding from NIH NHGRI grant U41HG006627 and U54HG006542; J.S.H. acknowledges funding from National Taiwan University #91F701-45C and #109T098-02; the work of K.W.R. was supported by the Intramural Research Program of the National Library of Medicine, NIH. For the purpose of open access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. H.V.F. acknowledges funding from Wellcome Grant 200990/A/16/Z ‘Designing, developing and delivering integrated foundations for genomic medicine'.
Published: 2021
Full Text: View/download PDF

46. NCBI’s Virus Discovery Codeathon: Building 'FIVE' —The Federated Index of Viral Experiments API Index

Author: Christiam Camacho, Ben Busby, Evan Biederstedt, Benjamin J. Tully, David M. Kristensen, Nicholas P. Cooley, Ward Deboutte, Alexis L. Norris, Sierra D. Miller, Kiran Javkar, Cody Glickman, Alexandre Efremov, Ryan Connor, Anderson F. Brito, Nidhi Shah, Michael Muchow, Alejandro Rafael Gener, Bert Vanmechelen, Michael J. Tisza, Sejal Modha, Vadim Zalunin, Charles Pepe-Ranney, Jan P. Buchmann, Jake L. Weissman, Migun Shakya, Harihara Subrahmaniam Muralidharan, Surya Saha, Robert Edwards, Valerie C. Virta, Joan Martí-Carreras, James Rodney Brister, Wynn K. Meyer, and Anna K. Belford
Subjects: 0301 basic medicine, Computer science, DATABASE, data federation, lcsh:QR1-502, Genome, Viral, virus, Web Browser, Genome, Article, Virus, lcsh:Microbiology, User-Computer Interface, Viral Proteins, 03 medical and health sciences, 0302 clinical medicine, Data sequences, CRISPR, protein domain, metagenomics, genome graphs, HIV-1, Virology, RESOURCE, Databases, Genetic, Humans, Genome size, computer.programming_language, Information retrieval, Science & Technology, Computational Biology, Genetic Variation, Python (programming language), 030104 developmental biology, Infectious Diseases, Test case, Metagenomics, Host-Pathogen Interactions, Viruses, computer, Life Sciences & Biomedicine, 030217 neurology & neurosurgery
Abstract: Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus-host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE. ispartof: VIRUSES-BASEL vol:12 issue:12 ispartof: location:Switzerland status: published
Published: 2020

47. Correspondence pattern attribute selection for consumption of federated data sources.

Author: Walshe, Brian, Brennan, Rob, and O'Sullivan, Declan
Abstract: When consuming data from federated domains, it is often necessary to identify the relationships that exist between the data schemas used in each domain. Discovering the exact nature of these relationships is difficult due to data set schema heterogeneity. Prior work has focused on inter-domain class equivalence. However it is not always possible to find an equivalent class in both schemas. For example, when instances are modeled as classes in one domain (e.g. router type) but as the attribute values of a single class in the other domain (e.g. router interface). This paper investigates whether when classifying instances in one data set against a second schema, it may be more useful to use some attribute (or attribute group) other than the original class type, to perform this classification. A machine-learning based classification approach to appropriate attribute selection is presented and its operation is evaluated using two large data-sets available on the web as Linked Data. The classification problem is compounded by the less formal semantics of Linked Data when compared to full ontologies but this also highlights the strength of our approach to dealing with noisy or under-specified data-sets and schemas. The experimental results show that our attribute selection approach is capable of discovering appropriate mappings for cases where the correspondence is conditioned on one attribute and that information gain provides a suitable scoring function for selection of correspondence patterns to describe these complex attribute-based mappings. [ABSTRACT FROM PUBLISHER]
Published: 2012
Full Text: View/download PDF

48. Brede Tools and Federating Online Neuroinformatics Databases.

Author: Nielsen, Finn
Abstract: As open science neuroinformatics databases the Brede Database and Brede Wiki seek to make distribution and federation of their content as easy and transparent as possible. The databases rely on simple formats and allow other online tools to reuse their content. This paper describes the possible interconnections on different levels between the Brede tools and other databases. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

49. National COVID Cohort Collaborative (N3C) Institutional Review Board (IRB) protocol

Author: Christopher G Chute
Subjects: EHR, SARS-CoV-2, IRB, data sharing, data federation, COVID-19, electronic health record, institutional review board
Abstract: This document details N3C's Institutional Review Board (IRB) protocol. The National COVID Cohort Collaborative (N3C) proposes to establish a central registry of patients who have been tested for COVID or have a clinical diagnosis of COVID. This will be derived by harmonizing COVID clinical data extracted from the federated clinical repositories associated with the Common Data Model (CDM) programs, enumerated in this document. Creating the N3C registry of individual-level (containing information specific to individual patients, sometimes called row-level) data as a limited--albeit protected--dataset of EHR data at a national level will be unprecedented in US clinical research. It will support novel machine learning analytics and discovery of important predictors associated with emergency visits, hospitalizations, ICU transfer, ventilator dependency, and death, amongst a myriad of related outcomes. It will have the scale, statistical power, and computing platform to address most questions the clinical and research communities seek to answer.&nbsp
Published: 2020
Full Text: View/download PDF

50. Sharing Heterogeneous Data: The National Database for Autism Research.

Author: Hall, Dan, Huerta, Michael, McAuliffe, Matthew, and Farber, Gregory
Abstract: The National Database for Autism Research (NDAR) is a secure research data repository designed to promote scientific data sharing and collaboration among autism spectrum disorder investigators. The goal of the project is to accelerate scientific discovery through data sharing, data harmonization, and the reporting of research results. Data from over 25,000 research participants are available to qualified investigators through the NDAR portal. Summary information about the available data is available to everyone through that portal. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

114 results on '"Data federation"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources