537 results on '"Data standard"'
Search Results
52. Investigating metadata adoptions for open government data portals in US cities.
- Author
-
Xiao, Fanghui, Jeng, Wei, and He, Daqing
- Subjects
- *
METADATA , *WEB portals , *TRANSPARENCY in government , *GOVERNMENT accountability , *USER-centered system design , *OPEN Data Protocol - Abstract
Open government data (OGD) is a valuable resource for both policy transparency and government accountability. All levels of the United States government are working hard to promote open data and its portals. However, there is still a lack of studies on local‐level OGD portals in the United States, particularly on the quality of metadata adopted by these portals. By examining 200 US cities, a list of 112 local‐level portals is sampled and we investigate the current usages of open data platforms for building local‐level OGD portals. This study further investigates and discusses the adoption and potential issues of metadata on those OGD portals. Our result findings discuss the platform distributions among US local‐level OGD portals, and also highlight several critical issues associated with metadata on the portals. We anticipate the results will inspire further studies on identifying solutions to improve the metadata and enhance the usability of open government data portals. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
53. New Means of Data Collection and Accessibility
- Author
-
Faught, I. Charie, Aspevig, James, Spear, Rita, Magnuson, J.A., editor, and Fu, Jr., Paul C., editor
- Published
- 2014
- Full Text
- View/download PDF
54. Global marine biodiversity in the context of achieving the Aichi Targets: ways forward and addressing data gaps.
- Author
-
Saeedi, Hanieh, Reimer, James Davis, Brandt, Miriam I., Dumais, Philippe-Olivier, Jażdżewska, Anna Maria, Jeffery, Nicholas W., Thielen, Peter M., and Costello, Mark John
- Subjects
MARINE biodiversity ,SCIENTIFIC knowledge ,BIODIVERSITY ,KNOWLEDGE gap theory ,CONFERENCES & conventions - Abstract
In 2010, the Conference of the Parties of the Convention on Biological Diversity agreed on the Strategic Plan for Biodiversity 2011–2020 in Aichi Prefecture, Japan. As this plan approaches its end, we discussed whether marine biodiversity and prediction studies were nearing the Aichi Targets during the 4th World Conference on Marine Biodiversity held in Montreal, Canada in June 2018. This article summarises the outcome of a five-day group discussion on how global marine biodiversity studies should be focused further to better understand the patterns of biodiversity. We discussed and reviewed seven fundamental biodiversity priorities related to nine Aichi Targets focusing on global biodiversity discovery and predictions to improve and enhance biodiversity data standards (quantity and quality), tools and techniques, spatial and temporal scale framing, and stewardship and dissemination. We discuss how identifying biodiversity knowledge gaps and promoting efforts have and will reduce such gaps, including via the use of new databases, tools and technology, and how these resources could be improved in the future. The group recognised significant progress toward Target 19 in relation to scientific knowledge, but negligible progress with regard to Targets 6 to 13 which aimed to safeguard and reduce human impacts on biodiversity. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
55. Harmonized outcome measures for use in atrial fibrillation patient registries and clinical practice: Endorsed by the Heart Rhythm Society Board of Trustees.
- Author
-
Calkins, Hugh, Gliklich, Richard E., Leavy, Michelle B., Piccini, Jonathan P., Hsu, Jonathan C., Mohanty, Sanghamitra, Lewis, William, Nazarian, Saman, and Turakhia, Mintu P.
- Abstract
Background: Atrial fibrillation (AF) affects an estimated 33 million people worldwide, leading to increased mortality and an increased risk of heart failure and stroke. Many AF patient registries exist, but the ability to link and compare data across registries is hindered by differences in the outcome measures collected by each registry and a lack of harmonization.Objectives: The purpose of this project was to develop a minimum set of standardized outcome measures that could be collected in AF patient registries and clinical practice.Methods: AF patient registries were identified through multiple sources and invited to join the workgroup and submit outcome measures. Additional measures were identified through literature searches and reviews of consensus statements. Outcome measures were categorized using the Agency for Healthcare Research and Quality's supported Outcome Measures Framework (OMF). A minimum set of broadly relevant measures was identified. Measure definitions were harmonized through in-person and virtual meetings.Results: One hundred twelve outcome measures, including those from thirteen registries, were curated according to the OMF and then harmonized into a minimum set of measures in the OMF categories of survival (3 measures), clinical response (3 measures), events of interest (9 measures), patient-reported outcomes (2 measures), and resource utilization (3 measures). The harmonized definitions build on existing consensus statements.Conclusions: The harmonized measures represent a minimum set of outcomes that are relevant in AF research and clinical practice. Routine and consistent collection of these measures in registries and in other systems would support creation of a research infrastructure to efficiently address new questions and improve patient outcomes. [ABSTRACT FROM AUTHOR]- Published
- 2019
- Full Text
- View/download PDF
56. Harmonizing plot data with collection data.
- Author
-
Petersen, Mareike, Glöckler, Falko, and Hoffmann, Jana
- Subjects
ACQUISITION of data ,NATURAL history ,BIODIVERSITY monitoring ,ONTOLOGIES (Information retrieval) ,ECOLOGICAL research - Abstract
Although plot or monitoring data are quite often associated with objects collected in the plot and stored in specific collections, controlled vocabularies currently available do not cover both disciplines. This situation limits the possibility to publish common data sets and consequently brings a loss of significant information by combining plot-based research with collection object associated data. To facilitate the exchange and publication of these important data sets, experts in natural history collection data, ecological research, and environmental science met for a one-day workshop in Berlin. The participants discussed data standards and ontologies relevant for each discipline and collected requirements for a first application schema covering terms important for both, collection object related data and plot-based research. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
57. Access to Geosciences - Ways and Means to share and publish collection data.
- Author
-
Petersen, Mareike, Hoffmann, Jana, and Glöckler, Falko
- Subjects
EARTH sciences ,ACQUISITION of data ,NATURAL history ,SCIENTIFIC community ,PALEONTOLOGY - Abstract
Natural history collections are invaluable tools for various questions regarding biodiversity, environmental, and cultural studies. All object metadata thus need to be findable, reachable and interoperable for the scientific community and beyond. This requires a good structuration of data, appropriate exchange formats, and web sites or portals making all necessary information accessible. Collection managers, curators, and scientist from various institutions and nationalities were surveyed in order to understand the importance of open geoscientific collections for the respective holding institution and their daily work. In addition, particular requirements for the publication of geoscientific collection object metadata were gathered in a two-day workshop with international experts working with paleontological, mineralogical, petrological and meteorite collections. The survey and workshop revealed that common data standards are of crucial importance though insufficiently used by most institutions. The extent and type of information necessary for the publication and discussed during the workshop will be considered for domain specific application schema facilitating the publication and exchange of geoscientific object metadata. There is a high demand for comprehensive data portals covering all geoscientific disciplines. Gathered portal requirements will be taken into account when improving the already running GeoCASe aggregator platform. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
58. Data File Standard for Flow Cytometry, Version FCS 3.2.
- Author
-
Spidlen, Josef, Moore, Wayne, Parks, David, Goldberg, Michael, Blenman, Kim, Cavenaugh, James S., and Brinkman, Ryan
- Abstract
FCS 3.2 is a revision of the flow cytometry data standard based on a decade of suggested improvements from the community as well as industry needs to capture instrument conditions and measurement features more precisely. The unchanged goal of the standard is to provide a uniform file format that allows files created by one type of acquisition hardware and software to be analyzed by any other type. The standard retains the overall FCS file structure and most features of previous versions, but also contains a few changes that were required to support new types of data and use cases efficiently. These changes are incompatible with existing FCS file readers. Notably, FCS 3.2 supports mixed data types to, for example, allow FCS measurements that are intrinsically integers (e.g., indices or class assignments) or measurements that are commonly captured as integers (e.g., time ticks) to be more represented as integer values, while capturing other measurements as floating‐point values in the same FCS data set. In addition, keywords explicitly specifying dyes, detectors, and analytes were added to avoid having to extract those heuristically and unreliably from measurement names. Types of measurements were formalized, several keywords added, others removed, or deprecated, and various aspects of the specification were clarified. A reference implementation of the cyclic redundancy check (CRC) calculation is provided in two programming languages since a correct CRC implementation was problematic for many vendors. © 2020 International Society for Advancement of Cytometry [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
59. Challenges for the standardized reporting of NGS HLA genotyping: Surveying gaps between clinical and research laboratories
- Author
-
Robert P. Milius, Kazutoyo Osoegawa, Kalyan C. Mallempati, Marcelo Fernandez-Vina, Martin Maiers, Miranda Bauer, Gonzalo Montero-Martín, and Steven J. Mack
- Subjects
Hla genotype ,Genotyping Techniques ,Computer science ,Histocompatibility Testing ,Immunology ,High-Throughput Nucleotide Sequencing ,Sequence Analysis, DNA ,General Medicine ,Human leukocyte antigen ,Data science ,Turnaround time ,Article ,Histocompatibility ,Data Standard ,HLA Antigens ,Hla genotyping ,Immunogenetics ,Humans ,Immunology and Allergy ,Data reporting ,Laboratories ,Genotyping ,Software - Abstract
Next generation sequencing (NGS) is being applied for HLA typing in research and clinical settings. NGS HLA typing has made it feasible to sequence exons, introns and untranslated regions simultaneously, with significantly reduced labor and reagent cost per sample, rapid turnaround time, and improved HLA genotype accuracy. NGS technologies bring challenges for cost-effective computation, data processing and exchange of NGS-based HLA data. To address these challenges, guidelines and specifications such as Genotype List (GL) String, Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING), and Histoimmunogenetics Markup Language (HML) were proposed to streamline and standardize reporting of HLA genotypes. As part of the 17th International HLA and Immunogenetics Workshop (IHIW), we implemented standards and systems for HLA genotype reporting that included GL String, MIRING and HML, and found that misunderstanding or misinterpretations of these standards led to inconsistencies in the reporting of NGS HLA genotyping results. This may be due in part to a historical lack of centralized data reporting standards in the histocompatibility and immunogenetics community. We have worked with software and database developers, clinicians and scientists to address these issues in a collaborative fashion as part of the Data Standard Hackathons (DaSH) for NGS. Here we report several categories of challenges to the consistent exchange of NGS HLA genotyping data we have observed. We hope to address these challenges in future DaSH for NGS efforts.
- Published
- 2021
60. SOME ASPECTS OF USING PASSENGER DATA (API/PNR) IN SUPPORT OF BORDER SECURITY
- Subjects
Data collection ,Passenger information ,Control (management) ,Legislation ,Passenger information system ,Computer security ,computer.software_genre ,Data Standard ,Data exchange ,General Earth and Planetary Sciences ,Confidentiality ,Business ,computer ,General Environmental Science - Abstract
The article reveals with the logic, mechanism and main parameters for using passenger data (API/PNR). The interrelation of the concepts of “Advance Passenger Information” and “Passenger Name Record” has been revealed. It has been determined that the systems of simplification of formalities also have a positive effect on security. For example, the purpose of data collection in the EU is to effectively combat illegal migration and strengthen border control, as well as to prevent, detect, investigate and prosecute terrorist and serious crimes. This applies to both the Passenger Information System (API) and additional passenger data, such as Passenger Name Record (PNR) data. There is no doubt that the successful application of API and PNR data exchange depends on a unified approach of all participants in information relations (both border agencies and airlines in different countries) regarding the question of data standard and standard of their transmission. Ensuring such a unified approach is a problem even when there is unity of legal regulation on data format and data exchange procedures. There are no special regulations in the national legislation that would regulate the relationship regarding the circulation of passenger registration data, nor is there a specific entity that processes this type of information. Thus, there is a need for special legal regulation of the circulation of information about airline passengers crossing the state border of Ukraine. In particular, the legislation must regulate the procedure, grounds, purpose of obtaining, processing, transmission, storage and destruction of information about passengers (its content), protection of the right to confidential information of persons operating international flights, establish a responsible controller of such information.
- Published
- 2021
61. DATA INTEROPERABILITY OF BUILDING INFORMATION MODELING AND GEOGRAPHIC INFORMATION SYSTEM IN CONSTRUCTION INDUSTRY
- Author
-
S. Azri, W. N. F. W. A. Basir, Zulkepli Majid, and Uznir Ujang
- Subjects
Technology ,Geographic information system ,Geospatial analysis ,Process (engineering) ,business.industry ,Computer science ,Engineering (General). Civil engineering (General) ,computer.software_genre ,Automation ,GeneralLiterature_MISCELLANEOUS ,Construction engineering ,TA1501-1820 ,Data Standard ,Building information modeling ,Industry Foundation Classes ,Applied optics. Photonics ,TA1-2040 ,business ,computer ,Data integration - Abstract
Application of Building Information Modeling (BIM) in construction industry has been applied for many years back. This because BIM can provide a better advantage in construction industry in term of controlling and managing construction project during their life cycle. The advantages that can be provide by BIM is focusing on the indoor planning tasks. But, when the construction project involves, besides indoor planning, outdoor planning also is important part that need to be look up. To cover the outdoor planning in construction project, Geographic Information System (GIS) need to be applied. GIS can overcome this problem because GIS mainly for outdoor planning by using their spatial analysis. GIS can offer a high degree of geospatial information and can provide the detailed geometrical and semantic information of building to assisted across improve automation. Towards produce the improved preparation in construction project, BIM and GIS should be integrated. To integrate both domains, the data interoperability between them need to be investigate because they used different data standard. This study focusses on solving the data interoperability through the data integration between BIM and GIS to solve the problem of data mismatch and data missing during data translation process. Industry Foundation Classes (IFC) was used as a data standard to perform the data integration between BIM and GIS. The outcomes from this study show that when the data interoperability applied between BIM and GIS, the problem above can be solved, and the data dimension and their coordinate system also can be control.
- Published
- 2021
62. Development of Common Data Standard for Airports Facilities based on Building Information Modeling (BIM)
- Author
-
Kee-Woong Kim, Youn-Chul Choi, and Euisoo Jung
- Subjects
Data Standard ,Development (topology) ,Building information modeling ,Computer science ,business.industry ,business ,Construction engineering - Published
- 2021
63. The Need for Data Standards and Implementation Policies to Integrate Insulin Delivery Data Into the Electronic Health Record.
- Author
-
Espinoza JC, Yeung AM, Huang J, Seley JJ, Longo R, and Klonoff DC
- Subjects
- Humans, Insulin, Blood Glucose Self-Monitoring, Blood Glucose, Insulin, Regular, Human, Electronic Health Records, Diabetes Mellitus drug therapy
- Abstract
Integration of insulin dosing data into the electronic health record (EHR), combined with other patient-generated health care data, would facilitate the use of wirelessly connected insulin delivery systems, including smart insulin pens, insulin pumps, and advanced hybrid closed-loop systems. In 2022, Diabetes Technology Society developed the Integration of Continuous Glucose Monitoring Data into the EHR (iCoDE) Project, which is the first consensus standard for integrating data from a wearable device into the EHR. The iCoDE Standard is a comprehensive guide for any health care delivery organization or hospital for automatically integrating continuous glucose monitoring data into the EHR. Diabetes Technology Society is following iCoDE with the Integration of Connected Diabetes Device Data into the EHR (iCoDE-2) Project, to similarly provide guidance for integrating insulin delivery data into the EHR alongside continuous glucose monitoring data.
- Published
- 2023
- Full Text
- View/download PDF
64. Data Quality- and Utility-Compliant Anonymization of Common Data Model-Harmonized Electronic Health Record Data: Protocol for a Scoping Review.
- Author
-
Kamdje Wabo G, Prasser F, Gierend K, Siegel F, and Ganslandt T
- Abstract
Background: The anonymization of Common Data Model (CDM)-converted EHR data is essential to ensure the data privacy in the use of harmonized health care data. However, applying data anonymization techniques can significantly affect many properties of the resulting data sets and thus biases research results. Few studies have reviewed these applications with a reflection of approaches to manage data utility and quality concerns in the context of CDM-formatted health care data., Objective: Our intended scoping review aims to identify and describe (1) how formal anonymization methods are carried out with CDM-converted health care data, (2) how data quality and utility concerns are considered, and (3) how the various CDMs differ in terms of their suitability for recording anonymized data., Methods: The planned scoping review is based on the framework of Arksey and O'Malley. By using this, only articles published in English will be included. The retrieval of literature items should be based on a literature search string combining keywords related to data anonymization, CDM standards, and data quality assessment. The proposed literature search query should be validated by a librarian, accompanied by manual searches to include further informal sources. Eligible articles will first undergo a deduplication step, followed by the screening of titles. Second, a full-text reading will allow the 2 reviewers involved to reach the final decision about article selection, while a domain expert will support the resolution of citation selection conflicts. Additionally, key information will be extracted, categorized, summarized, and analyzed by using a proposed template into an iterative process. Tabular and graphical analyses should be addressed in alignment with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) checklist. We also performed some tentative searches on Web of Science for estimating the feasibility of reaching eligible articles., Results: Tentative searches on Web of Science resulted in 507 nonduplicated matches, suggesting the availability of (potential) relevant articles. Further analysis and selection steps will allow us to derive a final literature set. Furthermore, the completion of this scoping review study is expected by the end of the fourth quarter of 2023., Conclusions: Outlining the approaches of applying formal anonymization methods on CDM-formatted health care data while taking into account data quality and utility concerns should provide useful insights to understand the existing approaches and future research direction based on identified gaps. This protocol describes a schedule to perform a scoping review, which should support the conduction of follow-up investigations., International Registered Report Identifier (irrid): PRR1-10.2196/46471., (©Gaetan Kamdje Wabo, Fabian Prasser, Kerstin Gierend, Fabian Siegel, Thomas Ganslandt. Originally published in JMIR Research Protocols (https://www.researchprotocols.org), 11.08.2023.)
- Published
- 2023
- Full Text
- View/download PDF
65. Building Essential Biodiversity Variable netCDFs with the ebvcube R Package
- Author
-
Luise Quoß, Néstor Fernández, Christian Langer, Jose Valdez, Miguel Alejandro Fernández, and Henrique Pereira
- Subjects
monitoring ,EBV ,data portal ,data standard ,interoperability ,GEO BON ,General Medicine ,FAIR - Abstract
The concept of Essential Biodiversity Variables (EBVs) was conceived to study, report, and manage biodiversity change. The EBV netCDF structure was developed in order to support publication and interoperability of biodiversity data. This standard is based on the Network Common Data Format (netCDF). Additionally, it follows the Climate and Forecast Conventions (CF, version 1.8) and the Attribute Convention for Data Discovery (ACDD, version 1.3). The standard allows several datacubes per netCDF file (see Fig. 1). These cubes have four dimensions: longitude, latitude, time and entity, whereby the last dimension can, for example, encompass different species or groups of species, ecosystem types or other aspects. The usage of hierarchical groups enables the coexistence of multiple EBV cubes (see Fig. 2). The first level (netCDF group) are scenarios, e.g., the modelling for different Shared Socioeconomic Pathways (SSP) scenarios. The second level (netCDF group) are metrics, e.g., the percentage of protected area per pixel and its proportional loss over a certain time span per pixel. All metrics are repeated per scenario, if any are present. The result is a rather complex raster dataset (see example dataset in Fig. 3). This is where the ebvcube R package comes into play. This R package enables scientists to create their own netCDFs in the EBV cube standard. Its functionality covers the creation, opening/reading and visualizing the EBV netCDFs. The ebvcube package is part of the overall EBV infrastructure and works together with the EBV Data Portal. Users can work with the downloaded EBV netCDFs or upload their own EBV netCDFs to the portal. Generally, the package aims to condense the output for the users and assist in the understanding of the file structure to overcome the complexity. The output is reduced to the necessary information, e.g., not displaying coordinate variables or any technical attributes. Moreover, functionality for a quick data exploration is implemented.
- Published
- 2022
66. Establishment of Marine and Coastal Spatial Data Infrastructure in Indonesia
- Author
-
Wahyu, Suwahyuono, Pramono, Gatot H., Purnawan, Bebas, and Green, D.R., editor
- Published
- 2010
- Full Text
- View/download PDF
67. flowIO: Flow cytometry standard conformance testing, editing, and export tool.
- Author
-
Koblížek, Miroslav, Lebedeva, Anastasia, and Fišer, Karel
- Abstract
Abstract: The Flow Cytometry Standard (FCS) format is a widely accepted norm for storing Flow Cytometry (FCM) data. Its goal as a standard is to allow FCM data sharing and re‐analysis. Over more than three decades of its existence FCS has evolved into a well‐defined, flexible file format reflecting technical changes in the FCM field. Its flexibility as well as rising numbers of instrument vendors leads to suboptimal implementations of FCS in some cases. Such situations compromise the primary goal of the standard and hinder the ability to reproduce FCM analyses. It is further underlined by rapid rise of advanced FCM analyses, often carried out outside traditional software tools and heavily relying on standard data storage and presentation. We have developed flowIO, an R package which tests FCS file conformance with the standard as defined by International Society for Advancement of Cytometry (ISAC) normative. Along with the package we provide a web based application (also at http://bioinformin.cesnet.cz/flowIO/) allowing user friendly access to the conformance testing as well as FCS file editing and export for further analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
68. Brazilian Network on Plant-Pollinator Interactions: an update on the initiative of a standard for plantpollinator interactions data.
- Author
-
Augusto Salim, José, Mauro Saraiva, Antonio, Agostini, Kayna, Wolowski, Marina, Veiga, Allan, Saragiotto Silva, Juliana, and Carvalheiro, Luisa G.
- Subjects
POLLINATORS ,BIODIVERSITY conservation ,INFORMATION sharing - Abstract
The Brazilian Plant-Pollinator Interactions Network*1 (REBIPP) aims to develop scientific and teaching activities in plant-pollinator interaction. The main goals of the network are to: 1. generate a diagnosis of plant-pollinator interactions in Brazil; 2. integrate knowledge in pollination of natural, agricultural, urban and restored areas; 3. identify knowledge gaps; 4. support public policy guidelines aimed at the conservation of biodiversity and ecosystem services for pollination and food production; 5. and encourage collaborative studies among REBIPP participants. To achieve these goals the group has resumed and built on previous works in data standard definition done under the auspices of the IABIN-PTN (Etienne Américo et al. 2007) and FAO (Saraiva et al. 2010) projects (Saraiva et al. 2017). The ultimate goal is to standardize the ways data on plant-pollinator interactions are digitized, to facilitate data sharing and aggregation. A database will be built with standardized data from Brazilian researchers members of the network to be used by the national community, and to allow sharing data with data aggregators. To achieve those goals three task groups of specialists with similar interests and background (e.g botanists, zoologists, pollination biologists) have been created. Each group is working on the definition of the terms to describe plants, pollinators and their interactions. The glossary created explains their meaning, trying to map the suggested terms into Darwin Core (DwC) terms, and following the TDWG Standards Documentation Standard*2 in definition. Reaching a consensus on terms and their meaning among members of each group is challenging, since researchers have different views and concerns about which data are important to be included into a standard. That reflects the variety of research questions that underlie different projects and the data they collect. Thus, we ended up having a long list of terms, many of them useful only in very specialized research protocols and experiments, sometimes rarely collected or measured. Nevertheless we opted to maintain a very comprehensive set of terms, so that a large number of researchers feel that the standard meets their needs and that the databases based on it are a suitable place to store their data, thus encouraging the adoption of the data standard. An update of the work will soon be available at REBIPP website and will be open for comments and contributions. This proposal of a data standard is also being discussed within the TDWG Biological Interaction Data Interest Group*3 in order to propose an international standard for species interaction data. The importance of interaction data for guiding conservation practices and ecosystem services provision management has led to the proposal of defining Essential Biodiversity Variables (EBVs) related to biological interactions. Essential Biodiversity Variables (Pereira et al. 2013) were developed to identify key measurements that are required to monitoring biodiversity change. EBVs act as intermediate abstract layer between primary observations (raw data) and indicators (Niemeijer 2002). Five EBV classes have been defined in an initial stage: genetic composition, species populations, species traits, community composition, ecosystem function and ecosystem structure. Each EBV class defines a list of candidate EBVs for biodiversity change monitoring (Fig. 1). Consequently, digitalization of such data and making them available online are essential. Differences in sampling protocols may affect data scalability across space and time, hence imposing barriers to the full use of primary data and EBVs calculation (Henry et al. 2008). Thus, common protocols and methods should be adopted as the most straightforward approach to promote integration of collected data and to allow calculation of EBVs (Jürgens et al. 2011). Recently a Workshop was held by GLOBIS B*4 (GLOBal Infrastructures for Supporting Biodiversity research) to discuss Species Interactions EBVs (February, 26-28, Bari, Italy). Plant-pollinator interactions deserved a lot of attention and REBIPP's work was presented there. As an outcome we expect to define specific EBVs for interactions, and use plant- pollinators as an example, considering pairwise interactions as well as interaction network related variables. The terms in the plant-pollinator data standard under discussion at REBIPP will provide information not only on EBV related with interactions, but also on other four EBV classes: species populations, species traits, community composition, ecosystem function and ecosystem structure. As we said, some EBVs for specific ecosystem functions (e.g. pollination) lay beyond interactions network structures. The EBV 'Species interactions' (EBV class 'Community composition') should incorporate other aspects such as frequency (Vázquez et al. 2005), duration and empirical estimates of interaction strengths (Berlow et al. 2004). Overall, we think the proposed plant-pollinator interaction data standard which is currently being developed by REBIPP will contribute to data aggregation, filling many data gaps and can also provide indicators for long-term monitoring, being an essential source of data for EBVs. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
69. Data format standards in analytical chemistry
- Author
-
Rauh, D., Blankenburg, C., Fischer, T.G., Jung, N., Kuhn, S., Schatzschneider, U., Schulze, Tobias, Neumann, S., Rauh, D., Blankenburg, C., Fischer, T.G., Jung, N., Kuhn, S., Schatzschneider, U., Schulze, Tobias, and Neumann, S.
- Abstract
Research data is an essential part of research and almost every publication in chemistry. The data itself can be valuable for reuse if sustainably deposited, annotated and archived. Thus, it is important to publish data following the FAIR principles, to make it findable, accessible, interoperable and reusable not only for humans but also in machine-readable form. This also improves transparency and reproducibility of research findings and fosters analytical work with scientific data to generate new insights, being only accessible with manifold and diverse datasets. Research data requires complete and informative metadata and use of open data formats to obtain interoperable data. Generic data formats like AnIML and JCAMP-DX have been used for many applications. Special formats for some analytical methods are already accepted, like mzML for mass spectrometry or nmrML and NMReDATA for NMR spectroscopy data. Other methods still lack common standards for data. Only a joint effort of chemists, instrument and software vendors, publishers and infrastructure maintainers can make sure that the analytical data will be of value in the future. In this review, we describe existing data formats in analytical chemistry and introduce guidelines for the development and use of standardized and open data formats.
- Published
- 2022
70. Towards efficient use of data, models and tools in food microbiology
- Author
-
Filter, Matthias, Nauta, Maarten, Pires, Sara M., Guillier, Laurent, Buschhardt, Tasja, Filter, Matthias, Nauta, Maarten, Pires, Sara M., Guillier, Laurent, and Buschhardt, Tasja
- Abstract
Food microbiology researchers, risk assessment agencies and food business operators rely heavily on the reuse of knowledge that is available as data, models and tools. Unfortunately, such knowledge reuse remains challenging, as food safety data sets, models and tools are usually only available in platform-dependent or software-dependent formats that rarely comply to the Findability, Accessibility, Interoperability, and Reusability data principles. In recent years, the Risk Assessment Modelling and Knowledge Integration Platform (RAKIP) Initiative developed the so-called Food Safety Knowledge Exchange (FSKX) format. This development was accompanied by the creation of open-source software that facilitates the adoption of FSKX. Future work within RAKIP will focus on creating semantic interoperability in FSKX-related solutions and on the extension of the FSKX format towards other food microbiology knowledge.
- Published
- 2022
71. Storing, Searching, and Disseminating Experimental Proteomics Data
- Author
-
Paton, Norman W., Jones, Andrew R., Garwood, Chris, Garwood, Kevin, Oliver, Stephen, and Choi, Sangdun, editor
- Published
- 2007
- Full Text
- View/download PDF
72. 1st joint workshop LIFE MICA - LIFE RIPARIAS: Towards a data standard for reporting on IAS management interventions. Agenda and final report
- Author
-
Oldoni, Damiano, Adriaens, Tim, Cartuyvels, Emma, Fromme, Lilja, Gethöffer, Friederike, Maistrelli, Claudia, Reyserhove, Lien, and Vermeersch, Xavier
- Subjects
EU IAS Regulation ,Invasive Alien Species ,wildlife ,data standard ,field management - Abstract
This workshop report reflects the discussions and outputs of the first joint LIFE MICA - LIFE RIPARIAS workshop, organized virtually on March 25, 2022. This workshop represented the kick-off of a longer term process towards the development of a data exchange format for reporting on the management of Invasive Alien Species. The aim of the workshop was to discuss the reporting on management interventions against invasive alien species (IAS), building on the experiences of project managers. Participants to the workshop were scientists, practitioners, authorities and other parties involved in IAS management. Break-out groups discussed requirements for reporting on IAs mangement, relevant terms and available controlled vocabularies for them. Presentations at the workshop are available upon request with their respective authors., The workshop was performed within the framework of the LIFE projects RIPARIAS: Reaching Integrated and Prompt Action in Response to Invasive Alien Species (LIFE19 NAT/BE/000953) and MICA: Management of Invasive Coypu and muskrAt in Europe (LIFE18 NAT/NL/001047), co-funded by the the European Commission LIFE programme.
- Published
- 2022
- Full Text
- View/download PDF
73. geoChronR – an R package to model, analyze, and visualize age-uncertain data
- Author
-
Julien Emile-Geay, Nicholas P. McKay, and Deborah Khider
- Subjects
010506 paleontology ,010504 meteorology & atmospheric sciences ,Uncertain data ,Computer science ,business.industry ,lcsh:QE1-996.5 ,General Engineering ,Transparency (human–computer interaction) ,01 natural sciences ,Data science ,Regression ,Data Standard ,lcsh:Geology ,Software ,lcsh:Stratigraphy ,Principal component analysis ,Use case ,business ,0105 earth and related environmental sciences ,Reusability ,lcsh:QE640-699 - Abstract
Chronological uncertainty is a hallmark of the paleoenvironmental sciences and geosciences. While many tools have been made available to researchers to quantify age uncertainties suitable for various settings and assumptions, disparate tools and output formats often discourage integrative approaches. In addition, associated tasks like propagating age-model uncertainties to subsequent analyses, and visualizing the results, have received comparatively little attention in the literature and available software. Here, we describe geoChronR, an open-source R package to facilitate these tasks. geoChronR is built around an emerging data standard (Linked PaleoData, or LiPD) and offers access to four popular age-modeling techniques (Bacon, BChron, OxCal, BAM). The output of these models is used to conduct ensemble data analysis, quantifying the impact of chronological uncertainties on common analyses like correlation, regression, principal component, and spectral analyses by repeating the analysis across a large collection of plausible age models. We present five real-world use cases to illustrate how geoChronR may be used to facilitate these tasks, visualize the results in intuitive ways, and store the results for further analysis, promoting transparency and reusability.
- Published
- 2021
74. Design of Data Model for 3D Geospatial Information-based Highway Management using LandInfra Standard
- Author
-
Shin, Sung Pil and Munkhbaatar Buuveibaatar
- Subjects
Data Standard ,Information management ,Geospatial analysis ,Database ,Data model ,Computer science ,Management system ,Database schema ,Pavement management ,Context (language use) ,computer.software_genre ,computer - Abstract
PURPOSES : The purpose of this study is to contribute to the utilization of standards while considering the possible upgrade of a local system as a subject of the application. Therefore, this study aims to explore the possible application of LandInfra for a local road management (maintenance) system in the context of enabling the basis of 3D geospatial road information management in Korea. METHODS : Based on a review of related literature and international standards, an analysis of the current system is performed. After reviewing the LandInfra standard, an examination of corresponding classes between each data model (HMS and LandInfra) is performed for the mapping process. After the mapping process, a data model of the LandInfra-based HMS pavement data model is proposed. RESULTS : To apply the LandInfa to the HMS pavement part, an examination of each data model is performed. After this procedure, a LandInfra-based HMS pavement database schema is proposed in the context of enabling 3D geospatial road information management and maintenance, particularly for pavement management information. CONCLUSIONS : This paper presents how the LandInfra international open geospatial standard can be applied to the local road management system (HMS pavement part). As a result of this study, the LandInfra standard could be applied to the HMS; however, an encoding of the standard is required for conformance. Thus, further studies would be the encoding of the proposed data model for conformance with InfaGML encoding standards. In addition, a system prototype may be needed for complete application.
- Published
- 2021
75. Global Substance Registration System: consistent scientific descriptions for substances related to health
- Author
-
Daniel Katzel, Frank Switzer, Tyler Peryea, Noel Southall, Ramez Ghazzaoui, Archana Newatia, Mitch Miller, Herman Diederik, Ðắc-Trung Nguyễn, Elaine Johanson, Jorge Neyra, Larry Callahan, Sarah Stemann, Niko Anderson, and Dammika Amugoda
- Subjects
Prescription Drugs ,Databases, Factual ,Databases, Pharmaceutical ,Polymers ,Datasets as Topic ,Registration system ,Translational research ,Biology ,01 natural sciences ,Xenobiotics ,Small Molecule Libraries ,World Wide Web ,Food and drug administration ,03 medical and health sciences ,Nucleic Acids ,Marketed products ,Genetics ,Database Issue ,Humans ,030304 developmental biology ,Biological Products ,Internet ,0303 health sciences ,United States Food and Drug Administration ,010405 organic chemistry ,Proteins ,Drugs, Investigational ,Public benefit ,United States ,0104 chemical sciences ,Data Standard ,Identification (information) ,Public Health ,Translational science ,Databases, Chemical ,Software - Abstract
The US Food and Drug Administration (FDA) and the National Center for Advancing Translational Sciences (NCATS) have collaborated to publish rigorous scientific descriptions of substances relevant to regulated products. The FDA has adopted the global ISO 11238 data standard for the identification of substances in medicinal products and has populated a database to organize the agency's regulatory submissions and marketed products data. NCATS has worked with FDA to develop the Global Substance Registration System (GSRS) and produce a non-proprietary version of the database for public benefit. In 2019, more than half of all new drugs in clinical development were proteins, nucleic acid therapeutics, polymer products, structurally diverse natural products or cellular therapies. While multiple databases of small molecule chemical structures are available, this resource is unique in its application of regulatory standards for the identification of medicinal substances and its robust support for other substances in addition to small molecules. This public, manually curated dataset provides unique ingredient identifiers (UNIIs) and detailed descriptions for over 100 000 substances that are particularly relevant to medicine and translational research. The dataset can be accessed and queried at https://gsrs.ncats.nih.gov/app/substances.
- Published
- 2020
76. The Neurodata Without Borders ecosystem for neurophysiological data science
- Author
-
Satrajit Ghosh, Benjamin K Dichter, Ryan Ly, Andrew Tritt, Oliver Rübel, Lawrence Niu, Pamela Baker, Ivan Soltesz, Lydia Ng, Karel Svoboda, Loren Frank, and Kristofer E Bouchard
- Subjects
data ecosystem ,FAIR data ,Metadata ,archive ,Mouse ,General Immunology and Microbiology ,General Neuroscience ,Data Science ,Neurosciences ,Neurophysiology ,General Medicine ,General Biochemistry, Genetics and Molecular Biology ,neuroscience ,Networking and Information Technology R&D (NITRD) ,data standard ,data language ,Rat ,Humans ,human ,Biochemistry and Cell Biology ,Ecosystem ,Software - Abstract
The neurophysiology of cells and tissues are monitored electrophysiologically and optically in diverse experiments and species, ranging from flies to humans. Understanding the brain requires integration of data across this diversity, and thus these data must be findable, accessible, interoperable, and reusable (FAIR). This requires a standard language for data and metadata that can coevolve with neuroscience. We describe design and implementation principles for a language for neurophysiology data. Our open-source software (Neurodata Without Borders, NWB) defines and modularizes the interdependent, yet separable, components of a data language. We demonstrate NWB's impact through unified description of neurophysiology data across diverse modalities and species. NWB exists in an ecosystem, which includes data management, analysis, visualization, and archive tools. Thus, the NWB data language enables reproduction, interchange, and reuse of diverse neurophysiology data. More broadly, the design principles of NWB are generally applicable to enhance discovery across biology through data FAIRness.The brain is an immensely complex organ which regulates many of the behaviors that animals need to survive. To understand how the brain works, scientists monitor and record brain activity under different conditions using a variety of experimental techniques. These neurophysiological studies are often conducted on multiple types of cells in the brain as well as a variety of species, ranging from mice to flies, or even frogs and worms. Such a range of approaches provides us with highly informative, complementary ‘views’ of the brain. However, to form a complete, coherent picture of how the brain works, scientists need to be able to integrate all the data from these different experiments. For this to happen effectively, neurophysiology data need to meet certain criteria: namely, they must be findable, accessible, interoperable, and re-usable (or FAIR for short). However, the sheer diversity of neurophysiology experiments impedes the ‘FAIR’-ness of the information obtained from them. To overcome this problem, researchers need a standardized way to communicate their experiments and share their results – in other words, a ‘standard language’ to describe neurophysiology data. Rübel, Tritt, Ly, Dichter, Ghosh et al. therefore set out to create such a language that was not only FAIR, but could also co-evolve with neurophysiology research. First, they produced a computer software program (called Neurodata Without Borders, or NWB for short) which generated and defined the different components of the new standard language. Then, other tools for data management were created to expand the NWB platform using the standardized language. This included data analysis and visualization methods, as well as an ‘archive’ to store and access data. Testing the new language and associated tools showed that they indeed allowed researchers to access, analyze, and share information from many different types of experiments, in organisms ranging from flies to humans. The NWB software is open-source, meaning that anyone can obtain a copy and make changes to it. Thus, NWB and its associated resources provide the basis for a collaborative, community-based system for sharing neurophysiology data. Rübel et al. hope that NWB will inspire similar developments across other fields of biology that share similar levels of complexity with neurophysiology.
- Published
- 2022
77. Towards efficiency in rare disease research: what is distinctive and important?
- Author
-
Jia, Jinmeng and Shi, Tieliu
- Abstract
Characterized by their low prevalence, rare diseases are often chronically debilitating or life threatening. Despite their low prevalence, the aggregate number of individuals suffering from a rare disease is estimated to be nearly 400 million worldwide. Over the past decades, efforts from researchers, clinicians, and pharmaceutical industries have been focused on both the diagnosis and therapy of rare diseases. However, because of the lack of data and medical records for individual rare diseases and the high cost of orphan drug development, only limited progress has been achieved. In recent years, the rapid development of next-generation sequencing (NGS)-based technologies, as well as the popularity of precision medicine has facilitated a better understanding of rare diseases and their molecular etiology. As a result, molecular subclassification can be identified within each disease more clearly, significantly improving diagnostic accuracy. However, providing appropriate care for patients with rare diseases is still an enormous challenge. In this review, we provide a brief introduction to the challenges of rare disease research and make suggestions on where and how our efforts should be focused. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
78. Brazilian Plant-Pollinator Interactions Network: definition of a data standard for digitization, sharing, and aggregation of plant-pollinator interaction data.
- Author
-
Saraiva, Antonio M., Salim, José A., Agostini, Kayna, Wolowski, Marina, Silva, Juliana Saragiotto, Veiga, Allan K., and de Carvalho Albertini, Bruno
- Subjects
BIODIVERSITY conservation ,POLLINATION ,DIGITIZATION ,ECOSYSTEMS ,SUSTAINABLE agriculture - Abstract
Pollination is considered one of the most important processes for biodiversity conservation (Kremen 2005). Recently, the global community, by means of the Intergovernmental Platform of Biodiversity and Ecosystems Services (IPBES 2016), and also, the Convention on Biological Diversity (CBD 2002) recognized the importance of plant-pollinator interactions for ecosystems functioning and sustainable agriculture. The conservation of pollination depends of information about plant-pollinator interactions covering a great diversity of functional and taxonomic groups. Studies show that successful pollination can improve the amount and the quality of plant fecundation and fruit production (Kevan and Imperatriz-Fonseca 2002). However, the success of these actions depends on the knowledge on pollinators, their conservation and interactions with plants and the environment. In order to conserve and manage it, more information needs to be captured about plant-pollinator interactions. Primary data about pollinators is becoming increasingly available online and can be accessed at a number of websites and portals. Many initiatives have also been created to facilitate and to stimulate the dissemination of pollination data, and examples are the Inter- American Biodiversity Information Network - Pollinators Thematic Network - IABIN-PTN (w ww.biocomp.org.br/iabinptn) and the WebBee (www.webbee.org.br) (Saraiva et al. 2003). One important aspect of this trend is the strong reliance on standardized data schemas and protocols (e.g. Darwin Core - DwC and TDWG Access Protocol for Information Retrieval - TAPIR, respectively) that allow us to share and aggregate biological data, among which pollinator data are included. Although plant-pollinator interaction data are critically important to our understanding of the role, importance and effectiveness of (potential) pollinators, they cannot be adequately represented by the current standards for occurrence data (such as DwC). The ways that interaction data are recorded and stored worldwide, as well as their intended use are very diverse. They lack of a common protocol and data schema, that will allow us to aggregate them in web portals and eventually use them to build decision support systems for conservation and sustainable use in agriculture, needs to be addressed. The IABIN-PTN adopted a simple solution to characterize and digitalize plant-pollinator interaction data based on DwC (Cartolano Júnior et al. 2007), allowing the digitalization of many Latin-american collections. Following that work, the Food and Agriculture Organization of the United Nations (FAO) produced a detailed survey of potential descriptors of plant-pollinators interactions. Although the ultimate goal of that work was to propose a data standard, that did not evolve (Cavalheiro et al. 2016). The FAO Global pollination project, adopted in Brazil the same simplified model used by IABIN to digitize plant-pollinator interaction data (Saraiva et al. 2010). Recently many Brazilian scientists gathered around the Brazilian Plant-Pollinator Interactions Network (REBIPP - www.rebipp.org.br) with the aim of developing scientific and teaching activities in the field. The main goals of the network are: generate a diagnosis of plant-pollinator interactions in Brazil; integrate knowledge in pollination of natural, agricultural, urban and restored areas; identify knowledge gaps; support public policy guidelines aimed at the conservation of biodiversity and ecosystem services for pollination and food production; and encourage collaborative studies among REBIPP participants. To achieve these goals the group has resumed those previous works done under the auspices of the IABIN and FAO projects, and a data standard is being discussed. The ultimate goal is to adopt a standard and develop a database of plant-pollinator data in Brazil to be used by the national community. This proposal of a data standard (depicted in Fig. 1) can serve as a starting point for the definition of a global data standard for plantpollinator interactions under the TDWG umbrella. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
79. A comprehensive open package format for preservation and distribution of geospatial data and metadata.
- Author
-
Pons, X. and Masó, J.
- Subjects
- *
GEOSPATIAL data , *METADATA , *GEOGRAPHIC information systems , *COMPUTER software , *COMPUTER files - Abstract
The complexities of the intricate geospatial resources and formats make preservation and distribution of GIS data difficult even among experts. The proliferation of, for instance, KML, Internet map services, etc, reflects the need for sharing geodata but a comprehensive solution when having to deal with data and metadata of a certain complexity is not currently provided. Original geospatial data is usually divided into several parts to record its different aspects (spatial and thematic features, etc), plus additional files containing, metadata, symbolization specifications and tables, etc; these parts are encoded in different formats, both standard and proprietary. To simplify data access, software providers encourage the use of an additional element that we call generically “map project”, and this contains links to other parts (local or remote). Consequently, in order to distribute the data and metadata refereed by the map in a complete way, or to apply the Open Archival Information System (OAIS) standard to preserve it for the future, we need to face the multipart problem. This paper proposes a package allowing the distribution of real (comprehensive although diverse and complex) GIS data over the Internet and for data preservation. This proposal, complemented with the right tools, hides but keeps the multipart structure, so providing a simpler but professional user experience. Several packaging strategies are reviewed in the paper, and a solution based on ISO 29500-2 standard is chosen. The solution also considers the adoption of the recent Open Geospatial Consortium Web Services common standard (OGC OWS) context document as map part, and as a way for also combining data files with geospatial services. Finally, and by using adequate strategies, different GIS implementations can use several parts of the package and ignore the rest: a philosophy that has proven useful (e.g. in TIFF). [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
80. Somatic cancer variant curation and harmonization through consensus minimum variant level data.
- Author
-
Ritter, Deborah I., Roychowdhury, Sameek, Roy, Angshumoy, Rao, Shruti, Landrum, Melissa J., Sonkin, Dmitriy, Shekar, Mamatha, Davis, Caleb F., Hart, Reece K., Micheel, Christine, Weaver, Meredith, Van Allen, Eliezer M., Parsons, Donald W., McLeod, Howard L., Watson, Michael S., Plon, Sharon E., Kulkarni, Shashikant, and Madhavan, Subha
- Subjects
- *
GENOMICS , *SOMATIC cells , *CANCER cells , *CANCER genetics , *GENOMES , *ONCOLOGY - Abstract
Background: To truly achieve personalized medicine in oncology, it is critical to catalog and curate cancer sequence variants for their clinical relevance. The Somatic Working Group (WG) of the Clinical Genome Resource (ClinGen), in cooperation with ClinVar and multiple cancer variant curation stakeholders, has developed a consensus set of minimal variant level data (MVLD). MVLD is a framework of standardized data elements to curate cancer variants for clinical utility. With implementation of MVLD standards, and in a working partnership with ClinVar, we aim to streamline the somatic variant curation efforts in the community and reduce redundancy and time burden for the interpretation of cancer variants in clinical practice. Methods: We developed MVLD through a consensus approach by i) reviewing clinical actionability interpretations from institutions participating in the WG, ii) conducting extensive literature search of clinical somatic interpretation schemas, and iii) survey of cancer variant web portals. A forthcoming guideline on cancer variant interpretation, from the Association of Molecular Pathology (AMP), can be incorporated into MVLD. Results: Along with harmonizing standardized terminology for allele interpretive and descriptive fields that are collected by many databases, the MVLD includes unique fields for cancer variants such as Biomarker Class, Therapeutic Context and Effect. In addition, MVLD includes recommendations for controlled semantics and ontologies. The Somatic WG is collaborating with ClinVar to evaluate MVLD use for somatic variant submissions. ClinVar is an open and centralized repository where sequencing laboratories can report summary-level variant data with clinical significance, and ClinVar accepts cancer variant data. Conclusions: We expect the use of the MVLD to streamline clinical interpretation of cancer variants, enhance interoperability among multiple redundant curation efforts, and increase submission of somatic variants to ClinVar, all of which will enhance translation to clinical oncology practice. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
81. Challenges when creating a cohesive digital twin ship: a data modelling perspective
- Author
-
Icaro Aragao Fonseca and Henrique M. Gaspar
- Subjects
ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATION ,Computer science ,business.industry ,Perspective (graphical) ,Ocean Engineering ,Data science ,GeneralLiterature_MISCELLANEOUS ,Data modeling ,Data Standard ,Open source ,Digital asset ,Hardware_INTEGRATEDCIRCUITS ,Internet of Things ,business ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) - Abstract
A digital twin is a digital asset that simulates the behaviours of a physical counterpart. Digital twin ship literature identifies that the concept is already being applied to specialised problems, but no clear guide exists for creating broader interdisciplinary digital twins. Relevant dimensions of product data modelling and previous attempts at standardizing ship data elucidate the requirements for effective data modelling in a digital twin context. Such requirements are placed in a broader perspective for digital twin implementation that encompasses challenges and directions for future development of services, networks, and software. Finally, an open standardization for digital twin data is proposed based on lessons extracted from this panorama, proposing its application to a research vessel.
- Published
- 2020
82. Leveraging the UMLS As a Data Standard for Rare Disease Data Normalization and Harmonization
- Author
-
Anne Pariser, Dac-Trung Nguyen, Qian Zhu, and Eric Sid
- Subjects
Computer science ,Knowledge Bases ,MEDLINE ,Health Informatics ,Harmonization ,computer.software_genre ,Database normalization ,Set (abstract data type) ,03 medical and health sciences ,Rare Diseases ,0302 clinical medicine ,Health Information Management ,Disease Ontology ,Humans ,030304 developmental biology ,Advanced and Specialized Nursing ,0303 health sciences ,business.industry ,Unified Medical Language System ,Semantics ,Data Standard ,030220 oncology & carcinogenesis ,Artificial intelligence ,business ,computer ,Natural language processing ,Rare disease - Abstract
Objective In this study, we aimed to evaluate the capability of the Unified Medical Language System (UMLS) as one data standard to support data normalization and harmonization of datasets that have been developed for rare diseases. Through analysis of data mappings between multiple rare disease resources and the UMLS, we propose suggested extensions of the UMLS that will enable its adoption as a global standard in rare disease. Methods We analyzed data mappings between the UMLS and existing datasets on over 7,000 rare diseases that were retrieved from four publicly accessible resources: Genetic And Rare Diseases Information Center (GARD), Orphanet, Online Mendelian Inheritance in Men (OMIM), and the Monarch Disease Ontology (MONDO). Two types of disease mappings were assessed, (1) curated mappings extracted from those four resources; and (2) established mappings generated by querying the rare disease-based integrative knowledge graph developed in the previous study. Results We found that 100% of OMIM concepts, and over 50% of concepts from GARD, MONDO, and Orphanet were normalized by the UMLS and accurately categorized into the appropriate UMLS semantic groups. We analyzed 58,636 UMLS mappings, which resulted in 3,876 UMLS concepts across these resources. Manual evaluation of a random set of 500 UMLS mappings demonstrated a high level of accuracy (99%) of developing those mappings, which consisted of 414 mappings of synonyms (82.8%), 76 are subtypes (15.2%), and five are siblings (1%). Conclusion The mapping results illustrated in this study that the UMLS was able to accurately represent rare disease concepts, and their associated information, such as genes and phenotypes, and can effectively be used to support data harmonization across existing resources developed on collecting rare disease data. We recommend the adoption of the UMLS as a data standard for rare disease to enable the existing rare disease datasets to support future applications in a clinical and community settings.
- Published
- 2020
83. Standards for Passive UHF RFID
- Author
-
Jian Zhang, Shiwen Mao, Senthilkumar C. G. Periaswamy, and Justin Patton
- Subjects
Computer science ,business.industry ,Interface (computing) ,020208 electrical & electronic engineering ,02 engineering and technology ,021001 nanoscience & nanotechnology ,Electronic Product Code ,Data Standard ,Data sharing ,Identification (information) ,Ultra high frequency ,Embedded system ,0202 electrical engineering, electronic engineering, information engineering ,General Earth and Planetary Sciences ,Radio-frequency identification ,ComputerSystemsOrganization_SPECIAL-PURPOSEANDAPPLICATION-BASEDSYSTEMS ,0210 nano-technology ,business ,Protocol (object-oriented programming) ,General Environmental Science - Abstract
Passive ultra-high frequency (UHF) radio frequency identification (RFID) technology has been widely adopted by retail and other industries for serialized item-level identification and data sharing. This article introduces the standards that support and define all procedures in various passive UHF RFID applications. Electronic Product Code (EPC) Radio-Frequency Identity Protocols Generation-2 UHF RFID Standard, or C1G2, is a foundational standard that defines the format, encoding, and procedures within the air interfaces of RFID systems. Low Level Reader Protocol provides a standard and portable interface between different applications and RFID readers from different vendors. The format and encoding of the EPC information are defined by the EPC Tag Data Standard, which enables each tag to be uniquely identified (e.g., item-level identity of many goods in a warehouse). Additional protocols, such as Discovery, Configuration and Initialization and Reader Management, specify and clarify more processes (e.g., the management of many readers) that can be deployed in various business applications. We provide a general introduction of passive UHF RFID technology standards and review each protocol's features and procedures.
- Published
- 2020
84. Data format standards in analytical chemistry
- Author
-
David Rauh, Claudia Blankenburg, Tillmann G. Fischer, Nicole Jung, Stefan Kuhn, Ulrich Schatzschneider, Tobias Schulze, and Steffen Neumann
- Subjects
Life sciences ,biology ,file format ,cheminformatics: data and standards ,General Chemical Engineering ,ddc:570 ,data standard ,cheminformatics ,General Chemistry ,Analytical chemistry ,data and standards ,NMR ,data and standards [cheminformatics] ,mass spectrometry - Abstract
Research data is an essential part of research and almost every publication in chemistry. The data itself can be valuable for reuse if sustainably deposited, annotated and archived. Thus, it is important to publish data following the FAIR principles, to make it findable, accessible, interoperable and reusable not only for humans but also in machine-readable form. This also improves transparency and reproducibility of research findings and fosters analytical work with scientific data to generate new insights, being only accessible with manifold and diverse datasets. Research data requires complete and informative metadata and use of open data formats to obtain interoperable data. Generic data formats like AnIML and JCAMP-DX have been used for many applications. Special formats for some analytical methods are already accepted, like mzML for mass spectrometry or nmrML and NMReDATA for NMR spectroscopy data. Other methods still lack common standards for data. Only a joint effort of chemists, instrument and software vendors, publishers and infrastructure maintainers can make sure that the analytical data will be of value in the future. In this review, we describe existing data formats in analytical chemistry and introduce guidelines for the development and use of standardized and open data formats.
- Published
- 2022
85. Calculating the Digitization Level of Specimens with the Minimum Information about a Digital Specimen (MIDS) Standard
- Author
-
Mathias Dillen, Pieter Huybrechts, Quentin Groom, and Lynn Delgat
- Subjects
JSON schema ,data standard ,interoperability ,General Medicine ,Rshiny ,FAIR - Abstract
Natural history specimens constitute physical evidence for past observations of nature. They hold further value as the backbone of taxonomy and as historical samples that can be subjected to further analysis. Yet, as physical objects scattered across collections around the world, their scientific use cases are limited by an overall lack of FAIRness, i.e. not easily Findable, Accessible, Interoperable or Reusable. Digitization of these specimens through imaging and categorical metadata capture can improve this FAIRness and has been done to some extent for decades already, but only recently have technical developments in the field of imaging and information technology made it possible for the fruits of these digitization efforts to be widely distributed and utilized. Digitization can be done in many different ways and while protocols may be well formulated during a project or within the responsible digitization team, they are often not communicated beyond to users, get lost with time and are not available for analysts to assess the state of digitization or make requests concerning material for which further information may be available. Hence, as digitization is ongoing, it is a difficult exercise to estimate how much has been digitized, and to what extent, at the collection level or on a larger scale. The Minimum Information about a Digital Specimen (MIDS) standard that is currently under development by a Working Group of Biodiversity Information Standards (TDWG) aims to address this problem by defining hierarchical levels of digitization, each associated with a set of criteria for a level to be achieved by an individual specimen. MIDS has been in development since work in the ICEDIG project in 2019 and its earlier drafts have been used in surveys to try and determine digitization status, often through coarse estimates based on the experience of curators. As a result, these scores cannot be considered reproducible or particularly reliable. Ideally, MIDS scores can be calculated automatically based on a mapping made between the data model of the source and the MIDS criteria. These mappings should also take into account any data value that is known not to be reflective of digitization status. While in an ideal world there would be only one accepted mapping for any data model, different practices causing interoperability conflicts and different kinds of specimens will likely continue to require slight modifications. To make this concrete, we constructed a few JSON schemas that specify such mappings, based on the current specification of the MIDS standard and the state of biodiversity data in a few sources, including Darwin Core archives for occurrence data as produced by the Global Biodiversity Information Facility (GBIF). These schemas could be incorporated into existing data publication workflows to automatically calculate MIDS levels. We have also developed an R Shiny app with a user interface to make calculations and simple adjustments of the schemas. We welcome anyone interested to further develop the syntax and philosophy behind the schemas and their integration into other systems.
- Published
- 2022
86. PDBx/mmCIF Ecosystem: Foundational Semantic Tools for Structural Biology
- Author
-
John D. Westbrook, Jasmine Y. Young, Chenghua Shao, Zukang Feng, Vladimir Guranovic, Catherine L. Lawson, Brinda Vallat, Paul D. Adams, John M Berrisford, Gerard Bricogne, Kay Diederichs, Robbie P. Joosten, Peter Keller, Nigel W. Moriarty, Oleg V. Sobolev, Sameer Velankar, Clemens Vonrhein, David G. Waterman, Genji Kurisu, Helen M. Berman, Stephen K. Burley, and Ezra Peisach
- Subjects
Biochemistry & Molecular Biology ,Macromolecular Substances ,Protein Conformation ,Bioengineering ,Microbiology ,Databases ,Medicinal and Biomolecular Chemistry ,Structural Biology ,Underpinning research ,ddc:570 ,protein data bank ,Databases, Protein ,Molecular Biology ,Crystallography ,Protein ,Computational Biology ,macromolecular structure ,1.5 Resources and infrastructure (underpinning) ,Semantics ,Networking and Information Technology R&D (NITRD) ,data standard ,data management ,Generic health relevance ,Biochemistry and Cell Biology ,biological data ,Software - Abstract
PDBx/mmCIF, Protein Data Bank Exchange (PDBx) macromolecular Crystallographic Information Framework (mmCIF), has become the data standard for structural biology. With its early roots in the domain of small-molecule crystallography, PDBx/mmCIF provides an extensible data representation that is used for deposition, archiving, remediation, and public dissemination of experimentally determined three-dimensional (3D) structures of biological macromolecules by the Worldwide Protein Data Bank (wwPDB, wwpdb.org). Extensions of PDBx/mmCIF are similarly used for computed structure models by ModelArchive (modelarchive.org), integrative/hybrid structures by PDB-Dev (pdb-dev.wwpdb.org), small angle scattering data by Small Angle Scattering Biological Data Bank SASBDB (sasbdb.org), and for models computed generated with the AlphaFold 2.0 deep learning software suite (alphafold.ebi.ac.uk). Community-driven development of PDBx/mmCIF spans three decades, involving contributions from researchers, software and methods developers in structural sciences, data repository providers, scientific publishers, and professional societies. Having a semantically rich and extensible data framework for representing a wide range of structural biology experimental and computational results, combined with expertly curated 3D biostructure data sets in public repositories, accelerates the pace of scientific discovery. Herein, we describe the architecture of the PDBx/mmCIF data standard, tools used to maintain representations of the data standard, governance, and processes by which data content standards are extended, plus community tools/software libraries available for processing and checking the integrity of PDBx/mmCIF data. Use cases exemplify how the members of the Worldwide Protein Data Bank have used PDBx/mmCIF as the foundation for its pipeline for delivering Findable, Accessible, Interoperable, and Reusable (FAIR) data to many millions of users worldwide. published
- Published
- 2021
87. European DDI Conference 2021: Training Fair. Track 2B: DDI Lifecycle tutorial
- Author
-
Thomas, Wendy, Mills, Hayley, Kulla, Kaia, and Beuster, Benjamin
- Subjects
DDI ,metadata ,data standard ,data lifecycle ,research data management - Abstract
The DDI Training Group organized a series of tutorials around the 2021 European DDI Users Conference. This included a general overview of the DDI standards on Friday 26 November, followed by a series of more detailed tutorials and a session describing the software tools and services available for implementing DDI, all on Monday 29 November 2021. A short description of each session is available here, along with recordings and presentations. DDI Lifecycle addresses the needs of metadata management not only for archives and data disseminators, but also throughout the data lifecycle, from data collection and survey design through the end stages. It is also a useful tool for understanding how related data sets fit together, whether as a longitudinal or repeat cross-sectional effort, or through ad-hoc comparison and harmonization. A range of topics will cover a broad description of the specification, as well as providing focus on specific aspects of the standard for metadata and data management. Presenters include Wendy Thomas, Hayley Mills, Kaia Kulla, and Benjamin Beuster.
- Published
- 2021
- Full Text
- View/download PDF
88. European DDI Conference 2021: Training Fair. Track 2A: DDI Codebook tutorial
- Author
-
Fry, Jane and Dusa, Adrian
- Subjects
qualitative data ,DDI Codebook ,DDI ,metadata ,data standard ,research data - Abstract
The DDI Training Group organized a series of tutorials around the 2021 European DDI Users Conference. This included a general overview of the DDI standards on Friday 26 November, followed by a series of more detailed tutorials and a session describing the software tools and services available for implementing DDI, all on Monday 29 November. A short description of each session is available here, along with recordings and presentations. DDI Codebook is a metadata specification for providing detailed, machine-actionable and human-readable metadata about individual studies and their data files in an XML format. Instructors are Jane Fry of Carleton University and Adrian Dusa of the University of Bucharest, who will introduce you to this popular metadata standard. Please note that the date on the title slide is incorrect: the presentation actually took place on Mon 29 November 2021.
- Published
- 2021
- Full Text
- View/download PDF
89. A Community-Developed Extension to Darwin Core for Reporting the Chronometric Age of Specimens
- Author
-
Kitty F. Emery, Michelle J. LeFebvre, Robert P. Guralnick, Edward Byrd Davis, Neill J. Wallis, John Wieczorek, and Laura Brenskelle
- Subjects
Vocabulary ,Multidisciplinary ,Process (engineering) ,Computer science ,Data Collection ,media_common.quotation_subject ,Level of detail (writing) ,Flexibility (personality) ,Biodiversity ,Biodiversity informatics ,Data science ,Data Standard ,Blueprint ,Darwin Core ,media_common - Abstract
Darwin Core, the data standard used for sharing modern biodiversity and paleodiversity occurrence records, has previously lacked proper mechanisms for reporting what is known about the estimated age range of specimens from deep time. This has led to data providers putting these data in fields where they cannot easily be found by users, which impedes the reuse and improvement of these data by other researchers. Here we describe the development of the Chronometric Age Extension to Darwin Core, a ratified, community-developed extension that enables the reporting of ages of specimens from deeper time and the evidence supporting these estimates. The extension standardizes reporting about the methods or assays used to determine an age and other critical information like uncertainty. It gives data providers flexibility about the level of detail reported, focusing on the minimum information needed for reuse while still allowing for significant detail if providers have it. Providing a standardized format for reporting these data will make them easier to find and search and enable researchers to pinpoint specimens of interest for data improvement or accumulate more data for broad temporal studies. The Chronometric Age Extension was also the first community-managed vocabulary to undergo the new Biodiversity Informatics Standards (TDWG) review and ratification process, thus providing a blueprint for future Darwin Core extension development.
- Published
- 2021
90. Lessons Learned from the Development and Demonstration of a PPE Inventory Monitoring System for US Hospitals
- Author
-
Alexa Furek, Megan Casey, Emily J. Haas, Spencer Crosswy, Susan M. Moore, Tommy Ragsdale, and Kelly Aldrich
- Subjects
Face shield ,Health (social science) ,business.product_category ,Computer science ,Health, Toxicology and Mutagenesis ,Management, Monitoring, Policy and Law ,Article ,Health care ,medicine ,Humans ,Respirator ,Personal protective equipment ,Pandemics ,Personal Protective Equipment ,Emergency management ,business.industry ,Public Health, Environmental and Occupational Health ,Masks ,COVID-19 ,Monitoring system ,medicine.disease ,Hospitals ,Data Standard ,Software deployment ,Emergency Medicine ,Medical emergency ,business ,Safety Research - Abstract
An international system should be established to support personal protective equipment (PPE) inventory monitoring, particularly within the healthcare industry. In this article, the authors discuss the development and 15-week deployment of a proof-of-concept prototype that included the use of a Healthcare Trust Data Platform to secure and transmit PPE-related data. Seventy-eight hospitals participated, including 66 large hospital systems, 11 medium-sized hospital systems, and a single hospital. Hospitals reported near-daily inventory information for N95 respirators, surgical masks, and face shields, ultimately providing 159 different PPE model numbers. Researchers cross-checked the data to ensure the PPE could be accurately identified. In cases where the model number was inaccurately reported, researchers corrected the numbers whenever possible. Of the PPE model numbers reported, 74.2% were verified-60.5% of N95 respirators, 40.0% of face shields, and 84.0% of surgical masks. The authors discuss the need to standardize how PPE is reported, possible aspects of a PPE data standard, and standards groups who may assist with this effort. Having such PPE data standards would enable better communication across hospital systems and assist in emergency preparedness efforts during pandemics or natural disasters.
- Published
- 2021
91. Towards data standards for enterprise and farm-level analysis
- Author
-
Hansen, J. W., Thornton, P. K., Jones, J. W., Jacobson, B. M., Penning de Vries, F. W. T., editor, Teng, P. S., editor, Kropff, M. J., editor, ten Berge, H. F. M., editor, Dent, J. B., editor, Lansigan, F. P., editor, and van Laar, H. H., editor
- Published
- 1997
- Full Text
- View/download PDF
92. CitSci.org & PPSR Core: Sharing biodiversity observations across platforms
- Author
-
Gregory Newman and Brandon Budnicki
- Subjects
crowd-sourced science ,crowd science ,Computer science ,civic science ,Biodiversity ,metadata model ,General Medicine ,Metadata modeling ,Data science ,Core (game theory) ,community science ,Data quality ,citizen science ,data standard ,Citizen science ,data quality ,biodiversity - Abstract
CitSci.org is a global citizen science software platform and support organization housed at Colorado State University. The mission of CitSci is to help people do high quality citizen science by amplifying impacts and outcomes. This platform hosts over one thousand projects and a diverse volunteer base that has amassed over one million observations of the natural world, focused on biodiversity and ecosystem sustainability. It is a custom platform built using open source components including: PostgreSQL, Symfony, Vue.js, with React Native for the mobile apps. CitSci sets itself apart from other Citizen Science platforms through the flexibility in the types of projects it supports rather than having a singular focus. This flexibility allows projects to define their own datasheets and methodologies. The diversity of programs we host motivated us to take a founding role in the design of the PPSR Core, a set of global, transdisciplinary data and metadata standards for use in Public Participation in Scientific Research (Citizen Science) projects. Through an international partnership between the Citizen Science Association, European Citizen Science Association, and Australian Citizen Science Association, the PPSR team and associated standards enable interoperability of citizen science projects, datasets, and observations. Here we share our experience over the past 10+ years of supporting biodiversity research both as developers of the CitSci.org platform and as stewards of, and contributors to, the PPSR Core standard. Specifically, we share details about: the origin, development, and informatics infrastructure for CitSci our support for biodiversity projects such as population and community surveys our experiences in platform interoperability through PPSR Core working with the Zooniverse, SciStarter, and CyberTracker data quality data sharing goals and use cases. the origin, development, and informatics infrastructure for CitSci our support for biodiversity projects such as population and community surveys our experiences in platform interoperability through PPSR Core working with the Zooniverse, SciStarter, and CyberTracker data quality data sharing goals and use cases. We conclude by sharing overall successes, limitations, and recommendations as they pertain to trust and rigor in citizen science data sharing and interoperability. As the scientific community moves forward, we show that Citizen Science is a key tool to enabling a systems-based approach to ecosystem problems.
- Published
- 2021
93. A Data Standard for Dynamic Collection Descriptions
- Author
-
Kate Webbink, Maarten Trekels, Sarah Vincent, Matt Woodburn, Sharon Grant, Janeen Jones, Quentin Groom, and Gabriele Droege
- Subjects
Computer science ,business.industry ,Environmental resource management ,Biodiversity ,collection descriptions ,General Medicine ,TDWG ,Data Standard ,Geodiversity ,DiSSCo ,geodiversity ,natural sciences ,data standards ,business ,biodiversity - Abstract
The utopian vision is of a future where a digital representation of each object in our collections is accessible through the internet and sustainably linked to other digital resources. This is a long term goal however, and in the meantime there is an urgent need to share data about our collections at a higher level with a range of stakeholders (Woodburn et al. 2020). To sustainably achieve this, and to aggregate this information across all natural science collections, the data need to be standardised (Johnston and Robinson 2002). To this end, the Biodiversity Information Standards (TDWG) Collection Descriptions (CD) Interest Group has developed a data standard for describing collections, which is approaching formal review for ratification as a new TDWG standard. It proposes 20 classes (Suppl. material 1) and over 100 properties that can be used to describe, categorise, quantify, link and track digital representations of natural science collections, from high-level approximations to detailed breakdowns depending on the purpose of a particular implementation. The wide range of use cases identified for representing collection description data means that a flexible approach to the standard and the underlying modelling concepts is essential. These are centered around the ‘ObjectGroup’ (Fig. 1), a class that may represent any group (of any size) of physical collection objects, which have one or more common characteristics. This generic definition of the ‘collection’ in ‘collection descriptions’ is an important factor in making the standard flexible enough to support the breadth of use cases. For any use case or implementation, only a subset of classes and properties within the standard are likely to be relevant. In some cases, this subset may have little overlap with those selected for other use cases. This additional need for flexibility means that very few classes and properties, representing the core concepts, are proposed to be mandatory. Metrics, facts and narratives are represented in a normalised structure using an extended MeasurementOrFact class, so that these can be user-defined rather than constrained to a set identified by the standard. Finally, rather than a rigid underlying data model as part of the normative standard, documentation will be developed to provide guidance on how the classes in the standard may be related and quantified according to relational, dimensional and graph-like models. So, in summary, the standard has, by design, been made flexible enough to be used in a number of different ways. The corresponding risk is that it could be used in ways that may not deliver what is needed in terms of outputs, manageability and interoperability with other resources of collection-level or object-level data. To mitigate this, it is key for any new implementer of the standard to establish how it should be used in that particular instance, and define any necessary constraints within the wider scope of the standard and model. This is the concept of the ‘collection description scheme,’ a profile that defines elements such as: which classes and properties should be included, which should be mandatory, and which should be repeatable; which controlled vocabularies and hierarchies should be used to make the data interoperable; how the collections should be broken down into individual ObjectGroups and interlinked, and how the various classes should be related to each other. which classes and properties should be included, which should be mandatory, and which should be repeatable; which controlled vocabularies and hierarchies should be used to make the data interoperable; how the collections should be broken down into individual ObjectGroups and interlinked, and how the various classes should be related to each other. Various factors might influence these decisions, including the types of information that are relevant to the use case, whether quantitative metrics need to be captured and aggregated across collection descriptions, and how many resources can be dedicated to amassing and maintaining the data. This process has particular relevance to the Distributed System of Scientific Collections (DiSSCo) consortium, the design of which incorporates use cases for storing, interlinking and reporting on the collections of its member institutions. These include helping users of the European Loans and Visits System (ELViS) (Islam 2020) to discover specimens for physical and digital loans by providing descriptions and breakdowns of the collections of holding institutions, and monitoring digitisation progress across European collections through a dynamic Collections Digitisation Dashboard. In addition, DiSSCo will be part of a global collections data ecosystem requiring interoperation with other infrastructures such as the GBIF (Global Biodiversity Information Facility) Registry of Scientific Collections, the CETAF (Consortium of European Taxonomic Facilities) Registry of Collections and Index Herbariorum. In this presentation, we will introduce the draft standard and discuss the process of defining new collection description schemes using the standard and data model, and focus on DiSSCo requirements as examples of real-world collection descriptions use cases.
- Published
- 2021
94. Towards efficient use of data, models and tools in food microbiology
- Author
-
Matthias Filter, Maarten Nauta, Sara M. Pires, Laurent Guillier, and Tasja Buschhardt
- Subjects
Data standard ,Information exchange ,Food safety knowledge exchange format ,Data interoperability ,Applied Microbiology and Biotechnology ,Modelling ,Food Science - Abstract
Food microbiology researchers, risk assessment agencies and food business operators rely heavily on the reuse of knowledge that is available as data, models and tools. Unfortunately, such knowledge reuse remains challenging, as food safety data sets, models and tools are usually only available in platform-dependent or software-dependent formats that rarely comply to the Findability, Accessibility, Interoperability, and Reusability data principles. In recent years, the Risk Assessment Modelling and Knowledge Integration Platform (RAKIP) Initiative developed the so-called Food Safety Knowledge Exchange (FSKX) format. This development was accompanied by the creation of open-source software that facilitates the adoption of FSKX. Future work within RAKIP will focus on creating semantic interoperability in FSKX-related solutions and on the extension of the FSKX format towards other food microbiology knowledge.
- Published
- 2022
95. The HD(CP)2 Data Archive for Atmospheric Measurement Data.
- Author
-
Stamnas, Erasmia, Lammert, Andrea, Winkelmann, Volker, and Lang, Ulrich
- Subjects
- *
DATA analysis , *READABILITY formulas - Abstract
The archiving of scientific data is a sophisticated mission in nearly all research projects. In this paper, we introduce a new online archive of atmospheric measurement data from the "High definition clouds and precipitation for advancing climate prediction" (HD(CP)2) research initiative. The project data archive is quality managed, easy to use, and is now open for other atmospheric research data. The archive's creation was already taken into account during the HD(CP)2 project planning phase and the necessary resources were granted. The funding enabled the HD(CP)2 project to build a sound archive structure, which guarantees that the collected data are accessible for all researchers in the project and beyond. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
96. A reporting format for leaf-level gas exchange data and metadata
- Author
-
Ely, Kim S., Rogers, Alistair, Agarwal, Deborah A., Ainsworth, Elizabeth A., Albert, Loren P., Ali, Ashehad, Anderson, Jeremiah, Aspinwall, Michael J., Bellasio, Chandra, Bernacchi, Carl, Bonnage, Steve, Buckley, Thomas N., Bunce, James, Burnett, Angela C., Busch, Florian A., Cavanagh, Amanda, Cernusak, Lucas A., Crystal-Ornelas, Robert, Damerow, Joan, Davidson, Kenneth J., De Kauwe, Martin G., Dietze, Michael C., Domingues, Tomas F., Dusenge, Mirindi Eric, Ellsworth, David S., Evans, John R., Gauthier, Paul P.G., Gimenez, Bruno O., Gordon, Elizabeth P., Gough, Christopher M., Halbritter, Aud H., Hanson, David T., Heskel, Mary, Hogan, J. Aaron, Hupp, Jason R., Jardine, Kolby, Kattge, Jens, Keenan, Trevor, Kromdijk, Johannes, Kumarathunge, Dushan P., Ely, Kim S., Rogers, Alistair, Agarwal, Deborah A., Ainsworth, Elizabeth A., Albert, Loren P., Ali, Ashehad, Anderson, Jeremiah, Aspinwall, Michael J., Bellasio, Chandra, Bernacchi, Carl, Bonnage, Steve, Buckley, Thomas N., Bunce, James, Burnett, Angela C., Busch, Florian A., Cavanagh, Amanda, Cernusak, Lucas A., Crystal-Ornelas, Robert, Damerow, Joan, Davidson, Kenneth J., De Kauwe, Martin G., Dietze, Michael C., Domingues, Tomas F., Dusenge, Mirindi Eric, Ellsworth, David S., Evans, John R., Gauthier, Paul P.G., Gimenez, Bruno O., Gordon, Elizabeth P., Gough, Christopher M., Halbritter, Aud H., Hanson, David T., Heskel, Mary, Hogan, J. Aaron, Hupp, Jason R., Jardine, Kolby, Kattge, Jens, Keenan, Trevor, Kromdijk, Johannes, and Kumarathunge, Dushan P.
- Abstract
Leaf-level gas exchange data support the mechanistic understanding of plant fluxes of carbon and water. These fluxes inform our understanding of ecosystem function, are an important constraint on parameterization of terrestrial biosphere models, are necessary to understand the response of plants to global environmental change, and are integral to efforts to improve crop production. Collection of these data using gas analyzers can be both technically challenging and time consuming, and individual studies generally focus on a small range of species, restricted time periods, or limited geographic regions. The high value of these data is exemplified by the many publications that reuse and synthesize gas exchange data, however the lack of metadata and data reporting conventions make full and efficient use of these data difficult. Here we propose a reporting format for leaf-level gas exchange data and metadata to provide guidance to data contributors on how to store data in repositories to maximize their discoverability, facilitate their efficient reuse, and add value to individual datasets. For data users, the reporting format will better allow data repositories to optimize data search and extraction, and more readily integrate similar data into harmonized synthesis products. The reporting format specifies data table variable naming and unit conventions, as well as metadata characterizing experimental conditions and protocols. For common data types that were the focus of this initial version of the reporting format, i.e., survey measurements, dark respiration, carbon dioxide and light response curves, and parameters derived from those measurements, we took a further step of defining required additional data and metadata that would maximize the potential reuse of those data types. To aid data contributors and the development of data ingest tools by data repositories we provided a translation table comparing the outputs of common gas exchange instruments. Extensive consult
- Published
- 2021
97. Research on Suggestions of Improving Chinese Open Government Data in Innovation of Public Governance
- Author
-
Hongqin Li and Jun Zhai
- Subjects
Open government ,business.industry ,Corporate governance ,05 social sciences ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Accounting ,02 engineering and technology ,Data sharing ,Data Standard ,Order (business) ,Data quality ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Business ,0509 other social sciences ,050904 information & library sciences ,China - Abstract
This paper collects a large number of cases and makes a comparative analysis of the typical application of Chinese and American open government data for public governance. Through comparison, this paper finds the gap between China's open government data and the United States, and then analyzes the reasons. On this basis, through the investigation of advanced experience, this paper puts forward the suggestions of open government data to innovate public governance, including data catalogue compilation, data standard formulating, data quality assessment and open government data sharing cooperation, in order to improve Chinese open government data to innovate the public governance level.
- Published
- 2021
98. Response to Request for Information (RFI): Use of Common Data Elements (CDEs) in NIH-funded research: NOT-LM-21-005
- Author
-
Haendel, Melissa, Eddy, James, Walden, Anita, and Volchenboum, Sam
- Subjects
NIH ,Data standard ,Common Data Elements ,Data harmonization ,Clinical research - Abstract
NIH requestedpublic comment on the use of CDEs, particularly in the context of COVID-19 research, including opportunities for advancing research with CDEs, challenges to adopting CDEs, and guidance or tools that could facilitate use of CDEs. Here, members of theNCI funded Center for Cancer Data Harmonization, the Pediatric Data Commons, the NCATS funded Center for Data to Health, describe our extensive use ofCDEs for prospective studies and data harmonization efforts. We participate in many SDOs such as HL7, ISO, CDISC, GA4GH, WHO, etc. and try to promote computational encoding and use of CDEs in these contexts.
- Published
- 2021
- Full Text
- View/download PDF
99. Muon: multimodal omics analysis framework
- Author
-
Oliver Stegle, Ilia Kats, and Danila Bredikhin
- Subjects
Data Standard ,Muon ,business.industry ,Multimodal data ,Data management ,Interoperability ,Data pre-processing ,Data structure ,business ,Data science - Abstract
Advances in multi-omics technologies have led to an explosion of multimodal datasets to address questions ranging from basic biology to translation. While these rich data provide major opportunities for discovery, they also come with data management and analysis challenges, thus motivating the development of tailored computational solutions to deal with multi-omics data.Here, we present a data standard and an analysis framework for multi-omics — MUON — designed to organise, analyse, visualise, and exchange multimodal data. MUON stores multimodal data in an efficient yet flexible data structure, supporting an arbitrary number of omics layers. The MUON data structure is interoperable with existing community standards for single omics, and it provides easy access to both data from individual omics as well as multimodal dataviews. Building on this data infrastructure, MUON enables a versatile range of analyses, from data preprocessing, the construction of multi-omics containers to flexible multi-omics alignment.
- Published
- 2021
100. ODM Clinical Data Generator: Syntactically Correct Clinical Data Based on Metadata Definition
- Author
-
Maximilian Fechner, Ludger Becker, Johannes Oehm, Martin Dugas, Timm Harbich, Michael Storck, and Tobias Brix
- Subjects
Metadata ,Data Standard ,Source code ,Information retrieval ,Computer science ,Test data generation ,media_common.quotation_subject ,Data Protection Act 1998 ,Data type ,Test data ,media_common ,Data modeling - Abstract
The Operational Data Model (ODM) is a data standard for interchanging clinical trial data. ODM contains the metadata definition of a study, i.e., case report forms, as well as the clinical data, i.e., the answers of the participants. The portal of medical data models is an infrastructure for creation, exchange, and analysis of medical metadata models. There, over 23000 metadata definitions can be downloaded in ODM format. Due to data protection law and privacy issues, clinical data is not contained in these files. Access to exemplary clinical test data in the desired metadata definition is necessary in order to evaluate systems claiming to support ODM or to evaluate if a planned statistical analysis can be performed with the defined data types. In this work, we present a web application, which generates syntactically correct clinical data in ODM format based on an uploaded ODM metadata definition. Data types and range constraints are taken into account. Data for up to one million participants can be generated in a reasonable amount of time. Thus, in combination with the portal of medical data models, a large number of ODM files including metadata definition and clinical data can be provided for testing of any ODM supporting system. The current version of the application can be tested at https://cdgen.uni-muenster.de and source code is available, under MIT license, at https://imigitlab.uni-muenster.de/published/odm-clinical-data-generator.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.