Back to Search Start Over

Standardized metadata for human pathogen/vector genomic sequences.

Authors :
Vivien G Dugan
Scott J Emrich
Gloria I Giraldo-Calderón
Omar S Harb
Ruchi M Newman
Brett E Pickett
Lynn M Schriml
Timothy B Stockwell
Christian J Stoeckert
Dan E Sullivan
Indresh Singh
Doyle V Ward
Alison Yao
Jie Zheng
Tanya Barrett
Bruce Birren
Lauren Brinkac
Vincent M Bruno
Elizabet Caler
Sinéad Chapman
Frank H Collins
Christina A Cuomo
Valentina Di Francesco
Scott Durkin
Mark Eppinger
Michael Feldgarden
Claire Fraser
W Florian Fricke
Maria Giovanni
Matthew R Henn
Erin Hine
Julie Dunning Hotopp
Ilene Karsch-Mizrachi
Jessica C Kissinger
Eun Mi Lee
Punam Mathur
Emmanuel F Mongodin
Cheryl I Murphy
Garry Myers
Daniel E Neafsey
Karen E Nelson
William C Nierman
Julia Puzak
David Rasko
David S Roos
Lisa Sadzewicz
Joana C Silva
Bruno Sobral
R Burke Squires
Rick L Stevens
Luke Tallon
Herve Tettelin
David Wentworth
Owen White
Rebecca Will
Jennifer Wortman
Yun Zhang
Richard H Scheuermann
Source :
PLoS ONE, Vol 9, Iss 6, p e99979 (2014)
Publication Year :
2014
Publisher :
Public Library of Science (PLoS), 2014.

Abstract

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.

Subjects

Subjects :
Medicine
Science

Details

Language :
English
ISSN :
19326203
Volume :
9
Issue :
6
Database :
Directory of Open Access Journals
Journal :
PLoS ONE
Publication Type :
Academic Journal
Accession number :
edsdoj.7ea0c224c6f4484e9585bedb61a17966
Document Type :
article
Full Text :
https://doi.org/10.1371/journal.pone.0099979