1. GenoPheno: cataloging large-scale phenotypic and next-generation sequencing data within human datasets
- Author
-
Cartik R. Kothari, Carlos De Niz, Sek Won Kong, Kenneth D. Mandl, Paul Avillach, and Alba Gutiérrez-Sacristán
- Subjects
catalog ,AcademicSubjects/SCI01060 ,Computer science ,precision medicine ,Cataloging ,Dynamic web page ,03 medical and health sciences ,0302 clinical medicine ,Database Review ,Databases, Genetic ,Humans ,Genetic Predisposition to Disease ,Molecular Biology ,Exome sequencing ,030304 developmental biology ,Whole genome sequencing ,0303 health sciences ,Whole Genome Sequencing ,phenotypic data ,High-Throughput Nucleotide Sequencing ,Subject (documents) ,Precision medicine ,Data science ,Biobank ,Phenotype ,next-generation sequencing data ,Genomic Profile ,biobanks ,Large-scale datasets ,030217 neurology & neurosurgery ,Information Systems - Abstract
Precision medicine promises to revolutionize treatment, shifting therapeutic approaches from the classical one-size-fits-all to those more tailored to the patient’s individual genomic profile, lifestyle and environmental exposures. Yet, to advance precision medicine’s main objective—ensuring the optimum diagnosis, treatment and prognosis for each individual—investigators need access to large-scale clinical and genomic data repositories. Despite the vast proliferation of these datasets, locating and obtaining access to many remains a challenge. We sought to provide an overview of available patient-level datasets that contain both genotypic data, obtained by next-generation sequencing, and phenotypic data—and to create a dynamic, online catalog for consultation, contribution and revision by the research community. Datasets included in this review conform to six specific inclusion parameters that are: (i) contain data from more than 500 human subjects; (ii) contain both genotypic and phenotypic data from the same subjects; (iii) include whole genome sequencing or whole exome sequencing data; (iv) include at least 100 recorded phenotypic variables per subject; (v) accessible through a website or collaboration with investigators and (vi) make access information available in English. Using these criteria, we identified 30 datasets, reviewed them and provided results in the release version of a catalog, which is publicly available through a dynamic Web application and on GitHub. Users can review as well as contribute new datasets for inclusion (Web: https://avillachlab.shinyapps.io/genophenocatalog/; GitHub: https://github.com/hms-dbmi/GenoPheno-CatalogShiny).
- Published
- 2020