9 results on '"Nicole A. Deflaux"'
Search Results
2. The All of Us Research Program: data quality, utility, and diversity
- Author
-
Dan M. Roden, Cheryl R. Clark, Hoda Anton-Culver, Melissa A. Basford, Parinda Khatri, Mona N. Fouad, Stephen O. Sodeke, Paul A. Harris, Joshua C. Denny, Consuelo H. Wilkins, Robert M. Cronin, Mine S. Cicek, Scott J. Hebbring, Jun Qian, John Wilbanks, Karthik N. Muthuraman, David Goldstein, Lucila Ohno-Machado, Kelsey R. Mayo, Lina Sulieman, Roxana Loperena, Nicole A. Deflaux, Kelly Gebo, Abel N. Kho, Philip Greenland, Eric Boerwinkle, Anthony Philippakis, George Hripcsak, Brian K. Ahmedani, Stephanie A. Devaney, Elizabeth W. Karlson, David J. Schlueter, Maria Argos, Justin Hentges, Elizabeth Cohn, Karthik Natarajan, Alese E. Halvorson, Christopher J. O'Donnell, Bruce R. Korf, David Glazer, Jordan W. Smoller, Francis Ratsimbazafy, Hua Xu, Robert J. Carroll, Christopher Lunt, Sheri D. Schully, and Andrea H. Ramirez
- Subjects
Gerontology ,Research program ,education.field_of_study ,business.industry ,Population ,Cloud computing ,Precision medicine ,Data quality ,Cohort ,Survey data collection ,Relevance (information retrieval) ,business ,Psychology ,education - Abstract
ImportanceThe All of Us Research Program hypothesizes that accruing one million or more diverse participants engaged in a longitudinal research cohort will advance precision medicine and ultimately improve human health. Launched nationally in 2018, to date All of Us has recruited more than 345,000 participants. All of Us plans to open beta access to researchers in May 2020.ObjectiveTo demonstrate the quality, utility, and diversity of the All of Us Research Program’s initial data release and beta launch of the cloud-based analysis platform, the cloud-based Researcher Workbench.EvidenceWe analyzed the initial All of Us data release, comprising surveys, physical measurements (PM), and electronic health record (EHR) data, to characterize All of Us participants including self-reported descriptors of diversity. Data depth, density, and quality were evaluated using medication sequencing analyses for depression and type 2 diabetes. Replication of known oncologic associations with smoking exposure ascertained by EHR and survey data and calculation of population-based atherosclerotic cardiovascular disease risk scores demonstrated the utility of data and platform capability.FindingsThe beta launch of the All of Us Researcher Workbench contains data on 224,143 participants. Seventy-seven percent of this cohort were identified as Underrepresented in Biomedical Research (UBR) including over forty-eight percent self-reporting non-White race. Medication usage patterns in common diseases depression and type 2 diabetes replicated prior findings previously reported in the literature and showed differences based on race. Oncologic associations with smoking were replicated and effect sizes compared for EHR and survey exposures finding general agreement. A cardiovascular disease score was calculated utilizing multiple data elements curated across sources. The cloud-based architecture built in the Researcher Workbench provided secure access and powerful computational resources at a low cost. All analyses have been made available for replication and reuse by registered researchers.Conclusions and RelevanceThe All of Us Research Program’s initial release of cohort data contains longitudinal and multidimensional data on diverse participants that replicate known associations. This dataset and the cloud-based Researcher Workbench advance the mission of All of Us to make data widely and securely available to researchers to improve human health and advance precision medicine.
- Published
- 2020
3. The ISB Cancer Genomics Cloud: A Flexible Cloud-Based Platform for Cancer Genomics Research
- Author
-
Matthew Bookman, Suzanne M. Paquette, Varsha Dhankani, Michael Miller, Kalle Leinonen, Nicole A. Deflaux, Mark Backus, Todd Pihl, Madelyn Reyes, Sheila Reynolds, Joseph Slagel, Jonathan Bingham, David L Gibbs, David Pot, William J.R. Longabaugh, Abigail Hahn, Zack Rodebaugh, Ilya Shmulevich, and Phyliss Lee
- Subjects
0301 basic medicine ,Cancer Research ,SQL ,Computer science ,Datasets as Topic ,Cloud computing ,Bioinformatics ,Article ,03 medical and health sciences ,0302 clinical medicine ,Documentation ,Neoplasms ,Humans ,Web application ,computer.programming_language ,Internet ,Genome, Human ,business.industry ,GENCODE ,Research ,Computational Biology ,Genomics ,Cloud Computing ,Data science ,National Cancer Institute (U.S.) ,United States ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,Workflow ,Oncology ,030220 oncology & carcinogenesis ,Data as a service ,business ,computer ,Software - Abstract
The ISB Cancer Genomics Cloud (ISB-CGC) is one of three pilot projects funded by the National Cancer Institute to explore new approaches to computing on large cancer datasets in a cloud environment. With a focus on Data as a Service, the ISB-CGC offers multiple avenues for accessing and analyzing The Cancer Genome Atlas, TARGET, and other important references such as GENCODE and COSMIC using the Google Cloud Platform. The open approach allows researchers to choose approaches best suited to the task at hand: from analyzing terabytes of data using complex workflows to developing new analysis methods in common languages such as Python, R, and SQL; to using an interactive web application to create synthetic patient cohorts and to explore the wealth of available genomic data. Links to resources and documentation can be found at www.isb-cgc.org. Cancer Res; 77(21); e7–10. ©2017 AACR.
- Published
- 2017
4. Cloud-based interactive analytics for terabytes of genomic variants data
- Author
-
Gregory McInnes, Jonathan Bingham, Cuiping Pan, Philip S. Tsao, Michael Snyder, Nicole A. Deflaux, and Somalee Datta
- Subjects
0301 basic medicine ,Statistics and Probability ,Genotype ,Computer science ,Big data ,Cloud computing ,Web Browser ,Terabyte ,Biochemistry ,03 medical and health sciences ,0302 clinical medicine ,Gene Frequency ,Humans ,Ecosystem ,Molecular Biology ,Genome, Human ,business.industry ,Genetic Variation ,Genomics ,Data Compression ,Original Papers ,Data science ,Computer Science Applications ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Analytics ,Scalability ,DECIPHER ,Databases, Nucleic Acid ,business ,Software ,030217 neurology & neurosurgery ,Data compression - Abstract
Motivation Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. Results We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information. Availability and implementation Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2017
5. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder
- Author
-
Christian R. Marshall, Annette Estes, John Wei, Janet A. Buchanan, Jennifer L. Howe, Christina Chrysler, Weili Li, Tara Paton, Fiona Tsoi, Zhuozhi Wang, Brendan J. Frey, Eric Deneault, Edwin H. Cook, William Van Etten, Stephen W. Scherer, Mohammed Uddin, Mayada Elsabbagh, Emily Kirby, Sylvia Lamoureux, Cheryl Cytrynbaum, Bhooma Thiruvahindrapuram, Mathew T. Pletcher, Lonnie Zwaigenbaum, Wilson W L Sung, Angie Fedele, Daniele Merico, Bartha Maria Knoppers, Ryan K. C. Yuen, Marc Woodbury-Smith, Worrawat Engchuan, Vicki Seifer, Isabel M. Smith, Barbara Kellam, Bonnie Mackinnon Modi, Stephanie Koyanagi, Bridget A. Fernandez, James T. Robinson, Karen Ho, Edward J Higginbotham, Joe Whitney, Krissy A.R. Doyle-Thomas, Beth A. Malow, Susan Walker, Jeremy R. Parr, Louise Gallagher, Rob Nicolson, Jonathan Bingham, Thomas Nalpathamkalam, Lia D’Abate, Sanne Jilderda, Matt Bookman, Jessica Brian, Sarah J. Spence, Ann Thompson, Jonathan Leef, Rosanna Weksberg, Jacob A. S. Vorstman, Tal Savion-Lemieux, Anne Marie Tassé, Peter Szatmari, Alana Iaboni, Xudong Liu, Evdokia Anagnostou, Jeffrey R. MacDonald, Ny Hoang, Mehdi Zarrei, Lizhen Xu, Simon N. Twigger, Robert H. Ring, Stephen R. Dager, Melissa T. Carter, Irene Drmic, Michael J. Szego, Wendy Roberts, Lili Senman, Giovanna Pellecchia, Rohan V. Patel, Sergio L. Pereira, Joachim Hallmayer, David Glazer, Lisa J. Strug, Ada J.S. Chan, and Nicole A. Deflaux
- Subjects
0301 basic medicine ,Candidate gene ,DNA Copy Number Variations ,Autism Spectrum Disorder ,Neuroscience(all) ,Biology ,behavioral disciplines and activities ,Polymorphism, Single Nucleotide ,DNA sequencing ,Article ,03 medical and health sciences ,Genetic variation ,mental disorders ,Databases, Genetic ,medicine ,Journal Article ,Humans ,Genetic Predisposition to Disease ,Copy-number variation ,Gene ,Sequence Deletion ,Whole genome sequencing ,Genetics ,Chromosome Aberrations ,General Neuroscience ,Autism spectrum disorders ,medicine.disease ,Phenotype ,Mutagenesis, Insertional ,030104 developmental biology ,Autism spectrum disorder ,Next-generation sequencing ,Genome-Wide Association Study - Abstract
We are performing whole genome sequencing (WGS) of families with Autism Spectrum Disorder (ASD) to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved. Here, we report WGS of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible in a cloud platform, and through an internet portal with controlled access. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertion/deletions (indels) or copy number variations (CNVs) per ASD subject. We identified 18 new candidate ASD-risk genes such as MED13 and PHF3, and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (p=6×10−4). In 294/2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried CNV/chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD.
- Published
- 2017
6. Analysis of protein-coding genetic variation in 60,706 humans
- Author
-
Jack A. Kosmicki, Mark A. DePristo, Mark I. McCarthy, Patrick F. Sullivan, Laramie E. Duncan, Ryan Poplin, David Neil Cooper, Mitja I. Kurki, Aarno Palotie, Hong-Hee Won, Dermot P.B. McGovern, John Danesh, Jose C. Florez, Grace Tiao, Anne H. O’Donnell-Luria, Timothy Fennell, Gad Getz, Douglas M. Ruderfer, Joanne Berghout, Mark J. Daly, Monkol Lek, Daniel P. Howrigan, Stacey Gabriel, Daniel P. Birnbaum, Ami Levy Moonshine, Michael Boehnke, Ben Weisburd, Ruth McPherson, Christine Stevens, Dongmei Yu, Sekar Kathiresan, Andrew J. Hill, James G. Wilson, James S. Ware, Hugh Watkins, Benjamin M. Neale, Khalid Shakir, David Altshuler, María Teresa Tusié-Luna, Lorena Orozco, James Zou, Samuel A. Rose, Menachem Fromer, Jeremiah M. Scharf, Daniel G. MacArthur, Namrata Gupta, Pamela Sklar, Eric Vallabh Minikel, Steven A. McCarroll, Jaakko Tuomilehto, Jackie Goldstein, Ming T. Tsuang, Stacey Donnelly, Konrad J. Karczewski, Fengmei Zhao, Stephen J. Glatt, Ron Do, Nicole A. Deflaux, Adam Kiezun, Emma Pierce-Hoffman, Markku Laakso, Beryl B. Cummings, Pradeep Natarajan, Danish Saleheen, Karol Estrada, Peter D. Stenson, Manuel A. Rivas, Diego Ardissino, Kaitlin E. Samocha, Gina M. Peloso, Laura D. Gauthier, Eric Banks, Brett Thomas, Shaun Purcell, Taru Tukiainen, Valentin Ruano-Rubio, Christina M. Hultman, Jason Flannick, Roberto Elosua, Complex Trait Genetics, Amsterdam Neuroscience - Complex Trait Genetics, Institute for Molecular Medicine Finland, Aarno Palotie / Principal Investigator, Jaakko Tuomilehto Research Group, Department of Public Health, Clinicum, Genomics of Neurological and Neuropsychiatric Disorders, Danesh, John [0000-0003-1158-6791], Apollo - University of Cambridge Repository, Wellcome Trust, and The Academy of Medical Sciences
- Subjects
0301 basic medicine ,Proteome ,DNA Mutational Analysis ,Datasets as Topic ,Human genetic variation ,GUIDELINES ,0302 clinical medicine ,Exome Aggregation Consortium ,SEQUENCE VARIANTS ,Coding region ,2.1 Biological and endogenous factors ,Exome ,Aetiology ,MUTATION ,Genetics ,0303 health sciences ,Multidisciplinary ,HUMAN-DISEASE ,NETWORKS ,Multidisciplinary Sciences ,Phenotype ,Mutation (genetic algorithm) ,Science & Technology - Other Topics ,Biotechnology ,General Science & Technology ,Genomics ,Computational biology ,Biology ,DNA sequencing ,03 medical and health sciences ,Rare Diseases ,Clinical Research ,Genetic variation ,Humans ,Genetic Testing ,Gene ,030304 developmental biology ,Science & Technology ,Human Genome ,HUMAN-POPULATION HISTORY ,Genetic Variation ,FRAMEWORK ,R1 ,EVOLUTION ,030104 developmental biology ,DISCOVERY ,Sample Size ,Generic health relevance ,3111 Biomedicine ,Genètica humana -- Variació ,030217 neurology & neurosurgery - Abstract
SummaryLarge-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC). The resulting catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We show that this catalogue can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 72% of which have no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human “knockout” variants in protein-coding genes.
- Published
- 2016
7. Interactive Analytics for Very Large Scale Genomic Data
- Author
-
Somalee Datta, Jonathan Bingham, Nicole A. Deflaux, Philip S. Tsao, Michael Snyder, Cuiping Pan, and Gregory McInnes
- Subjects
0303 health sciences ,Computer science ,business.industry ,Scale (chemistry) ,Shell (computing) ,Cloud computing ,Data science ,03 medical and health sciences ,0302 clinical medicine ,Analytics ,Scalability ,Ecosystem ,business ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. Here we present interactive analytics using public cloud infrastructure and distributed computing database Dremel and developed according to the standards of Global Alliance for Genomics and Health, to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate that such computing paradigms can provide orders of magnitude faster turnaround for common analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds.
- Published
- 2015
- Full Text
- View/download PDF
8. Simulating context-driven activity cascades in online social networks on the google exacycle platform
- Author
-
Miao Sui, Arun V. Sathanur, Nicole A. Deflaux, Michael D. Tyka, and Vikram Jandhyala
- Subjects
Computer science ,Human–computer interaction ,business.industry ,Context (language use) ,Artificial intelligence ,Machine learning ,computer.software_genre ,business ,computer - Published
- 2014
9. Systematic Analysis of Challenge-Driven Improvements in Molecular Prognostic Models for Breast Cancer
- Author
-
Michael R. Kellen, Oscar M. Rueda, Christina Curtis, Brigham H. Mecham, Adam A. Margolin, Erhan Bilal, Hege G. Russnes, Craig Citro, Veronica O. Vang, Lara M. Mangravite, Joseph L. Hellerstein, Tyler Pirtle, Stephen H. Friend, Xavier Schildwachter, Gustavo Stolovitzky, Nicole A. Deflaux, Daehoon Park, Lars Ottestad, Anne Lise Børresen-Dale, Carlos Caldas, Lamia Youseff, Matthew D. Furia, Samuel Aparicio, Ben Sauerwine, Thea Norman, Hans Kristian Moen Vollan, Bruce Hoff, Justin Guinney, Erich Huang, and Vessela N. Kristensen
- Subjects
Oncology ,medicine.medical_specialty ,business.industry ,MEDLINE ,General Medicine ,medicine.disease ,Bioinformatics ,Data set ,Breast cancer ,Open source ,Internal medicine ,medicine ,Prognostics ,skin and connective tissue diseases ,business ,Clinical risk factor ,Survival analysis ,Prognostic models - Abstract
Although molecular prognostics in breast cancer are among the most successful examples of translating genomic analysis to clinical applications, optimal approaches to breast cancer clinical risk prediction remain controversial. The Sage Bionetworks–DREAM Breast Cancer Prognosis Challenge (BCC) is a crowdsourced research study for breast cancer prognostic modeling using genome-scale data. The BCC provided a community of data analysts with a common platform for data access and blinded evaluation of model accuracy in predicting breast cancer survival on the basis of gene expression data, copy number data, and clinical covariates. This approach offered the opportunity to assess whether a crowdsourced community Challenge would generate models of breast cancer prognosis commensurate with or exceeding current best-in-class approaches. The BCC comprised multiple rounds of blinded evaluations on held-out portions of data on 1981 patients, resulting in more than 1400 models submitted as open source code. Participants then retrained their models on the full data set of 1981 samples and submitted up to five models for validation in a newly generated data set of 184 breast cancer patients. Analysis of the BCC results suggests that the best-performing modeling strategy outperformed previously reported methods in blinded evaluations; model performance was consistent across several independent evaluations; and aggregating community-developed models achieved performance on par with the best-performing individual models.
- Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.