Canon, Shane, Christianson, Danielle, Duncan, William, Eloe-Fadrosh, Emiley, Fagnan, Kjiersten, Hays, David, Huntemann, Marcel, Lebedeva, Sofya, Miller, Kayd, Miller, Mark, Mouncey, Nigel, Mungall, Chris, Reddy, Tbk, Rudolph, Marisa, Sarrafan, Setareh, Sundaramurthi, Jagadish Chandrabose, Unni, Deepak, Vangay, Pajau, Wood-Charlson, Elisha, Ahmed, Faiza, Baumes, Jeffrey, Davis, Brandon, Anubhav, Fnu, Borkhum, Mark, Bramer, Lisa, Corilo, Yuri, Lipton, Mary, Mans, Douglas, McCue, Lee Ann, Millard, David, Piehowski, Paul, Prymolenna, Anastasiya, Purvine, Samuel, Richardson, Rachel, Smith, Montana, Stratton, Kelly, Babinski, Michal, Chain, Patrick, Davenport, Karen, Flynn, Mark, Hu, Bin, Kelliher, Julia, Li, Po-E, Lo, Chien-Chi, Jackson, Elais Player, Shakya, Migun, Xu, Yan, Drake, Meghan, Martin, Stanton, Wilson, Bruce, and Winston, Donny
The cross-cutting nature of microbiome research in environmental sciences, health, agriculture, energy, and natural and built environments, and the velocity at which microbiome data are generated has far outpaced current data infrastructure. Resources and solutions for collection, processing, and distribution of these data in an effective, uniform, accessible, and reproducible manner, are lacking even at the largest data centers. The National Microbiome Data Collaborative (NMDC) is a pilot initiative launched to support microbiome data exploration and discovery through a collaborative, integrative science gateway. The NMDC is tackling infrastructure challenges in microbiome data science by making use of distributed computational resources available across four Department of Energy National Laboratories, Lawrence Berkeley National Laboratory (LBNL), Los Alamos National Laboratory (LANL), Pacific Northwest National Laboratory (PNNL) and Oak Ridge National Laboratory (ORNL). The NMDC team aims to deliver a set of unique microbiome data science capabilities, which include: leveraging existing ontology mapping software and curation resources to enable automated annotation of standardized metadata; developing workflows for metagenome, metatranscriptome, metaproteome, and metabolomics data processing leveraging HPC systems, and integrating the execution of these pipelines to produce NMDC-compliant data products; developing data registration, indexing, and access services to link data through a suite of publicly available APIs; and developing communication and sustainability strategies to assess current and future needs and capabilities to empower users, and promote the NMDC to the larger scientific community. To ensure that the NMDC data ecosystem supports and evolves with the needs of the microbiome research community, the NMDC team uses a community-centered design approach to seek feedback from the scientific research community throughout its phases of iterative development. Community feedback has informed the priorities of the NMDC data standards, bioinformatic workflows, and engagement activities, but perhaps the most visible contributions of the research community can be seen through the features and enhancements on the NMDC data portal. Here, we present an overview of the NMDC mission and vision, the distributed data infrastructure, and our community-centered design approach in developing the NMDC data portal.