Back to Search Start Over

Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity [version 1; peer review: 1 approved, 1 approved with reservations]

Authors :
Evan Bolyen
Matthew R. Dillon
Nicholas A. Bokulich
Jason T. Ladner
Brendan B. Larsen
Crystal M. Hepp
Darrin Lemmer
Jason W. Sahl
Andrew Sanchez
Chris Holdgraf
Chris Sewell
Aakash G. Choudhury
John Stachurski
Matthew McKay
David M. Engelthaler
Michael Worobey
Paul Keim
J. Gregory Caporaso
Author Affiliations :
<relatesTo>1</relatesTo>Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA<br /><relatesTo>2</relatesTo>School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ, USA<br /><relatesTo>3</relatesTo>Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition and Health, ETH Zurich, Switzerland<br /><relatesTo>4</relatesTo>Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA<br /><relatesTo>5</relatesTo>Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA<br /><relatesTo>6</relatesTo>Pathogen and Microbiome Division, Translational Genomics Research Institute, Flagstaff, AZ, USA<br /><relatesTo>7</relatesTo>Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, USA<br /><relatesTo>8</relatesTo>Department of Statistics, University of California at Berkeley, Berkeley, CA, USA<br /><relatesTo>9</relatesTo>Theory and Simulation of Materials, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland<br /><relatesTo>10</relatesTo>Research School of Economics, Australian National University, ACT, Australia
Source :
F1000Research. 9:657
Publication Year :
2020
Publisher :
London, UK: F1000 Research Limited, 2020.

Abstract

The COVID-19 pandemic has led to a rapid accumulation of SARS-CoV-2 genomes, enabling genomic epidemiology on local and global scales. Collections of genomes from resources such as GISAID must be subsampled to enable computationally feasible phylogenetic and other analyses. We present genome-sampler, a software package that supports sampling collections of viral genomes across multiple axes including time of genome isolation, location of genome isolation, and viral diversity. The software is modular in design so that these or future sampling approaches can be applied independently and combined (or replaced with a random sampling approach) to facilitate custom workflows and benchmarking. genome-sampler is written as a QIIME 2 plugin, ensuring that its application is fully reproducible through QIIME 2’s unique retrospective data provenance tracking system. genome-sampler can be installed in a conda environment on macOS or Linux systems. A complete default pipeline is available through a Snakemake workflow, so subsampling can be achieved using a single command. genome-sampler is open source, free for all to use, and available at https://caporasolab.us/genome-sampler. We hope that this will facilitate SARS-CoV-2 research and support evaluation of viral genome sampling approaches for genomic epidemiology.

Details

ISSN :
20461402
Volume :
9
Database :
F1000Research
Journal :
F1000Research
Notes :
[version 1; peer review: 1 approved, 1 approved with reservations]
Publication Type :
Academic Journal
Accession number :
edsfor.10.12688.f1000research.24751.1
Document Type :
software-tool
Full Text :
https://doi.org/10.12688/f1000research.24751.1