1. Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments
- Author
-
Bryton Fett, Andrew Connell, Louis J. Taylor, Chunyu Zhao, Kyle Bittinger, Jung-Jin Lee, Erik L. Clarke, and Frederic D. Bushman
- Subjects
Data Analysis ,Microbiology (medical) ,Computer science ,Biology ,Microbiology ,Extensibility ,Genome ,lcsh:Microbial ecology ,03 medical and health sciences ,Software ,0302 clinical medicine ,Pipeline ,Computer cluster ,Data Mining ,Humans ,Preprocessor ,Nucleotide ,030304 developmental biology ,chemistry.chemical_classification ,0303 health sciences ,030306 microbiology ,business.industry ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Sunbeam ,Quality control ,Sequence Analysis, DNA ,Modular design ,Shotgun metagenomic sequencing ,Workflow ,chemistry ,Metagenomics ,lcsh:QR100-130 ,Software engineering ,business ,Host (network) ,Algorithms ,030217 neurology & neurosurgery - Abstract
BackgroundAnalysis of mixed microbial communities using metagenomic sequencing experiments requires multiple preprocessing and analytical steps to interpret the microbial and genetic composition of samples. Analytical steps include quality control, adapter trimming, host decontamination, metagenomic classification, read assembly, and alignment to reference genomes.ResultsWe present a modular and user-extensible pipeline called Sunbeam that performs these steps in a consistent and reproducible fashion. It can be installed in a single step, does not require administrative access to the host computer system, and can work with most cluster computing frameworks. We also introduce Komplexity, a software tool to eliminate potentially problematic, low-complexity nucleotide sequences from metagenomic data. Unique components of the Sunbeam pipeline include direct analysis of data from NCBI SRA and an easy-to-use extension framework that enables users to add custom processing or analysis steps directly to the workflow. The pipeline and its extension framework are well documented, in routine use, and regularly updated.ConclusionsSunbeam provides a foundation to build more in-depth analyses and to enable comparisons in metagenomic sequencing experiments by removing problematic low complexity reads and standardizing post-processing and analytical steps. Sunbeam is written in Python using the Snakemake workflow management software and is freely available at github.com/sunbeam-labs/sunbeam under the GPLv3.
- Published
- 2018
- Full Text
- View/download PDF