Back to Search Start Over

AnVILWorkflow: A runnable workflow package for Cloud-implemented bioinformatics analysis pipelines [version 1; peer review: awaiting peer review]

Authors :
Sehyun Oh
Kai Gravel-Pucillo
Marcel Ramos
Michael C. Schatz
Sean Davis
Vincent Carey
Martin Morgan
Levi Waldron
Author Affiliations :
<relatesTo>1</relatesTo>Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, New York, USA<br /><relatesTo>2</relatesTo>Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, New York, USA<br /><relatesTo>3</relatesTo>Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA<br /><relatesTo>4</relatesTo>Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA<br /><relatesTo>5</relatesTo>Departments of Biomedical Informatics and Medicine,, University of Colorado Anschutz School of Medicine, Denver, Colorado, USA<br /><relatesTo>6</relatesTo>Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA<br /><relatesTo>7</relatesTo>Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USA
Source :
F1000Research. 13:1257
Publication Year :
2024
Publisher :
London, UK: F1000 Research Limited, 2024.

Abstract

Advancements in sequencing technologies and the development of new data collection methods produce large volumes of biological data. The Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) provides a cloud-based platform for democratizing access to large-scale genomics data and analysis tools. However, utilizing the full capabilities of AnVIL can be challenging for researchers without extensive bioinformatics expertise, especially for executing complex workflows. We present the AnVILWorkflow R package, which enables the convenient execution of bioinformatics workflows hosted on AnVIL directly from an R environment. AnVILWorkflow simplifies the setup of the cloud computing environment, input data formatting, workflow submission, and retrieval of results through intuitive functions. We demonstrate the utility of AnVILWorkflow for three use cases: bulk RNA-seq analysis with Salmon, metagenomics analysis with bioBakery, and digital pathology image processing with PathML. The key features of AnVILWorkflow include user-friendly browsing of available data and workflows, seamless integration of R and non-R tools within a reproducible analysis pipeline, and accessibility to scalable computing resources without direct management overhead. AnVILWorkflow lowers the barrier to utilizing AnVIL’s resources, especially for exploratory analyses or bulk processing with established workflows. This empowers a broader community of researchers to leverage the latest genomics tools and datasets using familiar R syntax. This package is distributed through the Bioconductor project ( https://bioconductor.org/packages/AnVILWorkflow), and the source code is available through GitHub ( https://github.com/shbrief/AnVILWorkflow).

Details

ISSN :
20461402
Volume :
13
Database :
F1000Research
Journal :
F1000Research
Notes :
[version 1; peer review: awaiting peer review]
Publication Type :
Academic Journal
Accession number :
edsfor.10.12688.f1000research.155449.1
Document Type :
other
Full Text :
https://doi.org/10.12688/f1000research.155449.1