1. Large-Scale Uniform Analysis of Cancer Whole Genomes in Multiple Computing Environments
- Author
-
Tijanic N, Solomon Shorser, Rosenberg Mw, Mijalkovic S, Byrne Nj, André Kahles, Rolf Kabbe, Larsson Omberg, Jules Kerssemakers, Olivier Harismendy, Romina Royo, Michael Heinold, Francis Ouellette B, Jonas Demeulemeester, Zhao Chen, Satoru Miyano, Peter J. Campbell, Short C, Ocana D, Oliver Hofmann, Raine Km, April E. Williams, Hong Jh, Mihaiescu Gl, Adam Butler, Denis Yuen, Jongsun Jung, Sergei Yakneen, Kovacevic M, Carolyn M. Hutter, Miyoshi N, Vicente D, Hidewaki Nakagawa, Steven Newhouse, Manuel Prinz, Andy Cafferkey, Johannes Werner, Yong Ho Kim, Nicholson J, Esther Rheinbay, Matthias Schlesner, David Torrents, Ivkovic S, Sung-Hoon Cho, Joachim Weischenfeldt, Lazic Am, Jeon S, Thomas J. Hudson, Julian M. Hess, Fayzullaev N, Nahal Hk, Gad Getz, Peter Van Loo, Dimitri Livitz, Perry, Ivo Buchhalter, Rodriguez Jb, Nagarajan Paramasivam, Michelle Dow, Young-Choon Woo, Ignaty Leshchiner, Paul Flicek, Robert L. Grossman, Jonathan Spring, Jeremiah Wala, Roland Eils, Grace Tiao, Kyle Ellrott, Angela N. Brooks, Heidi J. Sofia, Josep Lluís Gelpí, Barbara Hutter, Francesco Favero, Brian O'Connor, Lucila Ohno-Machado, Peter Clapham, Nastic M, Choi W, De La Vega Fm, L. J. Dursi, Montserrat Puiggròs, Ohi K, Wei Jiao, Brandi N. Davis-Dusenbery, Qian Xiang, Adam J Struck, Gibson B, Ferretti, Claudiu Farcas, Koscher M, Koures A, Lincoln Stein, Keith A. Boroevich, Jan O. Korbel, Gordon Saksena, Radovic P, Christian Lawerenz, Alex Buchanan, Christina K. Yung, Adam Wright, Nuno A. Fonseca, Todd Pihl, Jae H. Kim, Zhining Wang, Boyce R, Miguel Vazquez, Jürgen Eils, Kim H, Junjun Zhang, Seiya Imoto, Kortine Kleinheinz, and Daniel Huebschmann
- Subjects
0303 health sciences ,03 medical and health sciences ,0302 clinical medicine ,Computer science ,030220 oncology & carcinogenesis ,Cancer genome ,Genomics ,Replicate ,Data mining ,computer.software_genre ,Genome ,computer ,030304 developmental biology - Abstract
The International Cancer Genome Consortium (ICGC)’s Pan-Cancer Analysis of Whole Genomes (PCAWG) project aimed to categorize somatic and germline variations in both coding and non-coding regions in over 2,800 cancer patients. To provide this dataset to the research working groups for downstream analysis, the PCAWG Technical Working Group marshalled ~800TB of sequencing data from distributed geographical locations; developed portable software for uniform alignment, variant calling, artifact filtering and variant merging; performed the analysis in a geographically and technologically disparate collection of compute environments; and disseminated high-quality validated consensus variants to the working groups. The PCAWG dataset has been mirrored to multiple repositories and can be located using the ICGC Data Portal. The PCAWG workflows are also available as Docker images through Dockstore enabling researchers to replicate our analysis on their own data.
- Published
- 2017
- Full Text
- View/download PDF