1. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing
- Author
-
Kenneth Idler, Andreas Scherer, Charles Lu, Timothy K. McDaniel, Penelope Duerken-Hughes, K J. Langenbach, Seta Stanbouly, Charles Wang, Victoria Zismann, Keyur Talsania, Leming Shi, Margaret C. Cam, Shamoni Maheshwari, Zhipan Li, Luyao Ren, Petr Vojta, Mehdi Pirooznia, Jonathan J Keats, Rasika Kalamegham, Howard Jacob, Bao Tran, Liz Kerrigan, Baitang Ning, Ene Reimann, Jiri Drabek, Eric F. Donaldson, Zhaowei Yang, Sayed Mohammad Ebrahim Sahraeian, Daoud Meerzaman, Marc Sultan, Jessica Nordlund, Tsai-wei Shen, Sulev Kõks, Christopher E. Mason, Yunfei Guo, Winnie S. Liang, Claudia Catalanotti, Jeffrey M. Trent, Ying Yu, Roderick V. Jensen, Huixiao Hong, Malcolm Moos, Wenming Xiao, Stephen T. Sherry, Jonathan Foox, Joe Shuga, Hugo Y. K. Lam, Chunlin Xiao, Lijing Yao, Li Tai Fang, Wanqiu Chen, Marghoob Mohiyuddin, Monika Mehta, Rebecca Kusko, Roberta Maestro, Yongmei Zhao, Jonathan Adkins, Gary P. Schroth, Daniel Butler, Yuliya Kriga, Ogan D Abaan, Erich Jaeger, Yuanting Zheng, Daniela Gasparotto, Ulrika Liljedahl, Tiffany Hung, Eric Peters, Erica Tassone, Maryellen de Mars, Cu Nguyen, Lei Song, Bin Zhu, Weida Tong, Zivana Tezak, Justin B. Lack, Virginie Petitjean, Jyoti Shetty, Jing Li, and Zhong Chen
- Subjects
DNA Mutational Analysis ,Biomedical Engineering ,Datasets as Topic ,Breast Neoplasms ,Bioengineering ,Genomics ,Computational biology ,Biology ,Applied Microbiology and Biotechnology ,Somatic evolution in cancer ,Genome ,Article ,Germline ,Cell Line, Tumor ,medicine ,Humans ,Whole genome sequencing ,Whole Genome Sequencing ,High-Throughput Nucleotide Sequencing ,Reproducibility of Results ,Cancer ,Benchmarking ,Reference Standards ,medicine.disease ,genomic DNA ,Germ Cells ,Mutation ,Molecular Medicine ,Biotechnology - Abstract
The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor–normal genomic DNA (gDNA) samples derived from a breast cancer cell line—which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations—and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking ‘tumor-only’ or ‘matched tumor–normal’ analyses.
- Published
- 2021