1. precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions
- Author
-
Calvin Hung, Jean-Rémi Trotta, Severine Catreux, LinQi Tang, Kübra Narcı, Alexey Kolesnikov, Qian Liu, Mian Umair Ahsan, Emily Boja, Samuel T. Westreich, Özem Kalay, Andrew Carroll, Sinem Demirkaya-Budak, Ivan E. Johnson, Richard J. C. Brown, Benedict Paten, Howard Yang, Amit Jain, Genís Parra, José M. Lorenzo-Salazar, Ezekiel J. Maier, Mathieu Bourgey, ChouXian Ma, Justin Wagner, Miten Jain, Alexey Dolgoborodov, Mike Ruehle, Konstantinos Kyriakidis, David Jáspez, Omar Serang, Vladimir Semenyuk, Fritz J. Sedlazeck, Jordi Morata, Marghoob Mohiyuddin, Christian Brueffer, Chirag Jain, Li Tai Fang, Robert Eveleigh, Gunjan Baid, Sidharth Goel, Raul Tonda, Gungor Budak, Sarah H. Stephens, Maria Nattestad, ShaoWei Zhang, Deniz Turgut, Hanying Feng, Elaine Johanson, Luoqi Chen, H. Serhat Tetikol, Rami Mehio, Adrián Muñoz-Barrera, Gen Li, Guillaume Bourque, Duygu Kabakci-Zorlu, YuanPing Du, Trevor Pesout, Zhipan Li, Varun Jain, Nathan D. Olson, Sayed Mohammad Ebrahim Sahraeian, Anish G. Prasanna, Kishwar Shafin, Jennifer McDaniel, Bryan R. Lajoie, Pi-Chuan Chang, Justin M. Zook, Kai Wang, Cooper Roddey, Andigoni Malousi, Luis A. Rubio-Rodríguez, Carlos Flores, and Elif Arslan
- Subjects
FASTQ format ,Identification (information) ,medicine.medical_specialty ,chemistry.chemical_compound ,chemistry ,Computer science ,medicine ,Benchmark (computing) ,Medical genetics ,Nanopore sequencing ,Computational biology ,Genome ,DNA - Abstract
SummaryThe precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, and ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.
- Published
- 2020
- Full Text
- View/download PDF