1. Benchmarking challenging small variants with linked and long reads.
- Author
-
Wagner, Justin, Wagner, Justin, Olson, Nathan, Harris, Lindsay, Khan, Ziad, Farek, Jesse, Mahmoud, Medhat, Stankovic, Ana, Kovacevic, Vladimir, Yoo, Byunggil, Miller, Neil, Rosenfeld, Jeffrey, Ni, Bohan, Zarate, Samantha, Kirsche, Melanie, Aganezov, Sergey, Schatz, Michael, Narzisi, Giuseppe, Byrska-Bishop, Marta, Clarke, Wayne, Evani, Uday, Markello, Charles, Shafin, Kishwar, Zhou, Xin, Sidow, Arend, Bansal, Vikas, Ebert, Peter, Marschall, Tobias, Lansdorp, Peter, Hanlon, Vincent, Mattsson, Carl-Adam, Barrio, Alvaro, Fiddes, Ian, Xiao, Chunlin, Fungtammasan, Arkarachai, Chin, Chen-Shan, Wenger, Aaron, Rowell, William, Sedlazeck, Fritz, Carroll, Andrew, Salit, Marc, Zook, Justin, Wagner, Justin, Wagner, Justin, Olson, Nathan, Harris, Lindsay, Khan, Ziad, Farek, Jesse, Mahmoud, Medhat, Stankovic, Ana, Kovacevic, Vladimir, Yoo, Byunggil, Miller, Neil, Rosenfeld, Jeffrey, Ni, Bohan, Zarate, Samantha, Kirsche, Melanie, Aganezov, Sergey, Schatz, Michael, Narzisi, Giuseppe, Byrska-Bishop, Marta, Clarke, Wayne, Evani, Uday, Markello, Charles, Shafin, Kishwar, Zhou, Xin, Sidow, Arend, Bansal, Vikas, Ebert, Peter, Marschall, Tobias, Lansdorp, Peter, Hanlon, Vincent, Mattsson, Carl-Adam, Barrio, Alvaro, Fiddes, Ian, Xiao, Chunlin, Fungtammasan, Arkarachai, Chin, Chen-Shan, Wenger, Aaron, Rowell, William, Sedlazeck, Fritz, Carroll, Andrew, Salit, Marc, and Zook, Justin
- Abstract
Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.
- Published
- 2022