Back to Search Start Over

NCBench: providing an open, reproducible, transparent, adaptable, and continuous benchmark approach for DNA-sequencing-based variant calling [version 2; peer review: 2 approved with reservations]

Authors :
Friederike Hanssen
Gisela Gabernet
Famke Bäuerle
Bianca Stöcker
Felix Wiegand
Nicholas H. Smith
Christian Mertes
Avirup Guha Neogi
Leon Brandhoff
Anna Ossowski
Janine Altmueller
Kerstin Becker
Andreas Petzold
Marc Sturm
Tyll Stöcker
Sugirthan Sivalingam
Fabian Brand
Axel Schmidt
Andreas Buness
Alexander J. Probst
Susanne Motameny
Johannes Köster
Author Affiliations :
<relatesTo>1</relatesTo>Quantitative Biology Center, Eberhard Karls University Tübingen, Tübingen, Germany<br /><relatesTo>2</relatesTo>M3 Research Center, University Hospital, Tübingen, Germany<br /><relatesTo>3</relatesTo>Institute for Translational Bioinformatics, University Medical Center, Tübingen, Germany<br /><relatesTo>4</relatesTo>Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen, Germany<br /><relatesTo>5</relatesTo>Bioinformatics and Computational Oncology, Institute for Artificial Intelligence in Medicine (IKIM), University Medicine Essen, University of Duisburg-Essen, Essen, Germany<br /><relatesTo>6</relatesTo>TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany<br /><relatesTo>7</relatesTo>Munich Data Science Institute, Technical University of Munich, Munich, Germany<br /><relatesTo>8</relatesTo>Institute of Human Genetics, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, Munich, Germany<br /><relatesTo>9</relatesTo>Cologne Center for Genomics, University of Cologne, Cologne, Germany<br /><relatesTo>10</relatesTo>West German Genome Center - Cologne, University of Cologne, Cologne, Germany<br /><relatesTo>11</relatesTo>Core Facility Genomics, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany<br /><relatesTo>12</relatesTo>Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany<br /><relatesTo>13</relatesTo>DRESDEN-concept Genome Center, TUD Dresden University of Technology, Dresden, Germany<br /><relatesTo>14</relatesTo>Institute of Medical Genetics and Applied Genomics, University Hospital Tuebingen, Tübingen, Germany<br /><relatesTo>15</relatesTo>Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany<br /><relatesTo>16</relatesTo>Institute of Human Genetics, Medical Faculty and University Hospital Düsseldorf, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany<br /><relatesTo>17</relatesTo>Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University of Bonn, Bonn, Germany<br /><relatesTo>18</relatesTo>Institute of Human Genetics, University Hospital of Bonn, Bonn, Germany<br /><relatesTo>19</relatesTo>Core Unit for Bioinformatics Analysis, University Hospital Bonn, Bonn, Germany<br /><relatesTo>20</relatesTo>Environmental Metagenomics, Research Center One Health Ruhr, University Alliance Ruhr, Faculty of Chemistry, University of Duisburg-Essen, Essen, Germany<br /><relatesTo>21</relatesTo>German Cancer Consortium, Essen, Germany
Source :
F1000Research. 12:1125
Publication Year :
2024
Publisher :
London, UK: F1000 Research Limited, 2024.

Abstract

We present the results of the human genomic small variant calling benchmarking initiative of the German Research Foundation (DFG) funded Next Generation Sequencing Competence Network (NGS-CN) and the German Human Genome-Phenome Archive (GHGA). In this effort, we developed NCBench, a continuous benchmarking platform for the evaluation of small genomic variant callsets in terms of recall, precision, and false positive/negative error patterns. NCBench is implemented as a continuously re-evaluated open-source repository. We show that it is possible to entirely rely on public free infrastructure (Github, Github Actions, Zenodo) in combination with established open-source tools. NCBench is agnostic of the used dataset and can evaluate an arbitrary number of given callsets, while reporting the results in a visual and interactive way. We used NCBench to evaluate over 40 callsets generated by various variant calling pipelines available in the participating groups that were run on three exome datasets from different enrichment kits and at different coverages. While all pipelines achieve high overall quality, subtle systematic differences between callers and datasets exist and are made apparent by NCBench.These insights are useful to improve existing pipelines and develop new workflows. NCBench is meant to be open for the contribution of any given callset. Most importantly, for authors, it will enable the omission of repeated re-implementation of paper-specific variant calling benchmarks for the publication of new tools or pipelines, while readers will benefit from being able to (continuously) observe the performance of tools and pipelines at the time of reading instead of at the time of writing.

Details

ISSN :
20461402
Volume :
12
Database :
F1000Research
Journal :
F1000Research
Notes :
Revised Amendments from Version 1 We have added three new coauthors, Felix Wiegand, Famke Bäuerle and Bianca Stöcker, who have improved the NCBench workflow and helped with the revisions (author contributions and funding sections have been updated accordingly). We have added a section about the maintenance of NCBench to the end of the "Evaluation pipeline" section in the manuscript. We have extended and clarified our statement on variant atomization (see section "variant atomization"). We have extended the "Reporting" section in the manuscript to include the F* measure that has been added to the analysis. Apart from those changes in the manuscript, various improvements to the Datavzrd based reports on https://ncbench.github.io have been made, which are listed in detail in the response to the reviewers., , [version 2; peer review: 2 approved with reservations]
Publication Type :
Academic Journal
Accession number :
edsfor.10.12688.f1000research.140344.2
Document Type :
research-article
Full Text :
https://doi.org/10.12688/f1000research.140344.2