Back to Search
Start Over
Valection: Design Optimization for Validation and Verification Studies
- Publication Year :
- 2018
- Publisher :
- Cold Spring Harbor Laboratory, 2018.
-
Abstract
- BackgroundPlatform-specific error profiles necessitate confirmatory studies where predictions made on data generated using one technology are additionally verified by processing the same samples on an orthogonal technology. In disciplines that rely heavily on high-throughput data generation, such as genomics, reducing the impact of false positive and false negative rates in results is a top priority. However, verifying all predictions can be costly and redundant, and testing a subset of findings is often used to estimate the true error profile. To determine how to create subsets of predictions for validation that maximize inference of global error profiles, we developed Valection, a software program that implements multiple strategies for the selection of verification candidates.ResultsTo evaluate these selection strategies, we obtained 261 sets of somatic mutation calls from a single-nucleotide variant caller benchmarking challenge where 21 teams competed on whole-genome sequencing datasets of three computationally-simulated tumours. By using synthetic data, we had complete ground truth of the tumours’ mutations and, therefore, we were able to accurately determine how estimates from the selected subset of verification candidates compared to the complete prediction set. We found that selection strategy performance depends on several verification study characteristics. In particular the verification budget of the experiment (i.e. how many candidates can be selected) is shown to influence estimates.ConclusionsThe Valection framework is flexible, allowing for the implementation of additional selection algorithms in the future. Its applicability extends to any discipline that relies on experimental verification and will benefit from the optimization of verification candidate selection.
- Subjects :
- 0303 health sciences
business.industry
Test data generation
Computer science
Inference
Benchmarking
computer.software_genre
Set (abstract data type)
03 medical and health sciences
0302 clinical medicine
Software
Data mining
business
computer
030217 neurology & neurosurgery
Selection (genetic algorithm)
030304 developmental biology
Verification and validation
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....aad630e2a5cc3a94360e28f05ac43525
- Full Text :
- https://doi.org/10.1101/254839