51. Don’t split your data
- Author
-
Maria Feychting, Henrik Källberg, Anders Ahlbom, and Lars Alfredsson
- Subjects
False positive finding ,Epidemiology ,business.industry ,Bayesian probability ,Bayes Theorem ,Sample (statistics) ,Set (abstract data type) ,Bayes' theorem ,Bias ,Full data ,Prior probability ,Statistics ,Humans ,Medicine ,Statistical analysis ,business ,Genome-Wide Association Study - Abstract
False positive findings are a common problem in whole genome association studies. In this commentary we show that nothing is gained by randomly splitting a data sample to two equal sized subsets, where the first data subset is used for explorative purposes and the other sub set is used to confirm the findings in the first subset. We compare the random splitting procedure to using the full data sample for analysis, by using a Bayesian perspective with consideration taken to prior probability of a false positive finding.
- Published
- 2010
- Full Text
- View/download PDF