1. statSuma: automated selection and performance of statistical comparisons for microbiome studies
- Author
-
Leigh Rj, Richard Murphy, and Fiona Walsh
- Subjects
business.industry ,Computer science ,Variance (accounting) ,Python (programming language) ,Data structure ,Machine learning ,computer.software_genre ,Plot (graphics) ,Test (assessment) ,Software ,Pairwise comparison ,Artificial intelligence ,business ,computer ,Statistical hypothesis testing ,computer.programming_language - Abstract
There is a reproducibility crisis in scientific studies. Some of these crises arise from incorrect application of statistical tests to data that follow inappropriate distributions, have inconsistent equivariance, or have very small sample sizes. As determining which test is most appropriate for all data in a multicategorical study (such as comparing taxa between sites in microbiome studies), we present statsSuma, an interactive Python notebook (which can be run from any desktop computer using the Google Colaboratory web service) and does not require a user to have any programming experience. This software assesses underlying data structures in a given dataset to advise what pairwise or listwise statistical procedure would be best suited for all data. As some users may be interested in further mining specific trends, statSuma performs 5 different two-tailed pairwise tests (Student’s t-test, Welch’s t-test, Mann-Whitney U-test, Brunner-Munzel test, and a pairwise Kruskal-Wallis H-test) and advises the best test for each comparison. This software also advises whether ANOVA or a multicategorical Kruskal-Wallis H-test is most appropriate for a given dataset and performs both procedures. A data distribution-vs-Gaussian distribution plot is produced for each taxon at each site and a variance plot between all combinations of 2 taxa at each site are produced so Gaussian tests and variance tests can be visually confirmed alongside associated statistical determinants.
- Published
- 2021