Back to Search
Start Over
Improving Pearson's chi-squared test: hypothesis testing of distributions -- optimally
- Publication Year :
- 2023
-
Abstract
- Pearson's chi-squared test, from 1900, is the standard statistical tool for "hypothesis testing on distributions": namely, given samples from an unknown distribution $Q$ that may or may not equal a hypothesis distribution $P$, we want to return "yes" if $P=Q$ and "no" if $P$ is far from $Q$. While the chi-squared test is easy to use, it has been known for a while that it is not "data efficient", it does not make the best use of its data. Precisely, for accuracy $\epsilon$ and confidence $\delta$, and given $n$ samples from the unknown distribution $Q$, a tester should return "yes" with probability $>1-\delta$ when $P=Q$, and "no" with probability $>1-\delta$ when $|P-Q|>\epsilon$. The challenge is to find a tester with the \emph{best} tradeoff between $\epsilon$, $\delta$, and $n$. We introduce a new tester, efficiently computable and easy to use, which we hope will replace the chi-squared tester in practical use. Our tester is found via a new non-convex optimization framework that essentially seeks to "find the tester whose Chernoff bounds on its performance are as good as possible". This tester is $1+o(1)$ optimal, in that the number of samples $n$ needed by the tester is within $1+o(1)$ factor of the samples needed by \emph{any} tester, even non-linear testers (for the setting: accuracy $\epsilon$, confidence $\delta$, and hypothesis $P$). We complement this algorithmic framework with matching lower bounds saying, essentially, that "our tester is instance-optimal, even to $1+o(1)$ factors, to the degree that Chernoff bounds are tight". Our overall non-convex optimization framework extends well beyond the current problem and is of independent interest.
- Subjects :
- Mathematics - Statistics Theory
Subjects
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2310.09408
- Document Type :
- Working Paper