1. Knockoffs for exchangeable categorical covariates
- Author
-
Dreassi, Emanuela, Pratelli, Luca, and Rigo, Pietro
- Subjects
Mathematics - Statistics Theory ,Statistics - Methodology ,62E10, 62H05, 60E05, 62J02 - Abstract
Let $X=(X_1,\ldots,X_p)$ be a $p$-variate random vector and $F$ a fixed finite set. In a number of applications, mainly in genetics, it turns out that $X_i\in F$ for each $i=1,\ldots,p$. Despite the latter fact, to obtain a knockoff $\widetilde{X}$ (in the sense of \cite{CFJL18}), $X$ is usually modeled as an absolutely continuous random vector. While comprehensible from the point of view of applications, this approximate procedure does not make sense theoretically, since $X$ is supported by the finite set $F^p$. In this paper, explicit formulae for the joint distribution of $(X,\widetilde{X})$ are provided when $P(X\in F^p)=1$ and $X$ is exchangeable or partially exchangeable. In fact, when $X_i\in F$ for all $i$, there seem to be various reasons for assuming $X$ exchangeable or partially exchangeable. The robustness of $\widetilde{X}$, with respect to the de Finetti's measure $\pi$ of $X$, is investigated as well. Let $\mathcal{L}_\pi(\widetilde{X}\mid X=x)$ denote the conditional distribution of $\widetilde{X}$, given $X=x$, when the de Finetti's measure is $\pi$. It is shown that $$\norm{\mathcal{L}_{\pi_1}(\widetilde{X}\mid X=x)-\mathcal{L}_{\pi_2}(\widetilde{X}\mid X=x)}\le c(x)\,\norm{\pi_1-\pi_2}$$ where $\norm{\cdot}$ is total variation distance and $c(x)$ a suitable constant. Finally, a numerical experiment is performed. Overall, the knockoffs of this paper outperform the alternatives (i.e., the knockoffs obtained by giving $X$ an absolutely continuous distribution) as regards the false discovery rate but are slightly weaker in terms of power., Comment: submitted paper. 24 pages. 5 figures
- Published
- 2024