Start Over

Controlling the False Split Rate in Tree-Based Aggregation.

Authors :: Shao, Simeng
Bien, Jacob
Javanmard, Adel
Source :: Journal of the American Statistical Association. Jul2024, p1-22. 22p. 7 Illustrations.
Publication Year :: 2024
Abstract: AbstractIn many domains, data measurements can naturally be associated with the leaves of a tree, expressing the relationships among these measurements. For example, companies belong to industries, which in turn belong to ever coarser divisions such as sectors; microbes are commonly arranged in a taxonomic hierarchy from species to kingdoms; street blocks belong to neighborhoods, which in turn belong to larger-scale regions. The problem of tree-based aggregation that we consider in this paper asks which of these tree-defined subgroups of leaves should really be treated as a single entity and which of these entities should be distinguished from each other.We introduce the <italic>false split rate</italic>, an error measure that describes the degree to which subgroups have been split when they should not have been. While expressible as the false discovery rate in a special case, we show that these measures can be quite different for the general tree structures common in our setting. We then propose a multiple hypothesis testing algorithm for tree-based aggregation, which we prove controls this error measure. We focus on two main examples of tree-based aggregation, one which involves aggregating means and the other which involves aggregating regression coefficients. [ABSTRACT FROM AUTHOR]