Back to Search Start Over

Approximating snowflake metrics by trees

Authors :
William Leeb
Source :
Applied and Computational Harmonic Analysis. 45:405-424
Publication Year :
2018
Publisher :
Elsevier BV, 2018.

Abstract

Tree metrics are encountered throughout pure and applied mathematics. Their simple structure makes them a convenient choice of metric in many applications from machine learning and computer science. At the same time, there is an elegant theory of harmonic analysis with respect to tree metrics that parallels the classical theory. A basic question in this field, which is of both theoretical and practical interest, is how to design efficient algorithms for building trees with good metric properties. In particular, given a finite metric space, we seek a random family of dominating tree metrics approximating the underlying metric in expectation. For general metrics, this problem has been solved: on the one hand, there are finite metric spaces that cannot be approximated by trees without incurring a distortion logarithmic in the size of the space, while the tree construction of Fakcharoenphol, Rao, and Talwar (FRT, 2003) shows how to achieve such a logarithmic error for arbitrary metrics. Since a distortion that grows even logarithmically with the size of the set may be too large for practical use in many settings, one naturally asks if there is a more restricted class of metrics where one can do better. The main result of this paper is that certain random family of trees already studied in the computer science literature, including the FRT trees, can be used to approximate snowflake metrics (metrics raised to a power less than 1) with expected distortion bounded by its doubling dimension and the degree of snowflaking. We also show that without snowflaking, the metric distortion can be bounded by a term logarithmic in the distance being approximated and linear in the dimension. We also present an optimal algorithm for building a single FRT tree, whose running time is bounded independently of all problem parameters other than the number of points. We conclude by demonstrating our theoretical results on a numerical example, and applying them to the approximation of the Earth Mover's Distance between probability distributions.

Details

ISSN :
10635203
Volume :
45
Database :
OpenAIRE
Journal :
Applied and Computational Harmonic Analysis
Accession number :
edsair.doi...........ab0f31a54fbc5a84d55b72f93ac41c86