Back to Search Start Over

A fast diagonal distance metric learning approach for large-scale datasets

Authors :
Yi Peng
Gang Kou
Tie Li
Philip S. Yu
Source :
Information Sciences. 571:225-245
Publication Year :
2021
Publisher :
Elsevier BV, 2021.

Abstract

Distance metric learning (DML) aims to learn distance metrics that reflect the interactions between features and labels. Due to the high computational complexity, existing DML models are unsuitable for large-scale datasets. This study proposes a DML approach for large-scale problems by reducing the number of variables, utilizing sparse structures of the optimization problems, and taking advantage of large-scale computation platforms. The proposed approach treats DML as a linear space transformation problem and suggests that a full DML matrix can be approximated by a diagonal matrix in many cases. We solve the diagonal DML problem along with its l 1 and l 2 regularizations via linear and quadratic programming. To facilitate large-scale learning problems, we design a MapReduce framework to build triplets, which are encapsulations of triple data points used for the optimization problem, and develop a weighting mechanism for triplets according to their contributions to the whole distance distortion. Experiments show that the proposed approach is fast in large-scale DML applications with comparable accuracy to much more time-consuming full matrix models. Since the approach is implemented with the Scala language based on the Spark platform, it can be used directly by productive Java applications, which makes it highly practical for large-scale datasets.

Details

ISSN :
00200255
Volume :
571
Database :
OpenAIRE
Journal :
Information Sciences
Accession number :
edsair.doi...........067781d01dc23a68ceb9b20fd72e1297
Full Text :
https://doi.org/10.1016/j.ins.2021.04.077