Back to Search
Start Over
A fast diagonal distance metric learning approach for large-scale datasets
- Source :
- Information Sciences. 571:225-245
- Publication Year :
- 2021
- Publisher :
- Elsevier BV, 2021.
-
Abstract
- Distance metric learning (DML) aims to learn distance metrics that reflect the interactions between features and labels. Due to the high computational complexity, existing DML models are unsuitable for large-scale datasets. This study proposes a DML approach for large-scale problems by reducing the number of variables, utilizing sparse structures of the optimization problems, and taking advantage of large-scale computation platforms. The proposed approach treats DML as a linear space transformation problem and suggests that a full DML matrix can be approximated by a diagonal matrix in many cases. We solve the diagonal DML problem along with its l 1 and l 2 regularizations via linear and quadratic programming. To facilitate large-scale learning problems, we design a MapReduce framework to build triplets, which are encapsulations of triple data points used for the optimization problem, and develop a weighting mechanism for triplets according to their contributions to the whole distance distortion. Experiments show that the proposed approach is fast in large-scale DML applications with comparable accuracy to much more time-consuming full matrix models. Since the approach is implemented with the Scala language based on the Spark platform, it can be used directly by productive Java applications, which makes it highly practical for large-scale datasets.
- Subjects :
- Information Systems and Management
Optimization problem
Computational complexity theory
Computer science
Linear space
05 social sciences
Diagonal
050301 education
02 engineering and technology
Computer Science Applications
Theoretical Computer Science
Matrix (mathematics)
Artificial Intelligence
Control and Systems Engineering
Spark (mathematics)
Diagonal matrix
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Quadratic programming
0503 education
Algorithm
Computer Science::Databases
Software
Subjects
Details
- ISSN :
- 00200255
- Volume :
- 571
- Database :
- OpenAIRE
- Journal :
- Information Sciences
- Accession number :
- edsair.doi...........067781d01dc23a68ceb9b20fd72e1297
- Full Text :
- https://doi.org/10.1016/j.ins.2021.04.077