Back to Search
Start Over
AnalyticDB-V
- Source :
- Proceedings of the VLDB Endowment. 13:3152-3165
- Publication Year :
- 2020
- Publisher :
- Association for Computing Machinery (ACM), 2020.
-
Abstract
- With the explosive growth of unstructured data (such as images, videos, and audios), unstructured data analytics is widespread in a rich vein of real-world applications. Many database systems start to incorporate unstructured data analysis to meet such demands. However, queries over unstructured and structured data are often treated as disjoint tasks in most systems, where hybrid queries ( i.e. , involving both data types) are not yet fully supported. In this paper, we present a hybrid analytic engine developed at Alibaba, named AnalyticDB-V (ADBV), to fulfill such emerging demands. ADBV offers an interface that enables users to express hybrid queries using SQL semantics by converting unstructured data to high dimensional vectors. ADBV adopts the lambda framework and leverages the merits of approximate nearest neighbor search (ANNS) techniques to support hybrid data analytics. Moreover, a novel ANNS algorithm is proposed to improve the accuracy on large-scale vectors representing massive unstructured data. All ANNS algorithms are implemented as physical operators in ADBV, meanwhile, accuracy-aware cost-based optimization techniques are proposed to identify effective execution plans. Experimental results on both public and in-house datasets show the superior performance achieved by ADBV and its effectiveness. ADBV has been successfully deployed on Alibaba Cloud to provide hybrid query processing services for various real-world applications.
- Subjects :
- SQL
business.industry
Computer science
Nearest neighbor search
Interface (computing)
General Engineering
Cloud computing
Unstructured data
02 engineering and technology
Semantics
computer.software_genre
Data type
Analytics
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Data mining
business
computer
computer.programming_language
Subjects
Details
- ISSN :
- 21508097
- Volume :
- 13
- Database :
- OpenAIRE
- Journal :
- Proceedings of the VLDB Endowment
- Accession number :
- edsair.doi...........ed5186fa7fc2783f0e9bea34b5aaa405