Back to Search Start Over

Efficient external sorting in DuckDB

Authors :
Kuiper, L.N. (Laurens)
Raasveldt, M. (Mark)
Mühleisen, H.F. (Hannes)
Kuiper, L.N. (Laurens)
Raasveldt, M. (Mark)
Mühleisen, H.F. (Hannes)
Publication Year :
2022

Abstract

Interactive data analysis is often conveniently done on personal computers that have limited memory. Current analytical data management systems rely almost exclusively on main memory for computation. When the data size exceeds the memory limit, many systems cannot complete queries or resort to an external execution strategy that assumes a high I/O cost. These strategies are often much slower than the in-memory strategy. However, I/O cost has gone down: Most modern laptops have fast NVMe storage. We believe that the difference between in-memory and external does not have to be this big. We implement a parallel external sorting operator in DuckDB that demonstrates this. Experimental results with our implementation show that even when the data size far exceeds the memory size, the performance loss is negligible. From this result, we conclude that it is possible to have a graceful degradation from in-memory to external sorting.

Details

Database :
OAIster
Notes :
English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1342652518
Document Type :
Electronic Resource