Back to Search Start Over

Parallel membership queries on very large scientific data sets using bitmap indexes

Authors :
Yildiz, B
Yildiz, B
Wu, K
Byna, S
Shoshani, A
Yildiz, B
Yildiz, B
Wu, K
Byna, S
Shoshani, A
Source :
Concurrency and Computation: Practice and Experience; vol 31, iss 15, e5157-e5157; 1532-0626
Publication Year :
2019

Abstract

Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating-point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word-Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.

Details

Database :
OAIster
Journal :
Concurrency and Computation: Practice and Experience; vol 31, iss 15, e5157-e5157; 1532-0626
Notes :
Concurrency and Computation: Practice and Experience vol 31, iss 15, e5157-e5157 1532-0626
Publication Type :
Electronic Resource
Accession number :
edsoai.on1287307790
Document Type :
Electronic Resource