Back to Search
Start Over
Parallel membership queries on very large scientific data sets using bitmap indexes
- Source :
- Concurrency and Computation: Practice and Experience; vol 31, iss 15, e5157-e5157; 1532-0626
- Publication Year :
- 2019
-
Abstract
- Many scientific applications produce very large amounts of data as advances in hardware fuel computing and experimental facilities. Managing and analyzing massive quantities of scientific data is challenging as data are often stored in specific formatted files, such as HDF5 and NetCDF, which do not offer appropriate search capabilities. In this research, we investigated a special class of search capability, called membership query, to identify whether queried elements of a set are members of an attribute. Attributes that naturally have classification values appear frequently in scientific domains such as category and object type as well as in daily life such as zip code and occupation. Because classification attribute values are discrete and require random data access, performing a membership query on a large scientific data set creates challenges. We applied bitmap indexing and parallelization to membership queries to overcome these challenges. Bitmap indexing provides high performance not only for low cardinality attributes but also for high cardinality attributes, such as floating-point variables, electric charge, or momentum in a particle physics data set, due to compression algorithms such as Word-Aligned Hybrid. We conducted experiments, in a highly parallelized environment, on data obtained from a particle accelerator model and a synthetic data set.
Details
- Database :
- OAIster
- Journal :
- Concurrency and Computation: Practice and Experience; vol 31, iss 15, e5157-e5157; 1532-0626
- Notes :
- Concurrency and Computation: Practice and Experience vol 31, iss 15, e5157-e5157 1532-0626
- Publication Type :
- Electronic Resource
- Accession number :
- edsoai.on1287307790
- Document Type :
- Electronic Resource