Back to Search
Start Over
A Theoretical Model for Estimating Entity Resolution Costs in Cloud Computing Environments
- Source :
- Discrete Mathematics and Theoretical Computer Science, Discrete Mathematics and Theoretical Computer Science, DMTCS, In press, 1, In press
- Publication Year :
- 2018
- Publisher :
- HAL CCSD, 2018.
-
Abstract
- International audience; Entity resolution is the task of identifying duplicate entities in a dataset or multiple datasets. In the era of Big Data, this task has gained notorious attention due to the intrinsic quadratic complexity of the problem in relation to the size of the dataset. In practice, this task can be outsourced to a cloud service, and thus, a service customer may be interested in estimating the costs of an entity resolution solution before executing it. Since the execution time of an entity resolution solution depends on a combination of various algorithms, their respective parameter values and the employed cloud infrastructure, in practice it is hard to perform an a priori estimation of infrastructure costs for executing an entity resolution task. Besides estimating customer costs, the estimation of entity resolution costs is also important to evaluate if a set of ER parameter values can be employed to execute a task that meets predefined time and budget restrictions. Aiming to tackle these challenges, we formalize the problem of estimating ER costs taking into account the main parameters that may influence the execution time of the ER task. We also propose an algorithm, denominated T BF , for evaluating the feasibility of ER parameter values, given a set of predefined customer restrictions. Since the efficacy of the proposed algorithm is strongly tied to the accuracy provided by the theoretical estimations of the ER costs, we also present a number of guidelines that can be further explored to improve even more the efficacy of the proposed model.
Details
- Language :
- English
- ISSN :
- 14627264 and 13658050
- Database :
- OpenAIRE
- Journal :
- Discrete Mathematics and Theoretical Computer Science, Discrete Mathematics and Theoretical Computer Science, DMTCS, In press, 1, In press
- Accession number :
- edsair.dedup.wf.001..d6f3348ef2605a34d7ee82287b87827f