1. ERABQS: entity resolution based on active machine learning and balancing query strategy.
- Author
-
Mourad, Jabrane, Hiba, Tabbaa, Yassir, Rochd, and Imad, Hafidi
- Abstract
Entity Resolution (ER) is a crucial process in the field of data management and integration. The primary goal of ER is to identify different profiles (or records) that refer to the same real-world entity across databases. The challenging problem is that labeling a large sample of profiles can be very expensive and time-consuming. Active Machine Learning (ActiveML) addresses this issue by selecting the most representative or informative profiles pairs to be labeled. The informativeness is determined by the capacity to diminish the uncertainty of the model. Conversely, representativeness evaluates whether a selected instance effectively reflects the overall input patterns of unlabeled data. Traditional ActiveML techniques typically rely on one strategy, Which may severely restrict the performance of the ActiveML process and lead to slow convergence. Especially in ER problems with a lack of initial training data. In this paper, we overcame this issue by inventing an approach for balancing the two above strategies. The implemented solution named EBEES (Epsilon-based Balancing Exploration and Exploitation Strategy), Which contains two variations: Adaptive- ϵ and ϵ -decreasing. We evaluated the EBEES on twelve datasets. Comparing the EBEES strategy against the state-of-the-art methods, without an initial training data, showed an enhanced performance in terms of F1-score, model stability, and rapid convergence. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF