Back to Search
Start Over
AutoRepair: an automatic repairing approach over multi-source data
- Source :
- Knowledge and Information Systems. 61:227-257
- Publication Year :
- 2018
- Publisher :
- Springer Science and Business Media LLC, 2018.
-
Abstract
- Truth discovery methods and rule-based data repairing methods are two classic lines of approaches to improve data quality in the field of database. Truth discovery methods resolve the multi-source conflicts for the same entity by estimating the reliabilities of different source, while rule-based data repairing methods resolve the inconsistencies among different entities using integrity constraints. However, both lines of methods suffer unsatisfactory performances due to the lacking of enough evidence. In this paper, we propose AutoRepair, a novel automatic multi-source data repairing approach to enrich the evidence by taking the advantages of truth discovery and data repairing. We use functional dependency, one of the most common types of constraints, to detect the violations, and use the source reliability as evidence to discover and repair the errors among these violations. At the same time, the repaired results are used to estimate the source reliability. As the source reliability is unknown in advance, we model the process as an iterative framework to ensure better performance. Extensive experiments are conducted on both simulated and real-world datasets. The results clearly demonstrate the advantages of our approach, which outperform both recent truth discovery and rule-based data repairing methods.
- Subjects :
- Computer science
Process (engineering)
02 engineering and technology
Iterative framework
computer.software_genre
Field (computer science)
Human-Computer Interaction
Artificial Intelligence
Hardware and Architecture
020204 information systems
Data integrity
Data quality
Multi source data
0202 electrical engineering, electronic engineering, information engineering
Unsupervised learning
020201 artificial intelligence & image processing
Data mining
computer
Software
Reliability (statistics)
Information Systems
Subjects
Details
- ISSN :
- 02193116 and 02191377
- Volume :
- 61
- Database :
- OpenAIRE
- Journal :
- Knowledge and Information Systems
- Accession number :
- edsair.doi...........54549493eb26e014554a48b50e348c13
- Full Text :
- https://doi.org/10.1007/s10115-018-1284-9