Back to Search Start Over

Encryption-based sub-string matching for privacy-preserving record linkage.

Authors :
Vaiwsri, Sirintra
Ranbaduge, Thilina
Christen, Peter
Source :
Journal of Information Security & Applications. Mar2024, Vol. 81, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Accurate and secure string matching in record linkage is increasingly important in application domains such as bioinformatics, healthcare, and crime detection. Most existing privacy-preserving string matching techniques provide an overall similarity between a pair of strings. As a result, these techniques cannot identify the longest common sub-string between the strings in a pair leading to lower linkage quality, while existing techniques that can identify the longest common sub-string from a pair of strings have long runtimes. While blocking techniques that can be used in the record linkage pipeline improve the time complexity, each string is generally inserted into several blocks making it vulnerable to frequency based attacks. In this paper, we propose two encryption-based approaches to improve the effectiveness and efficiency of string matching in record linkage. Our approaches compare strings based on their lengths of sub-strings. In the first approach, we encrypt the sub-string lengths into individual ciphertexts and compare a pair of ciphertexts based on the corresponding sub-string. In the second approach, we encrypt multiple lengths of sub-strings into a single ciphertext that allows efficient comparison of ciphertexts. We evaluate our approaches on real-world datasets and validate the accuracy, complexity, and privacy compared to four baselines, showing that our approaches outperform all baselines in terms of complexity and privacy while providing higher linkage quality than a standard privacy-preserving record linkage technique. • Accurate and secure string matching based on longest common sub-string is increasingly important in application domains such as bioinformatics, healthcare, and crime detection. • We propose two privacy-preserving string matching approaches based on lengths of sub-strings that can identify common sub-strings between databases. • In our first approach we compare the lengths of sub-strings between databases using a list of hash values generated using a special 1- and 0-encoding, while in our second approach we use an efficient encryption technique to conduct the comparison between pairs of ciphertexts from two databases efficiently. • We empirically evaluate our approaches under the linkage quality, time complexity, and privacy guarantees against several state-of-the-art privacy-preserving string matching approaches using several real datasets. • Our evaluation shows that our approaches can achieve a high linkage quality and efficiency while providing stronger privacy compared to baselines. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
22142126
Volume :
81
Database :
Academic Search Index
Journal :
Journal of Information Security & Applications
Publication Type :
Academic Journal
Accession number :
175700536
Full Text :
https://doi.org/10.1016/j.jisa.2024.103712