Back to Search Start Over

End-to-end pseudonymization of fine-tuned clinical BERT models: Privacy preservation with maintained data utility.

Authors :
Vakili, Thomas
Henriksson, Aron
Dalianis, Hercules
Source :
BMC Medical Informatics & Decision Making. 6/24/2024, Vol. 24 Issue 1, p1-15. 15p.
Publication Year :
2024

Abstract

Many state-of-the-art results in natural language processing (NLP) rely on large pre-trained language models (PLMs). These models consist of large amounts of parameters that are tuned using vast amounts of training data. These factors cause the models to memorize parts of their training data, making them vulnerable to various privacy attacks. This is cause for concern, especially when these models are applied in the clinical domain, where data are very sensitive. Training data pseudonymization is a privacy-preserving technique that aims to mitigate these problems. This technique automatically identifies and replaces sensitive entities with realistic but non-sensitive surrogates. Pseudonymization has yielded promising results in previous studies. However, no previous study has applied pseudonymization to both the pre-training data of PLMs and the fine-tuning data used to solve clinical NLP tasks. This study evaluates the effects on the predictive performance of end-to-end pseudonymization of Swedish clinical BERT models fine-tuned for five clinical NLP tasks. A large number of statistical tests are performed, revealing minimal harm to performance when using pseudonymized fine-tuning data. The results also find no deterioration from end-to-end pseudonymization of pre-training and fine-tuning data. These results demonstrate that pseudonymizing training data to reduce privacy risks can be done without harming data utility for training PLMs. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14726947
Volume :
24
Issue :
1
Database :
Academic Search Index
Journal :
BMC Medical Informatics & Decision Making
Publication Type :
Academic Journal
Accession number :
178064704
Full Text :
https://doi.org/10.1186/s12911-024-02546-8