Back to Search
Start Over
End-to-end pseudonymization of fine-tuned clinical BERT models: Privacy preservation with maintained data utility.
- Source :
-
BMC Medical Informatics & Decision Making . 6/24/2024, Vol. 24 Issue 1, p1-15. 15p. - Publication Year :
- 2024
-
Abstract
- Many state-of-the-art results in natural language processing (NLP) rely on large pre-trained language models (PLMs). These models consist of large amounts of parameters that are tuned using vast amounts of training data. These factors cause the models to memorize parts of their training data, making them vulnerable to various privacy attacks. This is cause for concern, especially when these models are applied in the clinical domain, where data are very sensitive. Training data pseudonymization is a privacy-preserving technique that aims to mitigate these problems. This technique automatically identifies and replaces sensitive entities with realistic but non-sensitive surrogates. Pseudonymization has yielded promising results in previous studies. However, no previous study has applied pseudonymization to both the pre-training data of PLMs and the fine-tuning data used to solve clinical NLP tasks. This study evaluates the effects on the predictive performance of end-to-end pseudonymization of Swedish clinical BERT models fine-tuned for five clinical NLP tasks. A large number of statistical tests are performed, revealing minimal harm to performance when using pseudonymized fine-tuning data. The results also find no deterioration from end-to-end pseudonymization of pre-training and fine-tuning data. These results demonstrate that pseudonymizing training data to reduce privacy risks can be done without harming data utility for training PLMs. [ABSTRACT FROM AUTHOR]
- Subjects :
- *LANGUAGE models
*DATA privacy
*PRIVACY
*NATURAL language processing
Subjects
Details
- Language :
- English
- ISSN :
- 14726947
- Volume :
- 24
- Issue :
- 1
- Database :
- Academic Search Index
- Journal :
- BMC Medical Informatics & Decision Making
- Publication Type :
- Academic Journal
- Accession number :
- 178064704
- Full Text :
- https://doi.org/10.1186/s12911-024-02546-8