Back to Search Start Over

Utility of Features in a Natural-Language-Processing-Based Clinical De-Identification Model Using Radiology Reports for Advanced NSCLC Patients.

Authors :
Paul, Tanmoy
Islam, Humayera
Singh, Nitesh
Jampani, Yaswitha
Kotapati, Teja Venkat Pavan
Tautam, Preethi Aishwarya
Rana, Md Kamruz Zaman
Mandhadi, Vasanthi
Sharma, Vishakha
Barnes, Michael
Hammer, Richard D.
Mosa, Abu Saleh Mohammad
Source :
Applied Sciences (2076-3417); Oct2022, Vol. 12 Issue 19, p9976, 10p
Publication Year :
2022

Abstract

The de-identification of clinical reports is essential to protect the confidentiality of patients. The natural-language-processing-based named entity recognition (NER) model is a widely used technique of automatic clinical de-identification. The performance of such a machine learning model relies largely on the proper selection of features. The objective of this study was to investigate the utility of various features in a conditional-random-field (CRF)-based NER model. Natural language processing (NLP) toolkits were used to annotate the protected health information (PHI) from a total of 10,239 radiology reports that were divided into seven types. Multiple features were extracted by the toolkit and the NER models were built using these features and their combinations. A total of 10 features were extracted and the performance of the models was evaluated based on their precision, recall, and F<subscript>1</subscript>-score. The best-performing features were n-gram, prefix-suffix, word embedding, and word shape. These features outperformed others across all types of reports. The dataset we used was large in volume and divided into multiple types of reports. Such a diverse dataset made sure that the results were not subject to a small number of structured texts from where a machine learning model can easily learn the features. The manual de-identification of large-scale clinical reports is impractical. This study helps to identify the best-performing features for building an NER model for automatic de-identification from a wide array of features mentioned in the literature. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20763417
Volume :
12
Issue :
19
Database :
Complementary Index
Journal :
Applied Sciences (2076-3417)
Publication Type :
Academic Journal
Accession number :
159675985
Full Text :
https://doi.org/10.3390/app12199976