1. Natural language processing pipeline to extract prostate cancer-related information from clinical notes.
- Author
-
Nakai H, Suman G, Adamo DA, Navin PJ, Bookwalter CA, LeGout JD, Chen FK, Wellnitz CV, Silva AC, Thomas JV, Kawashima A, Fan JW, Froemming AT, Lomas DJ, Humphreys MR, Dora C, Korfiatis P, and Takahashi N
- Subjects
- Humans, Male, Retrospective Studies, Middle Aged, Aged, Risk Factors, Sensitivity and Specificity, Prostatic Neoplasms diagnostic imaging, Natural Language Processing, Magnetic Resonance Imaging methods
- Abstract
Objectives: To develop an automated pipeline for extracting prostate cancer-related information from clinical notes., Materials and Methods: This retrospective study included 23,225 patients who underwent prostate MRI between 2017 and 2022. Cancer risk factors (family history of cancer and digital rectal exam findings), pre-MRI prostate pathology, and treatment history of prostate cancer were extracted from free-text clinical notes in English as binary or multi-class classification tasks. Any sentence containing pre-defined keywords was extracted from clinical notes within one year before the MRI. After manually creating sentence-level datasets with ground truth, Bidirectional Encoder Representations from Transformers (BERT)-based sentence-level models were fine-tuned using the extracted sentence as input and the category as output. The patient-level output was determined by compilation of multiple sentence-level outputs using tree-based models. Sentence-level classification performance was evaluated using the area under the receiver operating characteristic curve (AUC) on 15% of the sentence-level dataset (sentence-level test set). The patient-level classification performance was evaluated on the patient-level test set created by radiologists by reviewing the clinical notes of 603 patients. Accuracy and sensitivity were compared between the pipeline and radiologists., Results: Sentence-level AUCs were ≥ 0.94. The pipeline showed higher patient-level sensitivity for extracting cancer risk factors (e.g., family history of prostate cancer, 96.5% vs. 77.9%, p < 0.001), but lower accuracy in classifying pre-MRI prostate pathology (92.5% vs. 95.9%, p = 0.002) and treatment history of prostate cancer (95.5% vs. 97.7%, p = 0.03) than radiologists, respectively., Conclusion: The proposed pipeline showed promising performance, especially for extracting cancer risk factors from patient's clinical notes., Clinical Relevance Statement: The natural language processing pipeline showed a higher sensitivity for extracting prostate cancer risk factors than radiologists and may help efficiently gather relevant text information when interpreting prostate MRI., Key Points: When interpreting prostate MRI, it is necessary to extract prostate cancer-related information from clinical notes. This pipeline extracted the presence of prostate cancer risk factors with higher sensitivity than radiologists. Natural language processing may help radiologists efficiently gather relevant prostate cancer-related text information., Competing Interests: Compliance with ethical standards Guarantor The scientific guarantor of this publication Naoki Takahashi. Conflict of interest The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article. Statistics and biometry No complex statistical methods were necessary for this paper. Informed consent Written informed consent was waived by the Institutional Review Board. “The Reviewer approved waiver of the requirement to obtain informed consent in accordance with 45 CFR 46.116 as justified by the Investigator, and waiver of HIPAA authorization in accordance with applicable HIPAA regulations”. Ethical approval Institutional Review Board approval was obtained. (#23-008038). “IRB Application #: 23-008038. Title: Development of Machine Learning Model of Prostate Cancer Using Prostate MRI and Clinical Data. IRB Approval Date: 8/30/2023. IRB Expiration Date: The above referenced application was reviewed by expedited review procedures and is determined to be exempt from the requirement for IRB approval (45 CFR 46.104d, category 4). Continued IRB review of this study is not required as it is currently written. However, requests for modifications to the study design or procedures must be submitted to the IRB to determine whether the study continues to be exempt. The Reviewer approved waiver of HIPAA authorization in accordance with applicable HIPAA regulations. As the principal investigator of this project, you are responsible for the following relating to this study. (1) When applicable, use only IRB approved materials which are located under the documents tab of the IRBe workspace. Materials include consent forms, HIPAA, questionnaires, contact letters, advertisements, etc. (2) Submission to the IRB of any modifications to approved research along with any supporting documents for review and approval prior to initiation of the changes. (3) Submission to the IRB of all Unanticipated Problems Involving Risks to Subjects or Others (UPIRTSO) and major protocol violations/deviations within five working days of becoming aware of the occurrence. (4) Compliance with applicable regulations for the protection of human subjects and with Mayo Clinic Institutional Policies. Mayo Clinic Institutional Reviewer”. Study subjects or cohorts overlap Thousands of patients included in this study overlapped with previously published works that evaluated cancer detection rates of prostate MRI in various different populations and a study that developed deep learning models for detecting clinically significant prostate cancer.1.Nagayama H, Nakai H, Takahashi H, et al Cancer detection rate and abnormal interpretation rate of prostate MRI performed for clinical suspicion of prostate cancer. J Am Coll Radiol. 2023; https://doi.org/10.1016/j.jacr.2023.07.031.2.Nakai H, Nagayama H, Takahashi H, et al Cancer detection rate and abnormal interpretation rate of prostate MRI in patients with low-grade cancer. J Am Coll Radiol. 2023; https://doi.org/10.1016/j.jacr.2023.07.030.3.3.Nakai H, Takahashi H, Adamo DA, et al Decreased cancer detection rate of the prostate MRI in patients with moderate to severe susceptibility artifacts from hip prosthesis. Eur Radiol. 2023; https://doi.org/10.1007/s00330-023-10345-4.4.Cai JC, Nakai H, Kuanar S, et al A fully automated deep learning model to detect clinically significant prostate cancer on multiparametric MRI. (Manuscript under review). Methodology RetrospectiveDiagnostic or prognostic studyPerformed at one institution, (© 2024. The Author(s), under exclusive licence to European Society of Radiology.)
- Published
- 2024
- Full Text
- View/download PDF