1. Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models
- Author
-
Li, Zehan, Hu, Yan, Lane, Scott, Selek, Salih, Shahani, Lokesh, Machado-Vieira, Rodrigo, Soares, Jair, Xu, Hua, Liu, Hongfang, and Huang, Ming
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society ,Computer Science - Information Retrieval - Abstract
Accurate identification and categorization of suicidal events can yield better suicide precautions, reducing operational burden, and improving care quality in high-acuity psychiatric settings. Pre-trained language models offer promise for identifying suicidality from unstructured clinical narratives. We evaluated the performance of four BERT-based models using two fine-tuning strategies (multiple single-label and single multi-label) for detecting coexisting suicidal events from 500 annotated psychiatric evaluation notes. The notes were labeled for suicidal ideation (SI), suicide attempts (SA), exposure to suicide (ES), and non-suicidal self-injury (NSSI). RoBERTa outperformed other models using multiple single-label classification strategy (acc=0.86, F1=0.78). MentalBERT (acc=0.83, F1=0.74) also exceeded BioClinicalBERT (acc=0.82, F1=0.72) which outperformed BERT (acc=0.80, F1=0.70). RoBERTa fine-tuned with single multi-label classification further improved the model performance (acc=0.88, F1=0.81). The findings highlight that the model optimization, pretraining with domain-relevant data, and the single multi-label classification strategy enhance the model performance of suicide phenotyping. Keywords: EHR-based Phenotyping; Natural Language Processing; Secondary Use of EHR Data; Suicide Classification; BERT-based Model; Psychiatry; Mental Health, Comment: submitted to AMIA Informatics Summit 2025 as a conference paper
- Published
- 2024