Back to Search Start Over

Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC) -- an end-to-end model for characterizing severity and diagnosis

Authors :
Santos, Thiago
Kamath, Harish
McAdams, Christopher R.
Newell, Mary S.
Mosunjac, Marina
Oprea-Ilies, Gabriela
Smith, Geoffrey
Lehman, Constance
Gichoya, Judy
Banerjee, Imon
Trivedi, Hari
Publication Year :
2023

Abstract

Automated classification of cancer pathology reports can extract information from unstructured reports and categorize each report into structured diagnosis and severity categories. Thus, such system can reduce the burden for populating tumor registries, help registration for clinical trial as well as developing large dataset for deep learning model development using true pathologic ground truth. However, the content of breast pathology reports can be difficult for categorize due to the high linguistic variability in content and wide variety of potential diagnoses >50. Existing NLP models are primarily focused on developing classifier for primary breast cancer types (e.g. IDC, DCIS, ILC) and tumor characteristics, and ignore the rare diagnosis of cancer subtypes. We then developed a hierarchical hybrid transformer-based pipeline (59 labels) - Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC), which utilizes the potential of the transformer context-preserving NLP technique and compared our model to several state of the art ML and DL models. We trained the model on the EUH data and evaluated our model's performance on two external datasets - MGH and Mayo Clinic. We publicly release the code and a live application under Huggingface spaces repository

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2312.12442
Document Type :
Working Paper