Jennifer F Barcroft, Kristofer Linton-Reid, Chiara Landolfo, Maya Al Memar, Nina Parker, Chris Kyriacou, Maria Munaretto, Martina Fantauzzi, Nina Cooper, Joseph Yazbek, Nishat Bharwani, Sa ra Lee, Ju Hee Kim, Dirk Timmerman, Joram M. Posma, Luca Savelli, Srdjan Saso, Eric O. Aboagye, and Tom Bourne
BackgroundOvarian cancer remains the deadliest of all gynaecological cancers. Ultrasound-based models exist to support the classification of adnexal masses but are dependent on human assessment of features on ultrasound. Therefore, we aimed to develop an end-to-end machine learning (ML) model capable of automating the classification of adnexal masses.MethodsIn this retrospective study, transvaginal ultrasound scan images were extracted and segmented from Imperial College Healthcare, UK (ICH development dataset; n=577 masses; 1444 images) and Morgagni-Pierantoni Hospital, Italy (MPH external dataset; n=184 masses; 476 images). Clinical data including age, CA-125 and diagnosis (ultrasound subjective assessment, SA) or histology) were collected. A segmentation and classification model was developed by comparing several models using convolutional neural network-based models and traditional radiomics features. Dice surface coefficient was used to measure segmentation performance and area under the ROC curve (AUC), F1-score and recall for classification performance.FindingsThe ICH and MPH datasets had a median age of 45 (IQR 35-60) and 48 (IQR 38-57) and consisted of 23·1% and 31·5% malignant cases, respectively. The best segmentation model achieved a dice surface coefficient of 0·85 ±0·01, 0·88 ±0·01 and 0·85 ±0·01 in the ICH training, ICH validation and MPH test sets. The best classification model achieved a recall of 1·00 and F1-score of 0·88 (AUC 0·93), 0·94 (AUC 0·89) and 0·83 (AUC 0·90) in the ICH training, ICH validation and MPH test sets, respectively.InterpretationThe ML model provides an end-to-end method of adnexal mass segmentation and classification, with a comparable predictive performance (AUC 0·90) to the published performance of expert subjective assessment (SA, gold standard), and current risk models. Further prospective evaluation of the classification performance of the ML model against existing methods is required.FundingMedical Research Council, Imperial STRATiGRAD PhD programme and Imperial Health Charity.Research in ContextEvidence before this studyAdnexal masses are common, affecting up to 18% of postmenopausal women. Ultrasound is the primary imaging modality for the assessment of adnexal masses. Accurate classification of adnexal masses is fundamental to inform appropriate management. However, all existing classification methods are subjective and rely upon ultrasound expertise.Various models have been developed using ultrasound features and serological markers such as the Risk of malignancy index (RMI), International Ovarian Tumour Analysis (IOTA) Simple Rules (SR), the IOTA Assessment of Different NEoplasia’s in the AdneXa (ADNEX) model, and American College of Radiology (ACR) Ovarian-Adnexal Reporting and Data System Ultrasound (ORADS-US) to support the classification of adnexal masses. Despite modelling efforts, expert subjective assessment remains the gold standard method of classifying adnexal masses.The use of machine learning (ML) within clinical imaging is a rapidly evolving field due to its potential to overcome the subjectivity within image assessment and interpretation. Various studies (n=17) evaluating the use of ML within the classification of adnexal masses on ultrasound have been summarised within a recent meta-analysis by Xu et al, 2022. No studies used a radiomics-based approach to the classification of adnexal masses, and most have not been externally validated within a test set, questioning their generalisability. The largest study to date (Gao et al, 2022), used a deep learning (DL) based approach and was externally validated, yet its performance (F1 score 0·551) was not comparable to existing classification approaches.Added value of this studyWe have developed an end-to-end ML model (ODS) using DL and radiomics-based approaches, capable of identification (automated segmentation) and classification of adnexal masses with a high detection rate for malignancy. The ODS model had a performance comparable to the published performance of existing adnexal mass classification methods and does not rely upon ultrasound experience.Implications of all the available evidenceODS is a high performing, end-to-end model capable of classifying adnexal masses and requires limited ultrasound operator experience. The ODS model is potentially generalisable, having showed consistent performance in both validation (internal) and test (external) sets, highlighting the potential clinical value of a radiomics-based model within the classification of adnexal masses on ultrasound. The ODS model could function as a scalable triage tool, to identify high risk adnexal masses requiring further ultrasound assessment by an expert.