Nathan Radakovich, Manja Meggendorfer, Luca Malcovati, Mikkael A. Sekeres, Jacob Shreve, Cameron Beau Hilton, Yazan Rouphail, Wencke Walter, Stephan Hutter, Sudipto Mukherjee, Cassandra M. Kerr, Babal K. Jha, Anna Gallì, Sarah Pozzi, Aaron T. Gerds, Cassandra M Kerr, Claudia Haferlach, Jaroslaw P. Maciejewski, Torsten Haferlach, and Aziz Nazha
Background While histo- and cytomorphological examinations are central to the diagnosis of myelodysplastic syndromes (MDS), significant inter-observer variability exists. The diagnosis can be challenging in pancytopenic patients (pts) without evidence of dysplasia and is contingent on observer expertise. We developed and externally validated a geno-clinical model that uses mutational data and peripheral blood counts/clinical variables to distinguish MDS from other myeloid malignancies. Methods Clinical and genomic data, including commercially available next-generation sequencing panels, were obtained for patients (pts) treated at the Cleveland Clinic (CC; 652 pts), Munich Leukemia Laboratory (MLL; 1509 pts), and the University of Pavia in Italy (UP, 536 pts). All patients had carried a diagnosis of MDS, chronic myelomonocytic leukemia (CMML), MDS/myeloproliferative neoplasm overlap (MDS/MPN), myeloproliferative neoplasm (MPN; either polycythemia vera, essential thrombocythemia, or myelofibrosis), clonal cytopenia of undetermined significance (CCUS), or idiopathic cytopenia of undetermined significance (ICUS). All diagnoses were established with bone marrow aspiration and according to World Health Organization 2017 criteria. The training cohort included data from CC and UP and randomly divided into learner (80%) and test (20%) cohorts. The final model was independently validated in the MLL cohort. A machine learning algorithm was used to build the model; multiple extraction algorithms were used to extract genomic/clinical variables on both the cohort and individual levels. Performance was evaluated according to the area under the curve of the receiver operating characteristic (ROC-AUC) and accuracy matrices. Results Among the 2697 pts included from all sites, the median age was 70 years [36 - 86]. Median hemoglobin (Hb) was 10.4g/dl [6.9 - 15.7], median platelet count (PLT) was 132 k/dL [14 - 722], median WBC count was 5.3 k/dL [1.4 - 49.9], median ANC was 2.8 k/dL [0.3 - 27.7], median monocyte count was 0.3 k/dL [0 - 9.9], and median lymphocyte count (ALC) was 1.1 k/dL [0.1 - 5.4], and median peripheral blast percentage 0% [0 - 8]. The most commonly mutated genes in all patients were (list top 5 genes) and among pts with MDS were SF3B1 (27%), TET2 (25%), ASXL1 (19%), SRSF2 (16%), and DNMT3A (11%); among patients with MDS-MPN/CMML, the most commonly mutated genes were MDS-MPN/CMML (TET2 46%, ASXL1 34%, SRSF2 29%, RUNX1 13%, CBL 12%) ; among patients with MPNs, the most commonly mutated genes were (JAK2 64%, ASXL1 27%, TET2 14%, DNMT3A 8%, U2AF1 7%); among patients with CCUS the most commonly mutated genes were (TET2 41%, DNMT3A 27%, ASXL1 19%, SRSF2 17%, ZRSR2 10%). The most important features for model predictions (ranked from the most to the least important) included: number of mutations detected/sample, peripheral blast percentage, AMC, JAK2 status, Hb, basophil count, age, eosinophil count, ALC, WBC, EZH2 mutation status, ANC, mutation status of KRAS and SF3B1, platelets, and gender. The final model achieved an average AUROC of 0.95 (95% CI 0.93-0.96) when applied to the test cohort and 0.93 (95% CI 0.91 - 0.94) when it was applied to the MLL cohort. The model also provides individual-level explanations for predictions, providing top differential diagnoses and individual-level explanations of how features influence a putative diagnosis (Figure 1b). Conclusions We developed and externally validated a highly accurate and interpretable model that can distinguish MDS from other myeloid malignancies using clinical and mutational data from a large international cohort. The model can provide personalized interpretations of its outcome and can aid physicians and hematopathologists in recognizing MDS with high accuracy when encountering pts with pancytopenia and with a suspected diagnosis of MDS. Disclosures Sekeres: Pfizer: Consultancy, Membership on an entity's Board of Directors or advisory committees; Takeda/Millenium: Consultancy, Membership on an entity's Board of Directors or advisory committees; BMS: Consultancy, Membership on an entity's Board of Directors or advisory committees. Mukherjee:Novartis: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; Partnership for Health Analytic Research, LLC (PHAR, LLC): Honoraria; Bristol Myers Squib: Honoraria; Celgene: Consultancy, Honoraria, Research Funding; Aplastic Anemia and MDS International Foundation: Honoraria; Celgene/Acceleron: Membership on an entity's Board of Directors or advisory committees; EUSA Pharma: Consultancy. Gerds:Sierra Oncology: Research Funding; Imago Biosciences: Research Funding; Apexx Oncology: Consultancy; Celgene: Consultancy, Research Funding; Incyte Corporation: Consultancy, Research Funding; Roche/Genentech: Research Funding; CTI Biopharma: Consultancy, Research Funding; AstraZeneca/MedImmune: Consultancy; Gilead Sciences: Research Funding; Pfizer: Research Funding. Maciejewski:Alexion, BMS: Speakers Bureau; Novartis, Roche: Consultancy, Honoraria. Nazha:Jazz: Research Funding; Incyte: Speakers Bureau; Novartis: Speakers Bureau; MEI: Other: Data monitoring Committee.