Bryan J. Traynor, Ilaria Martinelli, Roy H. Campbell, Adriano Chiò, Anant Dadu, Rosario Vasta, Brunn F, Elisabetta Zucchi, Antonio Canosa, Faraz Faghri, Cristina Moglia, Letizia Mazzini, Mike A. Nalls, Jessica Mandrioli, and Andrea Calvo
SUMMARYBackgroundThe disease entity known as amyotrophic lateral sclerosis (ALS) is now known to represent a collection of overlapping syndromes. A better understanding of this heterogeneity and the ability to distinguish ALS subtypes would improve the clinical care of patients and enhance our understanding of the disease. Subtype profiles could be incorporated into the clinical trial design to improve our ability to detect a therapeutic effect. A variety of classification systems have been proposed over the years based on empirical observations, but it is unclear to what extent they genuinely reflect ALS population substructure.MethodsWe applied machine learning algorithms to a prospective, population-based cohort consisting of 2,858 Italian patients diagnosed with ALS for whom detailed clinical phenotype data were available. We replicated our findings in an independent population-based cohort of 1,097 Italian ALS patients.FindingsWe found that semi-supervised machine learning based on UMAP applied to the output of a multi-layered perceptron neural network produced the optimum clustering of the ALS patients in the discovery cohort. These clusters roughly corresponded to the six clinical subtypes defined by the Chiò classification system (bulbar ALS, respiratory ALS, flail arm ALS, classical ALS, pyramidal ALS, and flail leg ALS). The same clusters were identified in the replication cohort. A supervised learning approach based on ensemble learning identified twelve clinical parameters that predicted ALS clinical subtype with high accuracy (area under the curve = 0·94).InterpretationOur data-driven study provides insight into the ALS population’s substructure and demonstrates that the Chiò classification system robustly identifies ALS subtypes. We provide an interactive website (https://share.streamlit.io/anant-dadu/machinelearningforals/main) so that clinical researchers can predict the clinical subtype of an ALS patient based on a small number of clinical parameters.FundingNational Institute on Aging and the Italian Ministry of Health.RESEARCH IN CONTEXTEvidence before this studyWe searched PubMed for articles published in English from database inception until January 5, 2021, about the use of machine learning and the identification of clinical subtypes within the amyotrophic lateral sclerosis (ALS) population, using the search terms “machine learning”, AND “classification”, AND “amyotrophic lateral sclerosis”. This inquiry identified twenty-nine studies. Most previous studies used machine learning to diagnose ALS (based on gait, imaging, electromyography, gene expression, proteomic, and metabolomic data) or improve brain-computer interfaces. One study used machine learning algorithms to stratify ALS postmortem cortex samples into molecular subtypes based on transcriptome data. Kueffner and colleagues crowdsourced the development of machine learning algorithms to approximately thirty teams to obtain a consensus in an attempt to identify ALS patients subpopulation. In addition to clinical trial information in the PRO-ACT database (www.ALSdatabase.org), this effort used data from the Piedmont and Valle d’Aosta Registry for ALS (PARALS). Four ALS patient categories were identified: slow progressing, fast progressing, early stage, and late stage. This approach’s clinical relevance was unclear, as all ALS patients will necessarily pass through an early and late stage of the disease.Furthermore, no attempt was made to discern which of the existing clinical classification systems, such as the El Escorial criteria, the Chiò classification system, and the King’s clinical staging system, can identify ALS subtypes. We concluded that there remained an unmet need to identify the ALS population’s substructure in a data-driven, non-empirical manner. Building on this, there was a need for a tool that reliably predicts the clinical subtype of an ALS patient. This knowledge would improve our understanding of the clinical heterogeneity associated with this fatal neurodegenerative disease.Added value of this studyThis study developed a machine learning algorithm to detect ALS patients’ clinical subtypes using clinical data collected from the 2,858 Italian ALS patients in PARALS. Ascertainment of these patients within the catchment area was near complete, meaning that the dataset truly represented the ALS population. We replicated our approach using clinical data obtained from an independent cohort of 1,097 Italian ALS patients that had also been collected in a population-based, longitudinal manner. Semi-supervised learning based on Uniform Manifold Approximation and Projection (UMAP) applied to a multilayer perceptron neural network provided the optimum results based on visual inspection. The observed clusters equated to the six clinical subtypes previously defined by the Chiò classification system (bulbar ALS, respiratory ALS, flail arm ALS, classical ALS, pyramidal ALS, and flail leg ALS). Using a small number of clinical parameters, an ensemble learning approach could predict the ALS clinical subtype with high accuracy (area under the curve = 0·94).Implications of all available evidenceAdditional validation is required to determine these algorithms’ accuracy and clinical utility in assigning clinical subtypes. Nevertheless, our algorithms offer a broad insight into the clinical heterogeneity of ALS and help to determine the actual subtypes of disease that exist within this fatal neurodegenerative syndrome. The systematic identification of ALS subtypes will improve clinical care and clinical trial design.