1. Validation of a machine learning–derived clinical metric to quantify outcomes after total shoulder arthroplasty
- Author
-
Steven Overman, Thomas W. Wright, Ankur Teredesai, Joseph D. Zuckerman, Christopher P. Roche, Vikas Kumar, Howard D. Routman, Ryan Simovitch, and Pierre-Henri Flurin
- Subjects
Male ,medicine.medical_treatment ,Machine learning ,computer.software_genre ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,Patient satisfaction ,Humans ,Medicine ,Orthopedics and Sports Medicine ,Range of Motion, Articular ,Retrospective Studies ,Interpretability ,030222 orthopedics ,Shoulder Joint ,business.industry ,Construct validity ,030229 sport sciences ,General Medicine ,Response bias ,Arthroplasty ,Test (assessment) ,Treatment Outcome ,Arthroplasty, Replacement, Shoulder ,Ceiling effect ,Female ,Surgery ,Metric (unit) ,Artificial intelligence ,business ,computer - Abstract
Background We propose a new clinical assessment tool constructed using machine learning, called the Shoulder Arthroplasty Smart (SAS) score to quantify outcomes following total shoulder arthroplasty (TSA). Methods Clinical data from 3667 TSA patients with 8104 postoperative follow-up reports were used to quantify the psychometric properties of validity, responsiveness, and clinical interpretability for the proposed SAS score and each of the Simple Shoulder Test (SST), Constant, American Shoulder and Elbow Surgeons Standardized Shoulder Assessment Form (ASES), University of California Los Angeles (UCLA), and Shoulder Pain and Disability Index (SPADI) scores. Results Convergent construct validity was demonstrated, with all 6 outcome measures being moderately to highly correlated preoperatively and highly correlated postoperatively when quantifying TSA outcomes. The SAS score was most correlated with the UCLA score and least correlated with the SST. No clinical outcome score exhibited significant floor effects preoperatively or postoperatively or significant ceiling effects preoperatively; however, significant ceiling effects occurred postoperatively for each of the SST (44.3%), UCLA (13.9%), ASES (18.7%), and SPADI (19.3%) measures. Ceiling effects were more pronounced for anatomic than reverse TSA, and generally, men, younger patients, and whites who received TSA were more likely to experience a ceiling effect than TSA patients who were female, older, and of non-white race or ethnicity. The SAS score had the least number of patients with floor and ceiling effects and also exhibited no response bias in any patient characteristic analyzed in this study. Regarding clinical interpretability, patient satisfaction anchor-based thresholds for minimal clinically importance difference and substantial clinical benefit were quantified for all 6 outcome measures; the SAS score thresholds were most similar in magnitude to the Constant score. Regarding responsiveness, all 6 outcome measures detected a large effect, with the UCLA exhibiting the most responsiveness and the SST exhibiting the least. Finally, each of the SAS, ASES, Constant, and SPADI scores had similarly large standardized response mean and effect size responsiveness. Discussion The 6-question SAS score is an efficient TSA-specific outcome measure with equivalent or better validity, responsiveness, and clinical interpretability as 5 other historical assessment tools. The SAS score has an appropriate response range without floor or ceiling effects and without bias in any target patient characteristic, unlike the age, gender, or race/ethnicity bias observed in the ceiling scores with the other outcome measures. Because of these substantial benefits, we recommend the use of the new SAS score for quantifying TSA outcomes.
- Published
- 2021