Back to Search Start Over

Using Natural Language Processing to Automatically Assess Feedback Quality: Findings From 3 Surgical Residencies.

Authors :
Ötleş E
Kendrick DE
Solano QP
Schuller M
Ahle SL
Eskender MH
Carnes E
George BC
Source :
Academic medicine : journal of the Association of American Medical Colleges [Acad Med] 2021 Oct 01; Vol. 96 (10), pp. 1457-1460.
Publication Year :
2021

Abstract

Purpose: Learning is markedly improved with high-quality feedback, yet assuring the quality of feedback is difficult to achieve at scale. Natural language processing (NLP) algorithms may be useful in this context as they can automatically classify large volumes of narrative data. However, it is unknown if NLP models can accurately evaluate surgical trainee feedback. This study evaluated which NLP techniques best classify the quality of surgical trainee formative feedback recorded as part of a workplace assessment.<br />Method: During the 2016-2017 academic year, the SIMPL (Society for Improving Medical Professional Learning) app was used to record operative performance narrative feedback for residents at 3 university-based general surgery residency training programs. Feedback comments were collected for a sample of residents representing all 5 postgraduate year levels and coded for quality. In May 2019, the coded comments were then used to train NLP models to automatically classify the quality of feedback across 4 categories (effective, mediocre, ineffective, or other). Models included support vector machines (SVM), logistic regression, gradient boosted trees, naive Bayes, and random forests. The primary outcome was mean classification accuracy.<br />Results: The authors manually coded the quality of 600 recorded feedback comments. Those data were used to train NLP models to automatically classify the quality of feedback across 4 categories. The NLP model using an SVM algorithm yielded a maximum mean accuracy of 0.64 (standard deviation, 0.01). When the classification task was modified to distinguish only high-quality vs low-quality feedback, maximum mean accuracy was 0.83, again with SVM.<br />Conclusions: To the authors' knowledge, this is the first study to examine the use of NLP for classifying feedback quality. SVM NLP models demonstrated the ability to automatically classify the quality of surgical trainee evaluations. Larger training datasets would likely further increase accuracy.<br /> (Copyright © 2021 by the Association of American Medical Colleges.)

Details

Language :
English
ISSN :
1938-808X
Volume :
96
Issue :
10
Database :
MEDLINE
Journal :
Academic medicine : journal of the Association of American Medical Colleges
Publication Type :
Academic Journal
Accession number :
33951682
Full Text :
https://doi.org/10.1097/ACM.0000000000004153