Benefits of Alternative Evaluation Methods for Automated Essay Scoring

Authors :: Andersen, Øistein E.
Yuan, Zheng
Watson, Rebecca
Cheung, Kevin Yet Fong
Source :: International Educational Data Mining Society. 2021.
Publication Year :: 2021
Abstract: Automated essay scoring (AES), where natural language processing is applied to score written text, can underpin educational resources in blended and distance learning. AES performance has typically been reported in terms of correlation coefficients or agreement statistics calculated between a system and an expert human examiner. We describe the benefits of alternative methods to evaluate AES systems and, more importantly, facilitate comparison between AES systems and expert human examiners. We employ these methods, together with "multi-marked" test data labelled by 5 expert human examiners, to guide machine learning model development and selection, resulting in models that outperform expert human examiners. We extend on previous work on a mature feature-based linear ranking perceptron model and also develop a new multitask learning neural network model built on top of a pretrained language model -- DistilBERT. Combining these two models' scores results in further improvements in performance (compared to that of each single model). [For the full proceedings, see ED615472.]

Tools