Iterative Feature Engineering through Text Replays of Model Errors

Authors :: Slater, Stefan
Baker, Ryan S.
Wang, Yeyu
Source :: International Educational Data Mining Society. 2020.
Publication Year :: 2020
Abstract: Feature engineering, the construction of contextual and relevant features from system log data, is a crucial component of developing robust and interpretable models in educational data mining contexts. The practice of feature engineering depends on domain experts and system developers working in tandem in order to creatively identify actions and behaviors of interest. In this paper we outline a method of iterative feature engineering using the misclassifications of earlier models. By selecting cases where earlier models and ground truth disagree, we can focus attention on specific behaviors, or patterns of behavior, that a model is not using in its predictions. We show that iterative feature engineering on cases of false positives and false negatives improved a model predicting quitting in an educational video game by 15%. We close by discussing applications of this method for addressing model performance gaps across different classes of learners, as well as precautions against model overfitting with using this method of feature engineering. [For the full proceedings, see ED607784.]

Tools