Back to Search Start Over

Population validity for educational data mining models: A case study in affect detection.

Authors :
Ocumpaugh, Jaclyn
Baker, Ryan
Gowda, Sujith
Heffernan, Neil
Heffernan, Cristina
Source :
British Journal of Educational Technology; May2014, Vol. 45 Issue 3, p487-501, 15p, 7 Charts
Publication Year :
2014

Abstract

Information and communication technology ( ICT)-enhanced research methods such as educational data mining ( EDM) have allowed researchers to effectively model a broad range of constructs pertaining to the student, moving from traditional assessments of knowledge to assessment of engagement, meta-cognition, strategy and affect. The automated detection of these constructs allows EDM researchers to develop intervention strategies that can be implemented either by the software or the teacher. It also allows for secondary analyses of the construct, where the detectors are applied to a data set that is much larger than one that could be analyzed by more traditional methods. However, in many cases, the data used to develop EDM models are collected from students who may not be representative of the broader populations who are likely to use ICT. In order to use EDM models (automated detectors) with new populations, their generalizability must be verified. In this study, we examine whether detectors of affect remain valid when applied to new populations. Models of four educationally relevant affective states were constructed based on data from urban, suburban and rural students using ASSISTments software for middle school mathematics in the Northeastern United States. We found that affect detectors trained on a population drawn primarily from one demographic grouping do not generalize to populations drawn primarily from the other demographic groupings, even though those populations might be considered part of the same national or regional culture. Models constructed using data from all three subpopulations are more applicable to students in those populations than those trained on a single group, but still do not achieve ideal population validity-the ability to generalize across all subgroups. In particular, models generalize better across urban and suburban students than rural students. These findings have important implications for data collection efforts, validation techniques, and the design of interventions that are intended to be applied at scale. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00071013
Volume :
45
Issue :
3
Database :
Complementary Index
Journal :
British Journal of Educational Technology
Publication Type :
Academic Journal
Accession number :
95515264
Full Text :
https://doi.org/10.1111/bjet.12156