1. Rasch techniques for detecting bias in performance assessments: an example comparing the performance of native and non-native speakers on a test of academic English.
- Author
-
Elder C, McNamara T, and Congdon P
- Subjects
- Australia, Humans, Multilingualism, Pilot Projects, Bias, Educational Measurement statistics & numerical data, Language Tests statistics & numerical data, Models, Statistical, Students psychology
- Abstract
The use of common tasks and rating procedures when assessing the communicative skills of students from highly diverse linguistic and cultural backgrounds poses particular measurement challenges, which have thus far received little research attention. If assessment tasks or criteria are found to function differentially for particular subpopulations within a test candidature with the same or a similar level of criterion ability, then the test is open to charges of bias in favour of one or other group. While there have been numerous studies involving dichotomous language test items (see e.g. Chen and Henning, 1985 and more recently Elder, 1996) few studies have considered the issue of bias in relation to performance based tasks which are assessed subjectively, via analytic and holistic rating scales. The paper demonstrates how Rasch analytic procedures can be applied to the investigation of item bias or differential item functioning (DIF) in both dichotomous and scalar items on a test of English for academic purposes. The data were gathered from a pilot English language test administered to a representative sample of undergraduate students (N= 139) enrolled in their first year of study at an English-medium university. The sample included native speakers of English who had completed up to 12 years of secondary schooling in their first language (L1) and immigrant students, mainly from Asian language backgrounds, with varying degrees of prior English language instruction and exposure. The purpose of the test was to diagnose the academic English needs of incoming undergraduates so that additional support could be offered to those deemed at risk of failure in their university study. Some of the tasks included in the assessment procedure involved objectively-scored items (measuring vocabulary knowledge, text-editing skills and reading and listening comprehension) whereas others (i.e. a report and an argumentative writing task) were subjectively-scored. The study models a methodology for estimating bias with both dichotomous and scalar items using the programs Quest (Adams and Khoo, 1993) for the former and ConQuest (Wu, Adams and Wilson, 1998) for the latter. It also offers answers to the practical questions of whether a common set of assessment criteria can, in an academic context such as this one, be meaningfully applied to all subgroups within the candidature and whether analytic criteria are more susceptible to biased ratings than holistic ones. Implications for test fairness and test validity are discussed.
- Published
- 2003