Evaluating Prototype Tasks and Alternative Rating Schemes for a New ESL Writing Test through G-Theory

Authors :: Lee, Yong-Won
Kantor, Robert
Source :: International Journal of Testing. Nov 2007 7(4):353-385.
Publication Year :: 2007
Abstract: Possible integrated and independent tasks were pilot tested for the writing section of a new generation of the TOEFL[R] (Test of English as a Foreign Language[TM]). This study examines the impact of various rating designs and of the number of tasks and raters on the reliability of writing scores based on integrated and independent tasks from the perspective of generalizability theory (G-theory). Both univariate and multivariate G-theory analyses were conducted. It was found that (a) in terms of maximizing the score dependability, it would be more efficient to increase the number of tasks rather than the number of raters per essay; (b) two particular single-rating designs of "having different tasks for the same examinee rated by different raters" [p x (R:T), R:(p x T)] achieved relatively higher score dependability than other single-rating designs; and (c) a somewhat larger gain in composite score reliability was achieved when the number of listening-writing tasks was larger than that of reading-writing tasks. (Contains 4 figures, 6 tables and 8 footnotes.)

Full Text Access

Tools