Back to Search
Start Over
The Effect of Rounding Aggregated Item Ratings for Constructed Response Items in Mixed-Item Format Tests.
- Publication Year :
- 1998
-
Abstract
- A common procedure for obtaining multiple readings (ratings) for a constructed response item, especially in high-stakes tests, is to have two readers read the papers independently, with a third reading if the results differ by more than one point. This necessitates a scoring rule that specifies how the ratings will be aggregated into a single item score. Two plausible scoring rules involve averaging the readings and rounding either to the nearest half point or the nearest integer, but it is not known which results in a greater precision of measurement. This study investigated the precision and accuracy of ability estimates obtained under the two scoring rules for mixed format tests calibrated under an item response theory model. Eleventh-grade reading, mathematics, and science test results and a fifth-grade mathematics test result were analyzed, with more than 1,200 students available for each form. There was little substantive difference in score information or the standard errors of ability estimates due to the type of rounding (integer versus half point), above the floors of three of the four tests, but in the fourth (11th grade reading) there was less error in the integer-rounded ability estimates at the lower portion of the scale. Integer-rounded estimates generally produce slightly larger predicted percent of maximum (test) scores, though not throughout the entire ability range of all the four tests studied. The expected larger positive differences or rounding bias for number correct estimates were observed. Within-subject differences between scale score estimates derived using integer versus half-point scores were generally small for both pattern and number correct ability estimates. The lack of substantive improvement in measurement precision that could be attributed to half-point rounding, coupled with the documented instance of increased error induced by that type of rounding in a portion of the ability range of students taking one test, would seem to argue for rounding average ratings to the nearest integer. Rounding up gives the preponderance of students the benefit of the doubt concerning the acceptability of their responses. (Contains two tables, four figures, and eight references.) (SLD)
Details
- Language :
- English
- Database :
- ERIC
- Publication Type :
- Report
- Accession number :
- ED423290
- Document Type :
- Reports - Research<br />Speeches/Meeting Papers