807 results on '"van der Linden, Wim"'
Search Results
102. Distributions of Sums of Nonidentical Random Variables
- Author
-
van der Linden, Wim J., primary
- Published
- 2017
- Full Text
- View/download PDF
103. Item Response Theory: Brief History, Common Models, and Extensions
- Author
-
van der Linden, Wim J., Hambleton, Ronald K., van der Linden, Wim J., editor, and Hambleton, Ronald K., editor
- Published
- 1997
- Full Text
- View/download PDF
104. Some Conceptual Issues in Observed-Score Equating
- Author
-
van der Linden, Wim J.
- Published
- 2013
- Full Text
- View/download PDF
105. More Issues in Observed-Score Equating
- Author
-
van der Linden, Wim J.
- Published
- 2013
- Full Text
- View/download PDF
106. eServices for Hospital Equipment
- Author
-
de Jonge, Merijn, van der Linden, Wim, Willems, Rik, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Krämer, Bernd J., editor, Lin, Kwei-Jay, editor, and Narasimhan, Priya, editor
- Published
- 2007
- Full Text
- View/download PDF
107. A Statistical Test for the Detection of Item Compromise Combining Responses and Response Times.
- Author
-
van der Linden, Wim J. and Belov, Dmitry I.
- Subjects
- *
BINOMIAL distribution - Abstract
A test of item compromise is presented which combines the test takers' responses and response times (RTs) into a statistic defined as the number of correct responses on the item for test takers with RTs flagged as suspicious. The test has null and alternative distributions belonging to the well‐known family of compound binomial distributions, is simple to calculate, and has results that are easy to interpret. It also demonstrated nearly perfect power for the detection of compromise with no more than 10 test takers with preknowledge of the more difficult and discriminating items in a set of empirical examples. For the easier and less discriminating items, the presence of some 20 test takers with preknowledge still sufficed. A test based on the reverse statistic of the total time by test takers with responses flagged as suspicious may seem a natural alternative but misses the property of a monotone likelihood ratio necessary to decide between a test that should be left or right sided. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
108. Linking Item Response Model Parameters
- Author
-
van der Linden, Wim J. and Barrett, Michelle D.
- Published
- 2016
- Full Text
- View/download PDF
109. Modeling Answer Changes on Test Items
- Author
-
van der Linden, Wim J. and Jeon, Minjeong
- Abstract
The probability of test takers changing answers upon review of their initial choices is modeled. The primary purpose of the model is to check erasures on answer sheets recorded by an optical scanner for numbers and patterns that may be indicative of irregular behavior, such as teachers or school administrators changing answer sheets after their students have finished the test or test takers communicating with each other about their initial responses. A statistical test based on the number of erasures is derived from the model. Besides, it is shown how to analyze the residuals under the model to check for suspicious patterns of erasures. The use of the two procedures is illustrated for an empirical data set from a large-scale assessment. The robustness of the model with respect to less than optimal opportunities for regular test takers to review their responses is investigated. (Contains 5 figures.)
- Published
- 2012
- Full Text
- View/download PDF
110. On Compensation in Multidimensional Response Modeling
- Author
-
van der Linden, Wim J.
- Abstract
The issue of compensation in multidimensional response modeling is addressed. We show that multidimensional response models are compensatory in their ability parameters if and only if they are monotone. In addition, a minimal set of assumptions is presented under which the MLEs of the ability parameters are also compensatory. In a recent series of articles, beginning with Hooker, Finkelman, and Schwartzman (2009) in this journal, the second type of compensation was presented as a paradoxical result for certain multidimensional response models, leading to occasional unfairness in maximum-likelihood test scoring. First, it is indicated that the compensation is not unique and holds generally for any multiparameter likelihood with monotone score functions. Second, we analyze why, in spite of its generality, the compensation may give the impression of a paradox or unfairness.
- Published
- 2012
- Full Text
- View/download PDF
111. Automated Test Assembly Using lp_Solve Version 5.5 in R
- Author
-
Diao, Qi and van der Linden, Wim J.
- Abstract
This article reviews the use of the software program lp_solve version 5.5 for solving mixed-integer automated test assembly (ATA) problems. The program is freely available under Lesser General Public License 2 (LGPL2). It can be called from the statistical language R using the lpSolveAPI interface. Three empirical problems are presented to demonstrate how to use the program and interface to (a) simultaneously assemble multiple test forms with absolute targets for their test information functions, (b) assemble shadow tests for computerized adaptive testing, and (c) assemble multistage tests using relative targets for their test information functions, all subject to various quantitative and categorical constraints. The results of this study indicate that it is now possible for researchers and testing organizations to implement ATA for small to moderately sized test assembly problems using free software. (Contains 3 tables and 4 figures.)
- Published
- 2011
- Full Text
- View/download PDF
112. Setting Time Limits on Tests
- Author
-
van der Linden, Wim J.
- Abstract
It is shown how the time limit on a test can be set to control the probability of a test taker running out of time before completing it. The probability is derived from the item parameters in the lognormal model for response times. Examples of curves representing the probability of running out of time on a test with given parameters as a function of the time limit are presented. Unlike the traditional methods of dealing with test speededness, which assess the degree of speededness after the test has been administered, the curves enables us to set a desired degree in advance. The method is demonstrated using an empirical data set. (Contains 4 figures.)
- Published
- 2011
- Full Text
- View/download PDF
113. Modeling Rule-Based Item Generation
- Author
-
Geerlings, Hanneke, Glas, Cees A. W., and van der Linden, Wim J.
- Abstract
An application of a hierarchical IRT model for items in families generated through the application of different combinations of design rules is discussed. Within the families, the items are assumed to differ only in surface features. The parameters of the model are estimated in a Bayesian framework, using a data-augmented Gibbs sampler. An obvious application of the model is computerized algorithmic item generation. Such algorithms have the potential to increase the cost-effectiveness of item generation as well as the flexibility of item administration. The model is applied to data from a non-verbal intelligence test created using design rules. In addition, results from a simulation study conducted to evaluate parameter recovery are presented.
- Published
- 2011
- Full Text
- View/download PDF
114. A Paradox in the Study of the Benefits of Test-Item Review
- Author
-
van der Linden, Wim J., Jeon, Minjeong, and Ferrara, Steve
- Abstract
According to a popular belief, test takers should trust their initial instinct and retain their initial responses when they have the opportunity to review test items. More than 80 years of empirical research on item review, however, has contradicted this belief and shown minor but consistently positive score gains for test takers who changed answers they found to be incorrect during review. This study reanalyzed the problem of the benefits of answer changes using item response theory modeling of the probability of an answer change as a function of the test taker's ability level and the properties of items. Our empirical results support the popular belief and reveal substantial losses due to changing initial responses for all ability levels. Both the contradiction of the earlier research and support of the popular belief are explained as a manifestation of Simpson's paradox in statistics.
- Published
- 2011
- Full Text
- View/download PDF
115. Local Linear Observed-Score Equating
- Author
-
Wiberg, Marie and van der Linden, Wim J.
- Abstract
Two methods of local linear observed-score equating for use with anchor-test and single-group designs are introduced. In an empirical study, the two methods were compared with the current traditional linear methods for observed-score equating. As a criterion, the bias in the equated scores relative to true equating based on Lord's (1980) definition of equity was used. The local method for the anchor-test design yielded minimum bias, even for considerable variation of the relative difficulties of the two test forms and the length of the anchor test. Among the traditional methods, the method of chain equating performed best. The local method for single-group designs yielded equated scores with bias comparable to the traditional methods. This method, however, appears to be of theoretical interest because it forces us to rethink the relationship between score equating and regression.
- Published
- 2011
- Full Text
- View/download PDF
116. Test Design and Speededness
- Author
-
van der Linden, Wim J.
- Abstract
A critical component of test speededness is the distribution of the test taker's total time on the test. A simple set of constraints on the item parameters in the lognormal model for response times is derived that can be used to control the distribution when assembling a new test form. As the constraints are linear in the item parameters, they can easily be included in a mixed integer programming model for test assembly. The use of the constraints is demonstrated for the problems of assembling a new test form to be equally speeded as a reference form, test assembly in which the impact of a change in the content specifications on speededness is to be neutralized, and the assembly of test forms with a revised level of speededness.
- Published
- 2011
- Full Text
- View/download PDF
117. Automated Test-Form Generation
- Author
-
van der Linden, Wim J. and Diao, Qi
- Abstract
In automated test assembly (ATA), the methodology of mixed-integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different cases are discussed: (i) computerized test forms in which the items are presented on a screen one at a time and only their optimal order has to be determined; (ii) paper forms in which the items need to be ordered and paginated and the typical goal is to minimize paper use; and (iii) published test forms with the same requirements but a more sophisticated layout (e.g., double-column print). For each case, a menu of possible test-form specifications is identified, and it is shown how they can be modeled as linear constraints using 0-1 decision variables. The methodology is demonstrated using two empirical examples.
- Published
- 2011
- Full Text
- View/download PDF
118. Local Observed-Score Equating with Anchor-Test Designs
- Author
-
van der Linden, Wim J. and Wiberg, Marie
- Abstract
For traditional methods of observed-score equating with anchor-test designs, such as chain and poststratification equating, it is difficult to satisfy the criteria of equity and population invariance. Their equatings are therefore likely to be biased. The bias in these methods was evaluated against a simple local equating method in which the anchor-test score was used as a proxy of the proficiency measured by the test and the equating was conditional on this score. The results showed substantial bias for the two traditional methods under a variety of conditions but much smaller bias for the local method. In addition, unlike the traditional methods, the local method appeared to be quite robust with respect to changes in the difficulty and accuracy of the two tests that were equated. But like these methods, it appeared to be sensitive to a decrease in the accuracy of the anchor test as a proxy of the ability measured by the tests. (Contains 1 table and 13 figures.)
- Published
- 2010
- Full Text
- View/download PDF
119. Statistical Tests of Conditional Independence between Responses and/or Response Times on Test Items
- Author
-
van der Linden, Wim J. and Glas, Cees A. W.
- Abstract
Three plausible assumptions of conditional independence in a hierarchical model for responses and response times on test items are identified. For each of the assumptions, a Lagrange multiplier test of the null hypothesis of conditional independence against a parametric alternative is derived. The tests have closed-form statistics that are easy to calculate from the standard estimates of the person parameters in the model. In addition, simple closed-form estimators of the parameters under the alternatives of conditional dependence are presented, which can be used to explore model modification. The tests were applied to a data set from a large-scale computerized exam and showed excellent power to detect even minor violations of conditional independence. (Contains 6 figures.)
- Published
- 2010
- Full Text
- View/download PDF
120. Linking Response-Time Parameters onto a Common Scale
- Author
-
van der Linden, Wim J.
- Abstract
Although response times on test items are recorded on a natural scale, the scale for some of the parameters in the lognormal response-time model (van der Linden, 2006) is not fixed. As a result, when the model is used to periodically calibrate new items in a testing program, the parameter are not automatically mapped onto a common scale. Several combinations of linking designs and procedures for the lognormal model are examined that do map parameter estimates onto a common scale. For each of the designs, the standard error of linking is derived. The results are illustrated using examples with simulated data.
- Published
- 2010
- Full Text
- View/download PDF
121. IRT Parameter Estimation with Response Times as Collateral Information
- Author
-
van der Linden, Wim J., Klein Entink, Rinke H., and Fox, Jean-Paul
- Abstract
Hierarchical modeling of responses and response times on test items facilitates the use of response times as collateral information in the estimation of the response parameters. In addition to the regular information in the response data, two sources of collateral information are identified: (a) the joint information in the responses and the response times summarized in the estimates of the second-level parameters and (b) the information in the posterior distribution of the response parameters given the response times. The latter is shown to be a natural empirical prior distribution for the estimation of the response parameters. Unlike traditional hierarchical item response theory (IRT) modeling, where the gain in estimation accuracy is typically paid for by an increase in bias, use of this posterior predictive distribution improves both the accuracy and the bias of IRT parameter estimates. In an empirical study, the improvements are demonstrated for the estimation of the person and item parameters in a three-parameter response model. (Contains 7 figures.)
- Published
- 2010
- Full Text
- View/download PDF
122. On Bias in Linear Observed-Score Equating
- Author
-
van der Linden, Wim J.
- Abstract
The traditional way of equating the scores on a new test form X to those on an old form Y is equipercentile equating for a population of examinees. Because the population is likely to change between the two administrations, a popular approach is to equate for a "synthetic population." The authors of the articles in this issue of the journal try to avoid the arbitrariness in the definition of a synthetic population by equating X to Y for the population G1 that takes the new form. The author has been happy to notice the authors' attention to the topic of bias in linear equating. The equating literature has been dominated by an interest in the standard error of equating, but bias is the primary criterion for evaluating the success of an equating. After all, equating is an attempt to remove the bias in the score on the new test form as an estimate of the score on the old form due to scale differences between them. A focus only on the standard error of equating prevents one from noticing any remaining bias in the equated scores, or even possible new bias added to them in the equating process. In this article, the author discusses a little further the issue of bias in linear equating. (Contains 1 figure.)
- Published
- 2010
- Full Text
- View/download PDF
123. Multidimensional Adaptive Testing with Optimal Design Criteria for Item Selection
- Author
-
Mulder, Joris and van der Linden, Wim J.
- Abstract
Several criteria from the optimal design literature are examined for use with item selection in multidimensional adaptive testing. In particular, it is examined what criteria are appropriate for adaptive testing in which all abilities are intentional, some should be considered as a nuisance, or the interest is in the testing of a composite of the abilities. Both the theoretical analyses and the studies of simulated data in this paper suggest that the criteria of A-optimality and D-optimality lead to the most accurate estimates when all abilities are intentional, with the former slightly outperforming the latter. The criterion of E-optimality showed occasional erratic behavior for this case of adaptive testing, and its use is not recommended. If some of the abilities are nuisances, application of the criterion of A [subscript s] -optimality (or D [subscript s] -optimality), which focuses on the subset of intentional abilities is recommended. For the measurement of a linear combination of abilities, the criterion of c-optimality yielded the best results. The preferences of each of these criteria for items with specific patterns of parameter values was also assessed. It was found that the criteria differed mainly in their preferences of items with different patterns of values for their discrimination parameters.
- Published
- 2009
- Full Text
- View/download PDF
124. Bayesian Checks on Outlying Response Times in Computerized Adaptive Testing
- Author
-
van der Linden, Wim J., Yanai, H., editor, Okada, A., editor, Shigemasu, K., editor, Kano, Y., editor, and Meulman, J. J., editor
- Published
- 2003
- Full Text
- View/download PDF
125. Predictive Control of Speededness in Adaptive Testing
- Author
-
van der Linden, Wim J.
- Abstract
An adaptive testing method is presented that controls the speededness of a test using predictions of the test takers' response times on the candidate items in the pool. Two different types of predictions are investigated: posterior predictions given the actual response times on the items already administered and posterior predictions that use the responses on these items as an additional source of information. In a simulation study with an adaptive test modeled after a test from the Armed Services Vocational Aptitude Battery, the effectiveness of the methods in removing differential speededness from the test was evaluated. (Contains 6 figures.)
- Published
- 2009
- Full Text
- View/download PDF
126. Conceptual Issues in Response-Time Modeling
- Author
-
van der Linden, Wim J.
- Abstract
Two different traditions of response-time (RT) modeling are reviewed: the tradition of distinct models for RTs and responses, and the tradition of model integration in which RTs are incorporated in response models or the other way around. Several conceptual issues underlying both traditions are made explicit and analyzed for their consequences. We then propose a hierarchical modeling framework consistent with the first tradition but with the integration of their parameter structures as a second level of modeling. Two examples of the framework are presented. Also, a fundamental equation is derived which relates the RTs on test items to the speed of the test taker and the time intensity of the items. The equation serves as the core of the RT model in the framework. Finally, empirical applications of the framework demonstrating its practical value are reviewed.
- Published
- 2009
- Full Text
- View/download PDF
127. A Bivariate Lognormal Response-Time Model for the Detection of Collusion between Test Takers
- Author
-
van der Linden, Wim J.
- Abstract
A bivariate lognormal model for the distribution of the response times on a test by a pair of test takers is presented. As the model has parameters for the item effects on the response times, its correlation parameter automatically corrects for the spuriousness in the observed correlation between the response times of different test takers because of variation in the time intensities of the items. This feature suggests using the model in a routine check of response-time patterns for possible collusion between test takers using an estimate of the correlation parameter or a statistical test of a hypothesis about it. Closed-form expressions for the maximum-likelihood estimations of the model parameters and a Lagrange multiplier test for the correlation parameter are presented. As in any type of statistical decision making, results from such procedures should be corroborated by evidence from other sources, for example, results from a response-based analysis or observations during the test session. The effectiveness of the model in removing the spuriousness from correlated response times is illustrated using empirical response-time data. (Contains 2 tables and 1 figure.)
- Published
- 2009
- Full Text
- View/download PDF
128. Bayesian Procedures for Identifying Aberrant Response-Time Patterns in Adaptive Testing
- Author
-
van der Linden, Wim J. and Guo, Fanmin
- Abstract
In order to identify aberrant response-time patterns on educational and psychological tests, it is important to be able to separate the speed at which the test taker operates from the time the items require. A lognormal model for response times with this feature was used to derive a Bayesian procedure for detecting aberrant response times. Besides, a combination of the response-time model with a regular response model in an hierarchical framework was used in an alternative procedure for the detection of aberrant response times, in which collateral information on the test takers' speed is derived from their response vectors. The procedures are illustrated using a data set for the Graduate Management Admission Test[R] (GMAT[R]). In addition, a power study was conducted using simulated cheating behavior on an adaptive test.
- Published
- 2008
- Full Text
- View/download PDF
129. Implementing Sympson-Hetter Item-Exposure Control in a Shadow-Test Approach to Constrained Adaptive Testing
- Author
-
Veldkamp, Bernard P. and van der Linden, Wim J.
- Abstract
In most operational computerized adaptive testing (CAT) programs, the Sympson-Hetter (SH) method is used to control the exposure of the items. Several modifications and improvements of the original method have been proposed. The Stocking and Lewis (1998) version of the method uses a multinomial experiment to select items. For severely constrained CAT, the list on which this experiment is conducted not only has to be of appropriate length but also needs to balance the composition of the test with respect to its specifications. In this article it is shown how the SH method of exposure control can be implemented in the shadow test approach. The method was applied to an adaptive test with 433 constraints on various attributes. Both a single and a multiple shadow-test approach were used to compare different list lengths for the SH method. (Contains 2 tables and 3 figures.)
- Published
- 2008
- Full Text
- View/download PDF
130. Using Response Times for Item Selection in Adaptive Testing
- Author
-
van der Linden, Wim J.
- Abstract
Response times on items can be used to improve item selection in adaptive testing provided that a probabilistic model for their distribution is available. In this research, the author used a hierarchical modeling framework with separate first-level models for the responses and response times and a second-level model for the distribution of the ability and speed parameters in the population of test takers. The framework allows the author to retrofit an empirical prior distribution for the ability parameter on each occurrence of a new response time. In an example with an adaptive version of the Law School Admission Test (LSAT), the author shows how this additional update of the posterior distribution of the ability leads to a substantial improvement of the ability estimator. Two ways of applying the procedure in real-world adaptive testing are discussed. (Contains 6 figures.)
- Published
- 2008
- Full Text
- View/download PDF
131. A Hierarchical Framework for Modeling Speed and Accuracy on Test Items
- Author
-
van der Linden, Wim J.
- Abstract
Current modeling of response times on test items has been strongly influenced by the paradigm of experimental reaction-time research in psychology. For instance, some of the models have a parameter structure that was chosen to represent a speed-accuracy tradeoff, while others equate speed directly with response time. Also, several response-time models seem to be unclear as to the level of parametrization they represent. A hierarchical framework for modeling speed and accuracy on test items is presented as an alternative to these models. The framework allows a "plug-and-play approach" with alternative choices of models for the response and response-time distributions as well as the distributions of their parameters. Bayesian treatment of the framework with Markov chain Monte Carlo (MCMC) computation facilitates the approach. Use of the framework is illustrated for the choice of a normal-ogive response model, a lognormal model for the response times, and multivariate normal models for their parameters with Gibbs sampling from the joint posterior distribution.
- Published
- 2007
- Full Text
- View/download PDF
132. Conditional Item-Exposure Control in Adaptive Testing Using Item-Ineligibility Probabilities
- Author
-
van der Linden, Wim J. and Veldkamp, Bernard P.
- Abstract
Two conditional versions of the exposure-control method with item-ineligibility constraints for adaptive testing in van der Linden and Veldkamp (2004) are presented. The first version is for unconstrained item selection, the second for item selection with content constraints imposed by the shadow-test approach. In both versions, the exposure rates of the items are controlled using probabilities of item ineligibility given [theta] that adapt the exposure rates automatically to a goal value for the items in the pool. In an extensive empirical study with an adaptive version of the Law School Admission Test, the authors show how the method can be used to drive conditional exposure rates below goal values as low as 0.025. Obviously, the price to be paid for minimal exposure rates is a decrease in the accuracy of the ability estimates. This trend is illustrated with empirical data. (Contains 6 figures and 1 table.)
- Published
- 2007
- Full Text
- View/download PDF
133. Detecting Differential Speededness in Multistage Testing
- Author
-
van der Linden, Wim J., Breithaupt, Krista, Chuah, Siang Chee, and Zhang, Yanwei
- Abstract
A potential undesirable effect of multistage testing is differential speededness, which happens if some of the test takers run out of time because they receive subtests with items that are more time intensive than others. This article shows how a probabilistic response-time model can be used for estimating differences in time intensities and speed between subtests and test takers and detecting differential speededness. An empirical data set for a multistage test in the computerized CPA Exam was used to demonstrate the procedures. Although the more difficult subtests appeared to have items that were more time intensive than the easier subtests, an analysis of the residual response times did not reveal any significant differential speededness because the time limit appeared to be appropriate. In a separate analysis, within each of the subtests, we found minor but consistent patterns of residual times that are believed to be due to a warm-up effect, that is, use of more time on the initial items than they actually need.
- Published
- 2007
- Full Text
- View/download PDF
134. Speededness and Adaptive Testing
- Author
-
van der Linden, Wim J. and Xiong, Xinhui
- Published
- 2013
- Full Text
- View/download PDF
135. Cross-Validating Item Parameter Estimation in Adaptive Testing
- Author
-
van der Linden, Wim J., Glas, Cees A. W., Bickel, P., editor, Diggle, P., editor, Fienberg, S., editor, Krickeberg, K., editor, Olkin, I., editor, Wermuth, N., editor, Zeger, S., editor, Boomsma, Anne, editor, van Duijn, Marijtje A. J., editor, and Snijders, Tom A. B., editor
- Published
- 2001
- Full Text
- View/download PDF
136. Detecting Answer Copying when the Regular Response Process Follows a Known Response Model
- Author
-
van der Linden, Wim J. and Sotaridona, Leonardo
- Abstract
A statistical test for detecting answer copying on multiple-choice items is presented. The test is based on the exact null distribution of the number of random matches between two test takers under the assumption that the response process follows a known response model. The null distribution can easily be generalized to the family of distributions of the number of random matches under the alternative hypothesis of answer copying. It is shown how this information can be used to calculate such features as the maximum, minimum, and expected values of the power function of the test. For the case of the nominal response model, the test is an alternative to the one based on statistic [omega]. The differences between the two tests are discussed and illustrated using empirical results.
- Published
- 2006
137. A Lognormal Model for Response Times on Test Items
- Author
-
van der Linden, Wim J.
- Abstract
A lognormal model for the response times of a person on a set of test items is investigated. The model has a parameter structure analogous to the two-parameter logistic response models in item response theory, with a parameter for the speed of each person as well as parameters for the time intensity and discriminating power of each item. It is shown how these parameters can be estimated by a Markov chain Monte Carlo method (Gibbs sampler). The method was used to analyze response times for the adaptive version of a test from the Armed Services Vocational Aptitude Battery. The same data set was used to test the validity of the model against a normal model using posterior predictive checks on the response times. The lognormal model showed an excellent fit to the data, whereas the normal model seemed unable to allow for a characteristic skewness of the response time distributions. The addition of an equality constraint on the discrimination parameters led only to a slight loss of fit. The potential use of the model for improving the daily practice of testing is indicated.
- Published
- 2006
138. A Strategy for Optimizing Item-Pool Management
- Author
-
Ariel, Adelaide, van der Linden, Wim J., and Veldkamp, Bernard P.
- Abstract
Item-pool management requires a balancing act between the input of new items into the pool and the output of tests assembled from it. A strategy for optimizing item-pool management is presented that is based on the idea of a periodic update of an optimal blueprint for the item pool to tune item production to test assembly. A simulation study with scenarios involving different levels of quality of the initial item pool, item writing, and management for a previous item pool from the Law School Admission Test (LSAT) showed that good item-pool management had about the same main effects on the item-writing costs and the number of feasible tests as good item writing, but the two factors showed strong interaction effects.
- Published
- 2006
- Full Text
- View/download PDF
139. Equating Scores from Adaptive to Linear Tests
- Author
-
van der Linden, Wim J.
- Abstract
Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test for a population of test takers. The two local methods were generally best. Surprisingly, the TCF method performed slightly worse than the equipercentile method. Both methods showed strong bias and uniformly large inaccuracy, but the TCF method suffered from extra error due to the lower asymptote of the test characteristic function. It is argued that the worse performances of the two methods are a consequence of the fact that they use a single equating transformation for an entire population of test takers and therefore have to compromise between the individual score distributions. (Contains 4 figures.)
- Published
- 2006
- Full Text
- View/download PDF
140. Equating Error in Observed-Score Equating
- Author
-
van der Linden, Wim J.
- Abstract
Traditionally, error in equating observed scores on two versions of a test is defined as the difference between the transformations that equate the quantiles of their distributions in the sample and population of test takers. But it is argued that if the goal of equating is to adjust the scores of test takers on one version of the test to make them indistinguishable from those on another, equating error should be defined as the degree to which the equated scores realize this goal. Two equivalent definitions of equating error based on this criterion are formulated. It is shown how these definitions allow one to evaluate such key quantities as the bias and mean squared error of any equating method if the tests fit a unidimensional response model. Several alternative applications of the ideas for the case in which the tests do not fit a unidimensional response model are discussed. (Contains 6 figures.)
- Published
- 2006
- Full Text
- View/download PDF
141. Detecting Answer Copying Using the Kappa Statistic
- Author
-
Sotaridona, Leonardo S., van der Linden, Wim J., and Meijer, Rob R.
- Abstract
A statistical test for detecting answer copying on multiple-choice tests based on Cohen's kappa is proposed. The test is free of any assumptions on the response processes of the examinees suspected of copying and having served as the source, except for the usual assumption that these processes are probabilistic. Because the asymptotic null and alternative distributions of the kappa statistic are derived under the assumption of common marginal probabilities for all items, a recoding of the item alternatives is proposed to approximate this case. The results from a simulation study in this article show that under this recoding, the test approximates its nominal Type I error rates and has promising power functions. (Contains 2 figures and 6 tables.)
- Published
- 2006
- Full Text
- View/download PDF
142. Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests
- Author
-
van der Linden, Wim J., Ariel, Adelaide, and Veldkamp, Bernard P.
- Abstract
Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content constraints, and/or have unfavorable exposure rates. Although at first sight somewhat counterintuitive, it is shown that if the CAT pool is assembled as a set of linear test forms, undesirable correlations can be broken down effectively. It is proposed to assemble such pools using a mixed integer programming model with constraints that guarantee that each test meets all content specifications and an objective function that requires them to have maximal information at a well-chosen set of ability values. An empirical example with a previous master pool from the Law School Admission Test (LSAT) yielded a CAT with nearly uniform bias and mean-squared error functions for the ability estimator and item-exposure rates that satisfied the target for all items in the pool. (Contains 6 figures.)
- Published
- 2006
143. A Comparison of Item-Selection Methods for Adaptive Tests with Content Constraints
- Author
-
van der Linden, Wim J.
- Abstract
In test assembly, a fundamental difference exists between algorithms that select a test sequentially or simultaneously. Sequential assembly allows us to optimize an objective function at the examinee's ability estimate, such as the test information function in computerized adaptive testing. But it leads to the non-trivial problem of how to realize a set of content constraints on the test -- a problem more naturally solved by a simultaneous item-selection method. Three main item-selection methods in adaptive testing offer solutions to this dilemma. The spiraling method moves item selection across categories of items in the pool proportionally to the numbers needed from them. Item selection by the weighted-deviations method (WDM) and the shadow test approach (STA) is based on projections of the future consequences of selecting an item. These two methods differ in that the former calculates a projection of a weighted sum of the attributes of the eventual test and the latter a projection of the test itself. The pros and cons of these methods are analyzed. An empirical comparison between the WDM and STA was conducted for an adaptive version of the Law School Admission Test (LSAT), which showed equally good item-exposure rates but violations of some of the constraints and larger bias and inaccuracy of the ability estimator for the WDM.
- Published
- 2005
- Full Text
- View/download PDF
144. A Statistical Test for Detecting Answer Copying on Multiple-Choice Tests
- Author
-
van der Linden, Wim J. and Sotaridona, Leonardo
- Abstract
A statistical test for the detection of answer copying on multiple-choice tests is presented. The test is based on the idea that the answers of examinees to test items may be the result of three possible processes: (1) knowing, (2) guessing, and (3) copying, but that examinees who do not have access to the answers of other examinees can arrive at their answers only through the first two processes. This assumption leads to a distribution for the number of matched incorrect alternatives between the examinee suspected of copying and the examinee believed to be the source that belongs to a family of "shifted binomials." Power functions for the tests for several sets of parameter values are analyzed. An extension of the test to include matched numbers of correct alternatives would lead to improper statistical hypotheses.
- Published
- 2004
- Full Text
- View/download PDF
145. Constructing Rotating Item Pools for Constrained Adaptive Testing
- Author
-
Ariel, Adelaide, Veldkamp, Bernard P., and van der Linden, Wim J.
- Abstract
Preventing items in adaptive testing from being over- or underexposed is one of the main problems in computerized adaptive testing. Though the problem of overexposed items can be solved using a probabilistic item-exposure control method, such methods are unable to deal with the problem of underexposed items. Using a system of rotating item pools, on the other hand, is a method that potentially solves both problems. In this method, a master pool is divided into (possibly overlapping) smaller item pools, which are required to have similar distributions of content and statistical attributes. These pools are rotated among the testing sites to realize desirable exposure rates for the items. A test assembly model, motivated by Gulliksen's matched random subtests method, was explored to help solve the problem of dividing a master pool into a set of smaller pools. Different methods to solve the model are proposed. An item pool from the Law School Admission Test was used to evaluate the performances of computerized adaptive tests from systems of rotating item pools constructed using these methods.
- Published
- 2004
- Full Text
- View/download PDF
146. Setting Standards and Detecting Intrajudge Inconsistency Using Interdependent Evaluation of Response Alternatives
- Author
-
Chang, Lei, Van Der Linden, Wim J., and Vos, Hans J.
- Abstract
This article introduces a new test-centered standard-setting method as well as a procedure to detect intrajudge inconsistency of the method. The standard-setting method that is based on interdependent evaluations of alternative responses has judges closely evaluate the process that examinees use to solve multiple-choice items. The new method is analyzed against existing methods, particularly the Nedelsky and Angoff methods. Empirical results from three different experiments confirm the hypothesis that standards set by the new method are higher than those of the Nedelsky but lower than those of the Angoff method. The procedure for detecting intrajudge inconsistency is based on residual diagnosis of the judgments, which makes it possible to identify the sources of inconsistencies in the items, response alternatives, and/or judges. An empirical application of the procedure in an experiment with the new standard-setting method suggests that the method is internally consistent and has also revealed an interesting difference between residuals for the correct and incorrect alternatives.
- Published
- 2004
- Full Text
- View/download PDF
147. Constraining Item Exposure in Computerized Adaptive Testing with Shadow Tests
- Author
-
van der Linden, Wim J. and Veldkamp, Bernard P.
- Abstract
Item-exposure control in computerized adaptive testing is implemented by imposing item-ineligibility constraints on the assembly process of the shadow tests. The method resembles Sympson and Hetter's (1985) method of item-exposure control in that the decisions to impose the constraints are probabilistic. The method does not, however, require time-consuming simulation studies to set values for control parameters before the operational use of the test. Instead, it can set the probabilities of item ineligibility adaptively during the test using the actual item-exposure rates. An empirical study using an item pool from the Law School Admission Test showed that application of the method yielded perfect control of the item-exposure rates and had negligible impact on the bias and mean-squared error functions of the ability estimator.
- Published
- 2004
148. Optimizing Balanced Incomplete Block Designs for Educational Assessments
- Author
-
van der Linden, Wim J., Veldkamp, Bernard P., and Carlson, James E.
- Abstract
A popular design in large-scale educational assessments as well as any other type of survey is the balanced incomplete block design. The design is based on an item pool split into a set of blocks of items that are assigned to sets of "assessment booklets." This article shows how the problem of calculating an optimal balanced incomplete block design can be formulated as a problem in combinatorial optimization. Several examples of structural and practical requirements for balanced incomplete block designs are shown to be linear constraints on the optimization problem. In addition, a variety of possible objective functions to optimize the design is discussed. The technique is demonstrated using the 1996 Grade 8 Mathematics National Assessment of Educational Progress (NAEP) as a case study.
- Published
- 2004
- Full Text
- View/download PDF
149. Bayesian Checks on Cheating on Tests
- Author
-
van der Linden, Wim J. and Lewis, Charles
- Published
- 2015
- Full Text
- View/download PDF
150. Optimal Bayesian Adaptive Design for Test-Item Calibration
- Author
-
van der Linden, Wim J. and Ren, Hao
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.