60 results on '"van der Linden, Wim"'
Search Results
2. Constraining Item Exposure in Computerized Adaptive Testing with Shadow Tests. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Veldkamp, Bernard P.
- Abstract
Item-exposure control in computerized adaptive testing is implemented by imposing item-ineligibility constraints on the assembly process of the shadow tests. The method resembles J. Sympson and R. Hetter's (1985) method of item-exposure control in that the decisions to impose the constraints are probabilistic. However, the method does not require time consuming simulation studies to set values for control parameters prior to the operational use of the test. Instead, the probabilities of item ineligibility can be set "on the fly" using an adaptive procedure based on the actual item-exposure rates. An empirical study using an item pool from the Law School Admission Test showed that application of the method yielded perfect control of the item exposure rates and had negligible impact on the bias and mean squared error functions of the ability estimates. (Contains 5 figures and 26 references.) (Author/SLD)
- Published
- 2002
3. A Statistical Test for Detecting Answer Copying on Multiple-Choice Tests. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Sotaridona, Leonardo S.
- Abstract
A statistical test for the detection of answer copying on multiple-choice tests is presented. The test is based on the idea that answers of examinees to test items may be the results of three possible processes: (1) knowing; (2) guessing; and (3) copying. Examinees who do not have access to the answers of other examinees can arrive at their answers only through the first two processes. This assumption leads to a distribution for the number of matched incorrect alternatives between the examinee suspected of copying and the examinee believed to be the source that belongs to a family of "shifted binomials." Power functions for the tests for several sets of parameter values are analyzed. It is shown that an extension of the test to include matched numbers of correct alternatives would lead to improper statistical hypotheses. (Author/SLD)
- Published
- 2002
4. Constructing Rotating Item Pools for Constrained Adaptive Testing. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., Ariel, Adelaide, Veldkamp, Bernard P., and van der Linden, Wim J.
- Abstract
Preventing items in adaptive testing from being over- or underexposed is one of the main problems in computerized adaptive testing. Though the problem of overexposed items can be solved using a probabilistic item-exposure control method, such methods are unable to deal with the problem of underexposed items. Using a system of rotating item pools, on the other hand, is a method that potentially solves both problems. In this method, a master pool is divided into (possibly overlapping) smaller item pools that are required to have similar distributions of content and statistical attributes. These pools are rotated among the testing sites to realize desirable exposure rates for the items. In this paper, a test assembly model for the problem of dividing a master pool into a set of smaller pools is presented. The model was motivated by Gullicksen's (1950) matched random subtests method. Different methods to solve the model are proposed. An item pool from the Law School Admission Test was used to evaluate the performances of computerized adaptive tests from systems of rotating item pools constructed using these methods. (Contains 6 figures and 14 references.) (Author/SLD)
- Published
- 2002
5. Estimating Equating Error in Observed-Score Equating. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology. and van der Linden, Wim J.
- Abstract
Traditionally, error in equating observed scores on two versions of a test is defined as the difference between the transformations that equate the quantiles of their distributions in the sample and in the population of examinees. This definition underlies, for example, the well-known approximation to the standard error of equating by Lord (1982). However, it is argued that if the goal of equating is to adjust the scores of examinees on one version of the test to make them indistinguishable from those on another, equating error should be defined as the degree to which the equated scores realize this goal. Two equivalent definitions of equating error based on this criterion are formulated. These definitions can be used to evaluated existing equating methods and derive new methods if the response data fit an item-response theory model. An evaluation of the traditional equipercentile equating method and two new conditional methods for tests from a previous item pool of the Law School Admission Test showed that, under a variety of conditions, the equipercentile method tends to result in a serious bias in the equated scores, while the new methods are practically free of any bias. (Contains 5 figures and 14 references.) (Author/SLD)
- Published
- 2002
6. Some Alternatives to Sympson-Hetter Item-Exposure Control in Computerized Adaptive Testing. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology. and van der Linden, Wim J.
- Abstract
The Sympson and Hetter (SH; J. Sympson and R. Hetter; 1985; 1997) method is a method of probabilistic item exposure control in computerized adaptive testing. Setting its control parameters to admissible values requires an iterative process of computer simulations that has been found to be time consuming, particularly if the parameters have to be set conditional on a realistic set of values for the examinees ability parameter. Formal properties of the method are identified that help us explain why this iterative process can be slow and does not guarantee admissibility. In addition, some alternatives to the SH method are introduced. The behavior of these alternatives was estimated for an adaptive test from an item pool from the Law School Admission Test. Two of the alternatives showed attractive behavior and converged smoothly to admissibility for all items in a relatively small number of iteration steps. (Contains 4 figures and 14 references.) (Author/SLD)
- Published
- 2002
7. Mathematical-Programming Approaches to Test Item Pool Design. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., Veldkamp, Bernard P., van der Linden, Wim J., and Ariel, Adelaide
- Abstract
This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing and thus to increase both measurement precision and validity. The approach consists of the application of mathematical programming techniques to calculate optimal blueprints for item pools. These blueprints can be used to guide the item-writing process. Three different types of design problems are discussed: (1) item pools for linear tests; (2) item pools for computerized adaptive testing (CAT); and (3) systems of rotating pools for CAT. The paper concludes with an empirical example of the problem of designing a system or rotating item pools for CAT. (Contains 2 tables and 22 references.) (Author/SLD)
- Published
- 2002
8. A Model for Optimal Constrained Adaptive Testing. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.
- Author
-
Law School Admission Council, Princeton, NJ., van der Linden, Wim J., and Reese, Lynda M.
- Abstract
A model for constrained computerized adaptive testing is proposed in which the information on the test at the ability estimate is maximized subject to a large variety of possible constraints on the contents of the test. At each item-selection step, a full test is first assembled to have maximum information at the current ability estimate fixing the items previously administered. Then the item with maximum information is selected from the test. All test assembly is optimal due to the use of a linear programming model that is automatically updated to allow for the attributes of the items already administered as well as the new value of the ability estimator. A simulation study using a pool of 753 items from the Law School Admission Test (LSAT) shows that for adaptive tests of realistic lengths the ability estimator did not suffer any loss of efficiency from the presence of 433 constraints on the item selection process. (Contains 2 figures, 3 tables, and 35 references.) (Author/SLD)
- Published
- 2001
9. Computerized Adaptive Testing with Item Clones. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., Glas, Cees A. W., and van der Linden, Wim J.
- Abstract
To reduce the cost of item writing and to enhance the flexibility of item presentation, items can be generated by item-cloning techniques. An important consequence of cloning is that it may cause variability on the item parameters. Therefore, a multilevel item response model is presented in which it is assumed that the item parameters of a three-parameter logistic model describing response behavior are sampled from a multivariate normal distribution associated with a parent item. In this approach to item calibration, only distributions of item parameters are estimated. Therefore, the savings in item calibration costs for the item cloning model are potentially enormous. A marginal maximum likelihood and a Bayesian item calibration procedure are formulated. Further, a two-stage item selection procedure for computerized adaptive testing is presented. First, a set of items cloned from the same parent item is selected to be optimal at the ability estimate. Second, a random item from this set is administered. Simulation studies illustrate the accuracy of the item pool calibration and ability estimation procedures. An appendix describes Bayes model estimates for the item cloning model. (Contains 21 references.) (Author/SLD)
- Published
- 2001
10. Modeling Variability in Item Parameters in Item Response Models. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., Glas, Cees A. W., and van der Linden, Wim J.
- Abstract
In some areas of measurement item parameters should not be modeled as fixed but as random. Examples of such areas are: item sampling, computerized item generation, measurement with substantial estimation error in the item parameter estimates, and grouping of items under a common stimulus or in a common context. A hierarchical version of the three-parameter normal ogive model is used to model parameter variability in multiple populations of items. Two Bayesian procedures for the estimation of the parameter are given. The first method produces an estimate of the posterior distribution using a Markov Chain Monte Carlo method (Gibbs sampler); the second procedure produces a Bayes modal estimate. It is shown that the procedure using the Gibbs sampler breaks down if for some of the random item parameters the sampling design yields only one response. However, in this case, marginalization over the item parameters does result in a feasible estimation procedure. Some numerical examples are given. (Contains 2 tables, 4 figures, and 36 references.) (Author/SLD)
- Published
- 2001
11. Computerized Test Construction. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology. and van der Linden, Wim J.
- Abstract
This report contains a review of procedures for computerized assembly of linear, sequential, and adaptive tests. The common approach to these test assembly problems is to view them as instances of constrained combinatorial optimization. For each testing format, several potentially useful objective functions and types of constraints are discussed. (Contains 14 references.) (Author/SLD)
- Published
- 2001
12. Implementing Content Constraints in Alpha-Stratified Adaptive Testing Using a Shadow Test Approach. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Chang, Hua-Hua
- Abstract
The methods of alpha-stratified adaptive testing and constrained adaptive testing with shadow tests are combined in this study. The advantages are twofold. First, application of the shadow test allows the researcher to implement any type of constraint on item selection in alpha-stratified adaptive testing. Second, the result yields a simple set of constraints that can be used in any application of the shadow test approach to reduce overexposure and underexposure of the items in the pool. An example from the Law School Admission Test is used to demonstrate the advantages. (Contains 20 references and 3 figures.) (Author/SLD)
- Published
- 2001
13. An Integer-Programming Approach to Item Pool Design. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.
- Author
-
Law School Admission Council, Princeton, NJ., van der Linden, Wim J., Veldkamp, Bernard P., and Reese, Lynda M.
- Abstract
Presented is an integer-programming approach to item pool design that can be used to calculate an optimal blueprint for an item pool to support an existing testing program. The results are optimal in the sense that they minimize the efforts involved in actually producing the items as revealed by current item writing patterns. Also presented is an adaptation of the models for use as a set of monitoring tools in item pool management. The approach is demonstrated empirically for an item pool designed for the Law School Admission Test. (Contains 2 tables and 30 references.) (Author/SLD)
- Published
- 2000
14. Optimal Stratification of Item Pools in a-Stratified Computerized Adaptive Testing. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology. and van der Linden, Wim J.
- Abstract
A method based on 0-1 linear programming (LP) is presented to stratify an item pool optimally for use in "alpha"-stratified adaptive testing. Because the 0-1 LP model belongs to the subclass of models with a network-flow structure, efficient solutions are possible. The method is applied to a previous item pool from the computerized adaptive testing (CAT) version of the Graduate Record Examinations Quantitative Test. The results indicate that the new method performs well in practical situations. It improves item exposure control, reduces the mean squared error in the theta estimates, and increases test reliability. (Contains 2 figures and 25 references.) (Author/SLD)
- Published
- 2000
15. Detecting Intrajudge Inconsistency in Standard Setting Using Test Items with a Selected-Response Format. Research Report.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., Vos, Hans J., and Chang, Lei
- Abstract
In judgmental standard setting experiments, it may be difficult to specify subjective probabilities that adequately take the properties of the items into account. As a result, these probabilities are not consistent with each other in the sense that they do not refer to the same borderline level of performance. Methods to check standard setting data for intrajudge inconsistencies are thus of paramount importance to setting meaningful standards. This paper presents a method of consistency analysis for standard setting experiments in which judges specify probabilities for each response alternative of the items. The method is based on a residual diagnosis of the subjective probabilities under the hypothesis of a consistent judge to the probabilities. An empirical example shows how the method can be used to identify sources of inconsistency in response alternatives, items, or judges. (Contains 19 references.) (SLD)
- Published
- 2000
16. Designing Item Pools for Computerized Adaptive Testing. Research Report 99-03.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., Veldkamp, Bernard P., and van der Linden, Wim J.
- Abstract
A method of item pool design is proposed that uses an optimal blueprint for the item pool calculated from the test specifications. The blueprint is a document that specifies the attributes that the items in the computerized adaptive test (CAT) pool should have. The blueprint can be a starting point for the item writing process, and it can be used to assemble item pools in a system of rotating pools from a master pool. The blueprint is also useful for item pool maintenance. Designing the blueprint begins with analyzing the specifications for the CAT, a step amounting to the formation of a classification table involving categorization of quantitative item attributes. Using this table, an integer programming model for the assembly of the shadow tests in the CAT simulation is constructed. An estimate of the ability distribution of the identified population of examinees is obtained, and the CAT simulation is carried out using the integer programming model for the shadow tests and sampling simulees from the ability distribution. The blueprint is then calculated from the counts of the number of items from the cells in the classification table. The best way to implement the blueprint is in a sequential fashion recalculating the blueprint after a certain portion of the items has actually been written and tested so that their attribute values are known. (Contains 16 references and a list of University of Twente research reports.) (SLD)
- Published
- 1999
17. Adaptive Testing with Equated Number-Correct Scoring. Research Report 99-02.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology. and van der Linden, Wim J.
- Abstract
A constrained computerized adaptive testing (CAT) algorithm is presented that automatically equates the number-correct scores on adaptive tests. The algorithm can be used to equate number-correct scores across different administrations of the same adaptive test as well as to an external reference test. The constraints are derived from a set of conditions on item response functions that guarantees the observed number-correct score distributions on two forms to be identical (W. van der Linden and R. Luecht, 1998). An item pool from the Law School Admission Test is used to compare the results of the algorithm with those for traditional observed-score equating of ability estimates to number-correct scores as well as the transformation to predicted number-correct scores through the test characteristic function. The effects of the constraints on the statistical properties of the ability estimator are examined. (Contains 18 references, 4 figures, and a list of University of Twente research reports.) (Author/SLD)
- Published
- 1999
18. Calculating Balanced Incomplete Block Design for Educational Assessments.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Carlson, James E.
- Abstract
A popular design in large-scale educational assessments is the balanced incomplete block design. The design assumes that the item pool is split into a set of blocks of items that are assigned to assessment booklets. This paper shows how the technique of 0-1 linear programming can be used to calculate a balanced incomplete block design. Several structural as well as practical constraints on this type of design are formulated as linear (in)equalities. In addition, possible objective functions to optimize the design are discussed. The technique is demonstrated using an item pool from the 1996 Grade 8 Mathematics National Assessment of Educational Progress Project. (Contains 2 tables and 16 references.) (Author/SLD)
- Published
- 1999
19. An Integer Programming Approach to Item Pool Design. Research Report 98-11.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., Veldkamp, Bernard P., and Reese, Lynda M.
- Abstract
An integer programming approach to item pool design is presented that can be used to calculate an optimal blueprint for an item pool to support an existing testing program. The results are optimal in the sense that they minimize the efforts involved in actually producing the items as revealed by current item writing patterns. Also, an adaptation of the models for use as a set of monitoring tools in item pool management is presented. The approach is demonstrated empirically for an item pool designed for the Law School Admission Test (LSAT). (Contains 2 tables and 30 references.) (Author)
- Published
- 1998
20. Optimal Assembly of Tests with Item Sets. Research Report 98-12.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology. and van der Linden, Wim J.
- Abstract
Six methods for assembling tests from a pool with an item-set structure are presented. All methods are computational and based on the technique of mixed integer programming. The methods are evaluated using such criteria as the feasibility of their linear programming problems and their expected solution times. The methods are illustrated for two item pools with a set structure from the Law School Admission Test (LSAT). The methods are: (1) simultaneous selection of items and sets; (2) simultaneous selection with pivot items; (3) all items per set selected; (4) decision variables for subsets (power set approach); (5) two-stage selection; and (6) two-stage selection (alternative version). (Contains 3 tables, 2 figures, and 12 references.) (Author/SLD)
- Published
- 1998
21. Using Response-Time Constraints in Item Selection To Control for Differential Speededness in Computerized Adaptive Testing. Research Report 98-06.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., Scrams, David J., and Schnipke, Deborah L.
- Abstract
An item-selection algorithm to neutralize the differential effects of time limits on scores on computerized adaptive tests is proposed. The method is based on a statistical model for the response-time distributions of the examinees on items in the pool that is updated each time a new item has been administered. Predictions from the model are used as constraints in a 0-1 linear programming (LP) model for constrained adaptive testing that maximizes the accuracy of the ability estimator. The method is demonstrated empirically using an item pool from the Armed Services Vocational Aptitude Battery and the responses of 38,357 examinees. The empirical example suggests that the algorithm is able to reduce the speededness of the test for the examinees who otherwise would have suffered from the time limit. Also, the algorithm did not seem to introduce any differential effects on the statistical properties of the theta estimator. (Contains 9 figures and 14 references.) (SLD)
- Published
- 1998
22. Capitalization on Item Calibration Error in Adaptive Testing. Research Report 98-07.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Glas, Cees A. W.
- Abstract
In adaptive testing, item selection is sequentially optimized during the test. Since the optimization takes place over a pool of items calibrated with estimation error, capitalization on these errors is likely to occur. How serious the consequences of this phenomenon are depends not only on the distribution of the estimation errors in the pool or the ratio of the test length to the pool size, but also on the structure of the item selection criterion used. A simulation study demonstrated the existence of the phenomenon empirically. It also showed that its effect on the errors in the ability estimates interacts strongly with the distribution of the items in the pool. (Contains 1 table, 7 figures, and 15 references.) (Author)
- Published
- 1998
23. Optimal Assembly of Educational and Psychological Tests, with a Bibliography. Research Report 98-05.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology. and van der Linden, Wim J.
- Abstract
The advent of computers in educational and psychological measurement has lead to the need for algorithms for optimal assembly of tests from item banks. This paper reviews the literature on optimal test assembly and introduces the contributions to this report on the topic. Four different approaches to computerized test assembly are discussed: heuristic-based test assembly; 0-1 linear programming; network-flow programming; and an optimal design approach. In addition, applications of these methods to a large variety of problems are examined, including: (1) item response theory-based test assembly; (2) classical test assembly; (3) assembling multiple test forms; (4) item matching; (5) observed-score equating; (6) constrained adaptive testing; (7) assembling tests with item sets; (8) item pool design; and (9) assembling tests with multiple traits. This paper concludes with a 90-item bibliography on test assembly. (Contains three figures and seven references.) (Author/SLD)
- Published
- 1998
24. Observed-Score Equating as a Test Assembly Problem.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Luecht, Richard M.
- Abstract
A set of linear conditions on the item response functions is derived that guarantees identical observed-score distributions on two test forms. The conditions can be added as constraints to a linear programming model for test assembly that assembles a new test form to have an observed-score distribution optimally equated to the distribution of the old form. For a well-designed item pool, use of the model results into observed-score pre-equating and prevents the necessity of post hoc equating by a conventional observed-score equating method. An empirical example illustrates the use of the model for an item pool from the Law School Admission Test (LSAT). (Contains 6 figures and 33 references.) (Author/SLD)
- Published
- 1997
25. A Model for Optimal Constrained Adaptive Testing.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Reese, Lynda M.
- Abstract
A model for constrained computerized adaptive testing is proposed in which the information in the test at the ability estimate is maximized subject to a large variety of possible constraints on the contents of the test. At each item-selection step, a full test is first assembled to have maximum information at the current ability estimate fixing the items previously administered. Then the item with maximum information is selected from the test. All test assembly is optimal due to the use of a linear programming model that is automatically updated to allow for the attributes of items already administered as well as the new value of the ability estimator. A simulation study using a pool of 753 items from the Law School Admission Test (LSAT) showed that for adaptive tests of realistic lengths the ability estimator did not suffer any loss of efficiency from the presence of 433 constraints on the item selection process. (Contains 3 tables, 2 figures, and 35 references.) (Author/SLD)
- Published
- 1997
26. Multidimensional Adaptive Testing with a Minimum Error-Variance Criterion.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology. and van der Linden, Wim J.
- Abstract
The case of adaptive testing under a multidimensional logistic response model is addressed. An adaptive algorithm is proposed that minimizes the (asymptotic) variance of the maximum-likelihood (ML) estimator of a linear combination of abilities of interest. The item selection criterion is a simple expression in closed form. In addition, it is shown how the algorithm can be adapted if the interest is in a test with a "simple information structure." The statistical properties of the adaptive ML estimator are demonstrated for a two-dimensional item pool with several linear combinations of the two abilities. (Contains 1 figure and 15 references.) (Author/SLD)
- Published
- 1997
27. Simultaneous Assembly of Multiple Test Forms.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Adema, Jos J.
- Abstract
An algorithm for the assembly of multiple test forms is proposed in which the multiple-form problem is reduced to a series of computationally less intensive two-form problems. At each step one form is assembled to its true specifications; the other form is a dummy assembled only to maintain a balance between the quality of the current form and the remaining forms. It is shown how the method can be implemented using the technique of 0-1 linear programming. Two empirical examples using a former item pool from the Law School Admission Test (LSAT) are given - one in which a set of parallel forms is assembled and another in which the targets for the information functions of the forms are shifted systematically. (Contains 1 table, 3 figures, and 16 references.) (Author/SLD)
- Published
- 1997
28. A Procedure for Empirical Initialization of Adaptive Testing Algorithms.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology. and van der Linden, Wim J.
- Abstract
In constrained adaptive testing, the numbers of constraints needed to control the content of the tests can easily run into the hundreds. Proper initialization of the algorithm becomes a requirement because the presence of large numbers of constraints slows down the convergence of the ability estimator. In this paper, an empirical initialization of the algorithm is proposed based on the statistical relation between the ability variable and background variables known prior to the test. The relation is modeled using a two-parameter logistic version of an item response theory (IRT) model with manifest predictors discussed in A. H. Zwinderman (1991). An empirical example shows how an (incomplete) sample of response data and data on background variables can be used to derive an initial ability estimate or an empirical prior distribution for the ability parameter. An appendix gives the derivation of an equation for the estimator. (Contains 12 references.) (Author/SLD)
- Published
- 1997
29. Bayesian Item Selection Criteria for Adaptive Testing. Research Report 96-01.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology. and van der Linden, Wim J.
- Abstract
R. J. Owen (1975) proposed an approximate empirical Bayes procedure for item selection in adaptive testing. The procedure replaces the true posterior by a normal approximation with closed-form expressions for its first two moments. This approximation was necessary to minimize the computational complexity involved in a fully Bayesian approach, but is no longer necessary given the computational power currently available in adaptive testing. This paper suggests several item selection criteria for adaptive testing that are all based on the use of the true posterior. Some of the statistical properties of the ability estimator produced by these criteria are discussed and empirically characterized. An empirical study with 300 test items showed that the maximum predicted posterior expected information criterion had excellent mean-squared error for more extreme values of theta, and is the criterion elect for application in short adaptive tests. An appendix presents Owen's equations. (Contains 17 references.) (Author/SLD)
- Published
- 1996
30. Assembling Tests for the Measurement of Multiple Abilities.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education. and van der Linden, Wim J.
- Abstract
It is proposed that the assembly of tests for the measurement of multiple abilities be based on targets for the (asymptotic) variance functions of the estimators in each of the abilities. A linear programming model is presented that can be used to computerize the assembly process. Several cases of test assembly dealing with multidimensional abilities are distinguished, and versions of the model applicable to each of these cases are discussed. An empirical example of a test assembly program from a two-dimensional mathematics item pool concludes the paper. (Contains 2 tables, 2 figures, and 27 references.) (Author/SLD)
- Published
- 1995
31. Stochastic Order in Dichotomous Item Response Models for Fixed Tests, Adaptive Tests, or Multiple Abilities. Research Report 95-02.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education. and van der Linden, Wim J.
- Abstract
Dichotomous item response theory (IRT) models can be viewed as families of stochastically ordered distributions of responses to test items. This paper explores several properties of such distributions. The focus is on the conditions under which stochastic order in families of conditional distributions is transferred to their inverse distributions, from two families of related distributions to a third family, or from multivariate conditional distributions to a marginal distribution. The main results are formulated as two theorems that apply immediately to dichotomous IRT models. One theorem holds for unidimensional models with fixed item parameters. The other theorem holds for models with multiple abilities or with random item parameters as used, for example, in adaptive testing. (Contains 2 tables and 36 references.) (Author/SLD)
- Published
- 1995
32. Some Decision Theory for Course Placement. Research Report No. 95-01.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education. and Van der Linden, Wim J.
- Abstract
This paper addresses the problem of how to place students in a sequence of hierarchically related courses from an (empirical) Bayesian point of view. Based on a minimal set of assumptions, it is shown that optimal mastery rules for the courses are always monotone and a nonincreasing function of the scores on the placement test. On the other hand, placement rules are not generally monotone but have a form depending on the specific shape of the probability distributions and utility functions in force. The results are further explored for a class of linear utility functions. Numerous illustrations and tables present data and statistical analysis. (Contains 20 references.) (Author/TS)
- Published
- 1995
33. Robustness of Judgments in Evaluation Research. Research Report 94-10.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Zwarts, Michel A.
- Abstract
It is argued that judgments in evaluative research are ultimately subjective, but that good criteria are available to assess their quality. One of these criteria is the robustness of the judgments against incompleteness or uncertainty in the data used to describe the educational system. The use of the robustness criterion is demonstrated through the case of a recent evaluation project in which the state of elementary education in The Netherlands was evaluated. To test robustness, four different procedures were simulated for item removal: (1) scaling; (2) removal of easy items; (3) removal of difficult items; and (4) removal of extreme items. The robustness study demonstrated that the qualifications used in the evaluation project were quite stable under the removal of items from the pool by these four methods. Nearly all the qualifications met the rigorous criterion of robustness. An appendix discusses the independence of the mean observed score of covariation between abilities. (Contains 3 tables, 8 figures, and 17 references.) (Author/SLD)
- Published
- 1994
34. A Conceptual Analysis of Standard Setting in Large-Scale Assessments. Research Report 94-3.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology. and van der Linden, Wim J.
- Abstract
Elements of arbitrariness in the standard setting process are explored, and an alternative to the use of cut scores is presented. The first part of the paper analyzes the use of cut scores in large-scale assessments, discussing three different functions: (1) cut scores define the qualifications used in assessments; (2) they simplify the reporting of achievement distributions; and (3) they allow for the setting of targets for such distributions. The second part of the paper gives a decision-theoretic alternative to the use of cut scores and shows how each of the three functions identified in the first part can be approached in a way that may reduce some of the arbitrary nature of standard setting processes. The third part of the paper formulates criteria for standard setting methods that can be used to evaluate their results. (Contains six figures and eight references.) (Author/SLD)
- Published
- 1994
35. A Compensatory Approach to Optimal Selection with Mastery Scores. Research Report 94-2.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Vos, Hans J.
- Abstract
This paper presents some Bayesian theories of simultaneous optimization of decision rules for test-based decisions. Simultaneous decision making arises when an institution has to make a series of selection, placement, or mastery decisions with respect to subjects from a population. An obvious example is the use of individualized instruction in education. Compared with separate optimization, a simultaneous approach has two advantages. First, test scores used in previous decisions can be used as "prior" data in later decisions, and the efficiency of the decisions can be increased. Second, more realistic utility structures can be obtained defining utility functions for earlier decisions on later criteria. An important distinction is made between weak and strong decision rules. As opposed to strong rules, weak rules are allowed to be a function of prior test scores. Conditions for monotonicity of optimal weak and strong rules are presented. Also, it is shown that under mild conditions on the test score distributions and utility functions, weak rules are always compensatory by nature. To illustrate this approach, a common decision problem in education and psychology, consisting of a selection decision for treatment followed by a mastery decision, is analyzed. (Contains 1 figure, 2 tables, and 23 references.) (Author)
- Published
- 1994
36. An Optimization Model for Test Assembly To Match Observed-Score Distributions. Research Report 94-7.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Luecht, Richard M.
- Abstract
An optimization model is presented that allows test assemblers to control the shape of the observed-score distribution on a test for a population with a known ability distribution. An obvious application is for item response theory-based test assembly in programs where observed scores are reported and operational test forms are required to produce the same observed-score distributions as long as the population of examinees remains stable. The model belongs to the class of 0-1 linear programming models and constrains the characteristic function of the test. The model can be solved using the heuristic presented in Luecht and T. M. Hirsch (1992). An empirical example with item parameters from the ACT Assessment Program Mathematics Test illustrates the use of the model. (Contains 6 figures and 23 references.) (Author)
- Published
- 1994
37. A Comparison of Item-Selection Methods for Adaptive Tests with Content Constraints
- Author
-
van der Linden, Wim J.
- Abstract
In test assembly, a fundamental difference exists between algorithms that select a test sequentially or simultaneously. Sequential assembly allows us to optimize an objective function at the examinee's ability estimate, such as the test information function in computerized adaptive testing. But it leads to the non-trivial problem of how to realize a set of content constraints on the test -- a problem more naturally solved by a simultaneous item-selection method. Three main item-selection methods in adaptive testing offer solutions to this dilemma. The spiraling method moves item selection across categories of items in the pool proportionally to the numbers needed from them. Item selection by the weighted-deviations method (WDM) and the shadow test approach (STA) is based on projections of the future consequences of selecting an item. These two methods differ in that the former calculates a projection of a weighted sum of the attributes of the eventual test and the latter a projection of the test itself. The pros and cons of these methods are analyzed. An empirical comparison between the WDM and STA was conducted for an adaptive version of the Law School Admission Test (LSAT), which showed equally good item-exposure rates but violations of some of the constraints and larger bias and inaccuracy of the ability estimator for the WDM.
- Published
- 2005
- Full Text
- View/download PDF
38. Statistical Aspects of Optimal Treatment Assignment. Twente Educational Memorandum No. 18.
- Author
-
Twente Univ. of Technology, Enschede (Netherlands). and van der Linden, Wim J.
- Abstract
The issues of treatment assignment is ordinarily dealt with within the framework of testing aptitude treatment interaction (ATI) hypothesis. ATI research mostly uses linear regression techniques, and an ATI exists when the aptitude treatment (AT) regression lines cross each other within the relevant interval of the aptitude variable. Consistent with this approach is the use of the points of interaction of AT regression lines as treatment-assignment rule. The replacement of such rules by monotone, nonrandomized (Bayes) rules is proposed. Both continuous and dichotomous criteria for treatment success are considered. An example of the latter is evaluated using a mastery test. Solutions are given based on linear, normal ogive, and threshold utility functions. Some modifications of these functions are discussed which are believed to be more realistic in the context of individualized instruction, but for which no optimal monotone assignment rules are available yet. (Author/RL)
- Published
- 1980
39. Assessing Inconsistencies in Standard Setting with the Angoff or Nedelsky Technique.
- Author
-
van der Linden, Wim J.
- Abstract
A latent trait method is presented to investigate the possibility that Angoff or Nedelsky judges specify inconsistent probabilities in standard setting techniques for objectives-based instructional programs. It is suggested that judges frequently specify a low probability of success for an easy item but a large probability for a hard item. The responses of 156 pupils to a 25-item test from a tenth grade physics course were inspected by eight Angoff and nine Nedelsky judges. The latent trait analysis produced 18 items showing a satisfactory fit to the Rasch model. Serious errors of specification were found and errors were considerably larger for the Nedelsky technique. Special difficulties with the Nedelsky judges are discussed. Applications of the latent trait method are discussed. (Author/CM)
- Published
- 1982
40. Advances in the Application of Decision Theory to Test-Based Decision Making.
- Author
-
van der Linden, Wim J.
- Abstract
This paper reviews recent research in the Netherlands on the application of decision theory to test-based decision making about personnel selection and student placement. The review is based on an earlier model proposed for the classification of decision problems, and emphasizes an empirical Bayesian framework. Classification decisions with threshold utility are discussed to provide an example of the application of Bayesian theory to test-based decision making. Test results from the 1981 administration of the Eindtoets Basisonderwijs are analyzed with respect to the type of secondary education chosen by Dutch students at the end of primary education: lower vocational education, lower general education, or middle general education. A 55 item bibliography is attached. (GDC)
- Published
- 1985
41. IRT-Based Test Construction. Project Psychometric Aspects of Item Banking No. 15. Research Report 87-2.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education. and van der Linden, Wim J.
- Abstract
Four discussions of test construction based on item response theory (IRT) are presented. The first discussion, "Test Design as Model Building in Mathematical Programming" (T. J. J. M. Theunissen), presents test design as a decision process under certainty. A natural way of modeling this process leads to mathematical programming. General models of test construction are discussed, with information about algorithms and heuristics; ideas about the analysis and refinement of test constraints are also considered. The second paper, "Methods for Simultaneous Test Construction" (Ellen Boekkooi-Timminga), gives an overview of simultaneous test construction using zero-one programming. The item selection process is based on IRT. Some objective functions and practical constraints are presented, the construction of parallel tests is considered, and two tables are provided. The third paper, "Automated Test Construction Using Minimax Programming" (Wim J. van der Linden), proposes the use of the minimax principle in IRT test construction and indicates how this results in test information functions deviating less systematically from the target function than for the usual criterion of minimal test length. An alternative approach and some practical constraints are considered. The final paper, "A Procedure To Assess Target Information Functions" (Henk Kelderman), discusses the concept of an information function and its properties. An interpretable function of information is chosen: the probability of a wrong order of the ability estimates of two subjects. (SLD)
- Published
- 1987
42. A Maximin Model for Test Design with Practical Constraints. Project Psychometric Aspects of Item Banking No. 25. Research Report 87-10.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education., van der Linden, Wim J., and Boekkooi-Timminga, Ellen
- Abstract
A "maximin" model for item response theory based test design is proposed. In this model only the relative shape of the target test information function is specified. It serves as a constraint subject to which a linear programming algorithm maximizes the information in the test. In the practice of test construction there may be several demands with respect to the properties of the test. The way in which these can be formulated as linear constraints in the model is demonstrated. The constraints discussed include: (1) test composition; (2) administration time; (3) selection of item features; (4) group-dependent item parameters; (5) inclusion or exclusion of individual items; and (6) inter-item dependencies. An example of a test construction problem with practical constraints is presented. Using the three-parameter logistic model, an item bank of 1,000 items was drawn for the application of the test construction model, which was solved using the computer program LINPROG. Some alternative models of test construction are discussed. Three tables provide information about four solutions and list alternative objective functions in test construction. (SLD)
- Published
- 1987
43. Applications of Decision Theory to Test-Based Decision Making. Project Psychometric Aspects of Item Banking No. 23. Research Report 87-9.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education. and van der Linden, Wim J.
- Abstract
The use of Bayesian decision theory to solve problems in test-based decision making is discussed. Four basic decision problems are distinguished: (1) selection; (2) mastery; (3) placement; and (4) classification, the situation where each treatment has its own criterion. Each type of decision can be identified as a specific configuration of one or more of the following elements: a test that provides the scores on which the decisions are based; one or more treatments with respect to which decisions are made; and one or more criteria by which the successes of treatments are measured. For each type of decision, further restrictions or generalizations may hold, such as multivariate test scores, sequential testing, multiple criteria, multiple populations, and quota restrictions. In some applications, combinations of the basic types of decisions may occur. Samples of decision problems illustrate the optimization of the Bayes utility for each possible decision. Examples are given for selection decisions with linear utility, mastery decisions with threshold utility, placement decisions with normal-ogive utility, classification decisions with threshold utility, and combinations of basic decisions. Nine figures illustrate decision systems, and one table gives data for an application. (SLD)
- Published
- 1987
44. Algorithmic Test Design Using Classical Item Parameters. Project Psychometric Aspects of Item Banking No. 29. Research Report 88-2.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education., van der Linden, Wim J., and Adema, Jos J.
- Abstract
Two optimalization models for the construction of tests with a maximal value of coefficient alpha are given. Both models have a linear form and can be solved by using a branch-and-bound algorithm. The first model assumes an item bank calibrated under the Rasch model and can be used, for instance, when classical test theory has to serve as an interface between the item bank system and a user not familiar with modern test theory. Maximization of alpha was obtained by inserting a special constraint in a linear programming model. The second model has wider applicability and can be used with any item bank for which estimates of the classical item parameter are available. The models can be expanded to meet practical constraints with respect to test composition. An empirical study with simulated data using two item banks of 500 items was carried out to evaluate the model assumptions. For Item Bank 1 the underlying response was the Rasch model, and for Item Bank 2 the underlying model was the three-parameter model. An appendix discusses the relation between item response theory and classical parameter values and adds the case of a multidimensional item bank. Three tables present the simulation study data. (SLD)
- Published
- 1988
45. Optimizing Incomplete Sample Designs for Item Response Model Parameters. Research Report 88-5.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education. and van der Linden, Wim J.
- Abstract
Several models for optimizing incomplete sample designs with respect to information on the item parameters are presented. The following cases are considered: (1) known ability parameters; (2) unknown ability parameters; (3) item sets with multiple ability scales; and (4) response models with multiple item parameters. The models are able to cope with hierarchical structures in the population of examinees as well as the domain of content, and allow for practical constraints with respect to such items as test content, curricular differences between groups, or time available for item administration. An example with test data from a national assessment study illustrates the use of the models. This methodology was applied to an imagined third study of the Dutch part of the Second Mathematics Study of the International Association for the Evaluation of Educational Achievement for the three subject areas of Geometry, Algebra, and Arithmetic for a sample of 400 seventh graders. The LANDO computer program was used to solve the models, illustrating their utility. (Author/SLD)
- Published
- 1988
46. On the Estimation of the Proportion of Masters in Criterion-Referenced Testing. Twente Educational Memorandum No. 27.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education. and van der Linden, Wim J.
- Abstract
J. A. Emrick's (1971) model is a latent class model of mastery testing that can be used to estimate the proportion of masters in a given population. A. Hamerle (1980), in a recent paper on this model, has proposed an estimator for the proportion of masters that is claimed to constitute a maximum likelihood approach. It is indicated that Hamerle is not quite correct in his presentation of Emrick's model and that his estimator is not maximum likelihood. An estimator is provided using the method of moments; this estimator appears to have the same shape as Hamerle's estimator, but should be interpreted differently since it is derived under the correct version of Emrick's model. An attractive property of the method of moments is that it also yields simple estimators for the present model if the two success parameters are unknown. It appears that these estimators can be used for tests consisting of three or more items. Results of extensive Monte Carlo studies indicate that the estimators possess excellent statistical properties. (Author/TJH)
- Published
- 1981
47. A Latent Trait Method for Determining Inconsistencies in the Use of the Angoff and Nedelsky Techniques of Standard Setting. Twente Educational Report Number 12.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education. and van der Linden, Wim J.
- Abstract
It has often been argued that all techniques of standard setting are arbitrary and likely to yield different results for different techniques or persons. This paper deals with a related but hitherto ignored aspect of standard setting, namely, the possibility that Angoff or Nedelsky judges misspecify the probabilities of the borderline student's success on the items because they do not use the psychometric properties of the items consistently. A latent trait method is proposed to estimate such misspecifications, and an index of consistency is defined that can be used for deciding whether standards are set consistently enough for use in practice. Results from an empirical study are presented to illustrate the use of the method in a typical educational situation. The results indicate that serious errors of specification can be expected and that, on the whole, these will be considerably larger for the Nedelsky than for the Angoff technique. (Four data tables are provided.) (Author)
- Published
- 1981
48. Simple Estimators for the Simple Latent Class Mastery Testing Model. Twente Educational Memorandum No. 19.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education. and van der Linden, Wim J.
- Abstract
Latent class models for mastery testing differ from continuum models in that they do not postulate a latent mastery continuum but conceive mastery and non-mastery as two latent classes, each characterized by different probabilities of success. Several researchers use a simple latent class model that is basically a simultaneous application of the binomial error model to both mastery classes. W. A. Reulecke (1977) presents a version of this model that assumes that non-masters guess blindly, with a probability of success equal to the reciprocal of the number of alternatives. Assuming a loss ratio, these models enable the derivation of an optimal cutting score for separating masters from non-masters. To compute this cutting score, the model parameters must be estimated. J. A. Emrick and F. N. Adams (1969) suggest a method that is based on the average inter-item correlation but which, due to its assumptions, is only of restricted applicability. The sample applies to the maximum likelihood method in as much as this involves estimation equations that can be solved iteratively. In this paper, the method of moments is used to obtain "quick and easy" estimates. An endpoint that assumes that the parameters can simply be estimated from the tails of the sample distribution is discussed. A Monte Carlo experiment demonstrates that the method of moments yields excellent estimators and beats the endpoint method uniformly. Five data tables are included. (Author/TJH)
- Published
- 1980
49. The Use of Moment Estimators for Mixtures of Two Binomials with One Known Success Parameter. Twente Educational Report Number 10.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education. and van der Linden, Wim J.
- Abstract
Occasionally, situations arise where mixtures of two binomials with one known success parameter are met. An example in educational testing is the mastery or random guessing model in which an examinee is supposed either to master the items or not to master them and to guess blindly. This paper gives moment estimators for such mixtures and presents results from a Monte Carlo investigation into their statistical properties. The results suggest excellent estimators that can safely be used in most instances. It also indicates how the properties of these estimators relate to those of moment estimators for the case in which both success parameters are unknown. Finally, it is pointed out that in situations in which errors in specifying the true value of the known parameter may occur, it might be prudent to consider this parameter as unknown and to estimate accordingly. (Four data tables are included.) (Author)
- Published
- 1980
50. Passing Score and Length of a Mastery Test: An Old Problem Appraoched Anew. Twente Educational Report Number 11.
- Author
-
Twente Univ., Enschede (Netherlands). Dept. of Education. and van der Linden, Wim J.
- Abstract
A classical problem in mastery testing is the choice of passing score and test length so that the mastery decisions are optimal. This problem has been addressed several times from a variety of viewpoints. In this paper, the usual indifference zone approach is adopted, with a new criterion for optimizing the passing score. Specifically, manipulation of probabilities of misclassification of masters versus non-masters is not incorporated into the scheme. Rather, explicit parameters are introduced to account for differences in loss between misclassifying a true master and a non-master. It appears that, under the assumption of the binomial error model, this approach yields a linear relationship between the optimal passing score and test length. The means by which different losses for both decision errors and a known base rate can be incorporated in the procedure are outlined, and the means by which a correction for guessing can be applied are described. Results are related to findings obtained in sequential probability ratio testing for binomial populations and in the latent class approach to mastery testing. (TJH)
- Published
- 1980
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.