3,227 results
Search Results
152. A Comparative Study of Observed Score Approaches and Purification Procedures for Detecting Differential Item Functioning.
- Author
-
Kwak, Nohoon, Davenport, Ernest C., and Davison, Mark L.
- Abstract
The purposes of this study were to introduce the iterative purification procedure and to compare this with the two-step purification procedure, to compare false positive error rates and the power of five observed score approaches and to identify factors affecting power and false positive rates in each method. This study used 2,400 data sets that were divided into uniform, symmetric nonuniform, and nonsymmetric nonuniform differential item functioning (DIF) data sets. The sample size pairs were either 500,500 or 1,000,1,000 for the reference group and the focal group when the means of ability distributions for the 2 groups were the same, and either 1,000,500 or 1,000,250 for the reference and focal groups when the means of ability distributions for the 2 groups were different. Each dataset included four items with uniform, symmetric nonuniform, or nonsymmetric nonuniform DIF, with each DIF item having either a 0.4 or 0.8 amount of DIF (that is, the area between two item characteristic curves). The purification procedures reduced false positive error rates and/or increased power. The Mantel Haenszel method was superior to other methods with uniform DIF data sets, and the Absolute Mean Deviation method using the iterative purification procedure was superior to the others in nonuniform data sets when the means of ability distributions for the two groups were different. The ability estimation and the sample size affected detection rates and false positive error rates for all methods. The DIF effect size was also a strong influence on detection rates. (Contains 21 tables and 25 references.) (Author/SLD)
- Published
- 1998
153. Evaluation of Parameter Estimation under Modified IRT Models and Small Samples.
- Author
-
Parshall, Cynthia G., Kromrey, Jeffrey D., Chason, Walter M., and Yi, Qing
- Abstract
Accuracy of item parameter estimates is a critical concern for any application of item response theory (IRT). However, the necessary sample sizes are often difficult to obtain in practice, particularly for the more complex models. A promising avenue of research concerns modified item response models. This study both replicates and improves on an earlier investigation into modified models (C. Parshall, J. Kromrey, and W. Chason, 1996), which found tentatively positive results. To obtain realistic data, empirical item parameters were generated by fitting a six-dimensional model to archival data, using NOHARM (Fraser and McDonald, 1988). These parameters were then used along with thetas generated from independent normal ability distributions to generate simulated item response data. One hundred datasets were generated for each of four sample sizes. Finally, BILOG (Mislevy and Bock, 1990) was used to obtain estimated item ability parameters for each of the six investigated models. Results were evaluated in terms of accuracy and stability across samples. Accuracy was assessed as the degree to which both the obtained item responses and the known response probabilities were reproduced from the generating parameters. Stability was assessed as empirical estimates of standard errors. Crossvalidation of fit and accuracy was accomplished by applying the sample item parameter estimates to additional samples generated from the same population. (Contains 5 tables, 17 figures, and 27 references.) (Author)
- Published
- 1997
154. Equating Multiple Tests via an IRT Linking Design: Utilizing a Single Set of Anchor Items with Fixed Common Item Parameters during the Calibration Process.
- Author
-
Li, Yuan H., Griffith, William D., and Tam, Hak P.
- Abstract
This study explores the relative merits of a potentially useful item response theory (IRT) linking design: using a single set of anchor items with fixed common item parameters (FCIP) during the calibration process. An empirical study was conducted to investigate the appropriateness of this linking design using 6 groups of students taking 6 forms of a pilot test, for an accumulated sample size of 8,357 students. A parameter recovery study was performed to examine the robustness of FCIP under the situation of large standard errors in the item difficulty and guessing parameters. Comparison of these results to those produced by the characteristics curve method (CCM) was pursued. Based on the empirical portion of this study, ability estimates calibrated from this linking design are very consistent, except for students with extreme (especially low) ability under the CCM equating method. Item parameter estimates calibrated from this linking design are also very consistent, except for guessing parameters under the CCM equating method. Based on the results from the simulation portion of the study, this linking result can produce very precise and stable parameter estimates. (Contains 7 tables, 19 figures, and 49 references.) (Author/SLD)
- Published
- 1997
155. Pretest Item Analyses Using Polynomial Logistic Regression: An Approach to Small Sample Calibration Problems Associated with Computerized Adaptive Testing.
- Author
-
Patsula, Liane N. and Pashley, Peter J.
- Abstract
Many large-scale testing programs routinely pretest new items alongside operational (or scored) items to determine their empirical characteristics. If these pretest items pass certain statistical criteria, they are placed into an operational item pool; otherwise they are edited and re-pretested or simply discarded. In these situations, reliable ability estimates are usually available for each examinee based on operational items, and they may be treated as fixed. If so, polynomial (in ability, theta) logistic regression analyses can be conducted using a variety of statistical software packages. In this study, a cubic logistic model (theta, theta-2, theta-3) was found to fit standard three-parameter (i.e. discrimination, difficulty, and lower asymptote) logistic item response theory (IRT) model items very well. When employing a polynomial logistic model, well-known selection routines (such as stepwise elimination) can be utilized to reduce the number of required parameters for certain items, thus reducing the sample sizes needed for reliable estimation. With this model, simultaneous confidence bands are easily calculated. As an added benefit, given that a polynomial logistic function is not necessarily monotonically increasing with ability, poor quality items and incorrect alternative responses can also be fit using the same estimation procedures. (Contains 19 figures, 4 tables, and 22 references.) (Author/SLD)
- Published
- 1997
156. The Accuracy and Multidimensionality of First and Second Grade Students' Academic Self-Concepts.
- Author
-
Cassady, Jerrell C.
- Abstract
The multidimensional nature of self-concept was studied in early elementary school children. In addition, the accuracy of children's self-concept ratings was determined through comparisons with the following external measures of ability: (1) parent ratings; (2) teacher ratings; and (3) academic achievement. Participants in this study were 100 first- and second-grade children and their families and teachers. The children were individually assessed with self-concept and achievement measures. Parents and teachers rated the children's ability in reading and mathematics. Factor analysis suggested that the children's academic self-concepts were differentiated into two factors: mathematics and language arts. The three external measures of ability were significantly intercorrelated. However, there was poor agreement between the child's self-concept and external measures of ability. Only two variables, reading achievement and the family's rating of mathematics achievement were related to children's language arts and math self-concepts respectively. Analyses comparing the ability judgments of high and low achievers suggested that high ability children rated their math competence significantly higher than low ability children. Overall, the findings support the inferences that the self-concepts of first and second graders are both multidimensional and somewhat inflated and that the formation of an accurate academic self-concept (e.g. one that is significantly related to achievement and external ratings) appears to develop in a domain-specific manner. (Contains 3 tables and 36 references.) (Author/SLD)
- Published
- 1997
157. What State Tests Test.
- Author
-
McGee, Glenn W.
- Abstract
What the Illinois Goal Assessment Program (IGAP) test actually tests and the consequences of these tests for funding decisions were studied with a random sample of 100 school districts in the Cook County suburbs of Chicago. Eighth-grade IGAP scores for reading were obtained from the state report card, a document prepared by each school district under legislative mandate. Per pupil expenditure, attendance rate, mobility rate, average teacher salary, percentage of low income students, and the ratio of the number of students in the district to the number of teachers in the district were studied for significant correlations. Partial correlations were then used to isolate particular relationships, and analysis of variance was used to provide information for explaining variations in scores. Results support the conclusion that the statewide test in Illinois, the IGAP, measures more than student achievement. The bell-shaped curve of eighth-grade reading scores and the high and highly significant intercorrelations among all IGAP test results strongly imply the IGAP is a test of ability. Multiple regression shows that nearly three-fourths of the variation on IGAP test scores is due to context factors and not academic achievement. As the IGAP test exists, it is to a large measure a stronger indicator of poverty and mobility rate than of achievement. To a lesser extent, it is an indicator of the ratio of students to teachers, attendance rates, and cost variables. Implications for policy formation are discussed. An appendix presents two examples of test content--sixth and eighth grade reading tests--and associated questions. (Contains 12 tables and 20 references.) (SLD)
- Published
- 1997
158. An Investigation of the Likelihood Ratio Test for Detection of Differential Item Functioning under the Graded Response Model.
- Author
-
Kim, Seock-Ho and Cohen, Allan S.
- Abstract
Type I error rates of the likelihood ratio test for the detection of differential item functioning (DIF) were investigated using Monte Carlo simulations. The graded response model with five ordered categories was used to generate data sets of a 30-item test for samples of 300 and 1,000 simulated examinees. All DIF comparisons were simulated by randomly pairing two groups of examinees. Three different sample sizes of reference and focal groups comparisons were simulated under two different ability matching conditions. For each of the six combinations of sample sizes by ability matching conditions, 100 replications of DIF detection comparisons were simulated. Item parameter estimates and likelihood values were obtained by marginal maximum likelihood estimation using the computer program MULTILOG. Type I error rates of the likelihood ratio test statistics for all six combinations of the sample sizes and ability matching conditions were within theoretically expected values at each of the nominal alpha levels considered. (Contains 5 figures, 5 tables, and 24 references.) (Author)
- Published
- 1997
159. A Comparison of Procedures for Ability Estimation under the Graded Response Model.
- Author
-
Seong, Tae-Je
- Abstract
This study was designed to compare the accuracy of three commonly used ability estimation procedures under the graded response model. The three methods, maximum likelihood (ML), expected a posteriori (EAP), and maximum a posteriori (MAP), were compared using a recovery study design for two sample sizes, two underlying ability distributions, and three test lengths. Recovery of ability was generally better for longer tests and for the conditions in which ability was matched to test difficulty. ML tended to recover less well than either EAP or MAP, particularly for the short test in the unmatched ability condition. For longer tests, all three methods recovered about equally well. (Contains 8 figures, 8 tables, and 26 references.) (Author)
- Published
- 1997
160. Achievement Goals, Motivation, and Performance: A Closer Look.
- Author
-
Urdan, Tim, Pajares, Frank, and Lapin, Amy Z.
- Abstract
An achievement goal theory framework was used to examine the relations among goals and a number of other motivational constructs in a sample of middle school students. Participants were 189 eighth graders from a public school in the south. In one session students completed the attitude measures and in another session they completed a mathematics performance measure. The attitude instrument consisted of 15 items assessing task and ability goals. Results indicate that task and ability goals were moderately related. In this sample, task goals were moderately to strongly related with the performance and motivation variables in favorable ways. They were positively related to self-efficacy, self-concept, grade point average, persistence, importance, and self-efficacy for self-regulated learning. They were negatively related to anxiety. Ability goals did not have a negative pattern of relationship with other variables, but were unrelated or weakly positively correlated with the motivation and performance variables. When gender, grade point average, and task goals were controlled, ability goals had little or no effect on motivation or performance outcomes. Results suggest that for students strong in their pursuit of task goals, the simultaneous pursuit of ability goals is not helpful. This study does support previous results indicating a beneficial relationship between task goals and a variety of motivational and performance outcomes. (Contains 2 tables and 15 references.) (SLD)
- Published
- 1997
161. Examining Local Item Dependence Effects in a Large-Scale Science Assessment by a Rasch Partial Credit Model.
- Author
-
Yan, Jean W.
- Abstract
Context-dependent items are traditionally analyzed independently, creating a situation in which the potential local item dependence effects among these items may cause a biased estimation of examinees' abilities. This study investigated the local item dependence effects on testlets in the tryout version of a statewide science assessment by a Rasch partial credit model. Cluster sampling combined with stratified sampling was used. Data were analyzed in five different configurations to study the relationships between context-dependent items at the individual item level and at the testlet level. It is shown that local dependence effects may be controlled and a better fit for testlet calibration can be obtained by employing the Rasch partial credit model for some, but not all testlets. (Contains 2 figures, 11 tables, and 35 references.) (Author/SLD)
- Published
- 1997
162. The Role of Parental Expectation, Effort, and Self-Efficacy in the Achievement of High and Low Track High School Students in Taiwan.
- Author
-
Huang, Denise and O'Neil, Harold F.
- Abstract
In this study, the effects of perceived parental expectation, trait effort, trait self-efficacy, trait ability, state self-efficacy, state effort, and state worry on the mathematics achievement of high and low track high school students in Taiwan were investigated. A hypothesized model of these constructs was also investigated using a structural equation model. A state scale and a trait scale were translated from English to Chinese and used in a pilot study and a main study. The pilot study involved 278 tenth graders in a one public and one private school. Results supported the reliability of the measure, and it was administered to 173 high-track high school students at a public school and 210 regular-track students. Both perceived parental expectation and trait effort were important components of success for these students. Students who perceived that their parents had high expectations tended to have high trait effort and belief in effort. The more state effort students expended, the more likely they were to have high grades in mathematics. The only route to achievement without direct mediation through state effort was from perceived parental expectation to students' trait effort, leading to trait self-efficacy and reaching higher achievement. High-track students had higher trait self-efficacy and state efficacy than regular-track students, with higher mean trait effort and more state effort. In addition, students who had higher perceived parental expectations tended to worry more, expending more state effort and achieving more highly. Overall, results demonstrate the positive role of believing in effort. (Contains 3 tables, 4 figures, and 33 references.) (SLD)
- Published
- 1997
163. Descriptions of Motivation among African American High School Students for Their Favorite and Least Favorite Classes.
- Author
-
Gladney, Lawana and Greene, Barbara
- Abstract
The motivation to learn of African American high school students was examined by asking them about their favorite and least favorite classes. Two hundred and seventy-five students attending three urban high schools were randomly selected from their history classes to respond to a questionnaire on their perceptions of ability, goals, and reasons for disliking their least favorite class. There was a positive motivational pattern reported for their favorite classes. The students scored high on three variables that have been found to be most important for engagement and achievement: learning goals, future consequences, and perceived ability. Their reason for disliking the least favorite class was usually that the teacher was boring, and not because of perception of ability. Analysis of interview data for these students showed that teacher attitudes and methods of instruction were the significant reasons for liking and disliking the favorite and least favorite classes. Students also reported that the race of the teacher affected motivation in the classroom. These results show positive motivational orientations among students when their favorite classes were an issue. (Author/SLD)
- Published
- 1997
164. Avoiding the Demonstration of Lack of Ability: An Under-Explored Aspect of Goal Theory.
- Author
-
Middleton, Michael and Midgley, Carol
- Abstract
Theorists have traditionally described motivation in terms of approach and avoidance tendencies. In contrast, goal orientation research has focused primarily on two approach goals: demonstrating ability (performance-approach) and developing ability (task). A scale to assess the goal of avoiding the demonstration of lack of ability (performance-avoid) was included with scales assessing approach goals in a survey given to 703 sixth graders. Factor analysis supported the differentiation among the three scales. The performance scales were moderately positively correlated and exhibited low correlations with the task scale. With all three goals in regression equations, task goals predicted academic efficacy, self-regulated learning, and lower levels of avoiding seeking academic help in the classroom. Performance-avoid goals negatively predicted academic efficacy and positively predicted avoiding seeking help and test anxiety. Performance-approach goals did not emerge as the most significant predictor of any of these educationally relevant outcomes. An appendix presents the test items. (Contains 5 tables and 40 references.) (Author/SLD)
- Published
- 1997
165. Ability Explorer: A Review and Critique.
- Author
-
Hoffman, Anne and Hoffman, Anne
- Abstract
The Ability Explorer (AE) is a newly developed self-report inventory of abilities that is appropriate for group or individual administration. There are machine-scorable and hand-scorable versions of the test, and there are two levels. Level 1 is for students from junior high to high school, and Level 2 is for high school students and adults. Separate scores are reported for 14 work-related abilities that are considered important for employers: (1) artistic; (2) clerical; (3) interpersonal; (4) language; (5) leadership; (6) manual; (7) musical/dramatic; (8) numerical/mathematical; (9) organizational; (10) persuasive; (11) scientific; (12) social; (13) spatial; and (14) technical/mechanical. Materials for the AE are attractive and easy to use, although the hand-scorable version is more difficult to understand than the machine-scorable version. Extensive information is provided in the manual about reliability, intercorrelations between ability scales, and frequency distributions, but the characteristics of the norm group used to develop these statistics are not generally addressed, except in table form. The AE appears to be a valuable assessment tool with a sound theoretical basis and useful practical applications. A potential problem is the accuracy of its self-report measure. Recommendations are made for clarifying information with the hand-scorable version. (Contains three references.) (SLD)
- Published
- 1997
166. What Role Does Ability Play in Classroom Learning?
- Author
-
Nuthall, Graham
- Abstract
Four studies examined the relationship between students' ability and the learning processes the students engaged in when they acquired knowledge from their classroom experiences. The research was based on a model of learning processes during knowledge acquisition that identifies critical learning experiences and predicts what is learned and remembered. Each study involved detailed observation and audio and video recording of classroom experiences of selected upper primary or intermediate students during a science or social studies unit, as well as individual student interviews. Student learning measures were administered several weeks before and after the unit and again 12 months later. Findings indicated that the model predicted the learning of 86 percent of items whose content was learned and predicted failure to learn for 80 percent of items that were not learned. The lowest prediction rates were for students in the mid-range of ability, with no indication that the learning process was different for the most and least able. Patterns of correlations suggested that although student ability was related to prior knowledge levels, there was no relationship between prior knowledge level and amount learned during the unit. If the appropriate number of learning experiences occurred, without significant gaps between them, learning occurred regardless of students' ability level. Academically relevant discussions were more likely when there was a social climate of acceptance and valuing of each other's ideas, which was more likely with more able students. The major factors affecting whether students access learning opportunities appeared to be related to culture. (Contains 23 references.) (Author/KB)
- Published
- 1996
167. The Importance of Structure Coefficients in Structural Equation Modeling Confirmatory Factor Analysis.
- Author
-
Thompson, Bruce
- Abstract
A general linear model (GLM) framework is used to suggest that structure coefficients ought to be interpreted in structural equation modeling confirmatory factor analysis (CFA) studies in which factors are correlated. The computation of structure coefficients in explanatory factor analysis and CFA is explained. Two heuristic data sets are used to make the discussion concrete, illustrating the calculation of pattern and structure coefficients in LISREL CFA studies investigating scores on ability batteries. The benefits from using CFA structure coefficients are illustrated using two additional studies. One involves nine ability variables from a previous CFA study, and the other involves a self-concept model tested in a study by B. M. Byrne (1989). (Contains 6 tables and 28 references.) (Author/SLD)
- Published
- 1996
168. Estimation of Item Response Models Using the EM Algorithm for Finite Mixtures.
- Author
-
American Coll. Testing Program, Iowa City, IA., Woodruff, David J., and Hanson, Bradley A.
- Abstract
This paper presents a detailed description of maximum parameter estimation for item response models using the general EM algorithm. In this paper the models are specified using a univariate discrete latent ability variable. When the latent ability variable is discrete the distribution of the observed item responses is a finite mixture, and the EM algorithm for finite mixtures can be used. Maximum likelihood estimates of the item parameters and of the discrete probabilities of the latent ability distribution are given using the EM algorithm for finite mixtures. Results are presented in general for both dichotomous and polytomous item response models. The relation between the EM estimates and the Bock Aitken marginal maximum likelihood estimates is discussed. Estimates for the item parameters will depend on the specific form of the item response functions, and will usually require iterative numerical procedures. The EM algorithm is the same as the Bock-Aitken algorithm (R. D. Bock and M. Aitken, 1981) for marginal maximum likelihood estimation of the item parameters. (Contains 28 references.) (Author/SLD)
- Published
- 1996
169. On Reporting IRT Ability Scores When the Test Is Not Unidimensional.
- Author
-
Dirir, Mohamed A. and Sinclair, Norma
- Abstract
The purpose of this study was to examine the effect of test dimensionality on the stability of examinee ability estimates and item response theory (IRT) based score reports. A simulation procedure based on W. F. Stout's Essential Unidimensionality was used to generate test data with one dominant trait for the whole test and three minor traits specific to three subsets of items. The dimensionality of the data was controlled by varying the relative strengths of the specific traits. The errors in the ability estimation, which were examined both at test level and at subtest level, were compared among different degrees of test dimensionality. The correlation between the dominant trait and the minor traits was varied to three levels. When major and minor traits were not correlated, the standard errors in the ability estimates increased with increase in the strength of the minor traits. When the major and minor traits were correlated, on the other hand, the errors in the ability estimates slightly decreased as the strength of the minor traits was increased. (Contains 2 figures, 5 tables, and 12 references.) (Author/SLD)
- Published
- 1996
170. Examining the Invariance of Rater and Project Calibrations Using a Multi-facet Rasch Model.
- Author
-
O'Neill, Thomas R. and Lunz, Mary E.
- Abstract
To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of probabilities that permits items and persons to be analyzed independently, yet still be compared using a common frame of reference. An extension of the Rasch model, the multi-facet Rasch model, can estimate examinee ability, item difficulty, and other facets for polytomous data. It was hypothesized that the multi-facet Rasch model would yield invariant, sample-free, slide and judge calibrations for a certification test for histology completed by 364 candidates. Eighteen qualified judges graded the test, which required examinees to prepare laboratory slides. Results of the study confirm that the slide and judge calibrations were essentially stable across diverse samples of examinees. This indicates that slide and judge calibrations can be used to anchor test administrations to a benchmark scale, making the equating of two administrations of the examination possible and supporting the hypothesis. (Contains 2 figures, 2 tables, and 15 references.) (SLD)
- Published
- 1996
171. A Comparison of the Traditional Maximum Information Method and the Global Information Method in CAT Item Selection.
- Author
-
Tang, K. Linda
- Abstract
The average Kullback-Keibler (K-L) information index (H. Chang and Z. Ying, in press) is a newly proposed statistic in Computerized Adaptive Testing (CAT) item selection based on the global information function. The objectives of this study were to improve understanding of the K-L index with various parameters and to compare the performance of the K-L index with the traditional information method in CAT item selection. The results of this study, based on simulated and real data with 500 items each, provide evidence that Chang and Ying's global information method produced similar or better true ability theta estimates than the more traditional information approach in CAT item selection. In addition, results from the real item pool analyses indicate the parameter that provides the best theta estimates among the four K-L indices studied. (Contains one table, seven figures, and six references.) (Author/SLD)
- Published
- 1996
172. Many-Facet Rasch Model Selection Criteria: Examining Residuals and More.
- Author
-
Schumacker, Randall E.
- Abstract
This research examined the significance of facet selection in a multi-facet Rasch model analysis. The residuals or remaining error in a multi-facet Rasch model were further studied in the context of a full and reduced data-to-model fit chi-square, given the specific design. In addition, main effect facet contributions to person measures and the interaction among elements of two facets were investigated. Seventy-four subjects participated, with the variables or facets studied being subjects, judges, sessions, topics, and tasks. Each subject was rated by a sample of 6 of the total of 31 judges on recall, interpretation, and application of history, geography, and earth science domains. Fixed chi-square values were significant for all facets included in the model, indicating that the elements for each facet differed significantly and had different effects on the subject's scores that needed to be accounted for through adjustment to scores or ability estimates. Examination of models in which one facet was excluded further indicated a facet's contribution to the overall data-model fit. The chi-square test can indicate how the facet elements differ, and calibrated measures indicate how much the subject ability estimates should be adjusted to account for the characteristics of the particular elements encountered by a subject. Appendix A shows entry of the original coded data, and Appendix B presents sample measurement report. (Contains one figure, five tables, and five references.) (SLD)
- Published
- 1996
173. Polychotomous Responses and the Test Score.
- Author
-
Samejima, Fumiko
- Abstract
Traditionally, the test score represented by the number of items answered correctly was taken as an indicator of the examinee's ability level. Researchers still tend to think that the number-correct score is a way of ordering individuals with respect to the latent trait. The objective of this study is to depict the benefits of using ability estimates obtained directly from individuals' response patterns instead of their test scores, especially when responses are graded polychotomously. The importance of substantive model validation is also discussed. Mathematical models are presented to show that the use of the test score instead of the response pattern itself in ability or attitude estimation will, in general, reduce the accuracy of estimation. The loss of accuracy can be especially important when items are scored polychotomously. It is suggested that ability or attitude estimation be made from the response pattern using basic functions developed by F. Samejima in conjunction with the graded response model. In doing so, substantive model validation is essential. (Contains 2 figures, 1 table, and 13 references.) (SLD)
- Published
- 1996
174. Are We Now, Where We Were Then--The Bell Curve and the Gingrich Revolution--Implications for Employment Testing Litigation.
- Author
-
Pyburn, Keith M.
- Abstract
Because of a congruity of developments in the knowledge base concerning tests, external political factors, and legal factors, responsible use of employment tests could be on the verge of widespread acceptability. However, the possibility also exists that tests will once again become the scapegoat for "bad news." This review of the legal challenges to the use of employment tests shows both the opportunities and risks that currently confront employment testing. Challenges to tests of mental ability actually predate the Civil Rights Act of 1964. In the "Griggs v. Duke Power Co." decision of 1971 the Supreme Court announced the "Disparate Impact" theory of discrimination and approved the standards of the Equal Employment Opportunity Commission that required an employer to show that a test is valid for the job or class for which it is used. After almost 20 years, the Supreme Court has altered the disparate impact standard and made it much easier for employers to justify tests. A series of court cases illustrates the recent decline in the number of cases involving significant issues in testing. In the 1980s, plaintiffs began to turn away from the courts, where they often lost, and turned toward the federal bureaucracy and Congress, as exemplified by some provisions of the 1991 Civil Rights Act. (Contains one table and one graph.) (SLD)
- Published
- 1995
175. Construct Validation of Minimum Competence in Standard Setting. Revised.
- Author
-
DeMauro, Gerald E.
- Abstract
Studies of the Angoff method of standard setting suggest that judges agree in their estimates of the relative difficulties of test questions for minimally competent examinees and that each judge's estimates correlate well with the observed item difficulties for examinees whose total test scores are near the judge's personal standard (G. E. DeMauro, 1991). This finding suggests that Angoff estimates contain additive item-related and judge-related components, varying both from judge to judge and from estimated to observed performance by constants. Since, in homogeneous tests, observed performance on items also varies by constants over ability levels, the observed convergence of each judge's estimates on item performance near an individual standard is really a special case of convergence of all judges on item performance near a common deliberated standard. Data from the New Jersey High School Proficiency Test (NJHSPT) standard setting study supported this hypothesis. The convergence of the judges on a construct of minimal competence was studied for the standard setting study of multiple-choice items for three tests of the NJHSPT for grade 11. In all, 78 judges were involved. (Contains 3 tables and 14 references.) (Author/SLD)
- Published
- 1995
176. Gender Differences in the ACT Mathematics Tests: A Cross Cultural Comparison.
- Author
-
Wang, Xiaoping and Maxey, James
- Abstract
This study focused on gender differences in scores on the American College Testing Program (ACT) assessment in mathematics for a large sample of U.S. high school graduates (over 1,700,000) and a sample of 321 Chinese twelfth graders. A smaller gender difference was found for the Chinese sample than for the U.S. sample. However, a similar pattern of difference was found for both cultures. Gender differences existed not only in score means but also in score distributions. Gender differences increasingly favored males as subject ability level increased, and the magnitude of gender differences was varied across tasks for both cultures. The homogeneous culture for Chinese males and females might be one cause of the smaller gender difference in the Chinese sample. (Contains 5 tables, 9 figures, and 6 references.) (Author/SLD)
- Published
- 1995
177. Item Parameter Recovery for the Nominal Response Model.
- Author
-
De Ayala, R. J.
- Abstract
This study extended item parameter recovery studies in item response theory to the nominal response model (NRM). The NRM may be used with computerized adaptive testing, testlets, demographic items, and items whose alternatives provide educational diagnostic information. Moreover, with the increasing popularity of performance-based assessment, the use of polytomous item response theory models, in general, and the NRM in particular, will more than likely see increased application. Establishing guidelines for reasonable item parameter estimation was seen as fundamental to the use of the NRM. Factors studied through simulation were the sample size ratio, the latent ability distribution, and item information level. Results showed that as the latent ability distribution departs from a uniform distribution the accuracy of estimating the slope parameter decreased. This decrease in accuracy may be compensated for, in part, by increasing the sample size. Moreover, more informative items tended not to be as well estimated as less informative items. The results appear to indicate that if one is interested in estimating ability, a sample size ratio of 5:1 can produce reasonably accurate item parameter estimates for this purpose. (Contains 7 figures, 7 tables, and 26 references.) (Author/SLD)
- Published
- 1995
178. Critical Thinking Ability and Disposition as Factors of Performance on a Written Critical Thinking Test.
- Author
-
Taube, Kurt T.
- Abstract
Critical thinking has been conceptualized as a two-factor system in which critical thinking ability and critical thinking disposition combine to determine actual thinking performance. The present study used confirmatory factor analysis to investigate such a two-factor model empirically. One hundred ninety-eight Purdue University undergraduates completed the Watson-Glaser Critical Thinking Appraisal, the Ennis-Weir Critical Thinking Essay Test, the Need for Cognition Scale (NCS), the AT-20 ambiguity tolerance scale, and the Checklist of Educational Views (CLEV). The students' grade point averages (GPAs) and Scholastic Aptitude Test (SAT) Verbal and Mathematics scores were also collected. The NCS, AT-20, and CLEV served as measures of disposition; Watson-Glaser, SAT-Verbal, and SAT-Math served as measures of thinking ability; and Ennis-Weir and GPA were assumed to measure both ability and disposition. Confirmatory factor analysis indicated that the postulated two-factor model provided a more accurate fit with the data than did a model including only one latent factor and that Ennis-Weir, but not GPA, loads significantly on both factors. (Contains 6 tables and 80 references.) (Author)
- Published
- 1995
179. Differential Objective Function.
- Author
-
Kino, Mary M.
- Abstract
Item response theory (IRT) has been used extensively to study differential item functioning (dif) and to identify potentially biased items. The use of IRT for diagnostic purposes is less prevalent and has received comparatively less attention. This study addressed differential objective function (dof) to identify potentially biased content units. IRT was used to estimate person abilities and item difficulties, which were used to compute residual objective scores. Residual objective scores were analyzed with analysis of variance using the independent variables gender and ethnicity. Data were from mathematics subtests from the 1992 Connecticut Mastery Test census administration of eighth graders and its database of approximately 32,000 Connecticut eighth graders. The examples illustrate how dof outcomes can be used to identify potentially biased content units, to provide diagnostic information at the content level, and to construct profiles of content-based performance for different demographic subgroups. Ten figures and two tables present analysis results. Two appendixes present dif statistics by demographic subgroup and item-level statistics for dof objectives in four tables. (Contains 11 references.) (Author/SLD)
- Published
- 1995
180. The Effects of Cooperative Assessment on Goals, Perceived Ability, Self-Regulation and Achievement.
- Author
-
Griffin, Marlynn M.
- Abstract
Cooperative assessment was investigated in a classroom setting, examining achievement outcomes, effects on motivation, and student perceptions of the cooperative assessment process. Eighty-four undergraduate psychology students participated in this nonequivalent control group study design. It was hypothesized that students taking tests using a cooperative assessment procedure would perform significantly better on a posttest of educational psychology course concepts than would students completing tests in a traditional format. Effects of the treatment on goal orientation, perceived ability, self-regulation, and depth of processing were examined. Analysis of covariance indicated that there were no significant differences between the groups on the posttest, and that the hypothesis was not supported. There were also no differences between groups on measures of goal orientation, perceived ability, and depth of processing. Student reactions to the cooperative assessment procedure were overwhelmingly positive. Students enjoyed taking tests in groups, and felt that they learned more through this process as they discussed and debated the responses to the test items. Two tables illustrate the discussion, and two appendixes provide supplemental information. (Contains 24 references.) (Author/SLD)
- Published
- 1995
181. Assessing the Effect of Multidimensionality on IRT True-Score Equating for Subgroups of Examinees.
- Author
-
De Champlain, Andre F.
- Abstract
The dimensionality of one form of the Law School Admission Test (LSAT) was assessed with respect to three ethnic groups of test takers. Whether differences in the ability composite have any noticeable impact on item response theory (IRT) true score equating results for these subgroups (African Americans, Hispanic Americans, and Whites) was also studied. Results obtained with respect to the dimensionality of the LSAT showed that a two-dimensional model, specifying analytical reasoning and logical reasoning plus reading comprehension as two abilities, adequately accounted for the item responses of both African-American and Caucasian test takers, but a more complex model was required for the Hispanic subgroup. Results obtained in this study suggest that African-American and Hispanic-American conversion lines appear to be equivalent to the equating function of the majority Caucasian group as well as to the one derived from the total test-taker population. In other words, the current practice of applying a conversion function obtained from the total population to all test takers, without regard to ethnicity, does not penalize minority group test takers. Five tables and 15 figures present results of the analyses. (Contains 71 references.) (SLD)
- Published
- 1995
182. Hierarchical Analytic Methods That Yield Different Perspectives on Dynamics: Aids to Interpretation.
- Author
-
McClain, Andrew L.
- Abstract
First- and higher-order factor analyses are explained from a conceptual rather than a mathematical perspective. A case is made for performing higher-order factor analysis when factors are theoretically related. Actual scores of 301 children on 24 ability measures are used to demonstrate interpretation of second-order factors using the FORTRAN program SECONDOR. Higher-order factor analysis using interpretation aids such as the Schmid-Leiman (1957) solution allows the researcher to examine a complex world in a parsimonious manner. Seven tables illustrate the discussion. (Contains 11 references.) (Author/SLD)
- Published
- 1995
183. An Analytical Evaluation of Two Common-Odds Ratios as Population Indicators of DIF.
- Author
-
American Coll. Testing Program, Iowa City, IA. and Pommerich, Mary
- Abstract
The Mantel-Haenszel (MH) statistic for identifying differential item functioning (DIF) commonly conditions on the observed test score as a surrogate for conditioning on latent ability. When the comparison group distributions are not completely overlapping (i.e., are incongruent), the observed score represents different levels of latent ability across groups, and observed score conditioning may be ineffective. In this study, MH common-odds ratios conditioned on observed score and latent ability were evaluated as population indicators of DIF. The performances of the MH common-odds ratios were compared on moderate to high difficulty tests for combinations of degree of distributional incongruence, test length, occurrence of DIF, and ratio of examinees in the comparison groups. Under all conditions, the observed score and latent ability MH common-odds ratios performed similarly, even with fairly incongruent distributions. This provides reassurance in conditioning on observed score when the MH statistic is applied to large finite samples with incongruent group distributions. (Contains seven references, five tables, and two figures.) (Author)
- Published
- 1995
184. An Experiment on Effects of Redundant Audio in Computer Based Instruction on Achievement, Attitude, and Learning Time in 10th Grade Math.
- Author
-
Rehaag, Darlene M. and Szabo, Michael
- Abstract
The effects of the inclusion of matched redundant digital audio on achievement, time spent in learning, and attitude toward computer-based instruction (CBI) delivered mathematics were studied with 82 high school students. Differential effects on students of varying entry learning mathematics performance were also investigated. Subjects were assigned to CBI-audio or CBI-text conditions by stratified matched pairs within three existing classes. Both groups completed three lessons from the Alberta (Canada) CBI mathematics curriculum for grade 10. For the audio condition, lessons were modified by adding redundant audio through male voice instructions. Analysis of scores on a mathematics achievement test did not indicate any effects of CBI delivery mode on comprehension and mastery, but did indicate that redundant audio did reduce time required to complete practice questions, implying greater learning efficiency for the CBI-audio condition. No significant attitude differences were found overall, but lower ability students were more positive in the dual channel (redundant audio) condition. Seven tables illustrate study findings. (Contains 11 references.) (SLD)
- Published
- 1995
185. Digitized Speech as Feedback on Cognitive Aspects of Psychomotor Performance during Computer-Based Instruction.
- Author
-
Huang, James Chin-yun
- Abstract
Computer-based instruction opens new avenues for increasing the variety of possible feedback strategies that a teacher may employ to optimize learner performance. Feedback effectiveness is influenced by the nature of learning task and student ability. This study investigates the effects of digitized feedback and ability on the achievement of college students during computer-based instruction; the achievement of high and low-prior knowledge students was compared among different feedback treatments. A sample of 68 university students from four sections of beginning tennis classes at Chung Cheng University (Taiwan) were categorized as having high or low prior knowledge and randomly assigned to a computer to complete one of three treatments (audio only, voice with text, and voice with text and animation). The computer-based instructional unit was an interactive video lesson on cognitive areas of tennis skill performance. Results indicate that the anticipated interaction between ability and feedback was not confirmed, and that elaborate feedback was most beneficial for cognitive ares of psychomotor skill learning; no matter which ability levels or treatment conditions the subjects were categorized in, they rated the instructional program more positively. (Contains 14 references.) (AEF)
- Published
- 1995
186. A Pattern of Transition to Adulthood Indicated in Plans for the Future of Males with Intellectual Disabilities: Secondary Qualitative Data Analysis
- Author
-
Bartnikowska, Urszula, Cwirynkalo, Katarzyna, and Borowska-Beszta, Beata
- Abstract
The paper examines the category of transition to adulthood of males with intellectual disabilities. It involved secondary analysis of qualitative data from three earlier research projects (Bsdurek 2010, Cwirynkalo 2010, Lysoniek 2014), whose participants were male students of special vocational schools for students with mild intellectual disabilities (MID) (IQ 70-55). The secondary analysis involves overall forty-six in-depth qualitative interviews conducted in Poland. The main research question of the secondary analysis is as follows: What is the pattern of transition to adulthood of the 46 males aged 18-21 with intellectual disabilities? Within the research question, three sub-problems established as detailed research questions were also investigated: What are the components of transition patterns to adulthood? What are the factors facilitating and hindering the process of transition? The results indicate that the main aim of transition to adulthood in the pattern of transition, which was generated from 46 interviews of young Polish males with mild intellectual disabilities, was autonomy. There are several internal components of transition to adulthood: health conditions, self-awareness including skills and limitations, competences, awareness of future educational, work and accommodation tracks etc. The results also show external components of the transition pattern to adulthood. Among them there are ties and social circles and the system of education and access to the labor market. The research results also indicated a variety of factors that facilitated and hindered transition to adulthood of 46 Polish males with mild intellectual disabilities.
- Published
- 2017
187. Opinions of University Music Teachers on the Musical Competencies Necessary for Primary Education Teachers
- Author
-
Begic, Jasna Šulentic, Begic, Amir, and Škojo, Tihana
- Abstract
This paper describes the research conducted in the Republic of Croatia during the 2012/13 academic year. We have gathered opinions from experts, i.e. teaching methods teachers from seven faculties of teacher education, regarding the music teaching competencies necessary for primary education teachers teaching music in the first several grades of elementary school. We used the "Delphi method" in our research, i.e. in our sample survey among teaching methods teachers. The teachers also evaluated the competencies of their students and some elements of teacher education studies course syllabi and programmes. The sample survey among the teachers was implemented via email. The goal of the research was to determine if the programmes of the music courses at the teacher education studies are appropriate for the development of the competencies necessary for students of music education. Teaching methods teachers emphasized the need for more practical training, primarily regarding playing instruments and singing, and they pointed out that the course Teaching Methods in Music is the most important course for the training of future music teachers. Aside from that, they believe that more classes should be devoted to music courses, i.e. they propose to reorganise the contents of the courses by increasing the amount of practical classes and reducing the amount of theory classes. They also believe that it is necessary to introduce testing of musical ability at entrance exams for admission into the teacher education studies.
- Published
- 2017
188. Learner Agency and Its Effect on Spoken Interaction Time in the Target Language
- Author
-
Knight, Janine and Barberà, Elena
- Abstract
This paper presents the results of how four dyads in an online task-based synchronous computer-mediated (TB-SCMC) interaction event use their agency to carry out speaking tasks, and how their choices and actions affect time spent interacting in the target language. A case study approach was employed to analyse the language functions and cognitive and social processing that occurred in audio recordings of spoken interaction between four dyads, alongside other indicators of pre-task behaviour, triangulated with results from learner questionnaires. The study revealed that whilst all cases engaged in overt spoken interaction, some cases also avoided the designed task and engaged in covert pre-task planning. Learners' ability to reconfigure 1) the time mode of the task design; 2) the ways in which technological tools were used and 3) language choice, all impacted on their time spent interacting in the target language. The findings highlight tensions between learners' choices across the three dimensions that they had reconfigured, raising questions as to how to support time in synchronous interaction in the target language whilst supporting learners' agency. The implications are presented and discussed.
- Published
- 2017
189. Robustness of Judgments in Evaluation Research. Research Report 94-10.
- Author
-
Twente Univ., Enschede (Netherlands). Faculty of Educational Science and Technology., van der Linden, Wim J., and Zwarts, Michel A.
- Abstract
It is argued that judgments in evaluative research are ultimately subjective, but that good criteria are available to assess their quality. One of these criteria is the robustness of the judgments against incompleteness or uncertainty in the data used to describe the educational system. The use of the robustness criterion is demonstrated through the case of a recent evaluation project in which the state of elementary education in The Netherlands was evaluated. To test robustness, four different procedures were simulated for item removal: (1) scaling; (2) removal of easy items; (3) removal of difficult items; and (4) removal of extreme items. The robustness study demonstrated that the qualifications used in the evaluation project were quite stable under the removal of items from the pool by these four methods. Nearly all the qualifications met the rigorous criterion of robustness. An appendix discusses the independence of the mean observed score of covariation between abilities. (Contains 3 tables, 8 figures, and 17 references.) (Author/SLD)
- Published
- 1994
190. An Analysis of the Measurement of Study-Strategy.
- Author
-
Gordon, Wayne I.
- Abstract
Study strategies are the activities that an individual uses to facilitate learning. Although no consistent findings exist to show the factors that comprise the study-strategy concept, a three-factor conceptualization (cognitive, affective, and behavioral) is often suggested. These factors were studied with 128 undergraduates of high and low ability based on their scores on the ACT Assessment. The Survey of Study Habits and Attitudes and the Learning and Study Strategies Inventory were completed by each student. To assess construct validity, correlation coefficients were also computed between the various scale scores of the two instruments. The instruments were found to measure at least some of the same constructs or factors. Results indicated that the study-strategy concept is composed of: (1) a personality factor of personal values and feelings; (2) a cognitive skills factor; and (3) a behaviors and techniques factor concerned with the use of study skills. Results do support a three-factor structure of the concept. (Contains 6 tables and 29 references.) (SLD)
- Published
- 1994
191. Construction of a Computerized Adaptive Testing Version of the Quebec Adaptive Behavior Scale.
- Author
-
Tasse, Marc J.
- Abstract
Multilog (Thissen, 1991) was used to estimate parameters of 225 items from the Quebec Adaptive Behavior Scale (QABS). A database containing actual data from 2,439 subjects was used for the parameterization procedures. The two-parameter-logistic model was used in estimating item parameters and in the testing strategy. MicroCAT (Assessment Systems Corporation, 1989) was then used to manage the item banks and Computerized Adaptive Testing (CAT) environment during a simulation run using data from a randomly selected sample of 200 subjects taken from the larger data base. The simulation of the QABS-CAT testing indicates that levels of ability can be estimated for each of the seven skill domains by using only 30% of the items of the conventional version. The numerous advantages of item response theory and CAT as applied to the assessment of adaptive behavior with regard to the changing definition of mental retardation (Luckasson et al., 1992) are discussed. (Author/SLD)
- Published
- 1994
192. Diagnostic Value Resulting from the IRT Modeling of IGAP Reading Data: Using A Graded Response Model To Retrieve and Utilize More Information.
- Author
-
Evans, John Andrew and Ackerman, Terry
- Abstract
The strengths of item-response theory (IRT) are used to examine the degree of information individual test items provide, as well as to investigate how the individual item types contribute to the overall measurement accuracy of the Illinois Goal Assessment Program (IGAP) reading test. Using the graded-response model of Samejima (1969), the amount of information each subtest (narrative and expository) provides about the underlying latent ability is studied. Where an item type provides the most information along this ability scale, and how the different item formats (e.g., number of correct inferences) differ in terms of ability to discriminate between levels of reading proficiency are also studied. Data sets of 4,837, 4,840, and 5,011 randomly selected examinees were obtained for grades 3, 6, and 8, respectively. While the expository subtest is generally more informative than the narrative subtest across the three grade levels for low to moderate theta values, the difference does not appear to be substantial. The graded response model appears to be a promising tool that allows examination of the information from each subtest. Fourteen figures illustrate the findings. (Contains 10 references. (SLD)
- Published
- 1994
193. Response Latency: An Investigation into Determinants of Item-Level Timing.
- Author
-
Parshall, Cynthia G.
- Abstract
Response latency information has been used in the past to provide information for consideration along with response accuracy when obtaining trait level estimates, and more recently, to flag unusual response patterns, to establish appropriate time-to-test limits (Reese, 1993), and to determine predictors of the amount of time needed to administer a given item (Gershon, Bergstrom, & Lunz, 1993). Data for this research were obtained from administration of an adaptive college mathematics placement test to 3,364 examinees. This study investigated item and examinee variables as potential influences of item-level response time. The item variables in this study were presentation order in the test, content classification, and cognitive classification. Examinee variables consisted of estimated ability, average rate of response, gender, ethnicity, age. The examinee-item variables were conditional probability and correctness of response. Results were analyzed through a series of regression models to ascertain the variables that function as the strongest determinants of item latency. There are six tables and three figures. (Contains 14 references.) (Author/SLD)
- Published
- 1994
194. Development of a Valid Subtest for Assessment of DIF/Bias.
- Author
-
Nandakumar, Ratna
- Abstract
By definition, differential item functioning (DIF) refers to unequal probabilities of a correct response to a test item by examinees from two groups when controlled for their ability differences. Simulation results are presented for an attempt to purify a test by separating out multidimensional items under the assumption that the intent of the test constructors was to construct a unidimensional test for a given population. The procedure used to arrive at the purified and essentially unidimensional subtest used the multidimensional theory of DIF/bias proposed by Shealy and Stout (1993) and the statistical procedure DIMTEST for assessing essential unidimensionality. When applicable, the proposed methodology leads to a statistically validated construct valid subtest that can be used in the matching criterion for DIF/bias analysis. This methodology can be applied to an internal or external matching criterion. It is only applicable when the majority of test items are tapping the intended ability for a given population, while a few items are tapping other major abilities in addition to the intended ability. This method is not applicable when DIF/bias is pervasive. Ten tables summarize analysis results. (Contains 28 references.) (SLD)
- Published
- 1994
195. Controlling for Demographic Characteristics in Person Measures Using a Many-Faceted Rasch Model.
- Author
-
Bode, Rita K.
- Abstract
A benefit of using the multifaceted Rasch model is the ability to factor out or control for confounding factors in the estimation of person ability and item difficulty. This study experiments with a variation of the multifaceted Rasch analysis in calibrating the effects of demographic characteristics that are intended to overcome the problem of overparameterized person measures. The sample consisted of 1,319 U.S. eighth graders who participated in the Second International Mathematics Study (SIMS). The instrument was a seven-item measure of student effort taken from a questionnaire developed for the SIMS. In attempting to calibrate effects of the demographic characteristics, the FACETS program was not able to determine how much of person ability was due to the individual and how much was due to the gender or the racial/ethnic category. The proposed variation reverses the order in which the facets are calibrated, determines the gender and racial/ethnic effects and allocates the residual to person ability. This approach produces unambiguous person measures and, for these data, appears to make adjustments to the person measures for individuals based on their group membership. Two tables present study findings, and an appendix describes the variables. (Contains 5 references.) (SLD)
- Published
- 1994
196. Testing the Robustness of DIMTEST on Nonnormal Ability Distributions.
- Author
-
Nandakumar, Ratna and Yu, Feng
- Abstract
DIMTEST is a statistical test procedure for assessing essential unidimensionality of binary test item responses. The test statistic T used for testing the null hypothesis of essential unidimensionality is a nonparametric statistic. That is, there is no particular parametric distribution assumed for the underlying ability distribution or for the item characteristic curves generating item response in the mathematical derivation of probability distribution of the statistic T. The purpose of the present study is to empirically investigate the robustness of the statistic T with respect to ability distributions. Several nonnormal distributions, both symmetric and nonsymmetric, are considered in simulations involving six different types of ability distributions. In addition, test length and sample size are used as parameters in the present study. Simulation results indicate that the performance of Stout's statistics T subscript c and T subscript p are consistent with their theoretical developments, in that no particular shape is assumed for examinee abilities. That is, these statistics are robust against the shape of the ability distribution. Included are seven tables. (Contains 8 references.) (Author/SLD)
- Published
- 1994
197. Are Reading Comprehension Tasks Affected by Line References in Test Items?
- Author
-
Huntley, Renee M. and Miller, Sherri
- Abstract
Whether the shaping of test items can itself result in qualitative differences in examinees' comprehension of reading passages was studied using the Pearson-Johnson item classification system. The specific practice studied incorporated, within an item stem line, references that point the examinee to a specific location within a reading passage. Versions of 71 test items with and without line references were prepared and classified by the Pearson-Johnson system as textually explicit, textually implicit, or scriptally implicit. Each experimental unit was administered as part of the ACT Assessment to nearly 425 examinees. The practice of citing specific lines of text generally served to make items easier, although mainly for low-ability examinees, thus accounting for the consistently lower discrimination in the version with line references. The performance of males and females, or Blacks and Whites, however, was not differentially affected. That performance is affected by the shaping of test items has important implications for test construction. It remains to be studied whether adding a line reference gives an advantage to an examinee in a timed test. Four tables present study findings. (Contains 6 references.) (SLD)
- Published
- 1994
198. The Performance of the Mantel-Haenszel DIF Statistic When Comparison Group Distributions Are Incongruent.
- Author
-
Pommerich, Mary
- Abstract
The functioning of two population-based Mantel-Haenszel (MH) common-odds ratios was compared. One ratio is conditioned on the observed test score, while the other is conditioned on a latent trait or true ability score. When the comparison group distributions are incongruent or nonoverlapping to some degree, the observed score represents different levels of latent ability across the comparison groups, raising a question as to the effectiveness of observed score matching under conditions that could influence performance of the MH statistic in the identification of differential item functioning (DIF). The current study varies from typical simulation methodology in that the sample sizes are assumed to be infinite, and the observed score MH common-odds ratio is computed from the expected cell frequencies of the 2x2 contingency tables. A MH common-odds ratio based on latent ability is computed to define a measure of true DIF. Under all conditions examined, the observed score MH performed similarly to the latent ability MH. This provides reassurance in conditioning on the observed score when the MH statistic is applied to large finite samples with comparison groups that are not completely overlapping. Two figures and five tables are included. (Contains 6 references.) (SLD)
- Published
- 1994
199. Effects of Mathematics Test Content Specificity on Essential Dimensionality in U.S. and Japan Data.
- Author
-
Wang, Yu-Chung Lawrence and Hocevar, Dennis
- Abstract
The major goal of this study is to apply the essential unidimensionality statistic of W. Stout and the corresponding computer program (DIMTEST) to a hierarchical level mathematics achievement data set and to determine the extent to which the undimensional assumption can be accurately applied to mathematics achievement data. The study also ascertains if the unidimensionality assumption is more tenable when applied to specific subsets of items than to broader categories of items. A comparison of the essential unidimensionality structure across cultures is also performed. Results indicate that in the Japanese and U.S. data form the Second International Mathematics Study (SIMS), there are several subscales in SIMS mathematics tests, and that individual scores should be calibrated on each subscale rather than on a total score in the SIMS test. Essential unidimensionality estimates for the four tests were not the same in the two countries, calling into question the equivalence of dimensionality of the four tests. Either items on the test are more unidimensional in Japan, or the ability spaces among Japanese students are more homogeneous than for U.S. students. Eleven tables are included. (Contains 10 references.) (Author/SLD)
- Published
- 1994
200. A Simultaneous Approach to Multi-Factor DIF Analysis.
- Author
-
Tang, Huixing
- Abstract
A method is presented for the simultaneous analysis of differential item functioning (DIF) in multi-factor situations. The method is unique in that it combines item response theory (IRT) and analysis of variance (ANOVA), takes a simultaneous approach to multifactor DIF analysis, and is capable of capturing interaction and controlling for possible confounding variables. It is referred to as the IRT-ANOVA method. The most salient feature is that the procedure used IRT to control for group ability differences and familiar inferential procedures to test the DIF effect. Residuals can be construed as item scores free from the effects of both personal ability and item difficulty. The use of ANOVA provides not only a test statistic based on a familiar distribution, but also descriptive measures of DIF magnitude in terms of group means and variances. Simulations in the one-factor, two-group situation reveal the usefulness of the approach and indicate error rates. Seven figures illustrate the simulations. (Contains 8 references.) (SLD)
- Published
- 1994
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.