178 results on '"Item fit"'
Search Results
2. Item Response Theory Assumptions: A Comprehensive Review of Studies with Document Analysis.
- Author
-
Yiğiter, Mahmut Sami and Boduroğlu, Erdem
- Subjects
ITEM response theory ,RASCH models ,CLINICAL health psychology ,STIMULUS & response (Psychology) ,FACTOR analysis ,SAMPLE size (Statistics) - Abstract
Item Response Theory (IRT), over its nearly 100-year history, has become one of the most popular methodologies for modeling response patterns in measures in education, psychology and health. Due to its advantages, IRT is particularly popular in large-scale assessments. A pre-condition for the validity of the estimations obtained from IRT is that the data meet the model assumptions. The purpose of this study is to examine the testing of model assumptions in studies using IRT models. For this purpose, 107 studies in the National Thesis Center of the Council of Higher Education that use the IRT model on real data were examined. The studies were analyzed according to sample size, unidimensionality, local independence, overall model fit, item fit and non-speedness test criteria. According to the results, it was observed that the unidimensionality assumption was tested at a high level (89%) and Factor Analytic approaches were predominantly used. Local independence assumption was not tested in 36% of the studies, unidimensionality was cited as evidence in 40% of the studies and tested in 24% of the studies. Overall model fit was tested at a moderate level (51%) and Log-Likelihood and information criteria were used. Item fit and Non-Speedness testing were tested at a low level (26% and 9%). IRT assumptions should be considered as a whole and all assumptions should be tested from an evidence-based perspective. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Validity and reliability of the situational judgment test instrument for assessing the ‘Pancasila’ student profile
- Author
-
Tri Wahyuni, Wardani Rahayu, and Dali Santun Naga
- Subjects
Pancasila student profile ,situational judgement test ,confirmatory factor analysis ,item fit ,construct reliability ,Education - Social Sciences ,Education (General) ,L7-991 - Abstract
Character measurement has become increasingly relevant and needed in recent years due to its significant impact on an individual’s success in various aspects of life. However, existing methods of character measurement such as self-assessment, peer assessment, and observation, still have many limitations. The Situational Judgement Test (SJT) is a behavioral measurement tool with the ability to minimise measurement bias. Therefore, this study aims to demonstrate the content and construct validity of the SJT-based PSP instrument and estimate its reliability. The research findings include an Aiken’s V index of 0.864, loading factor of all variables ≥ 0.5, Goodness of Fit statistics with χ2/df = 122.299/48, CFI = 0.938, TLI = 0.915, RMSEA = 0.066, and SRMR = 0.046, all within the good fit category. The empirical data of PSP align with the GRM model, with item fitness MNSQ method showing 59 items fit to the model. PSP has a Construct Reliability (CR) of 0.967 and an Average Variance Explained (AVE) of 0.830. These findings contribute theoretically and practically to the field of measurement, indicating that the SJT-based instrument is valid and reliable for measuring the PSP.
- Published
- 2024
- Full Text
- View/download PDF
4. Development and Evaluation of the Abdominal Pain Knowledge Questionnaire (A-PKQ) for Children and Their Parents.
- Author
-
Neß, Verena, Humberg, Clarissa, Lucius, Franka, Eidt, Leandra, Berger, Thomas, Claßen, Martin, Syring, Nils Christian, Berrang, Jens, Vietor, Christine, Buderus, Stephan, Rau, Lisa-Marie, and Wager, Julia
- Subjects
HEALTH literacy ,PARENTS ,SELF-efficacy ,CHRONIC pain ,DATA analysis ,DIFFERENTIAL item functioning (Research bias) ,RESEARCH funding ,RESEARCH methodology evaluation ,QUESTIONNAIRES ,ABDOMINAL pain ,STATISTICAL sampling ,RESEARCH evaluation ,MAXIMUM likelihood statistics ,CHI-squared test ,DESCRIPTIVE statistics ,EXPERIMENTAL design ,RESEARCH methodology ,STATISTICS ,DATA analysis software - Abstract
Background: Abdominal pain is a common and often debilitating issue for children and adolescents. In many cases, it is not caused by a specific somatic condition but rather emerges from a complex interplay of bio-psycho-social factors, leading to functional abdominal pain (FAP). Given the complex nature of FAP, understanding its origins and how to effectively manage this condition is crucial. Until now, however, no questionnaire exists that targets knowledge in this specific domain. To address this, the Abdominal Pain Knowledge Questionnaire (A-PKQ) was developed. Methods: Two versions were created (one for children and one for parents) and tested in four gastroenterology clinics and one specialized pain clinic in Germany between November 2021 and February 2024. Children between 8 and 17 years of age (N = 128) and their accompanying parents (N = 131) participated in the study. Rasch analysis was used to test the performance of both versions of the questionnaire. Results: The original questionnaires exhibited good model and item fit. Subsequently, both questionnaires were refined to improve usability, resulting in final versions containing 10 items each. These final versions also demonstrated good model and item fit, with items assessing a variety of relevant domains. Conclusion: The A-PKQ is an important contribution to improving assessment in clinical trials focused on pediatric functional abdominal pain. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. 认知诊断模型中题目拟合评估的研究.
- Author
-
高旭亮, 王芳, 夏林坡, and 侯敏敏
- Abstract
Copyright of Psychological Science is the property of Psychological Science Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
6. Examining Local Item Dependence in a Cloze Test with the Rasch Model.
- Author
-
Abdullaev, Diyorjon, Shukhratovna, Djuraeva Laylo, Rasulovna, Jamoldinova Odinaxon, Umirzakovich, Jumanazarov Umid, and Staroverova, Olga V.
- Abstract
This article explores the occurrence and impact of local item dependence (LID) in a cloze test. LID refers to situations where responses to test items are influenced by other items. The researchers used the Rasch model to analyze a cloze test and identified three pairs of items that exhibited LID. Removing these items improved the test's fit and unidimensionality, but did not affect the overall ability of test takers. The study emphasizes the importance of detecting and addressing LID in language testing, while also suggesting the need for alternative approaches to handle LID without compromising the test's properties. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
7. أثر جودة مطابقة الفقرات للنموذج ثلاثي المعلم على البنية العاملية للمقياس باستخدام التحليل العاملي التوكيدي (دراسة محاكاة).
- Author
-
بشائر عبدالله ال and ربيع عبده أحمد رش
- Abstract
Copyright of Humanities & Educational Sciences Journal is the property of Humanities & Educational Sciences Journal and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
8. A Robust Method for Detecting Item Misfit in Large-Scale Assessments.
- Author
-
von Davier, Matthias and Bezirhan, Ummugul
- Subjects
- *
STATISTICS , *STRUCTURAL equation modeling , *RESEARCH methodology evaluation , *RESEARCH methodology , *MENTAL health , *SIMULATION methods in education , *PSYCHOLOGICAL tests , *PSYCHOSOCIAL factors , *DIFFERENTIAL item functioning (Research bias) , *DATA analysis , *MEDICAL coding , *EVALUATION - Abstract
Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical DIF assumptions such as the monotonicity and population independence of item functions are present even in classical test theory but are more explicitly stated when using item response theory or other latent variable models for the assessment of item fit. The work presented here provides a robust approach for DIF detection that does not assume perfect model data fit, but rather uses Tukey's concept of contaminated distributions. The approach uses robust outlier detection to flag items for which adequate model data fit cannot be established. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Item-Response-Theorie (IRT) : Messung nicht direkt beobachtbarer Fähigkeiten anhand kategorialer Itemantworten
- Author
-
Wagner, Wolfgang, Weißeno, Georg, editor, and Ziegler, Béatrice, editor
- Published
- 2022
- Full Text
- View/download PDF
10. Practical significance of item misfit and its manifestations in constructs assessed in large-scale studies
- Author
-
Katharina Fährmann, Carmen Köhler, Johannes Hartig, and Jörg-Henrik Heine
- Subjects
Item fit ,Practical significance of item fit ,Item response theory ,PISA 2018 field trial data ,Large scale assessment ,Educational measurement ,Education (General) ,L7-991 - Abstract
Abstract When scaling psychological tests with methods of item response theory it is necessary to investigate to what extent the responses correspond to the model predictions. In addition to the statistical evaluation of item misfit, the question arises as to its practical significance. Although item removal is undesirable for several reasons, its practical consequences are rarely investigated and focus mostly on main survey data with pre-selected items. In this paper, we identify criteria to evaluate practical significance and discuss them with respect to various types of assessments and their particular purposes. We then demonstrate the practical consequences of item misfit using two data examples from the German PISA 2018 field trial study: one with cognitive data and one with non-cognitive/metacognitive data. For the former, we scale the data under the GPCM with and without the inclusion of misfitting items, and investigate how this influences the trait distribution and the allocation to reading competency levels. For non-cognitive/metacognitive data, we explore the effect of excluding misfitting items on estimated gender differences. Our results indicate minor practical consequences for person allocation and no changes in the estimated gender-difference effects.
- Published
- 2022
- Full Text
- View/download PDF
11. Psychometric properties of Leisure Satisfaction Scale (LSS)-short form: a Rasch rating model calibration approach
- Author
-
Sae-Hyung Kim and Dongwook Cho
- Subjects
Differential item functioning ,Evaluation ,Instrument development ,Item fit ,Leisure satisfaction ,Person-item map ,Psychology ,BF1-990 - Abstract
Abstract Background Leisure satisfaction has been one of primary variables to explain an individual’s choice of leisure and recreational activities’ participation. The Leisure Satisfaction Scale (LSS)-short form has been widely utilized to measure leisure and recreation participants’ satisfaction levels. However, limited research has been studied on the LSS-short form that would provide sufficient evidence to use it to measure individual leisure satisfaction levels. Thus, the purpose of the study was to determine whether the LSS-short form would be appropriate to measure individuals’ leisure satisfaction levels. Method The convenience sampling was used in this study from the south-central United States. The LSS-short form questionnaire was administered to 436 individuals after removing 20 surveys due to incomplete questions. The WINSTEPS computer program was utilized to analyze the Rating scale fit; Item fit; Differential Item Functioning (DIF); and Person-Item map by utilizing Rasch rating scale model. Results The results indicated that the five-point Likert-type LSS-short form was appropriate to utilize. Two of 24 LSS-short form items had overfit or misfit and were eliminated. DIF indicated that all remained 22 items were suitable to measure leisure satisfaction levels. Overall, 22 item were finally selected for the reconstructed version of the LSS-short form. In addition, Person-Item map showed that ability and item difficulty were fit matched. Conclusions As the importance of leisure has been increased, the newly reconstructed LSS-short form would be recommended to evaluate individual leisure satisfaction levels in future studies. Furthermore, leisure and recreation professionals can provide and develop effective leisure activities or programs by measuring individual’s leisure satisfaction level with the new version of LSS-short form.
- Published
- 2022
- Full Text
- View/download PDF
12. Modified Item-Fit Indices for Dichotomous IRT Models with Missing Data.
- Author
-
Zhang, Xue and Wang, Chun
- Subjects
- *
MISSING data (Statistics) , *ERROR functions , *DATA modeling , *ACQUISITION of data - Abstract
Item-level fit analysis not only serves as a complementary check to global fit analysis, it is also essential in scale development because the fit results will guide item revision and/or deletion (Liu & Maydeu-Olivares, 2014). During data collection, missing response data may likely happen due to various reasons. Chi-square-based item fit indices (e.g., Yen's Q 1 , McKinley and Mill's G 2 , Orlando and Thissen's S-X 2 and S-G 2 ) are the most widely used statistics to assess item-level fit. However, the role of total scores with complete data used in S-X 2 and S-G 2 is different from that with incomplete data. As a result, S-X 2 and S-G 2 cannot handle incomplete data directly. To this end, we propose several modified versions of S-X 2 and S-G 2 to evaluate item-level fit when response data are incomplete, named as M impute -X 2 and M impute -G 2 , of which the subscript " impute " denotes different imputation methods. Instead of using observed total scores for grouping, the new indices rely on imputed total scores by either a single imputation method or three multiple imputation methods (i.e., two-way with normally distributed errors, corrected item-mean substitution with normally distributed errors and response function imputation). The new indices are equivalent to S-X 2 and S-G 2 when response data are complete. Their performances are evaluated and compared via simulation studies; the manipulated factors include test length, sources of misfit, misfit proportion, and missing proportion. The results from simulation studies are consistent with those of Orlando and Thissen (2000, 2003), and different indices are recommended under different conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. Re-examining the Utility of the Individualised Classroom Environment Questionnaire (ICEQ) Using the Rasch Model
- Author
-
Ben, Francisco and Khine, Myint Swe, editor
- Published
- 2020
- Full Text
- View/download PDF
14. Practical significance of item misfit and its manifestations in constructs assessed in large-scale studies.
- Author
-
Fährmann, Katharina, Köhler, Carmen, Hartig, Johannes, and Heine, Jörg-Henrik
- Subjects
PSYCHOMETRICS ,PSYCHOLOGICAL tests ,PSYCHOLOGICAL techniques ,ITEM response theory ,TEST methods ,EDUCATIONAL tests & measurements ,METACOGNITION - Abstract
When scaling psychological tests with methods of item response theory it is necessary to investigate to what extent the responses correspond to the model predictions. In addition to the statistical evaluation of item misfit, the question arises as to its practical significance. Although item removal is undesirable for several reasons, its practical consequences are rarely investigated and focus mostly on main survey data with pre-selected items. In this paper, we identify criteria to evaluate practical significance and discuss them with respect to various types of assessments and their particular purposes. We then demonstrate the practical consequences of item misfit using two data examples from the German PISA 2018 field trial study: one with cognitive data and one with non-cognitive/metacognitive data. For the former, we scale the data under the GPCM with and without the inclusion of misfitting items, and investigate how this influences the trait distribution and the allocation to reading competency levels. For non-cognitive/metacognitive data, we explore the effect of excluding misfitting items on estimated gender differences. Our results indicate minor practical consequences for person allocation and no changes in the estimated gender-difference effects. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. Psychometric properties of Leisure Satisfaction Scale (LSS)-short form: a Rasch rating model calibration approach.
- Author
-
Kim, Sae-Hyung and Cho, Dongwook
- Subjects
RASCH models ,PSYCHOMETRICS ,CONVENIENCE sampling (Statistics) ,LEISURE ,CLIENT satisfaction ,SATISFACTION ,PARTICIPATION - Abstract
Background: Leisure satisfaction has been one of primary variables to explain an individual's choice of leisure and recreational activities' participation. The Leisure Satisfaction Scale (LSS)-short form has been widely utilized to measure leisure and recreation participants' satisfaction levels. However, limited research has been studied on the LSS-short form that would provide sufficient evidence to use it to measure individual leisure satisfaction levels. Thus, the purpose of the study was to determine whether the LSS-short form would be appropriate to measure individuals' leisure satisfaction levels. Method: The convenience sampling was used in this study from the south-central United States. The LSS-short form questionnaire was administered to 436 individuals after removing 20 surveys due to incomplete questions. The WINSTEPS computer program was utilized to analyze the Rating scale fit; Item fit; Differential Item Functioning (DIF); and Person-Item map by utilizing Rasch rating scale model. Results: The results indicated that the five-point Likert-type LSS-short form was appropriate to utilize. Two of 24 LSS-short form items had overfit or misfit and were eliminated. DIF indicated that all remained 22 items were suitable to measure leisure satisfaction levels. Overall, 22 item were finally selected for the reconstructed version of the LSS-short form. In addition, Person-Item map showed that ability and item difficulty were fit matched. Conclusions: As the importance of leisure has been increased, the newly reconstructed LSS-short form would be recommended to evaluate individual leisure satisfaction levels in future studies. Furthermore, leisure and recreation professionals can provide and develop effective leisure activities or programs by measuring individual's leisure satisfaction level with the new version of LSS-short form. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
16. Statistical Properties of Estimators of the RMSD Item Fit Statistic.
- Author
-
Robitzsch, Alexander
- Subjects
ITEM response theory ,MATHEMATICAL models ,STATISTICS ,NUMERICAL analysis ,PARAMETER estimation - Abstract
In this article, statistical properties of the root mean square deviation (RMSD) item fit statistic in item response models are studied. It is shown that RMSD estimates will indicate even misfit for items whose parametric assumption of the item response function is correct (i.e., fitting items) if some item response functions in the test are misspecified. Moreover, it is demonstrated that the RMSD values of misfitting and fitting items depend on the proportion of misfitting items. We propose three alternative bias-corrected RMSD estimators that reduce the bias for fitting items. However, these alternative estimators provide slightly negatively biased estimates for misfitting items compared to the originally proposed RMSD statistic. In the numerical experiments, we study the case of a misspecified one-parameter logistic item response model and the behavior of the RMSD statistic if differential item functioning occurs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. On the Generalized S−X2 –Test of Item Fit: Some Variants, Residuals, and a Graphical Visualization.
- Author
-
Ranger, Jochen and Brauer, Kay
- Subjects
FALSE positive error ,ITEM response theory ,ERROR rates ,VISUALIZATION ,TEST scoring - Abstract
The generalized S − X 2 –test is a test of item fit for items with polytomous responses format. The test is based on a comparison of the observed and expected number of responses in strata defined by the test score. In this article, we make four contributions. We demonstrate that the performance of the generalized S − X 2 –test depends on how sparse cells are pooled. We propose alternative implementations of the test within the framework of limited information testing. We derive the distribution of the S − X 2 –residuals that can be used for post hoc analyses. We suggest a diagnostic plot that visualizes the form of the misfit. The performance of the alternative implementations is investigated in a simulation study. The simulation study suggests that the alternative implementations are capable of controlling the Type-I error rate well and have high power. An empirical application concludes this article. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
18. The Crit coefficient in Mokken scale analysis: a simulation study and an application in quality-of-life research.
- Author
-
Crișan, Daniela R., Tendeiro, Jorge N., and Meijer, Rob R.
- Abstract
Purpose: In Mokken scaling, the Crit index was proposed and is sometimes used as evidence (or lack thereof) of violations of some common model assumptions. The main goal of our study was twofold: To make the formulation of the Crit index explicit and accessible, and to investigate its distribution under various measurement conditions. Methods: We conducted two simulation studies in the context of dichotomously scored item responses. We manipulated the type of assumption violation, the proportion of violating items, sample size, and quality. False positive rates and power to detect assumption violations were our main outcome variables. Furthermore, we used the Crit coefficient in a Mokken scale analysis to a set of responses to the General Health Questionnaire (GHQ-12), a self-administered questionnaire for assessing current mental health. Results: We found that the false positive rates of Crit were close to the nominal rate in most conditions, and that power to detect misfit depended on the sample size, type of violation, and number of assumption-violating items. Overall, in small samples Crit lacked the power to detect misfit, and in larger samples power differed considerably depending on the type of violation and proportion of misfitting items. Furthermore, we also found in our empirical example that even in large samples the Crit index may fail to detect assumption violations. Discussion: Even in large samples, the Crit coefficient showed limited usefulness for detecting moderate and severe violations of monotonicity. Our findings are relevant to researchers and practitioners who use Mokken scaling for scale and questionnaire construction and revision. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. A semiparametric approach for item response function estimation to detect item misfit.
- Author
-
Köhler, Carmen, Robitzsch, Alexander, Fährmann, Katharina, Davier, Matthias, and Hartig, Johannes
- Subjects
- *
ITEM response theory , *SAMPLE size (Statistics) , *DEPENDENCE (Statistics) , *REGULARIZATION parameter - Abstract
When scaling data using item response theory, valid statements based on the measurement model are only permissible if the model fits the data. Most item fit statistics used to assess the fit between observed item responses and the item responses predicted by the measurement model show significant weaknesses, such as the dependence of fit statistics on sample size and number of items. In order to assess the size of misfit and to thus use the fit statistic as an effect size, dependencies on properties of the data set are undesirable. The present study describes a new approach and empirically tests it for consistency. We developed an estimator of the distance between the predicted item response functions (IRFs) and the true IRFs by semiparametric adaptation of IRFs. For the semiparametric adaptation, the approach of extended basis functions due to Ramsay and Silverman (2005) is used. The IRF is defined as the sum of a linear term and a more flexible term constructed via basis function expansions. The group lasso method is applied as a regularization of the flexible term, and determines whether all parameters of the basis functions are fixed at zero or freely estimated. Thus, the method serves as a selection criterion for items that should be adjusted semiparametrically. The distance between the predicted and semiparametrically adjusted IRF of misfitting items can then be determined by describing the fitting items by the parametric form of the IRF and the misfitting items by the semiparametric approach. In a simulation study, we demonstrated that the proposed method delivers satisfactory results in large samples (i.e., N ≥ 1,000). [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
20. Performance of the S −χ 2 Statistic for the Multidimensional Graded Response Model.
- Author
-
Su, Shiyang, Wang, Chun, and Weiss, David J.
- Subjects
- *
STATISTICS , *CONFIDENCE intervals , *MATHEMATICAL models , *THEORY , *RESEARCH funding , *DESCRIPTIVE statistics , *DIAGNOSTIC errors , *STATISTICAL models , *RECEIVER operating characteristic curves - Abstract
S − χ 2 is a popular item fit index that is available in commercial software packages such as flex MIRT. However, no research has systematically examined the performance of S − χ 2 for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was to evaluate the performance of S − χ 2 under two practical misfit scenarios: first, all items are misfitting due to model misspecification, and second, a small subset of items violate the underlying assumptions of the MGRM. Simulation studies showed that caution should be exercised when reporting item fit results of polytomous items using S − χ 2 within the context of the MGRM, because of its inflated false positive rates (FPRs), especially with a small sample size and a long test. S − χ 2 performed well when detecting overall model misfit as well as item misfit for a small subset of items when the ordinality assumption was violated. However, under a number of conditions of model misspecification or items violating the homogeneous discrimination assumption, even though true positive rates (TPRs) of S − χ 2 were high when a small sample size was coupled with a long test, the inflated FPRs were generally directly related to increasing TPRs. There was also a suggestion that performance of S − χ 2 was affected by the magnitude of misfit within an item. There was no evidence that FPRs for fitting items were exacerbated by the presence of a small percentage of misfitting items among them. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
21. A Bias-Corrected RMSD Item Fit Statistic: An Evaluation and Comparison to Alternatives.
- Author
-
Köhler, Carmen, Robitzsch, Alexander, and Hartig, Johannes
- Subjects
FALSE positive error ,ITEM response theory ,STATISTICAL hypothesis testing ,ERROR rates ,LIKERT scale ,MODEL theory - Abstract
Testing whether items fit the assumptions of an item response theory model is an important step in evaluating a test. In the literature, numerous item fit statistics exist, many of which show severe limitations. The current study investigates the root mean squared deviation (RMSD) item fit statistic, which is used for evaluating item fit in various large-scale assessment studies. The three research questions of this study are (1) whether the empirical RMSD is an unbiased estimator of the population RMSD; (2) if this is not the case, whether this bias can be corrected; and (3) whether the test statistic provides an adequate significance test to detect misfitting items. Using simulation studies, it was found that the empirical RMSD is not an unbiased estimator of the population RMSD, and nonparametric bootstrapping falls short of entirely eliminating this bias. Using parametric bootstrapping, however, the RMSD can be used as a test statistic that outperforms the other approaches—infit and outfit, S − X
2 —with respect to both Type I error rate and power. The empirical application showed that parametric bootstrapping of the RMSD results in rather conservative item fit decisions, which suggests more lenient cut-off criteria. [ABSTRACT FROM AUTHOR]- Published
- 2020
- Full Text
- View/download PDF
22. Analyzing the Fit of IRT Models With the Hausman Test
- Author
-
Jochen Ranger and Sören Much
- Subjects
item response theory ,2-PL model ,model fit ,item fit ,Hausman test ,Psychology ,BF1-990 - Abstract
In this manuscript, the applicability of the Hausman test to the evaluation of item response models is investigated. The Hausman test is a general test of model fit. The test assesses whether for a model in question the parameter estimates of two different estimators coincide. The test can be implemented for item response models by comparing the parameter estimates of the marginal maximum likelihood estimator with the corresponding parameter estimates of a limited information estimator. For a correctly specified item response model, the difference of the two estimates is normally distributed around zero. The Hausman test can be used for the evaluation of item fit and global model fit. The performance of the test is evaluated in a simulation study. The simulation study suggests that the implemented versions of the test adhere to the nominal Type-I error rate well in samples of 1000 test takers and more. The test is also capable to detect misspecified item characteristic functions, but lacks power to detect violations of the conditional independence assumption.
- Published
- 2020
- Full Text
- View/download PDF
23. Analyzing the Fit of IRT Models With the Hausman Test.
- Author
-
Ranger, Jochen and Much, Sören
- Subjects
CHARACTERISTIC functions ,ITEM response theory ,ERROR rates - Abstract
In this manuscript, the applicability of the Hausman test to the evaluation of item response models is investigated. The Hausman test is a general test of model fit. The test assesses whether for a model in question the parameter estimates of two different estimators coincide. The test can be implemented for item response models by comparing the parameter estimates of the marginal maximum likelihood estimator with the corresponding parameter estimates of a limited information estimator. For a correctly specified item response model, the difference of the two estimates is normally distributed around zero. The Hausman test can be used for the evaluation of item fit and global model fit. The performance of the test is evaluated in a simulation study. The simulation study suggests that the implemented versions of the test adhere to the nominal Type-I error rate well in samples of 1000 test takers and more. The test is also capable to detect misspecified item characteristic functions, but lacks power to detect violations of the conditional independence assumption. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
24. A Comparison of IRT Model Combinations for Assessing Fit in a Mixed Format Elementary School Science Test.
- Author
-
Yılmaz, Hacı Bayram
- Subjects
ELEMENTARY schools ,SCIENCE ,STATISTICS ,LOGISTICS ,ITEM response theory - Abstract
Open ended and multiple choice questions are commonly placed on the same tests; however, there is a discussion on the effects of using different item types on the test and item statistics. This study aims to compare model and item fit statistics in a mixed format test where multiple choice and constructed response items are used together. In this 25-item fourth grade science test administered to 2351 students in 35 schools in Turkey, items are calibrated separately and concurrently utilizing different IRT models. An important aspect of this study is that the effect of the calibration method on model and item fit is investigated on real data. Firstly, while the 1-, 2-, and 3-Parameter Logistic models are utilized to calibrate the binary coded items, the Graded Response Model and the Generalized Partial Credit Model are used to calibrate the open-ended ones. Then, combinations of dichotomous and polytomous models are employed concurrently. The results based on model comparisons revealed that the combination of the 3PL and the Graded Response Model produced the best fit statistics. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
25. Assessment of fit of item response theory models used in large-scale educational survey assessments
- Author
-
Peter W. van Rijn, Sandip Sinharay, Shelby J. Haberman, and Matthew S. Johnson
- Subjects
Generalized residual ,Item fit ,Residual analysis ,Two-parameter logistic model ,Education (General) ,L7-991 - Abstract
Abstract Latent regression models are used for score-reporting purposes in large-scale educational survey assessments such as the National Assessment of Educational Progress (NAEP) and Trends in International Mathematics and Science Study (TIMSS). One component of these models is based on item response theory. While there exists some research on assessment of fit of item response theory models in the context of large-scale assessments, there is a scope of further research on the topic. We suggest two types of residuals to assess the fit of item response theory models in the context of large-scale assessments. The Type I error rates and power of the residuals are computed from simulated data. The residuals are computed using data from four NAEP assessments. Misfit was found for all data sets for both types of residuals, but the practical significance of the misfit was minimal.
- Published
- 2016
- Full Text
- View/download PDF
26. An Evaluation of Overall Goodness-of-Fit Tests for the Rasch Model
- Author
-
Rudolf Debelak
- Subjects
item response theory ,Rasch model ,item fit ,type I error ,power ,Psychology ,BF1-990 - Abstract
For assessing the fit of item response theory models, it has been suggested to apply overall goodness-of-fit tests as well as tests for individual items and item pairs. Although numerous goodness-of-fit tests have been proposed in the literature for the Rasch model, their relative power against several model violations has not been investigated so far. This study compares four of these tests, which are all available in R software: T10, T11, M2, and the LR test. Results on the Type I error rate and the sensitivity to violations of different assumptions of the Rasch model (unidimensionality, local independence on the level of item pairs, equal item discrimination, zero as a lower asymptote for the item characteristic curves, invariance of the item parameters) are reported. The results indicate that the T11 test is comparatively most powerful against violations of the assumption of parallel item characteristic curves, which includes the presence of unequal item discriminations and a non-zero lower asymptote. Against the remaining model violations, which can be summarized as local dependence, M2 is found to be most powerful. T10 and LR are found to be sensitive against violations of the assumption of parallel item characteristic curves, but are insensitive against local dependence.
- Published
- 2019
- Full Text
- View/download PDF
27. Additional Evidence Based on the Internal Structure of the Instrument
- Author
-
McCoach, D. Betsy, Gable, Robert K., Madura, John P., McCoach, D. Betsy, Gable, Robert K., and Madura, John P.
- Published
- 2013
- Full Text
- View/download PDF
28. Assessing Item-Level Fit for Higher Order Item Response Theory Models.
- Author
-
Zhang, Xue, Wang, Chun, and Tao, Jian
- Subjects
- *
ITEM response theory , *MODEL theory , *FACTOR structure - Abstract
Testing item-level fit is important in scale development to guide item revision/deletion. Many item-level fit indices have been proposed in literature, yet none of them were directly applicable to an important family of models, namely, the higher order item response theory (HO-IRT) models. In this study, chi-square-based fit indices (i.e., Yen's Q1, McKinley and Mill's G2, Orlando and Thissen's S-X2, and S-G2) were extended to HO-IRT models. Their performances are evaluated via simulation studies in terms of false positive rates and correct detection rates. The manipulated factors include test structure (i.e., test length and number of dimensions), sample size, level of correlations among dimensions, and the proportion of misfitting items. For misfitting items, the sources of misfit, including the misfitting item response functions, and misspecifying factor structures were also manipulated. The results from simulation studies demonstrate that the S-G2 is promising for higher order items. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
29. The Effects of Q-Matrix Misspecification on Item and Model Fit.
- Author
-
Sünbül, Seçil Ömür and Aşiret, Semih
- Subjects
MATRICES (Mathematics) ,BAYESIAN analysis ,SAMPLE size (Statistics) ,EQUATIONS ,PROBABILITY theory - Abstract
In this study it was aimed to evaluate the effects of various factors such as sample sizes, percentage of misfit items in the test and item quality (item discrimination) on item and model fit in case of misspecification of Q matrix. Data were generated in accordance with DINA model. Q matrix was specified for 4 attributes and 15 items. While data were generated, sample sizes as 1000, 2000, 4000, s and g parameters as low and high discrimination index were manipulated. Three different misspecified Q matrix (overspecified, underspecified and mixed) was developed considering the percentage of misfit items (%20 and %40). In the study, S-X² was used as item fit statistics. Furthermore absolute (abs(fcor), max(X²)) and relative (-2 log-likelihood, Akaike's information criterion (AIC) and Bayesian information criterion (BIC)) model fit statistics were used. Investigating the results obtained from this simulation study, it was concluded that S-X² was sufficient to detect misfit items. When the percentage of misfit items in the test was much or Q matrix was both underspecified and overspecified, the correct detection of both abs(fcor) and max(X²) statistics was approximately 1 or 1. In addition, the correct detection rates of both statistics was high under other conditions, too. AIC and BIC were successful to detect model misfit in the cases where the Q matrix underspecified, whereas, they were failed detect model misfit for other cases. It can be said that the performance of BIC was mostly better than other relative model fit statistics to detect model misfit. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
30. A Rasch Analysis of the Junior Metacognitive Awareness Inventory With Singapore Students.
- Author
-
Ning, Hoi Kwan
- Subjects
- *
COGNITION , *ETHNIC groups , *PSYCHOMETRICS , *SEX distribution , *STATISTICS , *STUDENTS , *DATA analysis , *RESEARCH methodology evaluation , *DIFFERENTIAL item functioning (Research bias) - Abstract
The psychometric properties of the 2 versions of the Junior Metacognitive Awareness Inventory were examined with Singapore student samples. Other than 2 misfitting items and an underutilized response scale, Rasch analysis demonstrated that the instruments have good measurement precision, and no differential item functioning was detected across gender or ethnic groups. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
31. On the Generalized S−X2–Test of Item Fit: Some Variants, Residuals, and a Graphical Visualization
- Author
-
Jochen Ranger and Kay Brauer
- Subjects
Statistics ,Item response theory ,Polytomous Rasch model ,Expected value ,Item fit ,Social Sciences (miscellaneous) ,Education ,Visualization ,Test (assessment) ,Mathematics - Abstract
The generalized [Formula: see text]–test is a test of item fit for items with polytomous responses format. The test is based on a comparison of the observed and expected number of responses in strata defined by the test score. In this article, we make four contributions. We demonstrate that the performance of the generalized [Formula: see text]–test depends on how sparse cells are pooled. We propose alternative implementations of the test within the framework of limited information testing. We derive the distribution of the [Formula: see text]–residuals that can be used for post hoc analyses. We suggest a diagnostic plot that visualizes the form of the misfit. The performance of the alternative implementations is investigated in a simulation study. The simulation study suggests that the alternative implementations are capable of controlling the Type-I error rate well and have high power. An empirical application concludes this article.
- Published
- 2021
- Full Text
- View/download PDF
32. Using Rasch Analysis to Validate the Michigan Hand Outcomes Questionnaire from the Wrist and Radius Injury Surgical Trial
- Author
-
B.S. Chang Wang, Lu Wang, Mayank Jayaram, Melissa J. Shauver, and Kevin C. Chung
- Subjects
medicine.medical_specialty ,Activities of daily living ,Rasch model ,business.industry ,Michigan hand outcomes questionnaire ,Wrist ,Item fit ,Test theory ,medicine.anatomical_structure ,Assessment data ,Physical therapy ,Medicine ,Surgery ,business ,Reliability (statistics) - Abstract
BACKGROUND The Michigan Hand Outcomes Questionnaire is a patient-reported outcome measure that has been validated in many upper extremity disorders using classic test theory. Rasch measurement analysis is a rigorous method of questionnaire validation that offers several advantages over classic test theory and was used to assess the psychometric properties of the Michigan Hand Outcomes Questionnaire. This study used Rasch analysis to evaluate the questionnaire for distal radius fractures in older adults. The incidence and costs of distal radius fractures are rising, and reliable assessment tools are needed to measure outcomes in this growing concern. METHODS Rasch analysis was performed using 6-month assessment data from the Wrist and Radius Injury Surgical Trial. Each domain in the Michigan Hand Outcomes Questionnaire was independently analyzed for threshold ordering, person-item targeting, item fit, differential-item functioning, response dependency, unidimensionality, and internal consistency. RESULTS After collapsing disordered thresholds and removing any misfitting items from the model, five domains (Function, Activities of Daily Living, Work, Pain, and Satisfaction) demonstrated excellent fit to the Rasch model. The Aesthetics domain demonstrated high reliability and internal consistency but had poor fit to the Rasch model. CONCLUSIONS Rasch analysis further supports the reliability and validity of using the Michigan Hand Outcomes Questionnaire to assess hand outcomes in older adults following treatment for distal radius fractures. Results from this study suggest that questionnaire scores should be interpreted in a condition-specific manner, with more emphasis placed on interpreting individual domain scores, rather than the summary Michigan Hand Outcomes Questionnaire score.
- Published
- 2021
- Full Text
- View/download PDF
33. Validation of a Learning Outcome Survey (WOLOS) using the Rasch Model: An Implication on Washback Study
- Author
-
Ainol Madziah Zubairi and Norhaslinda Hassan
- Subjects
Item objective congruence ,Rasch model ,Separation (statistics) ,Applied psychology ,Public university ,Item fit ,Outcome-based education ,Psychology ,Outcome (game theory) ,Reliability (statistics) - Abstract
The assessment reform which has enveloped every part of the world warrants an evaluation of teaching and learning practices through washback study. This is due to the fact that washback is the phenomenon of how testing influences the teaching and learning. Malaysia has adopted Outcome Based Education policy and therefore, the efficacy of its assessment system, Outcome Based Assessment, is deemed pivotal to be evaluated. Against this backdrop, the Washback on Learning Outcome Survey (WOLOS) was developed and validated by means of qualitative (semi-structured interview) and quantitative analysis (Item Objective Congruence) and Rasch Measurement Model). Responses to 150 items by 65 participants from one public university in Malaysia were subjected to the Rasch analysis to ascertain the psychometric properties of the WOLOS. Five criteria within reliability (person and item reliability), validity (separation index, item polarity and item fit) and precision of measurement were evaluated to ensure the usefulness of measurement in WOLOS. Some items were deleted. Subsequently, reanalysis of the criteria provided evidence that WOLOS can be considered a psychometrically reliable instrument for the evaluation of impact of assessment practices on student learning outcomes.
- Published
- 2021
- Full Text
- View/download PDF
34. CURIOSITY TOWARDS STEM EDUCATION: A QUESTIONNAIRE FOR PRIMARY SCHOOL STUDENTS
- Author
-
Jamilah Ahmad and Nyet Moi Siew
- Subjects
Medical education ,Rasch model ,Alpha Value ,media_common.quotation_subject ,05 social sciences ,050301 education ,Validity ,050109 social psychology ,Item fit ,Education ,Cronbach's alpha ,Research studies ,Curiosity ,0501 psychology and cognitive sciences ,Psychology ,0503 education ,Reliability (statistics) ,media_common - Abstract
There are limited research studies about the development of questionnaire to assess the level of primary school students’ curiosity towards STEM education. In this research, curiosity towards STEM Education Questionnaire (CQ-STEM) instrument was developed based on Berlyne’s Theory of Curiosity. CQ-STEM consisted of 10 items measuring the two constructs of curiosity towards STEM, namely exploration and acceptance. A total of 166 fifth graders aged 10 to 11 years enrolled in five urban schools in Sabah, Malaysia made up the research sample. Rasch Measurement Model was applied to determine the validity and reliability of CQ-STEM. The validity of the CQ-STEM instrument was well established among the constructs of exploration and acceptance through the person fit, item fit, item polarity, unidimensionality, and variable map. The CQ-STEM instrument was found to have a high reliability with a Cronbach’s alpha value (KR-20) of .93. CQ-STEM has an excellent item reliability and moderate high item separation value of .96 and 4.83 respectively. In conclusion, CQ-STEM has good validity and high reliability in measuring curiosity towards STEM Education among primary school students. Keywords: curiosity towards STEM Education, primary school students, Rasch Measurement Model, validity and reliability, questionnaire development
- Published
- 2021
- Full Text
- View/download PDF
35. Investigation of Person Ability and Item Fit Instruments of Eco Critical Thinking Skills in Basic Science Concept Materials for Elementary Pre-Service Teachers
- Author
-
Y. Wahyu, Bambang Sumintono, Ashadi Ashadi, W. Purnami, Sarwanto Sarwanto, and Suranto Suranto
- Subjects
Rasch model ,Java ,Critical thinking ,Phenomenon ,Mathematics education ,Item fit ,Psychology ,computer ,Reliability (statistics) ,Teacher education ,Education ,computer.programming_language ,Test (assessment) - Abstract
The study aims to investigate Person Ability and Item Fit instruments for critical thinking skills of the environment (Eco Critical Thinking Skill) in elementary pre-service teachers. Instrument investigations were carried out by describing item fit, separation of item and person, unidimensionality, and reliability of item and person. The research method was carried out quantitatively by collecting data through an open-answer essay test and an ecocritical thinking skill test. Participants in this study were Elementary School Teacher Education (ESTE) students from 3 universities in Surabaya of East Java, Surakarta of Central Java, and Manggarai of East Nusa Tenggara. The number of participants was 110 ESTE students. Data were analyzed with the Rasch Model WINSTEP software. The result of the study is that 36% of ESTE students have high Person Ability, 58% of them have average Person Ability, and the rest (6%) have low Person Ability. Students who have higher abilities are dominated by students from campus in Surabaya at 50%. Item fit Instrument with Mean Square (MnSq) 0.7 - 1.33. ZSTD -1.4 - 2.0 and Pt-measure Corr 0.49–0.6. The research concludes that an instrument of ecocritical thinking skills is a good fit instrument. The distribution of pre-service teacher qualifications from Surakarta and Manggarai is at a medium level, and students from Surabaya-East Java dominate high ability. Elementary pre-service teachers have the most dominant ability to explain and analyze a phenomenon or fact related to environmental damage.
- Published
- 2021
- Full Text
- View/download PDF
36. A Rasch model analysis of two interpretations of ‘not relevant’ responses on the Dermatology Life Quality Index (DLQI)
- Author
-
Ákos Szabó, Péter Holló, Ariel Zoltán Mitev, Miklós Sárdy, Éva Remenyik, Zsuzsanna Beretzky, Valentin Brodszky, Andrea Szegedi, A.K. Poór, Sarolta Kárpáti, Fanni Rencz, and Norbert Wikonkál
- Subjects
Male ,Psychometrics ,Separation (statistics) ,Dermatology ,Item fit ,behavioral disciplines and activities ,Article ,030207 dermatology & venereal diseases ,03 medical and health sciences ,0302 clinical medicine ,Surveys and Questionnaires ,Statistics ,Humans ,Psoriasis ,DLQI-R ,Reliability (statistics) ,Rasch model ,Public Health, Environmental and Occupational Health ,Reproducibility of Results ,Polytomous Rasch model ,Dermatology Life Quality Index ,Differential item functioning ,humanities ,Cross-Sectional Studies ,030220 oncology & carcinogenesis ,Quality of Life ,Female ,Psychology ,‘not relevant’ response - Abstract
Purpose Eight of the ten items of the Dermatology Life Quality Index (DLQI) have a ‘not relevant’ response (NRR) option. There are two possible ways to interpret NRRs: they may be considered ‘not at all’ or missing responses. We aim to compare the measurement performance of the DLQI in psoriasis patients when NRRs are scored as ‘0’ (hereafter zero-scoring) and ‘missing’ (hereafter missing-scoring) using Rasch model analysis. Methods Data of 425 patients with psoriasis from two earlier cross-sectional surveys were re-analysed. All patients completed the paper-based Hungarian version of the DLQI. A partial credit model was applied. The following model assumptions and measurement properties were tested: dimensionality, item fit, person reliability, order of response options and differential item functioning (DIF). Results Principal component analysis of the residuals of the Rasch model confirmed the unidimensional structure of the DLQI. Person separation reliability indices were similar with zero-scoring (0.910) and missing-scoring (0.914) NRRs. With zero-scoring, items 6 (sport), 7 (working/studying) and 9 (sexual difficulties) suffered from item misfit and item-level disordering. With missing-scoring, no misfit was observed and only item 7 was illogically ordered. Six and three items showed DIF for gender and age, respectively, that were reduced to four and three by missing-scoring. Conclusions Missing-scoring NRRs resulted in an improved measurement performance of the scale. DLQI scores of patients with at least one vs. no NRRs cannot be directly compared. Our findings provide further empirical support to the DLQI-R scoring modification that treats NRRs as missing and replaces them with the average score of the relevant items.
- Published
- 2021
- Full Text
- View/download PDF
37. Practical Significance of Item Misfit in Educational Assessments.
- Author
-
Köhler, Carmen and Hartig, Johannes
- Subjects
- *
STATISTICAL hypothesis testing , *ITEM response theory , *STATISTICAL accuracy - Abstract
Testing item fit is an important step when calibrating and analyzing item response theory (IRT)-based tests, as model fit is a necessary prerequisite for drawing valid inferences from estimated parameters. In the literature, numerous item fit statistics exist, sometimes resulting in contradictory conclusions regarding which items should be excluded from the test. Recently, researchers argue to shift the focus from statistical item fit analyses to evaluating practical consequences of item misfit. This article introduces a method to quantify potential bias of relationship estimates (e.g., correlation coefficients) due to misfitting items. The potential deviation informs about whether item misfit is practically significant for outcomes of substantial analyses. The method is demonstrated using data from an educational test. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
38. Evaluating Item Fit Statistic Thresholds in PISA: Analysis of Cross‐Country Comparability of Cognitive Items
- Author
-
Frédéric Robin, Hyo Jeong Shin, Kentaro Yamamoto, Seang-Hwane Joo, and Lale Khorramdel
- Subjects
Cross country ,Computer science ,Comparability ,Statistics ,Cognition ,Measurement invariance ,Item fit ,Statistic ,Education - Published
- 2020
- Full Text
- View/download PDF
39. Application of Rasch Analysis for Development and Psychometric Properties of Adolescents’ Quality of Life Instruments: A Systematic Review
- Author
-
Fatemeh Esmaielzadeh, Camelia Rohani, and Sahar Dabaghi
- Subjects
Rasch model ,business.industry ,Mechanical Engineering ,Energy Engineering and Power Technology ,Guideline ,Management Science and Operations Research ,Cochrane Library ,Item fit ,behavioral disciplines and activities ,Differential item functioning ,humanities ,Test (assessment) ,03 medical and health sciences ,Critical appraisal ,0302 clinical medicine ,Quality of life ,030225 pediatrics ,Medicine ,business ,human activities ,Clinical psychology - Abstract
Background Due to the importance of assessing quality of life (QoL) in healthy and ill adolescents, the evaluation of psychometric properties of these questionnaires is important. Objective To investigate the application of Rasch analysis in psychometric assessment studies on adolescents' QoL instruments, and to evaluate the quality of reporting Rasch parameters in these studies. Methods This systematic review was conducted by searching for papers in electronic databases PubMed, Web of Science, EMBASE, Cochrane Library and Scopus until December 2018. Results After screening 122 papers, 31 remained in the study. Around 68% of the studies used the Rasch analysis for instrument testing and 32% for the development of new instruments. In 77.4% of studies, both classical and Rasch methods were used parallel to data analysis. In 32.2% of studies, healthy adolescents were the main target group. The most commonly used instrument in Rasch studies was, KIDSCREEN, administered in different countries. Six Rasch parameters were reported with a higher percentage in the studies. Major reported parameters of Rasch analysis were application of the software program (96.7%), test of item fit to the Rasch model (93.5%), unidimensionality (80.6%), type of the identified mathematical Rasch model (74.1%), threshold (58%) and differential item functioning (54.8%). Based on the psychometric evaluation of the QoL instruments, 71% of studies showed acceptable results. Conclusion The application of the Rasch model for psychometric assessment of adolescents' QoL questionnaires has increased in recent decades. But, there is still no strong and commonly used critical appraisal tool or guideline for the evaluation of these papers.
- Published
- 2020
- Full Text
- View/download PDF
40. An Investigation of Chi-Square and Entropy Based Methods of Item-Fit Using Item Level Contamination in Item Response Theory
- Author
-
Brandi A. Weiss and William Dardick
- Subjects
Statistics and Probability ,05 social sciences ,Monte Carlo method ,050401 social sciences methods ,Item fit ,01 natural sciences ,010104 statistics & probability ,0504 sociology ,Item response theory ,Statistics ,Chi-square test ,Entropy (information theory) ,0101 mathematics ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
New variants of entropy as measures of item-fit in item response theory are investigated. Monte Carlo simulation(s) examine aberrant conditions of item-level misfit to evaluate relative (compare EMRj, X2, G2, S-X2, and PV-Q1) and absolute (Type I error and empirical power) performance. EMRj has utility in discovering misfit.
- Published
- 2020
- Full Text
- View/download PDF
41. Validation of exercise motivations inventory – 2 (EMI-2) scale for college students
- Author
-
Sae-Hyung Kim and Dongwook Cho
- Subjects
Motivation ,Rasch model ,Psychometrics ,Universities ,Scale (ratio) ,Applied psychology ,Public Health, Environmental and Occupational Health ,Reproducibility of Results ,Item difficulty ,Item fit ,Differential item functioning ,Rating scale ,Surveys and Questionnaires ,Rating scale model ,Psychometric software ,Humans ,Students ,Psychology - Abstract
Objective: The purpose of this study was to determine whether the Exercise Motivations Inventory - 2 (EMI-2) scale would be appropriate to measure college students' exercise motivation. Participants: The EMI-2 scale questionnaire was administered to 325 college students in the southwestern U.S. Method: The WINSTEPS program was conducted to analyze Rating Scale Fit, Differential Item Functioning (DIF), and Item fit by applying Rasch rating scale model calibration. Results: A 5-point Likert-type rating scale of the EMI-2 was more appropriate to investigate college students' exercise motivation. Seventeen of 51 items were selected as the DIF, and one item had over standard item fit. Overall, 33 items were finally selected for a new version of the EMI-2 scale for college students. Additionally, Person-Item map showed that person ability and item difficulty were fit matched. Conclusions: This reconstructed EMI-2 scale can be utilized to assess exercise motivations of college students.
- Published
- 2020
- Full Text
- View/download PDF
42. Evaluation of the Multiple Sclerosis Walking Scale-12 (MSWS-12) in a Dutch sample: Application of item response theory.
- Author
-
Mokkink, Lidwine Brigitta, Galindo-Garre, Francisca, and Uitdehaag, Bernard M. J.
- Subjects
- *
MULTIPLE sclerosis , *WALKING , *ITEM response theory , *PSYCHOMETRICS , *PATIENTS ,MULTIPLE sclerosis research - Abstract
Background: The Multiple Sclerosis Walking Scale-12 (MSWS-12) measures walking ability from the patients’ perspective. We examined the quality of the MSWS-12 using an item response theory model, the graded response model (GRM). Methods: A total of 625 unique Dutch multiple sclerosis (MS) patients were included. After testing for unidimensionality, monotonicity, and absence of local dependence, a GRM was fit and item characteristics were assessed. Differential item functioning (DIF) for the variables gender, age, duration of MS, type of MS and severity of MS, reliability, total test information, and standard error of the trait level (θ) were investigated. Results: Confirmatory factor analysis showed a unidimensional structure of the 12 items of the scale, explaining 88% of the variance. Item 2 did not fit into the GRM model. Reliability was 0.93. Items 8 and 9 (of the 11 and 12 item version respectively) showed DIF on the variable severity, based on the Expanded Disability Status Scale (EDSS). However, the EDSS is strongly related to the content of both items. Conclusion: Our results confirm the good quality of the MSWS-12. The trait level (θ) scores and item parameters of both the 12- and 11-item versions were highly comparable, although we do not suggest to change the content of the MSWS-12. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
43. Assessing item fit: A comparative study of frequentist and Bayesian frameworks.
- Author
-
Khalid, Muhammad Naveed and Glas, Cees A.W.
- Subjects
- *
ITEM response theory , *FREQUENTIST statistics , *BAYESIAN analysis , *LAGRANGE multiplier , *COMPARATIVE studies - Abstract
Goodness of fit for item response theory (IRT) models in a frequentist and Bayesian framework are evaluated. The assumptions that are targeted are differential item functioning (DIF), local independence (LI), and the form of the item characteristics curve (ICC) in the one-, two-, and three parameter logistic models. It is shown that a Lagrange multiplier (LM) test, which is a frequentist based approach, can be defined in such a way that the statistics are based on the residuals, that is, differences between observations and their expectations under the model. In a Bayesian framework, identical residuals are used in posterior predictive checks. In a Bayesian framework, it proves convenient to use normal ogive representation of IRT models. For comparability of the two frameworks, the LM statistics are adapted from the usual logistic representation to normal ogive representation. Power and Type I error rates are evaluated using a number of simulation studies. Results show that Type I error rates are conservative in the Bayesian framework and that there is more power for the fit indices in a frequentist framework. An empirical data example is presented to show how the frameworks compare in practice. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
44. Assessment of fit of item response theory models used in large-scale educational survey assessments.
- Author
-
Rijn, Peter, Sinharay, Sandip, Haberman, Shelby, and Johnson, Matthew
- Subjects
EDUCATIONAL surveys ,ITEM response theory ,REGRESSION analysis - Abstract
Latent regression models are used for score-reporting purposes in large-scale educational survey assessments such as the National Assessment of Educational Progress (NAEP) and Trends in International Mathematics and Science Study (TIMSS). One component of these models is based on item response theory. While there exists some research on assessment of fit of item response theory models in the context of large-scale assessments, there is a scope of further research on the topic. We suggest two types of residuals to assess the fit of item response theory models in the context of large-scale assessments. The Type I error rates and power of the residuals are computed from simulated data. The residuals are computed using data from four NAEP assessments. Misfit was found for all data sets for both types of residuals, but the practical significance of the misfit was minimal. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
45. A Bias-Corrected RMSD Item Fit Statistic: An Evaluation and Comparison to Alternatives
- Author
-
Carmen Köhler, Johannes Hartig, and Alexander Robitzsch
- Subjects
Educational measurement ,Goodness of fit ,Item response theory ,Statistics ,Statistical inference ,Sampling (statistics) ,Item fit ,Social Sciences (miscellaneous) ,Statistic ,Education ,Test (assessment) ,Mathematics - Abstract
Testing whether items fit the assumptions of an item response theory model is an important step in evaluating a test. In the literature, numerous item fit statistics exist, many of which show severe limitations. The current study investigates the root mean squared deviation (RMSD) item fit statistic, which is used for evaluating item fit in various large-scale assessment studies. The three research questions of this study are (1) whether the empirical RMSD is an unbiased estimator of the population RMSD; (2) if this is not the case, whether this bias can be corrected; and (3) whether the test statistic provides an adequate significance test to detect misfitting items. Using simulation studies, it was found that the empirical RMSD is not an unbiased estimator of the population RMSD, and nonparametric bootstrapping falls short of entirely eliminating this bias. Using parametric bootstrapping, however, the RMSD can be used as a test statistic that outperforms the other approaches—infit and outfit, S − X 2—with respect to both Type I error rate and power. The empirical application showed that parametric bootstrapping of the RMSD results in rather conservative item fit decisions, which suggests more lenient cut-off criteria.
- Published
- 2019
- Full Text
- View/download PDF
46. A Comparison of IRT Model Combinations for Assessing Fit in a Mixed Format Elementary School Science Test
- Author
-
Haci Bayram Yilmaz, ALKÜ, and Yılmaz, H.B.
- Subjects
Computer science ,Model comparison ,Polytomous Rasch model ,Item fit ,Item response theory ,Education ,Test (assessment) ,Mixed format tests ,Goodness of fit ,Constructed response ,Statistics ,Calibration ,Multiple choice - Abstract
Open ended and multiple choice questions are commonly placed on the same tests; however, there is a discussion on the effects of using different item types on the test and item statistics. This study aims to compare model and item fit statistics in a mixed format test where multiple choice and constructed response items are used together. In this 25-item fourth grade science test administered to 2351 students in 35 schools in Turkey, items are calibrated separately and concurrently utilizing different IRT models. An important aspect of this study is that the effect of the calibration method on model and item fit is investigated on real data. Firstly, while the 1-, 2-, and 3-Parameter Logistic models are utilized to calibrate the binary coded items, the Graded Response Model and the Generalized Partial Credit Model are used to calibrate the open-ended ones. Then, combinations of dichotomous and polytomous models are employed concurrently. The results based on model comparisons revealed that the combination of the 3PL and the Graded Response Model produced the best fit statistics. © IEJEE.
- Published
- 2019
- Full Text
- View/download PDF
47. Rasch Analysis of the 9-Item Shared Decision Making Questionnaire in Women With Breast Cancer
- Author
-
Ching-Lin Hsieh, Yi Jing Huang, Jung-Der Wang, Cheng-Te Chen, Wen Hsuan Hou, and Tzu Yi Wu
- Subjects
Adult ,Best practice ,Breast Neoplasms ,Item fit ,03 medical and health sciences ,0302 clinical medicine ,Breast cancer ,Surveys and Questionnaires ,Health care ,medicine ,Humans ,Reliability (statistics) ,Aged ,Aged, 80 and over ,Rasch model ,030504 nursing ,Oncology (nursing) ,business.industry ,Reproducibility of Results ,Construct validity ,Variance (accounting) ,Middle Aged ,medicine.disease ,Oncology ,030220 oncology & carcinogenesis ,Female ,0305 other medical science ,business ,Decision Making, Shared ,Clinical psychology - Abstract
Background Shared decision making (SDM) is a best practice to help patients make optimal decisions by a process of healthcare, especially for women diagnosed with breast cancer and having heavy burden in long-term treatments. To promote successful SDM, it is crucial to assess the level of perceived involvement in SDM in women with breast cancer. Objective The aims of this study were to apply Rasch analysis to examine the construct validity and person reliability of the 9-item Shared Decision Making Questionnaire (SDM-Q-9) in women with breast cancer. Methods The construct validity of SDM-Q-9 was confirmed when the items fit the Rasch model's assumptions of unidimensionality: (1) infit and outfit mean square ranged from 0.6 to 1.4; (2) the unexplained variance of the first dimension of the principal component analysis was less than 20%. Person reliability was calculated. Results A total of 212 participants were recruited in this study. Item 1 did not fit the model's assumptions and was deleted. The unidimensionality of the remaining 8 items (SDM-Q-8) was supported with good item fit (infit and outfit mean square ranging from 0.6 to 1.3) and very low unexplained variance of the first dimension (5.3%) of the principal component analysis. The person reliability of the SDM-Q-8 was 0.90. Conclusions The SDM-Q-8 was unidimensional and had good person reliability in women with breast cancer. Implications for practice The SDM-Q-8 has shown its potential for assessing the level of perceived involvement in SDM in women with breast cancer for both research and clinical purposes.
- Published
- 2019
- Full Text
- View/download PDF
48. Development and Validation of Psychometric Properties of the 10 IB Learner Profile Instrument (10IBLP-I): A Combination of the Rasch and Classical Measurement Model
- Author
-
Mohd Effendi Ewan Mohd Matore and Miftahuljanah Kamaruddin
- Subjects
confirmatory factor analysis ,Psychometrics ,instrument development ,Health, Toxicology and Mutagenesis ,psychometric ,Applied psychology ,Validity ,Item fit ,Article ,10 IB learner profile instrument (10IBLP-I) ,Surveys and Questionnaires ,0502 economics and business ,Humans ,Rasch measurement model ,Reliability (statistics) ,Measure (data warehouse) ,Rasch model ,05 social sciences ,Public Health, Environmental and Occupational Health ,Malaysia ,050301 education ,Reproducibility of Results ,Confirmatory factor analysis ,validity and reliability ,Medicine ,Psychology ,Factor Analysis, Statistical ,0503 education ,050203 business & management - Abstract
Background: The International Baccalaureate Middle Years Programme (IBMYP) aims to produce a holistic transformation with creative and critically minded students. However, very little attention has been paid to the development of an instrument to measure the IB learner profile with good psychometric properties. Purpose: This study aims to develop an instrument with good psychometric properties, based on the Rasch measurement model and confirmatory factor analysis. Methods: The study consists of two phases of pilot and field studies involving 597 year four students from IBWS MOE. Results: The findings from the Rasch measurement model analysis have shown that 54 items meet the criteria of the item fit, unidimensionality, and reliability index. Meanwhile, confirmatory factor analysis found that 44 items have shown a valid item fit index. Conclusions: The combination of both analyses has shown the strength of 10IBLP-I psychometric properties that cover the aspects of validity and reliability. The findings also provide an implication to the theory, with empirical evidence that the IB learner profile consists of 10 constructs. Besides, the evidenced 10IBLP-I comprises good psychometric properties, which can be used to measure the level of IB learner profile among IBWS MOE students to assess the effectiveness of the implementation of IBMYP in Malaysia.
- Published
- 2021
49. The Arm Function in Multiple Sclerosis Questionnaire (AMSQ): development and validation of a new tool using IRT methods.
- Author
-
Mokkink, Lidwine B., Knol, Dirk L., van der Linden, Femke H., Sonder, Judith M., D'hooghe, Marie, and Uitdehaag, Bernard M. J.
- Subjects
- *
ACADEMIC medical centers , *ARM , *EXPERIMENTAL design , *FACTOR analysis , *GOODNESS-of-fit tests , *HAND , *RESEARCH methodology , *MULTIPLE sclerosis , *PSYCHOMETRICS , *QUESTIONNAIRES , *RESEARCH funding , *RESEARCH methodology evaluation , *FUNCTIONAL assessment , *DESCRIPTIVE statistics - Abstract
Purpose: We developed the Arm Function in Multiple Sclerosis Questionnaire (AMSQ) to measure arm and hand function in MS, based on existing scales. We aimed at developing a unidimensional scale containing enough items to be used as an itembank. In this study, we investigated reliability and differential item functioning of the Dutch version.Method: Patients were recruited from two MS Centers and a Dutch website for MS patients. We performed item factor analysis on the polychoric correlation matrix, using multiple fit-indices to investigate model fit. The graded response model, an item response theory model, was used to investigate item goodness-of-fit, reliability of the estimated trait levels (θ), differential item functioning, and total information. Differential item functioning was investigated for type of MS, gender, administration version, and test length.Results: Factor analysis results suggested one factor. All items showedp-values of the item goodness-of-fit statistic above 0.0016. The reliability was 0.95, and no items showed differential item functioning on any of the investigated variables.Conclusion: AMSQ is a unidimensional 31-item questionnaire for measuring arm function in MS. Because of a well fit in a graded response model, it is suitable for further development as a computer adaptive test.Implications for RehabilitationA new questionnaire for arm and hand function recommended in people with multiple sclerosis (AMSQ).Scale characteristics make the questionnaire suitable for use in clinical practice and research.Good reliability.Further development as a computer adaptive test to reduce burden of (repetitive) testing in patients is feasible. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
50. IDENTIFIKASI ITEM FIT DAN PERSON FIT DALAM PENGUKURAN HASIL BELAJAR KIMIA
- Author
-
Rizki Nor Amelia
- Subjects
Item fit ,Humanities ,Mathematics - Abstract
Tes kimia buatan guru yang berkualitas sangat dibutuhkan mengingat keputusan yang diambil dari hasil tes tersebut berdampak pada siswa. Untuk memenuhi hal tersebut, model Rasch menawarkan statistik uji yang berperan penting dalam konstruksi pengujian yang berkaitan dengan masalah evaluasi dan pemilihan item serta dalam pengambilan keputusan terhadap skor tes yang dihasilkan. Oleh sebab itu, penelitian ini dilakukan dengan tujuan untuk mengidentifikasi item fit dan person fit dalam pengukuran hasil belajar kimia menggunakan model Rasch. Data yang berupa respon multiple choice dari 40 item penyusun instrumen tes kimia buatan guru diambil dengan teknik dokumentasi dan dianalisis menggunakan Winsteps Rasch Software versi 3.73. Hasil analisis menyimpulkan bahwa semua item penyusun instrumen tes kimia buatan guru terbukti fit model. Sementara itu, dari 356 siswa SMA di Kota Yogyakarta yang menjadi responden penelitian, 18 siswa diantaranya teridentifikasi sebagai person misfit yang sebaiknya diperiksa lebih lanjut untuk mendapatkan bimbingan guru.
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.