1
|
Tu N, Kumar LS, Joo S, Stark S. Linking Methods for Multidimensional Forced Choice Tests Using the Multi-Unidimensional Pairwise Preference Model. Appl Psychol Meas 2024; 48:104-124. [PMID: 38585303 PMCID: PMC10993864 DOI: 10.1177/01466216241238741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Applications of multidimensional forced choice (MFC) testing have increased considerably over the last 20 years. Yet there has been little, if any, research on methods for linking the parameter estimates from different samples. This research addressed that important need by extending four widely used methods for unidimensional linking and comparing the efficacy of new estimation algorithms for MFC linking coefficients based on the Multi-Unidimensional Pairwise Preference model (MUPP). More specifically, we compared the efficacy of multidimensional test characteristic curve (TCC), item characteristic curve (ICC; Haebara, 1980), mean/mean (M/M), and mean/sigma (M/S) methods in a Monte Carlo study that also manipulated test length, test dimensionality, sample size, percentage of anchor items, and linking scenarios. Results indicated that the ICC method outperformed the M/M method, which was better than the M/S method, with the TCC method being the least effective. However, as the number of items "per dimension" and the percentage of anchor items increased, the differences between the ICC, M/M, and M/S methods decreased. Study implications and practical recommendations for MUPP linking, as well as limitations, are discussed.
Collapse
Affiliation(s)
- Naidan Tu
- University of South Florida, FL, USA
| | | | | | | |
Collapse
|
2
|
Farmer C, Kaat AJ, Edwards MC, Lecavalier L. Measurement Invariance in Intellectual and Developmental Disability Research. Am J Intellect Dev Disabil 2024; 129:191-198. [PMID: 38657963 DOI: 10.1352/1944-7558-129.3.191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 10/24/2023] [Indexed: 04/26/2024]
Abstract
Measurement invariance (MI) is a psychometric property of an instrument indicating the degree to which scores from an instrument are comparable across groups. In recent years, there has been a marked uptick in publications using MI in intellectual and developmental disability (IDD) samples. Our goal here is to provide an overview of why MI is important to IDD researchers and to describe some challenges to evaluating it, with an eye towards nudging our subfield into a more thoughtful and measured interpretation of studies using MI.
Collapse
Affiliation(s)
| | | | - Michael C Edwards
- Michael C. Edwards, Arizona State University and Vector Psychometric Group
| | | |
Collapse
|
3
|
Edwards KD, Soland J. How Scoring Approaches Impact Estimates of Growth in the Presence of Survey Item Ceiling Effects. Appl Psychol Meas 2024; 48:147-164. [PMID: 38585305 PMCID: PMC10993863 DOI: 10.1177/01466216241238749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Survey scores are often the basis for understanding how individuals grow psychologically and socio-emotionally. A known problem with many surveys is that the items are all "easy"-that is, individuals tend to use only the top one or two response categories on the Likert scale. Such an issue could be especially problematic, and lead to ceiling effects, when the same survey is administered repeatedly over time. In this study, we conduct simulation and empirical studies to (a) quantify the impact of these ceiling effects on growth estimates when using typical scoring approaches like sum scores and unidimensional item response theory (IRT) models and (b) examine whether approaches to survey design and scoring, including employing various longitudinal multidimensional IRT (MIRT) models, can mitigate any bias in growth estimates. We show that bias is substantial when using typical scoring approaches and that, while lengthening the survey helps somewhat, using a longitudinal MIRT model with plausible values scoring all but alleviates the issue. Results have implications for scoring surveys in growth studies going forward, as well as understanding how Likert item ceiling effects may be contributing to replication failures.
Collapse
Affiliation(s)
| | - James Soland
- University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
4
|
Gorney K. Three new corrections for standardized person-fit statistics for tests with polytomous items. Br J Math Stat Psychol 2024. [PMID: 38634149 DOI: 10.1111/bmsp.12342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 03/22/2024] [Indexed: 04/19/2024]
Abstract
Recent years have seen a growing interest in the development of person-fit statistics for tests with polytomous items. Some of the most popular person-fit statistics for such tests belong to the class of standardized person-fit statistics,T $$ T $$ , that is assumed to have a standard normal null distribution. However, this distribution only holds when (a) the true ability parameter is known and (b) an infinite number of items are available. In practice, both conditions are violated, and the quality of person-fit results is expected to deteriorate. In this paper, we propose three new corrections forT $$ T $$ that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The three new corrections are direct extensions of those that were developed by Gorney et al. (Psychometrika, 2024, https://doi.org/10.1007/s11336-024-09960-x) for tests with only dichotomous items. Our simulation study reveals that the three new corrections tend to outperform not only the original statisticT $$ T $$ but also an existing correction forT $$ T $$ proposed by Sinharay (Psychometrika, 2016, 81, 992). Therefore, the new corrections appear to be promising tools for assessing person fit in tests with polytomous items.
Collapse
Affiliation(s)
- Kylie Gorney
- Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
5
|
Ulitzsch E, Zhang S, Pohl S. A Model-Based Approach to the Disentanglement and Differential Treatment of Engaged and Disengaged Item Omissions. Multivariate Behav Res 2024:1-21. [PMID: 38594939 DOI: 10.1080/00273171.2024.2307518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
Item omissions in large-scale assessments may occur for various reasons, ranging from disengagement to not being capable of solving the item and giving up. Current response-time-based classification approaches allow researchers to implement different treatments of item omissions presumably going back to different mechanisms. These approaches, however, are limited in that they require a clear-cut decision on the underlying missingness mechanism and do not allow to take the uncertainty in classification into account. We present a response-time-based model-based mixture modeling approach that overcomes this limitation. The approach (a) facilitates disentangling item omissions stemming from disengagement from those going back to solution behavior, (b) considers the uncertainty in omission classification, (c) allows for omission mechanisms to vary on the item-by-examinee level, (d) supports investigating person and item characteristics associated with different types of omission behavior, and (e) gives researchers flexibility in deciding on how to handle different types of omissions. The approach exhibits good parameter recovery under realistic research conditions. We illustrate the approach on data from the Programme for the International Assessment of Adult Competencies 2012 and compare it against previous classification approaches for item omissions.
Collapse
Affiliation(s)
- Esther Ulitzsch
- IPN - Leibniz Institute for Science and Mathematics Education
| | - Susu Zhang
- University of Illinois at Urbana-Champaign
| | | |
Collapse
|
6
|
Fu IN, Chen CT, Chen KL, Liu MR, Hsieh CL. Development and validation of the newly developed Preschool Theory of Mind Assessment (ToMA-P). Front Psychol 2024; 15:1274204. [PMID: 38650906 PMCID: PMC11033484 DOI: 10.3389/fpsyg.2024.1274204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 02/28/2024] [Indexed: 04/25/2024] Open
Abstract
Introduction Theory of mind (ToM) refers to the ability to understand and attribute mental states to oneself and others. A ToM measure is warranted for preschool children to assess their ToM development from a multidimensional perspective (i.e., cognitive and affective dimensions). This study aimed to develop the Preschool Theory of Mind Assessment (ToMA-P) and to evaluate its construct validity and applicability. Methods The ToMA-P was developed based on comprehensive literature review and revised with expert panel feedback. Its psychometric properties were evaluated in 205 typically developing preschoolers with Rasch analysis for its dimensionality, item difficulties, and convergent validity. Results The results indicated that all ToMA-P items, except for one, fit the hypothesized two-dimensional construct. The item difficulties in the cognitive and affective dimensions followed developmental sequences. The ToMA-P scores exhibited good convergent validity, as evidenced by its significant correlations with age, verbal comprehension, adaptive functions, and daily ToM performance (p < 0.05). Children's responses and behaviors also showed that the ToMA-P has good applicability. Discussion This study provides empirical evidence that the ToMA-P measures cognitive and affective ToM following developmental sequences, and that it has potential as a clinical tool for assessing ToM in preschool children.
Collapse
Affiliation(s)
- I-Ning Fu
- Child Developmental Assessment and Intervention Center, Taipei City Hospital, Taipei, Taiwan
- School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei, Taiwan
- Department of Occupational Therapy, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Cheng-Te Chen
- Department of Educational Psychology and Counseling, National Tsing Hua University, Hsinchu, Taiwan
| | - Kuan-Lin Chen
- Department of Occupational Therapy, College of Medicine, National Cheng Kung University, Tainan, Taiwan
- Institute of Allied Health Sciences, College of Medicine, National Cheng Kung University, Tainan, Taiwan
- Department of Physical Medicine and Rehabilitation, College of Medicine, National Cheng Kung University Hospital, National Cheng Kung University, Tainan, Taiwan
| | - Meng-Ru Liu
- Child Developmental Assessment and Intervention Center, Taipei City Hospital, Taipei, Taiwan
| | - Ching-Lin Hsieh
- School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei, Taiwan
- Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, Taipei, Taiwan
- Department of Occupational Therapy, College of Medical and Health Science, Asia University, Taichung, Taiwan
| |
Collapse
|
7
|
Nohrborg S, Nguyen-Thi T, Xuan HN, Lindahl J, Boqvist S, Järhult JD, Magnusson U. Understanding Vietnamese chicken farmers' knowledge and practices related to antimicrobial resistance using an item response theory approach. Front Vet Sci 2024; 11:1319933. [PMID: 38645642 PMCID: PMC11027563 DOI: 10.3389/fvets.2024.1319933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 03/21/2024] [Indexed: 04/23/2024] Open
Abstract
Introduction Antimicrobial resistance (AMR) poses a threat to animal and human health, as well as food security and nutrition. Development of AMR is accelerated by over- and misuse of antimicrobials as seen in many livestock systems, including poultry production. In Vietnam, high AMR levels have been reported previously within poultry production, a sector which is dominated by small-scale farming, even though it is intensifying. This study focuses on understanding small- and medium-scale chicken farmers' knowledge and practices related to AMR by applying an item response theory (IRT) approach, which has several advantages over simpler statistical methods. Methods Farmers representing 305 farms in Thai Nguyen province were interviewed from November 2021 to January 2022, using a structured questionnaire. Results generated with IRT were used in regression models to find associations between farm characteristics, and knowledge and practice levels. Results Descriptive results showed that almost all farmers could buy veterinary drugs without prescription in the local community, that only one third of the farmers received veterinary professional advice or services, and that the majority of farmers gave antibiotics as a disease preventive measure. Regression analysis showed that multiple farm characteristics were significantly associated to farmers' knowledge and practice scores. Conclusion The study highlights the complexity when tailoring interventions to move towards more medically rational antibiotic use at farms in a setting with high access to over-the-counter veterinary drugs and low access to veterinary services, since many on-farm factors relevant for the specific context need to be considered.
Collapse
Affiliation(s)
- Sandra Nohrborg
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Thinh Nguyen-Thi
- International Livestock Research Institute, Regional Office for East and Southeast Asia, Hanoi, Vietnam
| | - Huyen Nguyen Xuan
- Department of Bacteriology, National Institute of Veterinary Research, Hanoi, Vietnam
| | - Johanna Lindahl
- Department of Animal Health and Antimicrobial Strategies, National Veterinary Institute, Uppsala, Sweden
| | - Sofia Boqvist
- Department of Biomedical Sciences and Veterinary Public Health, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Josef D. Järhult
- Department of Medical Sciences, Zoonosis Science Center, Uppsala University, Uppsala, Sweden
| | - Ulf Magnusson
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|
8
|
Glas CAW, Jorgensen TD, Hove DT. Reducing Attenuation Bias in Regression Analyses Involving Rating Scale Data via Psychometric Modeling. Psychometrika 2024:10.1007/s11336-024-09967-4. [PMID: 38573434 DOI: 10.1007/s11336-024-09967-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Indexed: 04/05/2024]
Abstract
Many studies in fields such as psychology and educational sciences obtain information about attributes of subjects through observational studies, in which raters score subjects using multiple-item rating scales. Error variance due to measurement effects, such as items and raters, attenuate the regression coefficients and lower the power of (hierarchical) linear models. A modeling procedure is discussed to reduce the attenuation. The procedure consists of (1) an item response theory (IRT) model to map the discrete item responses to a continuous latent scale and (2) a generalizability theory (GT) model to separate the variance in the latent measurement into variance components of interest and nuisance variance components. It will be shown how measurements obtained from this mixture of IRT and GT models can be embedded in (hierarchical) linear models, both as predictor or criterion variables, such that error variance due to nuisance effects are partialled out. Using examples from the field of educational measurement, it is shown how general-purpose software can be used to implement the modeling procedure.
Collapse
Affiliation(s)
| | - Terrence D Jorgensen
- Research Institute of Child Development and Education, University of Amsterdam, Amsterdam, The Netherlands
| | - Debby Ten Hove
- Section of Educational Sciences, Faculty of Behavioural and Movement Sciences, LEARN! Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
9
|
Shen Y, Wang S, Xiao H. A two-step item bank calibration strategy based on 1-bit matrix completion for small-scale computerized adaptive testing. Br J Math Stat Psychol 2024. [PMID: 38576260 DOI: 10.1111/bmsp.12340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 03/04/2024] [Accepted: 03/08/2024] [Indexed: 04/06/2024]
Abstract
Computerized adaptive testing (CAT) is a widely embraced approach for delivering personalized educational assessments, tailoring each test to the real-time performance of individual examinees. Despite its potential advantages, CAT�s application in small-scale assessments has been limited due to the complexities associated with calibrating the item bank using sparse response data and small sample sizes. This study addresses these challenges by developing a two-step item bank calibration strategy that leverages the 1-bit matrix completion method in conjunction with two distinct incomplete pretesting designs. We introduce two novel 1-bit matrix completion-based imputation methods specifically designed to tackle the issues associated with item calibration in the presence of sparse response data and limited sample sizes. To demonstrate the effectiveness of these approaches, we conduct a comparative assessment against several established item parameter estimation methods capable of handling missing data. This evaluation is carried out through two sets of simulation studies, each featuring different pretesting designs, item bank structures, and sample sizes. Furthermore, we illustrate the practical application of the methods investigated, using empirical data collected from small-scale assessments.
Collapse
Affiliation(s)
- Yawei Shen
- Department of Educational Psychology, University of Georgia, Athens, Georgia, USA
| | - Shiyu Wang
- Department of Educational Psychology, University of Georgia, Athens, Georgia, USA
| | - Houping Xiao
- Institute for Insight, Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
10
|
Newlands AF, Kramer M, Roberts L, Maxwell K, Price JL, Finlay KA. Evaluating the quality of life impact of recurrent urinary tract infection: Validation and refinement of the Recurrent UTI Impact Questionnaire (RUTIIQ). Neurourol Urodyn 2024; 43:902-914. [PMID: 38385648 DOI: 10.1002/nau.25426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/08/2023] [Accepted: 02/05/2024] [Indexed: 02/23/2024]
Abstract
BACKGROUND AND AIMS Recurrent urinary tract infection (rUTI) has significant negative consequences for a wide variety of quality of life (QoL) domains. Without adequate validation and assessment of the unique insights of people living with rUTI, clinical results cannot be fully understood. The Recurrent UTI Impact Questionnaire (RUTIIQ), a novel patient-reported outcome measure of rUTI psychosocial impact, has been robustly developed with extensive patient and clinician input to facilitate enhanced rUTI management and research. This study aimed to confirm the structural validity of the RUTIIQ, assessing its strength and bifactor model fit. METHODS A sample of 389 adults experiencing rUTI (96.9% female, aged 18-87 years) completed an online cross-sectional survey comprising a demographic questionnaire and the RUTIIQ. A bifactor graded response model was fitted to the data, optimizing the questionnaire structure based on item fit, discrimination capability, local dependence, and differential item functioning. RESULTS The final RUTIIQ demonstrated excellent bifactor model fit (RMSEA = 0.054, CFI = 0.99, SRMSR = 0.052), and mean-square fit indices indicated that all included items were productive for measurement (MNSQ = 0.52-1.41). The final questionnaire comprised an 18-item general "rUTI QoL impact" factor, and five subfactor domains measuring "personal wellbeing" (three items), "social wellbeing" (four items), "work and activity interference" (four items), "patient satisfaction" (four items), and "sexual wellbeing" (three items). Together, the general factor and five subfactors explained 81.6% of the common model variance. All factor loadings were greater than 0.30 and communalities greater than 0.60, indicating good model fit and structural validity. CONCLUSIONS The 18-item RUTIIQ is a robust, patient-tested questionnaire with excellent psychometric properties, which capably assesses the patient experience of rUTI-related impact to QoL and healthcare satisfaction. Facilitating standardized patient monitoring and improved shared decision-making, the RUTIIQ delivers the unique opportunity to improve patient-centered care.
Collapse
Affiliation(s)
- Abigail F Newlands
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | | | - Lindsey Roberts
- School of Psychology, University of Buckingham, Buckingham, UK
| | - Kayleigh Maxwell
- Department of Psychology, Faculty of Natural Sciences, University of Stirling, Stirling, UK
| | | | - Katherine A Finlay
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| |
Collapse
|
11
|
Gorney K, Sinharay S, Eckerly C. Efficient Corrections for Standardized Person-Fit Statistics. Psychometrika 2024:10.1007/s11336-024-09960-x. [PMID: 38558053 DOI: 10.1007/s11336-024-09960-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Indexed: 04/04/2024]
Abstract
Many popular person-fit statistics belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, in practice, this assumption is incorrect since T is computed using (a) an estimated ability parameter and (b) a finite number of items. Snijders (Psychometrika 66(3):331-342, 2001) developed mean and variance corrections for T to account for the use of an estimated ability parameter. Bedrick (Psychometrika 62(2):191-199, 1997) and Molenaar and Hoijtink (Psychometrika 55(1):75-106, 1990) developed skewness corrections for T to account for the use of a finite number of items. In this paper, we combine these two lines of research and propose three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The new corrections are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets. We conducted a detailed simulation study and found that the new corrections are able to control the Type I error rate while also maintaining reasonable levels of power. A real data example is also included.
Collapse
Affiliation(s)
- Kylie Gorney
- Department of Counseling, Educational Psychology, and Special Education, Michigan State University, 460 Erickson Hall, 620 Farm Lane, East Lansing, MI, 48824, USA.
| | - Sandip Sinharay
- Educational Testing Service, 660 Rosedale Road, Princeton, NJ, 08541, USA
| | - Carol Eckerly
- Educational Testing Service, 660 Rosedale Road, Princeton, NJ, 08541, USA
| |
Collapse
|
12
|
Li X, Li R, Xiao F, Zhao K, Zhang X, Wang X, Li M, Guo K, Wang L, Wu Y, Van Spall H, Gao T, Fu Q, Xie F. Validation of China Health-Related Outcomes Measures-Cardiovascular Disease. Value Health 2024; 27:490-499. [PMID: 38244982 DOI: 10.1016/j.jval.2024.01.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 12/15/2023] [Accepted: 01/08/2024] [Indexed: 01/22/2024]
Abstract
OBJECTIVES China Health-Related Outcomes Measures (CHROME) was an initiative aimed at developing a system of preference-based health-related quality of life instruments for China. CHROME-cardiovascular disease (CVD) is a CVD-specific instrument with 14 items developed under this initiative. This study aimed to test the psychometric properties of CHROME-CVD. METHODS This validation study was conducted using cross-sectional questionnaire survey in China. Eligible patients with CVD were recruited and asked to complete the CHROME-CVD, the EQ-5D-5L, and a CVD-specific nonpreference-based health-related quality of life instrument selected according to the confirmed diagnosis of the patients. Item evaluation, internal consistency, measurement invariance, test-retest reliability, structural validity, and construct validity were tested using classic test theory. Item response theory was used to evaluate item-level performance. RESULTS A total of 444 patients with CVD (coronary artery disease, n = 276, heart failure, n = 104, angina, n = 33, and atrial fibrillation, n = 16) from 6 provinces in China were enrolled for the validation. Exploratory factor analysis identified 4 factors: chest pain, other symptoms, physical health, and mental and social health. Cronbach's alpha and intraclass correlation coefficient were >0.8. A total of 20 of 26 (76.9%), and 90 of 95 (94.7%) predefined hypotheses were met for convergent and discriminant validities, respectively. No important difference was identified between subgroups of gender and residency. Response options of 10 items were found overlapped based on categorical response curves, which led to modification to 4-level response options. Wording of 3 items were modified by referring wordings of reference instruments. CONCLUSION The validation of the CHROME-CVD demonstrated generally good psychometric properties. Further validation on the modified CHROME-CVD is needed.
Collapse
Affiliation(s)
- Xue Li
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Health Technology Assessment, China National Health Development Research Center, Beijing, Beijing, China
| | - Rui Li
- Department of Health Technology Assessment, China National Health Development Research Center, Beijing, Beijing, China; Evidence Based Social Science Research Center/Health Technology Assessment Center, Lanzhou University, Lanzhou, Gansu, China; Evidence-Based Medicine Center, Lanzhou University, Lanzhou, Gansu, China
| | - Feiyi Xiao
- Department of Health Technology Assessment, China National Health Development Research Center, Beijing, Beijing, China
| | - Kun Zhao
- Department of Health Technology Assessment, China National Health Development Research Center, Beijing, Beijing, China; Vanke School of Public Health, Tsinghua University, Beijing, Beijing, China
| | - Xiaolu Zhang
- School of Business Administration, Shenyang Pharmaceutical University, Shenyang, Liaoning, China
| | - Xinyi Wang
- School of Business Administration, Shenyang Pharmaceutical University, Shenyang, Liaoning, China
| | - Meichen Li
- School of Business Administration, Shenyang Pharmaceutical University, Shenyang, Liaoning, China
| | - Ke Guo
- Evidence Based Social Science Research Center/Health Technology Assessment Center, Lanzhou University, Lanzhou, Gansu, China; Evidence-Based Medicine Center, Lanzhou University, Lanzhou, Gansu, China
| | - Li Wang
- School of International Pharmaceutical Business, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Yanan Wu
- Evidence Based Social Science Research Center/Health Technology Assessment Center, Lanzhou University, Lanzhou, Gansu, China; Evidence-Based Medicine Center, Lanzhou University, Lanzhou, Gansu, China
| | - Harriette Van Spall
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada; Department of Medicine, McMaster University, Hamilton, Ontario, Canada; Research Institute of St Joseph's and Population Health Research Institute, Hamilton, Ontario, Canada
| | - Tiantian Gao
- Shandong Provincial Hospital affiliated to Shandong First Medical University, Jinan, Shandong, China
| | - Qiang Fu
- Department of Health Technology Assessment, China National Health Development Research Center, Beijing, Beijing, China; China National Health Development Research Center, Beijing, Beijing, China
| | - Feng Xie
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada; Centre for Health Economics and Policy Analysis, McMaster University, Hamilton, Ontario, Canada.
| |
Collapse
|
13
|
Bäcklund C, Sörman DE, Gavelin HM, Király O, Demetrovics Z, Ljungberg JK. Comparing psychopathological symptoms, life satisfaction, and personality traits between the WHO and APA frameworks of gaming disorder symptoms: A psychometric investigation. Scand J Psychol 2024. [PMID: 38475668 DOI: 10.1111/sjop.13010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 12/07/2023] [Accepted: 01/27/2024] [Indexed: 03/14/2024]
Abstract
INTRODUCTION The inclusion of Internet Gaming Disorder (IGD) in the fifth revision of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) by the American Psychiatric Association and Gaming Disorder in the 11th revision of the International Classification of Diseases (ICD-11) by the World Health Organization requires consistent psychological measures for reliable estimates. The current study aimed to investigate the psychometric properties of the Gaming Disorder Test (GDT), the Ten-Item Internet Gaming Disorder Test (IGDT-10), and the Five-Item Gaming Disorder Test (GDT-5) and to compare the WHO and the APA frameworks of gaming disorder symptoms in terms of psychopathological symptoms, life satisfaction, and personality traits. METHODS A sample of 723 Swedish gamers was recruited (29.8% women, 68.3% men, 1.9% other, Mage = 29.50 years, SD = 8.91). RESULTS The results indicated notable differences regarding the estimated possible risk groups between the two frameworks. However, the association between gaming disorder symptoms and personality traits, life satisfaction, and psychopathological symptoms appeared consistent across the two frameworks. The results showed excellent psychometric properties in support of the one-factor model of the GDT, IGDT-10, and GDT-5, including good reliability estimates (McDonald's omega) and evidence of construct validity. Additionally, the results demonstrated full gender and age measurement invariance of the GDT, IGDT-10, and GDT-5, indicating that gaming disorder symptoms are measured equally across the subgroups. CONCLUSION These findings demonstrate that the IGDT-10, GDT-5, and GDT are appropriate measures for assessing gaming disorder symptoms and facilitating future research in Sweden.
Collapse
Affiliation(s)
- Christian Bäcklund
- Department of Health, Education and Technology, Luleå University of Technology, Luleå, Sweden
| | - Daniel Eriksson Sörman
- Department of Health, Education and Technology, Luleå University of Technology, Luleå, Sweden
| | | | - Orsolya Király
- Institute of Psychology, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Zsolt Demetrovics
- Institute of Psychology, ELTE Eötvös Loránd University, Budapest, Hungary
- Centre of Excellence in Responsible Gaming, University of Gibraltar, Gibraltar
| | - Jessica K Ljungberg
- Department of Health, Education and Technology, Luleå University of Technology, Luleå, Sweden
| |
Collapse
|
14
|
Kerry MJ, Paignon A, Wiesner Conti J, Sy M, Huber M. German translation and validation of the Interprofessional Facilitation Scale. J Interprof Care 2024; 38:394-398. [PMID: 38140905 DOI: 10.1080/13561820.2023.2287024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 11/14/2023] [Indexed: 12/24/2023]
Abstract
We identified evidence from item response theory (IRT) to examine a German translation of the Interprofessional Facilitation Scale (IPFS). The IPFS was administered to n = 130 mixed-health profession participants in a post-interprofessional education practicum questionnaire. We used IRT analyses to examine the following three aspects of the IPFS: (a) general factor strength, (b) subscale usability, and (c) item bias. First, findings indicate a strong, general factor underlying the IPFS that supports unidimensional interpretations. Second, findings supported IPFS overall reliability, but failed to support subscale reliabilities. Third, item bias assessment using a comparator-French sample (n = 89) indicated insubstantial differences across German and French samples. Taken together, we find sufficient evidence to support the IPFS-German translation's application in IPE contexts and unidimensional interpretations. Subscores are not advisable for interpretation, and future researchers should aim to further inspect potential item bias.
Collapse
Affiliation(s)
- Matthew J Kerry
- Institute of Health Sciences, Zurich University of Applied Sciences (ZHAW), Winterthur, Switzerland
| | - Adeline Paignon
- Geneva School of Health Sciences and Centre for Interprofessional Simulation, University of Applied Sciences and Arts of Western Switzerland HES-SO, Geneva, Switzerland
| | - Joanne Wiesner Conti
- Geneva School of Health Sciences and Centre for Interprofessional Simulation, University of Applied Sciences and Arts of Western Switzerland HES-SO, Geneva, Switzerland
| | - Michael Sy
- Institute of Health Sciences, Zurich University of Applied Sciences (ZHAW), Winterthur, Switzerland
| | - Marion Huber
- Institute of Health Sciences, Zurich University of Applied Sciences (ZHAW), Winterthur, Switzerland
| |
Collapse
|
15
|
Chung HKS, Louie K, Chan WS. Development and evaluation of a Chinese short-form of the Sleep-related Behaviors Questionnaire in Hong Kong Chinese adults using item response theory. J Health Psychol 2024; 29:255-265. [PMID: 37688382 DOI: 10.1177/13591053231195518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/10/2023] Open
Abstract
Insomnia-related safety behaviors are behaviors that aim to mitigate the negative consequences of insomnia but inadvertently perpetuate insomnia. This study aimed to develop and evaluate a Chinese short-form of the sleep-related behavior questionnaire (SRBQ-SF), a self-report measure of insomnia-related safety behaviors, using item response theory. The Chinese version of the original SRBQ was completed by 536 Chinese-speaking adults with clinically significant insomnia. The automatic item selection procedure of the Mokken scaling analysis was used to develop and evaluate the SRBQ-SF. A 23-item SRBQ-SF consisting of a 14-item reduced engagement and avoidance subscale (SRBQ-REA) and a 9-item preoccupation with sleep subscale (SRBQ-PS) was derived. Classical test theory-based estimates showed that the SRBQ-REA and SRBQ-PS had good internal consistency and acceptable convergent and discriminant validities, and they were only weakly correlated with each other. We recommend the use of the SRBQ-REA and SRBQ-PS separately to assess two dimensions of safety behaviors in the study and treatment of insomnia in Chinese-speaking adults.
Collapse
|
16
|
Newlands AF, Kramer M, Roberts L, Maxwell K, Price JL, Finlay KA. Confirmatory structural validation and refinement of the Recurrent Urinary Tract Infection Symptom Scale. BJUI Compass 2024; 5:240-252. [PMID: 38371201 PMCID: PMC10869661 DOI: 10.1002/bco2.297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 08/23/2023] [Accepted: 09/09/2023] [Indexed: 02/20/2024] Open
Abstract
Objectives To confirm the structural validity of the Recurrent Urinary Tract Infection Symptom Scale (RUTISS), determining whether a bifactor model appropriately fits the questionnaire's structure and identifying areas for refinement. Used in conjunction with established clinical testing methods, this patient-reported outcome measure addresses the urgent need to validate the patient perspective. Patients and methods A clinically and demographically diverse sample of 389 people experiencing recurrent UTI across 37 countries (96.9% female biological sex, aged 18-87 years) completed the RUTISS online. A bifactor graded response model was fitted to the data, identifying potential items for deletion if they indicated significant differential item functioning (DIF) based on sociodemographic characteristics, contributed to local item dependence or demonstrated poor fit or discrimination capability. Results The final RUTISS comprised a 3-item symptom frequency section, a 1-item global rating of change scale and an 11-item general 'rUTI symptom and pain severity' subscale with four sub-factor domains measuring 'urinary symptoms', 'urinary presentation', 'UTI pain and discomfort' and 'bodily sensations'. The bifactor model fit indices were excellent (root mean square error of approximation [RMSEA] = 0.041, comparative fit index [CFI] = 0.995, standardised root mean square residual [SRMSR] = 0.047), and the mean-square fit statistics indicated that all items were productive for measurement (mean square fit indices [MNSQ] = 0.64 - 1.29). Eighty-one per cent of the common model variance was accounted for by the general factor and sub-factors collectively, and all factor loadings were greater than 0.30 and communalities greater than 0.60. Items indicated high discrimination capability (slope parameters > 1.35). Conclusion The 15-item RUTISS is a patient-generated, psychometrically robust questionnaire that dynamically assesses the patient experience of recurrent UTI symptoms and pain. This brief tool offers the unique opportunity to enhance patient-centred care by supporting shared decision-making and patient monitoring.
Collapse
Affiliation(s)
- Abigail F. Newlands
- School of Psychology and Clinical Language SciencesUniversity of ReadingReadingUK
| | | | | | - Kayleigh Maxwell
- Department of Psychology, Faculty of Natural SciencesUniversity of StirlingStirlingUK
| | | | - Katherine A. Finlay
- School of Psychology and Clinical Language SciencesUniversity of ReadingReadingUK
| |
Collapse
|
17
|
Levine SZ, Goldberg Y, Rotstein A, Samara M, Yoshida K, Cipriani A, Iwatsubo T, Leucht S, Furukawa TA. Shortening the Alzheimer's disease assessment scale cognitive subscale. Eur Psychiatry 2024; 67:e19. [PMID: 38389390 DOI: 10.1192/j.eurpsy.2024.14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND A short yet reliable cognitive measure is needed that separates treatment and placebo for treatment trials for Alzheimer's disease. Hence, we aimed to shorten the Alzheimer's Disease Assessment Scale Cognitive Subscale (ADAS-Cog) and test its use as an efficacy measure. METHODS Secondary data analysis of participant-level data from five pivotal clinical trials of donepezil compared with placebo for Alzheimer's disease (N = 2,198). Across all five trials, cognition was appraised using the original 11-item ADAS-Cog. Statistical analysis consisted of sample characterization, item response theory (IRT) to identify an ADAS-Cog short version, and mixed models for repeated-measures analysis to examine the effect sizes of ADAS-Cog change on the original and short versions in the placebo versus donepezil groups. RESULTS Based on IRT, a short ADAS-Cog was developed with seven items and two response options. The original and short ADAS-Cog correlated at baseline and at weeks 12 and 24 at 0.7. Effect sizes based on mixed modeling showed that the short and original ADAS-Cog separated placebo and donepezil comparably (ADAS-Cog original ES = 0.33, 95% CI = 0.29, 0.40, ADAS-Cog short ES = 0.25, 95% CI =0.23, 0.34). CONCLUSIONS IRT identified a short ADAS-cog version that separated donepezil and placebo, suggesting its clinical potential for assessment and treatment monitoring.
Collapse
Affiliation(s)
| | - Yair Goldberg
- The Faculty of Data and Decision Science, Technion Israel Institute of Technology, Haifa, Israel
| | - Anat Rotstein
- Department of Gerontology, University of Haifa, Haifa, Israel
| | - Myrto Samara
- Department of Psychiatry, Faculty of Medicine, University of Thessaly, Larissa, Greece
| | - Kazufumi Yoshida
- Department of Health Promotion and Human Behavior, Graduate School of Medicine/School of Public Health, Kyoto University, Kyoto, Japan
| | - Andrea Cipriani
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Health NHS Foundation Trust, Warneford Hospital, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
| | - Takeshi Iwatsubo
- Department of Neuropathology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Stefan Leucht
- Technical University of Munich, TUM School of Medicine and Health, Department of Psychiatry and Psychotherapy, München, Germany
| | - Toshiaki A Furukawa
- Department of Health Promotion and Human Behavior, Graduate School of Medicine/School of Public Health, Kyoto University, Kyoto, Japan
| |
Collapse
|
18
|
Davison ML, Chung S, Kohli N, Davenport EC. A Multidimensional Model to Facilitate Within Person Comparison of Attributes. Psychometrika 2024:10.1007/s11336-023-09946-1. [PMID: 38332224 DOI: 10.1007/s11336-023-09946-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Indexed: 02/10/2024]
Abstract
In psychological research and practice, a person's scores on two different traits or abilities are often compared. Such within-person comparisons require that measurements have equal units (EU) and/or equal origins: an assumption rarely validated. We describe a multidimensional SEM/IRT model from the literature and, using principles of conjoint measurement, show that its expected response variables satisfy the axioms of additive conjoint measurement for measurement on a common scale. In an application to Quality of Life data, the EU analysis is used as a pre-processing step to derive a simple structure Quality of Life model with three dimensions expressed in equal units. The results are used to address questions that can only be addressed by scores expressed in equal units. When the EU model fits the data, scores in the corresponding simple structure model will have added validity in that they can address questions that cannot otherwise be addressed. Limitations and the need for further research are discussed.
Collapse
Affiliation(s)
- Mark L Davison
- Department of Educational Psychology, University of Minnesota, Minneapolis, 55455, USA.
| | | | | | | |
Collapse
|
19
|
Chiu C, Gao X, Wu R, Campbell J, Krause J, Driver S. Validation of an eight-item resilience scale for inpatients with spinal cord injuries in a rehabilitation hospital: exploratory factor analyses and item response theory. Disabil Rehabil 2024:1-7. [PMID: 38327137 DOI: 10.1080/09638288.2024.2308643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 12/15/2023] [Indexed: 02/09/2024]
Abstract
PURPOSE People with spinal cord injury (PwSCI) can experience life changes, including impacts on their physical and mental health. PwSCI often report less life satisfaction and lower subjective well-being than peers without SCI. These challenges and adversities increase the demand on them to be more resilient. Healthcare providers need quick and valid instruments to assess adult patients' resilience in clinical settings. We aimed to validate the factor validity and discrimination ability of a resilience scale, CD-RISC-10, for clinical usage in adults with SCI during hospitalization. MATERIALS AND METHODS 93 adults with SCI responded to the self-reported survey, including CD-RISC-10, the Patient Health Questionnaire-9 Scale (PHQ-9), the Satisfaction with Life Scale (SWLS), and the Intrinsic Spirituality Scale. We conducted descriptive statistics, exploratory factor analysis (EFA), and item response theory (IRT). RESULTS Two items were deleted from CD-RISC-10 after EFA, forming CD-RISC-8. The item discriminations of the remaining eight items from the unconstrained IRT model ranged from a high of 3.071 to a relatively low 1.433. CD-RISC-8 is significantly related to PHQ-9 and SWLS. CONCLUSIONS The factor validity of the CD-RISC-8 was improved. Significantly, the CD-RISC-8 has excellent potential for clinical usage due to its discriminant ability between low and intermediate resilience.
Collapse
Affiliation(s)
- Chungyi Chiu
- Department of Kinesiology and Community Health, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Xiaotian Gao
- Department of Kinesiology and Community Health, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Rongxiu Wu
- Center for Astrophysics, Harvard & Smithsonian, Cambridge, MA, USA
| | - Jeanna Campbell
- School of Social Work, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - James Krause
- Department of Public Health Sciences, Medical Universtiy of South Carolina, Charleston, SC, USA
| | - Simon Driver
- Research Center, Baylor Scott & White Institute for Rehabilitation, Dallas, TX, USA
| |
Collapse
|
20
|
Welter VDE, Dawborn-Gundlach M, Großmann L, Krell M. Adapting a self-efficacy scale to the task of teaching scientific reasoning: collecting evidence for its psychometric quality using Rasch measurement. Front Psychol 2024; 15:1339615. [PMID: 38384352 PMCID: PMC10879573 DOI: 10.3389/fpsyg.2024.1339615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 01/25/2024] [Indexed: 02/23/2024] Open
Abstract
Besides teachers' professional knowledge, their self-efficacy is a crucial aspect in promoting students' scientific reasoning (SR). However, because no measurement instrument has yet been published that specifically refers to self-efficacy beliefs regarding the task of teaching SR, we adapted the Science Teaching Efficacy Belief Instrument (STEBI) accordingly, resulting in the Teaching Scientific Reasoning Efficacy Beliefs Instrument (TSR-EBI). While the conceptual framework of the TSR-EBI is comparable to that of the STEBI in general terms, it goes beyond it in terms of specificity, acknowledging the fact that teaching SR requires very specific knowledge and skills that are not necessarily needed to the same extent for promoting other competencies in science education. To evaluate the TSR-EBI's psychometric quality, we conducted two rounds of validation. Both samples (N1 = 114; N2 = 74) consisted of pre-service teachers enrolled in university master's programs in Germany. The collected data were analyzed by applying Rasch analysis and known-group comparisons. In the course of an analysis of the TSR-EBI's internal structure, we found a 3-category scale to be superior to a 5-category structure. The person and item reliability of the scale proved to be satisfactory. Furthermore, during the second round of validation, it became clear that the results previously found for the 3-category scale were generally replicable across a new (but comparable) sample, which clearly supports the TSR-EBI's psychometric quality. Moreover, in terms of test-criterion relationships, the scale was also able to discriminate between groups that are assumed to have different levels of self-efficacy regarding teaching SR. Nonetheless, some findings also suggest that the scale might benefit from having the selection of individual items reconsidered (despite acceptable item fit statistics). On balance, however, we believe that the TSR-EBI has the potential to provide valuable insights in future studies regarding factors that influence teachers' self-efficacy, such as their professional experiences, prior training, or perceived barriers to effective teaching.
Collapse
Affiliation(s)
| | | | - Leroy Großmann
- Department of Biology Education, Freie Universität Berlin, Berlin, Germany
| | - Moritz Krell
- Department of Biology Education, IPN – Leibniz Institute for Science and Mathematics Education, Kiel, Germany
| |
Collapse
|
21
|
Liu DT, Mueller CA, Sedaghat AR. A scoping review of Rasch analysis and item response theory in otolaryngology: Implications and future possibilities. Laryngoscope Investig Otolaryngol 2024; 9:e1208. [PMID: 38362194 PMCID: PMC10866592 DOI: 10.1002/lio2.1208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 12/18/2023] [Indexed: 02/17/2024] Open
Abstract
Objective Item response theory (IRT) is a methodological approach to studying the psychometric performance of outcome measures. This study aims to determine and summarize the use of IRT in otolaryngological scientific literature. Methods A systematic search of the Medline, Embase, and the Cochrane Library databases was performed for original English-language published studies indexed up to January 28, 2023, per the following search strategy: ("item response theory" OR "irt" OR "rasch" OR "latent trait theory" OR "modern mental test theory") AND ("ent" OR "otorhinolaryngology" OR "ear" OR "nose" OR "throat" OR "otology" OR "audiology" OR "rhinology" OR "laryngology" OR "neurotology" OR "facial plastic surgery"). Results Fifty-five studies were included in this review. IRT was used across all subspecialties in otolaryngology, and most studies utilizing IRT methodology were published within the last decade. Most studies analyzed polytomous response data, and the most commonly used IRT models were the partial credit and the rating scale model. There was considerable heterogeneity in reporting the main assumptions and results of IRT. Conclusion IRT is increasingly being used in the otolaryngological scientific literature. In the otolaryngology literature, IRT is most frequently used in the study of patient-reported outcome measures and many different IRT-based methods have been used. Future IRT-based outcome studies, using standardized reporting guidelines, might improve otolaryngology-outcome research sustainably by improving response rates and reducing patient response burden. Level of evidence 2.
Collapse
Affiliation(s)
- David T. Liu
- Department of Otorhinolaryngology, Head and Neck SurgeryMedical University of ViennaViennaAustria
| | - Christian A. Mueller
- Department of Otorhinolaryngology, Head and Neck SurgeryMedical University of ViennaViennaAustria
| | - Ahmad R. Sedaghat
- Department of Otolaryngology—Head and Neck SurgeryUniversity of Cincinnati College of MedicineCincinnatiOhioUSA
| |
Collapse
|
22
|
Lin X, Zhang S, Tang Y, Li X. A Gibbs-INLA algorithm for multidimensional graded response model analysis. Br J Math Stat Psychol 2024; 77:169-195. [PMID: 37772696 DOI: 10.1111/bmsp.12321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 08/13/2023] [Accepted: 08/16/2023] [Indexed: 09/30/2023]
Abstract
In this paper, we propose a novel Gibbs-INLA algorithm for the Bayesian inference of graded response models with ordinal response based on multidimensional item response theory. With the combination of the Gibbs sampling and the integrated nested Laplace approximation (INLA), the new framework avoids the cumbersome tuning which is inevitable in classical Markov chain Monte Carlo (MCMC) algorithm, and has low computing memory, high computational efficiency with much fewer iterations, and still achieve higher estimation accuracy. Therefore, it has the ability to handle large amount of multidimensional response data with different item responses. Simulation studies are conducted to compare with the Metroplis-Hastings Robbins-Monro (MH-RM) algorithm and an application to the study of the IPIP-NEO personality inventory data is given to assess the performance of the new algorithm. Extensions of the proposed algorithm for application on more complicated models and different data types are also discussed.
Collapse
Affiliation(s)
- Xiaofan Lin
- KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
| | - Siliang Zhang
- KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
| | - Yincai Tang
- KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
| | - Xuan Li
- KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
| |
Collapse
|
23
|
García-Pérez MA. Are the Steps on Likert Scales Equidistant? Responses on Visual Analog Scales Allow Estimating Their Distances. Educ Psychol Meas 2024; 84:91-122. [PMID: 38250504 PMCID: PMC10795572 DOI: 10.1177/00131644231164316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
A recurring question regarding Likert items is whether the discrete steps that this response format allows represent constant increments along the underlying continuum. This question appears unsolvable because Likert responses carry no direct information to this effect. Yet, any item administered in Likert format can identically be administered with a continuous response format such as a visual analog scale (VAS) in which respondents mark a position along a continuous line. Then, the operating characteristics of the item would manifest under both VAS and Likert formats, although perhaps differently as captured by the continuous response model (CRM) and the graded response model (GRM) in item response theory. This article shows that CRM and GRM item parameters hold a formal relation that is mediated by the form in which the continuous dimension is partitioned into intervals to render the discrete Likert responses. Then, CRM and GRM characterizations of the items in a test administered with VAS and Likert formats allow estimating the boundaries of the partition that renders Likert responses for each item and, thus, the distance between consecutive steps. The validity of this approach is first documented via simulation studies. Subsequently, the same approach is used on public data from three personality scales with 12, eight, and six items, respectively. The results indicate the expected correspondence between VAS and Likert responses and reveal unequal distances between successive pairs of Likert steps that also vary greatly across items. Implications for the scoring of Likert items are discussed.
Collapse
|
24
|
Schoenmakers M, Tijmstra J, Vermunt J, Bolsinova M. Correcting for Extreme Response Style: Model Choice Matters. Educ Psychol Meas 2024; 84:145-170. [PMID: 38250509 PMCID: PMC10795569 DOI: 10.1177/00131644231155838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
Extreme response style (ERS), the tendency of participants to select extreme item categories regardless of the item content, has frequently been found to decrease the validity of Likert-type questionnaire results. For this reason, various item response theory (IRT) models have been proposed to model ERS and correct for it. Comparisons of these models are however rare in the literature, especially in the context of cross-cultural comparisons, where ERS is even more relevant due to cultural differences between groups. To remedy this issue, the current article examines two frequently used IRT models that can be estimated using standard software: a multidimensional nominal response model (MNRM) and a IRTree model. Studying conceptual differences between these models reveals that they differ substantially in their conceptualization of ERS. These differences result in different category probabilities between the models. To evaluate the impact of these differences in a multigroup context, a simulation study is conducted. Our results show that when the groups differ in their average ERS, the IRTree model and MNRM can drastically differ in their conclusions about the size and presence of differences in the substantive trait between these groups. An empirical example is given and implications for the future use of both models and the conceptualization of ERS are discussed.
Collapse
|
25
|
Zhang X, Wen YJ, Han N, Jiang Y. The Effect of a Video-Assisted Health Education Program Followed by Peer Education on the Health Literacy of COVID-19 and Other Infectious Diseases Among School Children: Quasi-Randomized Controlled Trial. JMIR Hum Factors 2024; 11:e43943. [PMID: 38285496 PMCID: PMC10862245 DOI: 10.2196/43943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 09/10/2023] [Accepted: 12/29/2023] [Indexed: 01/30/2024] Open
Abstract
BACKGROUND To improve the engagement and effectiveness of traditional health programs, it is necessary to explore alternative models of health education including video-assisted lectures and peer education. OBJECTIVE This study aimed to evaluate the effects of a combination of video-assisted lectures and peer education on health literacy related to infectious diseases among students. METHODS Third-grade classes from 11 pilot schools in Longgang District of Shenzhen, China, were randomized to the intervention and control groups. In the intervention group, a video-assisted interactive health education program was conducted twice over a time span of 5 months. Each of the 2 sessions included a 40-minute lecture on COVID-19 and other common infectious diseases in schools and a 5-minute science video. In addition, 5 "little health supervisors" at the end of the first session were elected in each class, who were responsible for helping class members to learn health knowledge and develop good hygiene habits. Students answered the same quiz before the first and after the second session. Models based on item response theory (IRT) were constructed to score the students' knowledge of infectious diseases based on the quiz. RESULTS In total, 52 classes and 2526 students (intervention group: n=1311; control group: n=1215) were enrolled. Responses of the baseline survey were available for 2177 (86.2%; intervention group: n=1306; control group: n=871) students and those of the postintervention survey were available for 1862 (73.7%; intervention group: n=1187; control group: n=675). There were significant cross-group differences in the rates of correctly answering questions about influenza symptoms, transmission, and preventive measures; chicken pox symptoms; norovirus diarrhea symptoms; mumps symptoms; and COVID-19 symptoms. Average IRT scores of questions related to infectious diseases in the intervention and control groups were, respectively, -0.0375 (SD 0.7784) and 0.0477 (SD 0.7481) before the intervention (P=.01), suggesting better baseline knowledge in the control group. After the intervention, the average scores of the intervention and control groups were 0.0543 (SD 0.7569) and -0.1115 (SD 0.7307), respectively (P<.001), suggesting not only significantly better scores but also greater improvement in the intervention group. CONCLUSIONS After the health education project, the correct answer rate of infectious disease questions in the intervention group was higher than that of the control group, which indicates significant effects of the combination of video-assisted lectures and peer education for the promotion of health literacy. In addition, the intervention effect of the first session persisted for at least 4 months up to the second session. As such, the proposed program was effective in improving the health literacy of school children in relation to infectious diseases and should be considered for massive health promotion campaigns during pandemics. TRIAL REGISTRATION ISRCTN ISRCTN49297995; https://www.isrctn.com/ISRCTN49297995.
Collapse
Affiliation(s)
- Xiaojuan Zhang
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, China
- School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Yingkun Justin Wen
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, China
- Department of Learning, Informatics, Management and Ethics, Karolinska Institutet, Stockholm, Sweden
| | - Ning Han
- Institute of Public Health Supervision of Longgang District, Shenzhen, China
| | - Yawen Jiang
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, China
| |
Collapse
|
26
|
Adams ZW, Hulvershorn LA, Smoker MP, Marriott BR, Aalsma MC, Gibbons RD. Initial Validation of a Computerized Adaptive Test for Substance Use Disorder Identification in Adolescents. Subst Use Misuse 2024; 59:867-873. [PMID: 38270342 DOI: 10.1080/10826084.2024.2305801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
PURPOSE Computerized adaptive tests (CATs) are highly efficient assessment tools that couple low patient and clinician time burden with high diagnostic accuracy. A CAT for substance use disorders (CAT-SUD-E) has been validated in adult populations but has yet to be tested in adolescents. The purpose of this study was to perform initial evaluation of the K-CAT-SUD-E (i.e., Kiddy-CAT-SUD-E) in an adolescent sample compared to a gold-standard diagnostic interview. METHODS Adolescents (N = 156; aged 11-17) with diverse substance use histories completed the K-CAT-SUD-E electronically and the substance related disorders portion of a clinician-conducted diagnostic interview (K-SADS) via tele-videoconferencing platform. The K-CAT-SUD-E assessed both current and lifetime overall SUD and substance-specific diagnoses for nine substance classes. RESULTS Using the K-CAT-SUD-E continuous severity score and diagnoses to predict the presence of any K-SADS SUD diagnosis, the classification accuracy ranged from excellent for current SUD (AUC = 0.89, 95% CI = 0.81, 0.95) to outstanding (AUC = 0.93, 95% CI = 0.82, 0.97) for lifetime SUD. Regarding current substance-specific diagnoses, the classification accuracy was excellent for alcohol (AUC = 0.82), cannabis (AUC = 0.83) and nicotine/tobacco (AUC = 0.90). For lifetime substance-specific diagnoses, the classification accuracy ranged from excellent (e.g., opioids, AUC = 0.84) to outstanding (e.g., stimulants, AUC = 0.96). K-CAT-SUD-E median completion time was 4 min 22 s compared to 45 min for the K-SADS. CONCLUSIONS This study provides initial support for the K-CAT-SUD-E as a feasible accurate diagnostic tool for assessing SUDs in adolescents. Future studies should further validate the K-CAT-SUD-E in a larger sample of adolescents and examine its acceptability, feasibility, and scalability in youth-serving settings.
Collapse
Affiliation(s)
- Zachary W Adams
- Department of Psychiatry, Indiana University, Indianapolis, IN, USA
| | | | - Michael P Smoker
- Department of Psychiatry, Indiana University, Indianapolis, IN, USA
| | | | - Matthew C Aalsma
- Department of Pediatrics, Indiana University, Indianapolis, IN, USA
| | - Robert D Gibbons
- Departments of Medicine and Public Health Sciences, The University of Chicago Biological Sciences, Chicago, IL, USA
| |
Collapse
|
27
|
Antoniou F, Alghamdi MH. Principal goals at school: evaluating construct validity and response scaling format. Front Psychol 2024; 14:1283686. [PMID: 38356991 PMCID: PMC10865888 DOI: 10.3389/fpsyg.2023.1283686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Accepted: 12/18/2023] [Indexed: 02/16/2024] Open
Abstract
The purpose of the present study was to test the efficacy and appropriateness of the 4-point response option of the Principal's Goals Scale of the SASS (1999-2000) survey. Competing dichotomous models with various conceptualizations were constructed and tested against the original polytomous conceptualization. Participants were 8,524 principals from whom 64% were males and 36% females. Principals' goals were assessed using a 6-item scale anchored across points reflecting proximity to achieving a goal. The original polytomous conceptualization was contrasted to a dichotomous two-pole conceptualization using a model with freely estimated discriminations (two-parameter logistic model, 2PL) as well as the Rasch model assuming equal discrimination parameters. Results indicated that the 2PL dichotomous model provided the most optimal model fit. Furthermore, item-related, and person-related estimates pointed to enhanced accuracy and validity for the dichotomous model conceptualization compared to the polytomous model. It is suggested that a dichotomous scaling system is considered in subsequent measurements of the scale as a means of enhancing the accuracy and validity of the measured trait.
Collapse
Affiliation(s)
- Faye Antoniou
- Department of Educational Studies, National and Kapodistrian University of Athens, Athens, Greece
| | - Mohammed H. Alghamdi
- Department of Self-Development Skills, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
28
|
Ilardi CR, Sannino M, Federico G, Cirillo MA, Cavaliere C, Iavarone A, Garofalo E. The Starkstein Apathy Scale-Italian Version: An Update. J Geriatr Psychiatry Neurol 2024:8919887241227404. [PMID: 38233366 DOI: 10.1177/08919887241227404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Apathy can manifest in various neuropsychiatric conditions, as well as in individuals who experience significant stressful life events or suffer from underlying internal medical conditions. The Starkstein Apathy Scale (SAS) is recognized as a reliable screening tool, besides being endorsed by the International Parkinson and Movement Disorder Society to assess apathy in patients with Parkinson's disease. Recently, the Italian version of this scale (SAS-I) has been introduced. Furthermore, normative data have been provided on a large sample of Italian healthy individuals. Here we present the official Italian translation of the SAS, along with clarifications regarding its administration. Also, we supply details concerning the scale's factorial structure, inter-item conditional associations and item performance by using EFA, Network analysis, and IRT modelling for polytomous items.
Collapse
Affiliation(s)
| | | | | | - Mara A Cirillo
- Department of Advanced Medical and Surgical Sciences, University of Campania "Luigi Vanvitelli", Naples, Italy
| | | | | | | |
Collapse
|
29
|
Myszkowski N, Storme M. Modeling Sequential Dependencies in Progressive Matrices: An Auto-Regressive Item Response Theory (AR-IRT) Approach. J Intell 2024; 12:7. [PMID: 38248905 PMCID: PMC10817306 DOI: 10.3390/jintelligence12010007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 12/30/2023] [Accepted: 01/12/2024] [Indexed: 01/23/2024] Open
Abstract
Measurement models traditionally make the assumption that item responses are independent from one another, conditional upon the common factor. They typically explore for violations of this assumption using various methods, but rarely do they account for the possibility that an item predicts the next. Extending the development of auto-regressive models in the context of personality and judgment tests, we propose to extend binary item response models-using, as an example, the 2-parameter logistic (2PL) model-to include auto-regressive sequential dependencies. We motivate such models and illustrate them in the context of a publicly available progressive matrices dataset. We find an auto-regressive lag-1 2PL model to outperform a traditional 2PL model in fit as well as to provide more conservative discrimination parameters and standard errors. We conclude that sequential effects are likely overlooked in the context of cognitive ability testing in general and progressive matrices tests in particular. We discuss extensions, notably models with multiple lag effects and variable lag effects.
Collapse
Affiliation(s)
- Nils Myszkowski
- Department of Psychology, Pace University, New York, NY 10004, USA
| | - Martin Storme
- IESEG School of Management, Univ. Lille, CNRS, UMR 9221 - LEM - Lille Économie Management, 59000 Lille, France
| |
Collapse
|
30
|
Qiu X, Huang SY, Wang WC, Wang YG. An Iterative Scale Purification Procedure on lz for the Detection of Aberrant Responses. Multivariate Behav Res 2024; 59:62-77. [PMID: 37261427 DOI: 10.1080/00273171.2023.2211564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Many person-fit statistics have been proposed to detect aberrant response behaviors (e.g., cheating, guessing). Among them, lz is one of the most widely used indices. The computation of lz assumes the item and person parameters are known. In reality, they often have to be estimated from data. The better the estimation, the better lz will perform. When aberrant behaviors occur, the person and item parameter estimations are inaccurate, which in turn degrade the performance of lz. In this study, an iterative procedure was developed to attain more accurate person parameter estimates for improved performance of lz. A series of simulations were conducted to evaluate the iterative procedure under two conditions of item parameters, known and unknown, and three aberrant response styles of difficulty-sharing cheating, random-sharing cheating, and random guessing. The results demonstrated the superiority of the iterative procedure over the non-iterative one in maintaining control of Type-I error rates and improving the power of detecting aberrant responses. The proposed procedure was applied to a high-stake intelligence test.
Collapse
Affiliation(s)
- Xuelan Qiu
- Institute for Learning Sciences and Teacher Education, Australian Catholic University (Brisbane Campus)
| | - Sheng-Yun Huang
- Assessment Research Centre, The Education University of Hong Kong
| | - Wen-Chung Wang
- Assessment Research Centre & Department of Psychology, The Education University of Hong Kong
| | - You-Gan Wang
- Institute for Learning Sciences and Teacher Education, Australian Catholic University (Brisbane Campus)
| |
Collapse
|
31
|
Vowles KE, Kruger ES, Bailey RW, Ashworth J, Hickman J, Sowden G, McCracken LM. The Pain Anxiety Symptom Scale: Initial Development and Evaluation of 4 and 8 Item Short Forms. J Pain 2024; 25:176-186. [PMID: 37574179 DOI: 10.1016/j.jpain.2023.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 06/30/2023] [Accepted: 08/04/2023] [Indexed: 08/15/2023]
Abstract
Elevated levels of anxiety in relation to chronic pain have been consistently associated with greater distress and disability. Thus, accurate measurement of pain-related anxiety is an important requirement in modern pain services. The Pain Anxiety Symptom Scale (PASS) was introduced over 30 years ago, with a shortened 20-item version introduced 10 years later. Both versions of the PASS were derived using Principal Components Analysis, an established method of measure development with roots in classical test theory. Item Response Theory (IRT) is a complementary approach to measure development that can reduce the number of items needed and maximize item utility with minimal loss of statistical and clinical information. The present study used IRT to shorten the 20-item PASS (PASS-20) in a large sample of people with chronic pain (N = 2,669). Two shortened versions were evaluated, 1 composed of the single best-performing item from each of its 4 subscales (PASS-4) and the other with the 2 best-performing items from each subscale (PASS-8). Several supplementary analyses were performed, including comparative item convergence evaluations based on sample characteristics (ie, female or male sex; clinical or online sample), factor invariance testing, and criterion validity evaluation of the 4, 8, and 20-item versions of the PASS in hierarchical regression models predicting pain-related distress and interference. Overall, both shortened PASS versions performed adequately across these supplemental tests, although the PASS-4 had more consistent item convergence between samples, stronger evidence for factor invariance, and accounted for 83% of the variance accounted for by the PASS-20% and 92% of the variance accounted for by the PASS-8 in criterion variables. Consequently, the PASS-4 is recommended for use in situations where a briefer evaluation of pain-related anxiety is appropriate. PERSPECTIVE: The Pain Anxiety Symptom Scale (PASS) is an established measure of pain-related fear. This study derived 4 and 8-item versions of the PASS using IRT. Both versions showed strong psychometric properties, stability of factor structure, and relation to important aspects of pain-related functioning.
Collapse
Affiliation(s)
- Kevin E Vowles
- School of Psychology, Queen's University Belfast and Belfast Centre for Pain Rehabilitation, Belfast City Hospital, National Health Service (NHS), Belfast, Northern Ireland, UK
| | - Eric S Kruger
- Division of Physical Therapy, University of New Mexico, Albuquerque, New Mexico
| | - Robert W Bailey
- VA Puget Sound Health Care System, Seattle Division, Seattle, Washington
| | - Julie Ashworth
- Midlands Partnership NHS Foundation Trust, Staffordshire, UK; School of Medicine, Keele University, Keele, UK
| | - Jayne Hickman
- UK Pain Service, Sandwell and West Birmingham Hospitals NHS Trust, Birmingham, UK
| | - Gail Sowden
- School of Medicine, Keele University, Keele, UK; Connect Health, Newcastle upon Tyne, UK
| | | |
Collapse
|
32
|
Pintro K, Sanchez SE, Rondon MB, Gelaye B. Fourteen-item perceived stress scale assessment using item response theory among pregnant women. Scand J Psychol 2023. [PMID: 38123342 DOI: 10.1111/sjop.12993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 11/19/2023] [Accepted: 11/29/2023] [Indexed: 12/23/2023]
Abstract
The current study aimed to assess the psychometric properties of the Spanish language version of the 14-item Perceived Stress Scale (PSS-S) in a population of pregnant women who speak Spanish in Peru using item response theory (IRT). Our study consisted of 5,435 pregnant women who participated in the Pregnancy Outcomes Maternal and Infant Study (PrOMIS) cohort in Peru. Exploratory and confirmatory factor analyses were conducted to determine dimensionality of the scale in this population, and item response theory was conducted to determine the applicability of the PSS. The PSS consisted of a 2-factor questionnaire measuring perceived stress and coping capacity accounting for 77% of variability. The IRT analysis showed differences in item difficulty and discrimination. Item difficulty represents the level of the latent construct where 50% of respondents endorse a particular response, and item discrimination determines the rate of change of the probability of endorsing an item for differing ability levels. For the first factor, perceived stress, item 12 was the least difficult and item 2 was the most difficult. For the second factor, coping capacity, item 9 was the least difficult and item 6 was the most difficult. The Spanish version of the 14-item PSS can be a useful assessment tool for perceived stress, but more IRT should be done to delve further into the psychometric properties of the questionnaire to inform clinicians and policy makers more appropriately.
Collapse
Affiliation(s)
- Kedie Pintro
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Sixto E Sanchez
- Universidad de San Martin de Porres, Facultad de Medicina Humana, Instituto de Investigacion, Lima, Peru
| | | | - Bizu Gelaye
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- The Chester M. Pierce, M.D. Division of Global Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
33
|
Sideridis G, Ghamdi H, Zamil O. Contrasting multistage and computer-based testing: score accuracy and aberrant responding. Front Psychol 2023; 14:1288177. [PMID: 38115978 PMCID: PMC10728648 DOI: 10.3389/fpsyg.2023.1288177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 11/20/2023] [Indexed: 12/21/2023] Open
Abstract
The goal of the present study was to compare and contrast the efficacy of a multistage testing (MST) design using three paths compared to a traditional computer-based testing (CBT) approach involving items across all ability levels. Participants were n = 627 individuals who were subjected to both a computer-based testing (CBT) instrument and a measure constructed using multistage testing to route individuals of low, middle, and high ability to content that was respective to their ability level. Comparisons between the medium of testing involved person ability accuracy estimates and evaluation of aberrant responding. The results indicated that MST assessments deviated markedly from CBT assessments, especially for low- and high-ability individuals. Test score accuracy was higher overall in MST compared to CBT, although error of measurement was enhanced for high-ability individuals during MST compared to CBT. Evaluating response patterns indicated significant amounts of Guttman-related errors during CBT compared to MST using person-fit aberrant response indicators. It was concluded that MST is associated with significant benefits compared to CBT.
Collapse
Affiliation(s)
- Georgios Sideridis
- Boston Children’s Hospital, Harvard Medical School, Boston, MA, United States
- Department of Primary Education, National and Kapodistrian University of Athens, Athens, Greece
| | - Hanan Ghamdi
- Education and Training Evaluation Commission, Riyadh, Saudi Arabia
| | - Omar Zamil
- Education and Training Evaluation Commission, Riyadh, Saudi Arabia
| |
Collapse
|
34
|
Abstract
Random item effects item response theory (IRT) models, which treat both person and item effects as random, have received much attention for more than a decade. The random item effects approach has several advantages in many practical settings. The present study introduced an explanatory multidimensional random item effects rating scale model. The proposed model was formulated under a novel parameterization of the nominal response model (NRM), and allows for flexible inclusion of person-related and item-related covariates (e.g., person characteristics and item features) to study their impacts on the person and item latent variables. A new variant of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm designed for latent variable models with crossed random effects was applied to obtain parameter estimates for the proposed model. A preliminary simulation study was conducted to evaluate the performance of the MH-RM algorithm for estimating the proposed model. Results indicated that the model parameters were well recovered. An empirical data set was analyzed to further illustrate the usage of the proposed model.
Collapse
Affiliation(s)
| | | | - Li Cai
- University of California, Los Angeles, USA
| |
Collapse
|
35
|
Ahanotu A, DeVore EK, Carroll TL, Edelen M, Morcos M, Willard E, Zhao NW, Belafsky P, Shin JJ. Can EAT-10 Become EAT-5? Improving Measurement Efficiency of Dysphagia with Item Response Theory. Laryngoscope 2023; 133:3327-3333. [PMID: 37166087 DOI: 10.1002/lary.30732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 03/31/2023] [Accepted: 04/16/2023] [Indexed: 05/12/2023]
Abstract
OBJECTIVES To assess: (1) the Eating Assessment Tool (EAT-10) with item response theory (IRT) to determine which individual items provide the most information, (2) the extent to which dysphagia is measured with subsets of items while maintaining precise score estimates, and (3) if 5-item scales have the differing discriminatory ability, as compared to the parent 10-item instrument. METHODS Prospectively collected data from 2,339 patients who completed the EAT-10 questionnaire during evaluation at a tertiary care otolaryngology clinic were utilized. IRT analyses provided discrimination and location parameters associated with individual questions. Residual item correlations were also assessed for redundant information. Based on these results, three 5-item subsets were further evaluated using item information function curves. Areas under receiver-operator characteristic curves (ROC-AUC) were also calculated to evaluate the discriminatory ability for dysphagia-related clinical diagnoses. RESULTS Item discrimination parameter estimates ranged from 1.71 to 5.46, with higher values indicating more information. Residual item correlations were determined within item pairs, and location parameters were calculated. Based on these data, in combination with clinical utility, three 5-item subsets were proposed and assessed. ROC-AUC analyses demonstrated no significant difference between the EAT-5-Alpha subset and the original 10-item instrument for discriminating dysphagia as a primary diagnosis (0.88, 0.88). The EAT-5-Clinical subset outperformed the original 10 instruments in ROC-AUC for aspiration. The EAT-5-Range subset was significantly associated with problems with thin liquids. CONCLUSIONS IRT analyses distinguished three proposed 5-item subsets of the EAT-10 instrument, supporting shorter survey options, while still reflecting the impact of dysphagia without significant loss of discrimination. LEVEL OF EVIDENCE 3 (Diagnostic testing with consistently applied reference standards, partial blinding). Laryngoscope, 133:3327-3333, 2023.
Collapse
Affiliation(s)
- Adaobi Ahanotu
- University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Elliana Kirsh DeVore
- Harvard Medical School, Boston, Massachusetts, USA
- Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Thomas L Carroll
- Harvard Medical School, Boston, Massachusetts, USA
- Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Maria Edelen
- Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Mary Morcos
- Harvard Medical School, Boston, Massachusetts, USA
| | | | - Nina W Zhao
- University of California, Davis, California, USA
- University Hospitals Cleveland Medical Center, Cleveland, Ohio, USA
| | | | - Jennifer J Shin
- Harvard Medical School, Boston, Massachusetts, USA
- Center for Surgery and Public Health, Brigham and Women's Hospital, Boston, Massachusetts, USA
| |
Collapse
|
36
|
Waddimba AC, DeSpain S, Bennett MM, Douglas ME, Warren AM. Longitudinal validation of the Fear of COVID-19 Scale in a nationwide United States sample: An item response theory model across three inflection points of the pandemic. Stress Health 2023; 39:1157-1170. [PMID: 37158412 DOI: 10.1002/smi.3259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 04/11/2023] [Accepted: 04/20/2023] [Indexed: 05/10/2023]
Abstract
The COVID-19 pandemic's global emergence/spread caused widespread fear. Measurement/tracking of COVID-19 fear could facilitate remediation. Despite the Fear of COVID-19 Scale (FCV-19S)'s validation in multiple languages/countries, nationwide United States (U.S.) studies are scarce. Cross-sectional classical test theory-based validation studies predominate. Our longitudinal study sampled respondents to a 3-wave, nationwide, online survey. We calibrated the FCV-19S using a unidimensional graded response model. Item/scale monotonicity, discrimination, informativeness, goodness-of-fit, criterion validity, internal consistency, and test-retest reliability were assessed. Items 7, 6, and 3 consistently displayed very high discrimination. Other items had moderate-to-high discrimination. Items 3, 6, and 7 were most (items 1 and 5 the least) informative. [Correction added on 18 May 2023, after first online publication: In the preceding sentence, the term 'items one-fifth least' has been changed to 'items 1 and 5 the least'.] Item scalability was 0.62-0.69; full-scale scalability 0.65-0.67. Ordinal reliability coefficient was 0.94; test-retest intraclass correlation coefficient 0.84. Positive correlations with posttraumatic stress/anxiety/depression, and negative correlations with emotional stability/resilience supported convergent/divergent validity. The FCV-19S validly/reliably captures temporal variation in COVID-19 fear across the U.S.
Collapse
Affiliation(s)
- Anthony C Waddimba
- Division of Surgical Research, Department of Surgery, Baylor University Medical Center, Dallas, Texas, USA
- Baylor Scott and White Research Institute, Dallas, Texas, USA
| | - Sydney DeSpain
- Arkansas College of Osteopathic Medicine, Arkansas Colleges of Health Education, Fort Smith, Arkansas, USA
| | | | - Megan E Douglas
- Baylor Scott and White Research Institute, Dallas, Texas, USA
| | - Ann Marie Warren
- Baylor Scott and White Research Institute, Dallas, Texas, USA
- Division of Trauma & Critical Care Surgery, Department of Surgery, Baylor University Medical Center, Dallas, Texas, USA
| |
Collapse
|
37
|
Zimmer F, Draxler C, Debelak R. Power Analysis for the Wald, LR, Score, and Gradient Tests in a Marginal Maximum Likelihood Framework: Applications in IRT. Psychometrika 2023; 88:1249-1298. [PMID: 36029390 PMCID: PMC10656348 DOI: 10.1007/s11336-022-09883-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 01/11/2022] [Indexed: 06/15/2023]
Abstract
The Wald, likelihood ratio, score, and the recently proposed gradient statistics can be used to assess a broad range of hypotheses in item response theory models, for instance, to check the overall model fit or to detect differential item functioning. We introduce new methods for power analysis and sample size planning that can be applied when marginal maximum likelihood estimation is used. This allows the application to a variety of IRT models, which are commonly used in practice, e.g., in large-scale educational assessments. An analytical method utilizes the asymptotic distributions of the statistics under alternative hypotheses. We also provide a sampling-based approach for applications where the analytical approach is computationally infeasible. This can be the case with 20 or more items, since the computational load increases exponentially with the number of items. We performed extensive simulation studies in three practically relevant settings, i.e., testing a Rasch model against a 2PL model, testing for differential item functioning, and testing a partial credit model against a generalized partial credit model. The observed distributions of the test statistics and the power of the tests agreed well with the predictions by the proposed methods in sufficiently large samples. We provide an openly accessible R package that implements the methods for user-supplied hypotheses.
Collapse
Affiliation(s)
| | - Clemens Draxler
- The Health and Life Sciences University, Hall in Tirol, Austria
| | | |
Collapse
|
38
|
Li N, Hein S, Cavitt J, Chapman J, Foley Geib C, Grigorenko EL. Applying Item Response Theory Analysis to the SAVRY in Justice-Involved Youth. Assessment 2023; 30:2373-2386. [PMID: 36658778 DOI: 10.1177/10731911221146120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
This study investigated item- and test-level functioning of the Structured Assessment of Violence Risk in Youth (SAVRY) and differential item functioning (DIF) across gender and race/ethnicity in justice-involved youth (JIY) using item response theory analysis. Participants were 868 JIY (23.7% female; 26.9% White, 50.9% Black, and 22.2% Hispanic) in pre-trial detention centers in Connecticut. Results obtained from the application of the graded response model showed that the SAVRY items were not equally discriminating JIY with varying levels of the latent trait, with "Poor compliance" as the most discriminating item and "History of self-harm or suicide attempts" as the least discriminating item. At the test level, the SAVRY provided precise (reliable) information about the latent trait for the majority of JIY whose latent trait between two standard deviations below and above the mean. Results of DIF revealed that six items operated inconsistently between White, Black, and Hispanic JIY, among which two items also functioned differentially across gender.
Collapse
Affiliation(s)
- Nan Li
- University of Houston, Houston, TX, USA
| | | | | | | | | | - Elena L Grigorenko
- University of Houston, Houston, TX, USA
- Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
39
|
Chen Y, Li C, Ouyang J, Xu G. DIF Statistical Inference Without Knowing Anchoring Items. Psychometrika 2023; 88:1097-1122. [PMID: 37550561 PMCID: PMC10656337 DOI: 10.1007/s11336-023-09930-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 07/05/2023] [Indexed: 08/09/2023]
Abstract
Establishing the invariance property of an instrument (e.g., a questionnaire or test) is a key step for establishing its measurement validity. Measurement invariance is typically assessed by differential item functioning (DIF) analysis, i.e., detecting DIF items whose response distribution depends not only on the latent trait measured by the instrument but also on the group membership. DIF analysis is confounded by the group difference in the latent trait distributions. Many DIF analyses require knowing several anchor items that are DIF-free in order to draw inferences on whether each of the rest is a DIF item, where the anchor items are used to identify the latent trait distributions. When no prior information on anchor items is available, or some anchor items are misspecified, item purification methods and regularized estimation methods can be used. The former iteratively purifies the anchor set by a stepwise model selection procedure, and the latter selects the DIF-free items by a LASSO-type regularization approach. Unfortunately, unlike the methods based on a correctly specified anchor set, these methods are not guaranteed to provide valid statistical inference (e.g., confidence intervals and p-values). In this paper, we propose a new method for DIF analysis under a multiple indicators and multiple causes (MIMIC) model for DIF. This method adopts a minimal [Formula: see text] norm condition for identifying the latent trait distributions. Without requiring prior knowledge about an anchor set, it can accurately estimate the DIF effects of individual items and further draw valid statistical inferences for quantifying the uncertainty. Specifically, the inference results allow us to control the type-I error for DIF detection, which may not be possible with item purification and regularized estimation methods. We conduct simulation studies to evaluate the performance of the proposed method and compare it with the anchor-set-based likelihood ratio test approach and the LASSO approach. The proposed method is applied to analysing the three personality scales of the Eysenck personality questionnaire-revised (EPQ-R).
Collapse
Affiliation(s)
- Yunxiao Chen
- London School of Economics and Political Science, London, UK.
| | | | | | | |
Collapse
|
40
|
Merhof V, Meiser T. Dynamic Response Strategies: Accounting for Response Process Heterogeneity in IRTree Decision Nodes. Psychometrika 2023; 88:1354-1380. [PMID: 36746887 PMCID: PMC10656330 DOI: 10.1007/s11336-023-09901-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Indexed: 06/18/2023]
Abstract
It is essential to control self-reported trait measurements for response style effects to ensure a valid interpretation of estimates. Traditional psychometric models facilitating such control consider item responses as the result of two kinds of response processes-based on the substantive trait, or based on response styles-and they assume that both of these processes have a constant influence across the items of a questionnaire. However, this homogeneity over items is not always given, for instance, if the respondents' motivation declines throughout the questionnaire so that heuristic responding driven by response styles may gradually take over from cognitively effortful trait-based responding. The present study proposes two dynamic IRTree models, which account for systematic continuous changes and additional random fluctuations of response strategies, by defining item position-dependent trait and response style effects. Simulation analyses demonstrate that the proposed models accurately capture dynamic trajectories of response processes, as well as reliably detect the absence of dynamics, that is, identify constant response strategies. The continuous version of the dynamic model formalizes the underlying response strategies in a parsimonious way and is highly suitable as a cognitive model for investigating response strategy changes over items. The extended model with random fluctuations of strategies can adapt more closely to the item-specific effects of different response processes and thus is a well-fitting model with high flexibility. By using an empirical data set, the benefits of the proposed dynamic approaches over traditional IRTree models are illustrated under realistic conditions.
Collapse
Affiliation(s)
- Viola Merhof
- Department of Psychology, University of Mannheim, L 13 15, 68161, Mannheim, Germany.
| | - Thorsten Meiser
- Department of Psychology, University of Mannheim, L 13 15, 68161, Mannheim, Germany
| |
Collapse
|
41
|
Wu T, Kim SY, Westine C. Evaluating the Effects of Missing Data Handling Methods on Scale Linking Accuracy. Educ Psychol Meas 2023; 83:1202-1228. [PMID: 37974655 PMCID: PMC10638981 DOI: 10.1177/00131644221140941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
For large-scale assessments, data are often collected with missing responses. Despite the wide use of item response theory (IRT) in many testing programs, however, the existing literature offers little insight into the effectiveness of various approaches to handling missing responses in the context of scale linking. Scale linking is commonly used in large-scale assessments to maintain scale comparability over multiple forms of a test. Under a common-item nonequivalent group design (CINEG), missing data that occur to common items potentially influence the linking coefficients and, consequently, may affect scale comparability, test validity, and reliability. The objective of this study was to evaluate the effect of six missing data handling approaches, including listwise deletion (LWD), treating missing data as incorrect responses (IN), corrected item mean imputation (CM), imputing with a response function (RF), multiple imputation (MI), and full information likelihood information (FIML), on IRT scale linking accuracy when missing data occur to common items. Under a set of simulation conditions, the relative performance of the six missing data treatment methods under two missing mechanisms was explored. Results showed that RF, MI, and FIML produced less errors for conducting scale linking whereas LWD was associated with the most errors regardless of various testing conditions.
Collapse
Affiliation(s)
- Tong Wu
- University of North Carolina at Charlotte, USA
- Riverside Insights, Itasca, IL
| | | | | |
Collapse
|
42
|
Liu SH, Feuerstahler L, Chen Y, Braun JM, Buckley JP. Toward Advancing Precision Environmental Health: Developing a Customized Exposure Burden Score to PFAS Mixtures to Enable Equitable Comparisons Across Population Subgroups, Using Mixture Item Response Theory. Environ Sci Technol 2023; 57:18104-18115. [PMID: 37615359 DOI: 10.1021/acs.est.3c00343] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
Quantifying a person's cumulative exposure burden to per- and polyfluoroalkyl substances (PFAS) mixtures is important for risk assessment, biomonitoring, and reporting of results to participants. However, different people may be exposed to different sets of PFASs due to heterogeneity in the exposure sources and patterns. Applying a single measurement model for the entire population (e.g., by summing concentrations of all PFAS analytes) assumes that each PFAS analyte is equally informative to PFAS exposure burden for all individuals. This assumption may not hold if PFAS exposure sources systematically differ within the population. However, the sociodemographic, dietary, and behavioral characteristics that underlie systematic exposure differences may not be known, or may be due to a combination of these factors. Therefore, we used mixture item response theory, an unsupervised psychometrics and data science method, to develop a customized PFAS exposure burden scoring algorithm. This scoring algorithm ensures that PFAS burden scores can be equitably compared across population subgroups. We applied our methods to PFAS biomonitoring data from the United States National Health and Nutrition Examination Survey (2013-2018). Using mixture item response theory, we found that participants with higher household incomes had higher PFAS burden scores. Asian Americans had significantly higher PFAS burden compared with non-Hispanic Whites and other race/ethnicity groups. However, some disparities were masked when using summed PFAS concentrations as the exposure metric. This work demonstrates that our summary PFAS burden metric, accounting for sources of exposure variation, may be a more fair and informative estimate of PFAS exposure.
Collapse
Affiliation(s)
- Shelley H Liu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Leah Feuerstahler
- Department of Psychology, Fordham University, Bronx, New York 10458, United States
| | - Yitong Chen
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States
| | - Joseph M Braun
- Department of Epidemiology, Brown University, Providence, Rhode Island 02912, United States
| | - Jessie P Buckley
- Department of Environmental Health and Engineering, John Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, United States
| |
Collapse
|
43
|
Mulrenin B, Pineda R, Dodds C, Velozo CA. Item-Level Psychometrics of the Neonatal Eating Outcome Assessment in Orally Feeding Infants. OTJR (Thorofare N J) 2023:15394492231212399. [PMID: 37981785 DOI: 10.1177/15394492231212399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2023]
Abstract
BACKGROUND The Neonatal Eating Outcome Assessment determines feeding performance based on the infant's postmenstrual age (PMA). OBJECTIVE To examine item-level measurement properties of this assessment's rating scale. METHODOLOGY In this retrospective study, Rasch analysis was completed on clinical data from the Neonatal Eating Outcome Assessment for 100 infants (52 preterm and 48 full-term) using Winsteps version 3.93.1. Instead of PMA-based scores, ordered letters converted to numerical scores were analyzed. RESULTS Analysis demonstrated that Section I (Pre-Feeding Skills) represents a separate construct from Sections II and III (Oral Feeding and End of Feeding, respectively). Sections II and III were adequately unidimensional to complete Rasch analysis. These sections fit the Rasch model overall, but rating scale category underuse was common, which may be attributed to sample characteristics. IMPLICATIONS This analysis supports using validated ordered letter scoring of Sections II and III to measure oral feeding performance in preterm and full-term newborns.
Collapse
Affiliation(s)
- Brooke Mulrenin
- Medical University of South Carolina (MUSC), Charleston, SC, USA
| | - Roberta Pineda
- University of Southern California (USC), Los Angeles, CA, USA
- Keck School of Medicine of USC, Los Angeles, CA, USA
- Washington University School of Medicine, St. Louis, MO, USA
| | - Cynthia Dodds
- Medical University of South Carolina (MUSC), Charleston, SC, USA
| | - Craig A Velozo
- Medical University of South Carolina (MUSC), Charleston, SC, USA
| |
Collapse
|
44
|
Quirk VL, Kern JL. Using IRTree Models to Promote Selection Validity in the Presence of Extreme Response Styles. J Intell 2023; 11:216. [PMID: 37998715 PMCID: PMC10672242 DOI: 10.3390/jintelligence11110216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 10/16/2023] [Accepted: 11/14/2023] [Indexed: 11/25/2023] Open
Abstract
The measurement of psychological constructs is frequently based on self-report tests, which often have Likert-type items rated from "Strongly Disagree" to "Strongly Agree". Recently, a family of item response theory (IRT) models called IRTree models have emerged that can parse out content traits (e.g., personality traits) from noise traits (e.g., response styles). In this study, we compare the selection validity and adverse impact consequences of noise traits on selection when scores are estimated using a generalized partial credit model (GPCM) or an IRTree model. First, we present a simulation which demonstrates that when noise traits do exist, the selection decisions made based on the IRTree model estimated scores have higher accuracy rates and have less instances of adverse impact based on extreme response style group membership when compared to the GPCM. Both models performed similarly when there was no influence of noise traits on the responses. Second, we present an application using data collected from the Open-Source Psychometrics Project Fisher Temperament Inventory dataset. We found that the IRTree model had a better fit, but a high agreement rate between the model decisions resulted in virtually identical impact ratios between the models. We offer considerations for applications of the IRTree model and future directions for research.
Collapse
|
45
|
Liu Y, Wang W. What Can We Learn from a Semiparametric Factor Analysis of Item Responses and Response Time? An Illustration with the PISA 2015 Data. Psychometrika 2023:10.1007/s11336-023-09936-3. [PMID: 37973773 DOI: 10.1007/s11336-023-09936-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Indexed: 11/19/2023]
Abstract
It is widely believed that a joint factor analysis of item responses and response time (RT) may yield more precise ability scores that are conventionally predicted from responses only. For this purpose, a simple-structure factor model is often preferred as it only requires specifying an additional measurement model for item-level RT while leaving the original item response theory (IRT) model for responses intact. The added speed factor indicated by item-level RT correlates with the ability factor in the IRT model, allowing RT data to carry additional information about respondents' ability. However, parametric simple-structure factor models are often restrictive and fit poorly to empirical data, which prompts under-confidence in the suitablity of a simple factor structure. In the present paper, we analyze the 2015 Programme for International Student Assessment mathematics data using a semiparametric simple-structure model. We conclude that a simple factor structure attains a decent fit after further parametric assumptions in the measurement model are sufficiently relaxed. Furthermore, our semiparametric model implies that the association between latent ability and speed/slowness is strong in the population, but the form of association is nonlinear. It follows that scoring based on the fitted model can substantially improve the precision of ability scores.
Collapse
Affiliation(s)
- Yang Liu
- Department of Human Development and Quantitative Methodology, University of Maryland, 3304R Benjamin Bldg, 3942 Campus Dr, College Park, MD, 20742, USA.
| | - Weimeng Wang
- Department of Human Development and Quantitative Methodology, University of Maryland, 3304R Benjamin Bldg, 3942 Campus Dr, College Park, MD, 20742, USA
| |
Collapse
|
46
|
van Rijn PW, Ali US, Shin HJ, Joo SH. Adjusted Residuals for Evaluating Conditional Independence in IRT Models for Multistage Adaptive Testing. Psychometrika 2023:10.1007/s11336-023-09935-4. [PMID: 37930558 DOI: 10.1007/s11336-023-09935-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Indexed: 11/07/2023]
Abstract
The key assumption of conditional independence of item responses given latent ability in item response theory (IRT) models is addressed for multistage adaptive testing (MST) designs. Routing decisions in MST designs can cause patterns in the data that are not accounted for by the IRT model. This phenomenon relates to quasi-independence in log-linear models for incomplete contingency tables and impacts certain types of statistical inference based on assumptions on observed and missing data. We demonstrate that generalized residuals for item pair frequencies under IRT models as discussed by Haberman and Sinharay (J Am Stat Assoc 108:1435-1444, 2013. https://doi.org/10.1080/01621459.2013.835660 ) are inappropriate for MST data without adjustments. The adjustments are dependent on the MST design, and can quickly become nontrivial as the complexity of the routing increases. However, the adjusted residuals are found to have satisfactory Type I errors in a simulation and illustrated by an application to real MST data from the Programme for International Student Assessment (PISA). Implications and suggestions for statistical inference with MST designs are discussed.
Collapse
Affiliation(s)
| | - Usama S Ali
- Educational Testing Service, Sacramento, Princeton, USA
- South Valley University, Qena, Egypt
| | | | | |
Collapse
|
47
|
Garcia D, Kazemitabar M, Habibi Asgarabad M. Corrigendum: The 18-item Swedish version of Ryff's psychological wellbeing scale: psychometric properties based on classical test theory and item response theory. Front Psychol 2023; 14:1324006. [PMID: 38022981 PMCID: PMC10656605 DOI: 10.3389/fpsyg.2023.1324006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Open
Abstract
[This corrects the article DOI: 10.3389/fpsyg.2023.1208300.].
Collapse
Affiliation(s)
- Danilo Garcia
- Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
- Centre for Ethics, Law and Mental Health (CELAM), University of Gothenburg, Gothenburg, Sweden
- Promotion of Health and Innovation (PHI) Lab, International Network for Well-Being, Linköping, Sweden
- Department of Psychology, University of Gothenburg, Gothenburg, Sweden
- Department of Psychology, Lund University, Lund, Sweden
| | - Maryam Kazemitabar
- Yale School of Medicine, Yale University, New Haven, CT, United States
- VA Connecticut Healthcare System, West Haven, CT, United States
- Promotion of Health and Innovation (PHI) Lab, International Network for Well-Being, New Haven, CT, United States
| | - Mojtaba Habibi Asgarabad
- Health Promotion Research Center, Iran University of Medical Sciences, Tehran, Iran
- Department of Health Psychology, School of Behavioral Sciences and Mental Health (Tehran Institute of Psychiatry), Iran University of Medical Sciences, Tehran, Iran
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
- Positive Youth Development Lab, Human Development and Family Sciences, Texas Tech University, Lubbock, TX, United States
- Center of Excellence in Cognitive Neuropsychology, Institute for Cognitive and Brain Sciences, Shahid Beheshti University, Tehran, Iran
| |
Collapse
|
48
|
Qiu X, de la Torre J. A dual process item response theory model for polytomous multidimensional forced-choice items. Br J Math Stat Psychol 2023; 76:491-512. [PMID: 36967236 DOI: 10.1111/bmsp.12303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 07/03/2023] [Indexed: 06/18/2023]
Abstract
The use of multidimensional forced-choice (MFC) items to assess non-cognitive traits such as personality, interests and values in psychological tests has a long history, because MFC items show strengths in preventing response bias. Recently, there has been a surge of interest in developing item response theory (IRT) models for MFC items. However, nearly all of the existing IRT models have been developed for MFC items with binary scores. Real tests use MFC items with more than two categories; such items are more informative than their binary counterparts. This study developed a new IRT model for polytomous MFC items based on the cognitive model of choice, which describes the cognitive processes underlying humans' preferential choice behaviours. The new model is unique in its ability to account for the ipsative nature of polytomous MFC items, to assess individual psychological differentiation in interests, values and emotions, and to compare the differentiation levels of latent traits between individuals. Simulation studies were conducted to examine the parameter recovery of the new model with existing computer programs. The results showed that both statement parameters and person parameters were well recovered when the sample size was sufficient. The more complete the linking of the statements was, the more accurate the parameter estimation was. This paper provides an empirical example of a career interest test using four-category MFC items. Although some aspects of the model (e.g., the nature of the person parameters) require additional validation, our approach appears promising.
Collapse
Affiliation(s)
- Xuelan Qiu
- Institute for Learning Sciences & Teacher Education, Australian Catholic University, Brisbane, Queensland, Australia
| | | |
Collapse
|
49
|
Bauer DJ. Enhancing measurement validity in diverse populations: Modern approaches to evaluating differential item functioning. Br J Math Stat Psychol 2023; 76:435-461. [PMID: 37431154 DOI: 10.1111/bmsp.12316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 06/05/2023] [Accepted: 06/09/2023] [Indexed: 07/12/2023]
Abstract
When developing and evaluating psychometric measures, a key concern is to ensure that they accurately capture individual differences on the intended construct across the entire population of interest. Inaccurate assessments of individual differences can occur when responses to some items reflect not only the intended construct but also construct-irrelevant characteristics, like a person's race or sex. Unaccounted for, this item bias can lead to apparent differences on the scores that do not reflect true differences, invalidating comparisons between people with different backgrounds. Accordingly, empirically identifying which items manifest bias through the evaluation of differential item functioning (DIF) has been a longstanding focus of much psychometric research. The majority of this work has focused on evaluating DIF across two (or a few) groups. Modern conceptualizations of identity, however, emphasize its multi-determined and intersectional nature, with some aspects better represented as dimensional than categorical. Fortunately, many model-based approaches to modelling DIF now exist that allow for simultaneous evaluation of multiple background variables, including both continuous and categorical variables, and potential interactions among background variables. This paper provides a comparative, integrative review of these new approaches to modelling DIF and clarifies both the opportunities and challenges associated with their application in psychometric research.
Collapse
Affiliation(s)
- Daniel J Bauer
- Department of Psychology and Neuroscience, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
50
|
Wallmark J, Josefsson M, Wiberg M. Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests. Appl Psychol Meas 2023; 47:496-512. [PMID: 38027462 PMCID: PMC10664743 DOI: 10.1177/01466216231209757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2023]
Abstract
This study aims to evaluate the performance of Item Response Theory (IRT) kernel equating in the context of mixed-format tests by comparing it to IRT observed score equating and kernel equating with log-linear presmoothing. Comparisons were made through both simulations and real data applications, under both equivalent groups (EG) and non-equivalent groups with anchor test (NEAT) sampling designs. To prevent bias towards IRT methods, data were simulated with and without the use of IRT models. The results suggest that the difference between IRT kernel equating and IRT observed score equating is minimal, both in terms of the equated scores and their standard errors. The application of IRT models for presmoothing yielded smaller standard error of equating than the log-linear presmoothing approach. When test data were generated using IRT models, IRT-based methods proved less biased than log-linear kernel equating. However, when data were simulated without IRT models, log-linear kernel equating showed less bias. Overall, IRT kernel equating shows great promise when equating mixed-format tests.
Collapse
Affiliation(s)
| | | | - Marie Wiberg
- Department of Statistics, USBE, Umeå University, Sweden
| |
Collapse
|