201
|
Hallgren KA, Matson TE, Oliver M, Witkiewitz K, Bobb JF, Lee AK, Caldeiro RM, Kivlahan D, Bradley KA. Practical Assessment of Alcohol Use Disorder in Routine Primary Care: Performance of an Alcohol Symptom Checklist. J Gen Intern Med 2022; 37:1885-1893. [PMID: 34398395 PMCID: PMC9198160 DOI: 10.1007/s11606-021-07038-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 07/13/2021] [Indexed: 10/20/2022]
Abstract
BACKGROUND Alcohol use disorder (AUD) is highly prevalent but underrecognized and undertreated in primary care settings. Alcohol Symptom Checklists can engage patients and providers in discussions of AUD-related care. However, the performance of Alcohol Symptom Checklists when they are used in routine care and documented in electronic health records (EHRs) remains unevaluated. OBJECTIVE To evaluate the psychometric performance of an Alcohol Symptom Checklist in routine primary care. DESIGN Cross-sectional study using item response theory (IRT) and differential item functioning analyses of measurement consistency across age, sex, race, and ethnicity. PATIENTS Patients seen in primary care in the Kaiser Permanente Washington Healthcare System who reported high-risk drinking on the Alcohol Use Disorder Identification Test Consumption screening measure (AUDIT-C ≥ 7) and subsequently completed an Alcohol Symptom Checklist between October 2015 and February 2020. MAIN MEASURE Alcohol Symptom Checklists with 11 items assessing AUD criteria defined in the Diagnostic and Statistical Manual for Mental Disorders, 5th edition (DSM-5), completed by patients during routine medical care and documented in EHRs. KEY RESULTS Among 11,464 patients who screened positive for high-risk drinking and completed an Alcohol Symptom Checklist (mean age 43.6 years, 30.5% female), 54.1% reported ≥ 2 DSM-5 AUD criteria (threshold for AUD diagnosis). IRT analyses demonstrated that checklist items measured a unidimensional continuum of AUD severity. Differential item functioning was observed for some demographic subgroups but had minimal impact on accurate measurement of AUD severity, with differences between demographic subgroups attributable to differential item functioning never exceeding 0.42 points of the total symptom count (of a possible range of 0-11). CONCLUSIONS Alcohol Symptom Checklists used in routine care discriminated AUD severity consistently with current definitions of AUD and performed equitably across age, sex, race, and ethnicity. Integrating symptom checklists into routine care may help inform clinical decision-making around diagnosing and managing AUD.
Collapse
Affiliation(s)
- Kevin A Hallgren
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA.
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA.
| | - Theresa E Matson
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Malia Oliver
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Katie Witkiewitz
- Department of Psychology and Center on Alcohol, Substance Use, and Addictions, University of New Mexico, Albuquerque, NM, USA
| | - Jennifer F Bobb
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Amy K Lee
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Ryan M Caldeiro
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Daniel Kivlahan
- Center of Innovation for Veteran-Centered and Value-Driven Care, Health Services Research and Development, Veteran Affairs Puget Sound Health Care System, Seattle, WA, USA
| | - Katharine A Bradley
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
- Center of Innovation for Veteran-Centered and Value-Driven Care, Health Services Research and Development, Veteran Affairs Puget Sound Health Care System, Seattle, WA, USA
- Department of Medicine, University of Washington, Seattle, WA, USA
| |
Collapse
|
202
|
Marcq K, Andersson B. Standard Errors of Kernel Equating: Accounting for Bandwidth Estimation. Appl Psychol Meas 2022; 46:200-218. [PMID: 35528269 PMCID: PMC9073636 DOI: 10.1177/01466216211066601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal.
Collapse
|
203
|
Andersson B, Luo H, Marcq K. Reliability coefficients for multiple group item response theory models. Br J Math Stat Psychol 2022; 75:395-410. [PMID: 35229881 PMCID: PMC9313586 DOI: 10.1111/bmsp.12269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 02/14/2022] [Indexed: 06/14/2023]
Abstract
Reliability of scores from psychological or educational assessments provides important information regarding the precision of measurement. The reliability of scores is however population dependent and may vary across groups. In item response theory, this population dependence can be attributed to differential item functioning or to differences in the latent distributions between groups and needs to be accounted for when estimating the reliability of scores for different groups. Here, we introduce group-specific and overall reliability coefficients for sum scores and maximum likelihood ability estimates defined by a multiple group item response theory model. We derive confidence intervals using asymptotic theory and evaluate the empirical properties of estimators and the confidence intervals in a simulation study. The results show that the estimators are largely unbiased and that the confidence intervals are accurate with moderately large sample sizes. We exemplify the approach with the Montreal Cognitive Assessment (MoCA) in two groups defined by education level and give recommendations for applied work.
Collapse
Affiliation(s)
| | - Hao Luo
- University of Hong KongHong Kong SARChina
| | | |
Collapse
|
204
|
Allott K, Gao CX, Fisher C, Hetrick SE, Filia KM, Menssink JM, Herrman HE, Rickwood DJ, Parker AG, McGorry PD, Cotton SM. The Neuropsychological Symptoms Self-Report: psychometric properties in an adolescent and young adult mental health cohort. Child Adolesc Ment Health 2022; 27:111-121. [PMID: 33913237 DOI: 10.1111/camh.12473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/26/2021] [Indexed: 12/01/2022]
Abstract
BACKGROUND Subjective cognitive symptoms are common in young people receiving mental health treatment and are associated with poorer outcomes. The aim of this study was to determine the psychometric properties of the Neuropsychological Symptoms Self-Report (NSSR), an eight-item measure recently developed to provide a snapshot of young people's perceived change in cognitive functioning in relation to mental health treatment. METHOD The sample included 633 youth aged 12-25 years (Mage = 18.2, 66.5% female, 88.6% Australian-born) who had sought mental health treatment in primary headspace services. At three-month follow-up, participants completed the NSSR and self-report measures of depression and anxiety. RESULTS Excellent internal consistency was found: Cronbach's alpha = 0.93. The NSSR had negative correlations with self-reported anxiety (r = -.33, p < .001) and depression (r = -.48, p < .001) symptoms, suggesting a link with affective symptoms, but still independence of constructs. Exploratory and confirmatory factor analyses supported a single-factor model. Item response theory (IRT) analysis suggested good model fit (homogeneity, data integrity, scalability, local independence and monotonicity) for all items. There was some evidence of measurement noninvariance (for item thresholds) by sex and age, but not diagnosis. IRT models also supported briefer six- and three-item versions of the NSSR. CONCLUSION In busy clinical practice, clinicians need a rapid and reliable method for determining whether cognitive symptoms are of concern and in need of further assessment and treatment. Study findings support the NSSR as a brief, psychometrically sound measure for assessing subjective cognitive functioning in adolescents and young adults receiving mental health treatment.
Collapse
Affiliation(s)
- Kelly Allott
- Orygen, Parkville, Vic., Australia.,Centre for Youth Mental Health, The University of Melbourne, Parkville, Vic., Australia
| | - Caroline X Gao
- Orygen, Parkville, Vic., Australia.,Centre for Youth Mental Health, The University of Melbourne, Parkville, Vic., Australia.,Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
| | - Caroline Fisher
- Department of Psychology, Royal Melbourne Hospital, Melbourne Health, Parkville, Vic., Australia.,The Melbourne Clinic, Richmond, Vic., Australia
| | - Sarah E Hetrick
- Department of Psychological Medicine, University of Auckland, Auckland, New Zealand
| | - Kate M Filia
- Orygen, Parkville, Vic., Australia.,Centre for Youth Mental Health, The University of Melbourne, Parkville, Vic., Australia
| | - Jana M Menssink
- Orygen, Parkville, Vic., Australia.,Centre for Youth Mental Health, The University of Melbourne, Parkville, Vic., Australia
| | - Helen E Herrman
- Orygen, Parkville, Vic., Australia.,Centre for Youth Mental Health, The University of Melbourne, Parkville, Vic., Australia
| | - Debra J Rickwood
- headspace National Youth Mental Health Foundation, Melbourne, Vic., Australia.,Faculty of Health, University of Canberra, Canberra, ACT, Australia
| | - Alexandra G Parker
- Orygen, Parkville, Vic., Australia.,Centre for Youth Mental Health, The University of Melbourne, Parkville, Vic., Australia.,Institute for Health and Sport, Victoria University, Melbourne, Vic., Australia
| | - Patrick D McGorry
- Orygen, Parkville, Vic., Australia.,Centre for Youth Mental Health, The University of Melbourne, Parkville, Vic., Australia
| | - Sue M Cotton
- Orygen, Parkville, Vic., Australia.,Centre for Youth Mental Health, The University of Melbourne, Parkville, Vic., Australia
| |
Collapse
|
205
|
Bolsinova M, Deonovic B, Arieli-Attali M, Settles B, Hagiwara M, Maris G. Measurement of Ability in Adaptive Learning and Assessment Systems when Learners Use On-Demand Hints. Appl Psychol Meas 2022; 46:219-235. [PMID: 35528271 PMCID: PMC9073638 DOI: 10.1177/01466216221084208] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Adaptive learning and assessment systems support learners in acquiring knowledge and skills in a particular domain. The learners' progress is monitored through them solving items matching their level and aiming at specific learning goals. Scaffolding and providing learners with hints are powerful tools in helping the learning process. One way of introducing hints is to make hint use the choice of the student. When the learner is certain of their response, they answer without hints, but if the learner is not certain or does not know how to approach the item they can request a hint. We develop measurement models for applications where such on-demand hints are available. Such models take into account that hint use may be informative of ability, but at the same time may be influenced by other individual characteristics. Two modeling strategies are considered: (1) The measurement model is based on a scoring rule for ability which includes both response accuracy and hint use. (2) The choice to use hints and response accuracy conditional on this choice are modeled jointly using Item Response Tree models. The properties of different models and their implications are discussed. An application to data from Duolingo, an adaptive language learning system, is presented. Here, the best model is the scoring-rule-based model with full credit for correct responses without hints, partial credit for correct responses with hints, and no credit for all incorrect responses. The second dimension in the model accounts for the individual differences in the tendency to use hints.
Collapse
|
206
|
Rios JA. A Comparison of Robust Likelihood Estimators to Mitigate Bias From Rapid Guessing. Appl Psychol Meas 2022; 46:236-249. [PMID: 35528268 PMCID: PMC9073634 DOI: 10.1177/01466216221084371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Rapid guessing (RG) behavior can undermine measurement property and score-based inferences. To mitigate this potential bias, practitioners have relied on response time information to identify and filter RG responses. However, response times may be unavailable in many testing contexts, such as paper-and-pencil administrations. When this is the case, self-report measures of effort and person-fit statistics have been used. These methods are limited in that inferences concerning motivation and aberrant responding are made at the examinee level. As test takers can engage in a mixture of solution and RG behavior throughout a test administration, there is a need to limit the influence of potential aberrant responses at the item level. This can be done by employing robust estimation procedures. Since these estimators have received limited attention in the RG literature, the objective of this simulation study was to evaluate ability parameter estimation accuracy in the presence of RG by comparing maximum likelihood estimation (MLE) to two robust variants, the bisquare and Huber estimators. Two RG conditions were manipulated, RG percentage (10%, 20%, and 40%) and pattern (difficulty-based and changing state). Contrasted to the MLE procedure, results demonstrated that both the bisquare and Huber estimators reduced bias in ability parameter estimates by as much as 94%. Given that the Huber estimator showed smaller standard deviations of error and performed equally as well as the bisquare approach under most conditions, it is recommended as a promising approach to mitigating bias from RG when response time information is unavailable.
Collapse
Affiliation(s)
- Joseph A. Rios
- Educational Psychology, University of Minnesota, Minneapolis, MN, USA
- Joseph A. Rios, Educational Psychology, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
207
|
Edelen MO, Rodriguez A, Huang W, Gramling R, Ahluwalia SC. A novel Scale to Assess Palliative Care Patients' Experience of Feeling Heard and Understood. J Pain Symptom Manage 2022; 63:689-697.e1. [PMID: 35017018 DOI: 10.1016/j.jpainsymman.2022.01.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 01/03/2022] [Accepted: 01/04/2022] [Indexed: 02/06/2023]
Abstract
CONTEXT Patient experience of palliative care serves as an important indicator of quality and patient-centeredness. OBJECTIVES To develop a novel patient-reported scale measuring ambulatory palliative care patients' experience of feeling heard and understood by their providers. METHODS We used self-reported patient experience data collected via mixed-mode survey administration. We conducted an exploratory factor analysis (EFA) and an expert panel ranking exercise to reduce the 10-item set based on underlying dimensionality. We then used item response theory (IRT) to calibrate remaining items based on psychometric properties and test information and precision. We considered item-level fit and examined the standardized local dependence chi-square statistics. We evaluated candidate items for differential item functioning by survey mode. We evaluated the test-retest reliability and validity of the final scale. RESULTS The EFA yielded a single factor (9/10 items had loadings > 0.80 on the single factor). We removed two items with the lowest factor loadings and ranked by the expert panel as being least reflective of the overall construct. IRT calibration of the remaining eight items showed high slopes (range 2.66 - 5.18); location parameters were all negative (range -0.90 - -0.36). We removed two more items based on local dependence indices and item-level fit. Combining psychometric information with the expert ratings we established the final 4-item scale, which was reliable (Cronbach's alpha = 0.84; polychoric correlation coefficient = 0.72) and had good convergent validity. CONCLUSIONS This novel multi-item Feeling Heard and Understood scale can be used to measure and improve ambulatory palliative care patient experience.
Collapse
Affiliation(s)
- Maria O Edelen
- Behavioral & Policy Sciences Department, RAND Corporation, Boston, Massachusetts, USA
| | - Anthony Rodriguez
- Behavioral & Policy Sciences Department, RAND Corporation, Boston, Massachusetts, USA
| | - Wenjing Huang
- Behavioral & Policy Sciences Department, RAND Corporation, Boston, Massachusetts, USA
| | - Robert Gramling
- University of Vermont, Department of Family Medicine, Burlington, Vermont
| | - Sangeeta C Ahluwalia
- Behavioral & Policy Sciences Department, RAND Corporation, Santa Monica, California, USA.
| |
Collapse
|
208
|
Lim ZX, Chua WL, Lim WS, Lim AQ, Chua KC, Chan EY. Psychometrics of the Pearlin Mastery Scale among Family Caregivers of Older Adults Who Require Assistance in Activities of Daily Living. Int J Environ Res Public Health 2022; 19:4639. [PMID: 35457504 DOI: 10.3390/ijerph19084639] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/05/2022] [Accepted: 04/08/2022] [Indexed: 11/17/2022]
Abstract
This study examined the psychometric properties of the seven-item mastery scale among 392 family caregivers of care dependent older adults in a tertiary hospital in Singapore. Item response theory (IRT) analysis and confirmatory factor analysis (CFA) were used to assess the scale's psychometric properties. Construct validity was assessed based on correlations between mastery and caregiver burden, depression, and quality of life. Data from the seven-item mastery scale showed acceptable reliability and model fit while IRT analysis showed that response categories were ordered but reflected poor fit for the two positively worded items. Without these two items, responses on the five-item version showed acceptable model fit and had acceptable reliability and high correlation with those on the seven-item version. Item responses on both the seven- and five-item versions show logical correlations with carer self-report on burden, depression, and quality of life. Further psychometric studies of the seven-item mastery scale are warranted. For practical applications such as caregiver screening during hospital admissions, the five-item mastery scale is fit for purpose.
Collapse
|
209
|
Peasgood T, Mukuria C, Brazier J, Marten O, Kreimeier S, Luo N, Mulhern B, Greiner W, Pickard AS, Augustovski F, Engel L, Gibbons L, Yang Z, Monteiro AL, Kuharic M, Belizan M, Bjørner J. Developing a New Generic Health and Wellbeing Measure: Psychometric Survey Results for the EQ-HWB. Value Health 2022; 25:525-533. [PMID: 35365299 DOI: 10.1016/j.jval.2021.11.1361] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/29/2021] [Accepted: 11/04/2021] [Indexed: 06/14/2023]
Abstract
OBJECTIVES The development of measures such as the EQ-HWB (EQ Health and Wellbeing) requires selection of items. This study explored the psychometric performance of candidate items, testing their validity in patients, social carer users, and carers. METHODS Article and online surveys that included candidate items (N = 64) were conducted in Argentina, Australia, China, Germany, United Kingdom, and the United States. Psychometric assessment on missing data, response distributions, and known group differences was undertaken. Dimensionality was explored using exploratory and confirmatory factor analysis. Poorly fitting items were identified using information functions, and the function of each response category was assessed using category characteristic curves from item response theory (IRT) models. Differential item functioning was tested across key subgroups. RESULTS There were 4879 respondents (Argentina = 508, Australia = 514, China = 497, Germany = 502, United Kingdom = 1955, United States = 903). Where missing data were allowed, it was low (UK article survey 2.3%; US survey 0.6%). Most items had responses distributed across all levels. Most items could discriminate between groups with known health conditions with moderate to large effect sizes. Items were less able to discriminate across carers. Factor analysis found positive and negative measurement factors alongside the constructs of interest. For most of the countries apart from China, the confirmatory factor analysis model had good fit with some minor modifications. IRT indicated that most items had well-functioning response categories but there was some evidence of differential item functioning in many items. CONCLUSIONS Items performed well in classical psychometric testing and IRT. This large 6-country collaboration provided evidence to inform item selection for the EQ-HWB measure.
Collapse
Affiliation(s)
- Tessa Peasgood
- Melbourne School of Population and Global Health, University of Melbourne, Victoria, Australia; School of Health and Related Research, University of Sheffield, Sheffield, England, UK.
| | - Clara Mukuria
- School of Health and Related Research, University of Sheffield, Sheffield, England, UK
| | - John Brazier
- School of Health and Related Research, University of Sheffield, Sheffield, England, UK
| | - Ole Marten
- Department of Health Economics and Health Care Management, School of Public Health, Bielefeld University, Bielefeld, Germany
| | - Simone Kreimeier
- Department of Health Economics and Health Care Management, School of Public Health, Bielefeld University, Bielefeld, Germany
| | - Nan Luo
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | - Brendan Mulhern
- Centre for Health Economics Research and Evaluation, University of Technology Sydney, New South Wales, Australia
| | - Wolfgang Greiner
- Department of Health Economics and Health Care Management, School of Public Health, Bielefeld University, Bielefeld, Germany
| | - A Simon Pickard
- Department of Pharmacy Systems, Outcomes and Policy, College of Pharmacy, University of Illinois Chicago, Chicago, IL, USA
| | | | - Lidia Engel
- Deakin Health Economics, School of Health and Social Development, Deakin University, Geelong, Australia
| | - Luz Gibbons
- Institute for Clinical Effectiveness and Health Policy, Buenos Aires, Argentina
| | - Zhihao Yang
- Health Services Management Department, Guizhou Medical University, Guiyang, China
| | - Andrea L Monteiro
- Department of Pharmacy Systems, Outcomes and Policy, College of Pharmacy, University of Illinois Chicago, Chicago, IL, USA
| | - Maja Kuharic
- Department of Pharmacy Systems, Outcomes and Policy, College of Pharmacy, University of Illinois Chicago, Chicago, IL, USA
| | - Maria Belizan
- Institute for Clinical Effectiveness and Health Policy, Buenos Aires, Argentina
| | | |
Collapse
|
210
|
Abstract
Test fairness is critical to the validity of group comparisons involving gender, ethnicities, culture, or treatment conditions. Detection of differential item functioning (DIF) is one component of efforts to ensure test fairness. The current study compared four treatments for items that have been identified as showing DIF: deleting, ignoring, multiple-group modeling, and modeling DIF as a secondary dimension. Results of this study provide indications about which approach could be applied for items showing DIF for a wide range of testing environments requiring reliable treatment.
Collapse
|
211
|
Liu DT, Phillips KM, Houssein FA, Speth MM, Besser G, Mueller CA, Sedaghat AR. Dedicated Olfaction and Taste Items do not Improve Psychometric Performance of the SNOT-22. Laryngoscope 2022; 132:1644-1651. [PMID: 35353381 PMCID: PMC9544569 DOI: 10.1002/lary.30120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 02/15/2022] [Accepted: 03/16/2022] [Indexed: 11/16/2022]
Abstract
Objective Previous work has shown the chemosensory dysfunction item of the 22‐item Sinonasal Outcome Test (SNOT‐22) that assesses problems with “taste/smell” has poor psychometric performance compared with other items on the SNOT‐22, which we have hypothesized is due to the simultaneous assessment of two different senses. Our aim was to determine whether distinct smell and taste items in the SNOT‐22 would improve psychometric performance. Methods One hundred and eighty‐one CRS patients were recruited and completed the SNOT‐22. Additional items querying problems with the senses of “smell” and “taste,” using the same response scale and recall period were given to study participants. Item response theory (IRT) was used to determine IRT parameters, including item discrimination, difficulty, and information provided by each SNOT‐22 item. Results Confirming previous studies, the chemosensory item of the SNOT‐22 (reflecting “taste/smell”) had poor psychometric performance. Use of a distinct smell or taste item instead of the combined “taste/smell” item did not improve psychometric performance. However, a dedicated smell question resulted in a left shift of threshold parameters, showing that the dedicated smell item better captures moderate CRS disease burden than the original taste/smell item of the SNOT‐22, which by virtue of near‐identical IRT parameters appears to more greatly reflect problems with taste. Conclusions A dedicated smell‐ or taste‐specific item, rather than the combined “taste/smell” item currently in the SNOT‐22 does not provide significantly greater psychometric performance. However, a dedicated smell item may better capture moderate CRS disease burden compared with the current chemosensory item on the SNOT‐22. Laryngoscope, 132:1644–1651, 2022
Collapse
Affiliation(s)
- David T Liu
- Department of Otorhinolaryngology, Head and Neck Surgery, Medical University of Vienna, Vienna, Austria
| | - Katie M Phillips
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati College of Medicine, Cincinnati, Ohio, U.S.A
| | - Firas A Houssein
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati College of Medicine, Cincinnati, Ohio, U.S.A
| | - Marlene M Speth
- Department of Otorhinolaryngology, Kantonsspital Aarau, Aarau, Switzerland
| | - Gerold Besser
- Department of Otorhinolaryngology, Head and Neck Surgery, Medical University of Vienna, Vienna, Austria
| | - Christian A Mueller
- Department of Otorhinolaryngology, Head and Neck Surgery, Medical University of Vienna, Vienna, Austria
| | - Ahmad R Sedaghat
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati College of Medicine, Cincinnati, Ohio, U.S.A
| |
Collapse
|
212
|
Otto C, Kaman A, Barkmann C, Döpfner M, Görtz-Dorten A, Ginsberg C, Zaplana Labarga S, Treier AK, Roessner V, Hanisch C, Koelch M, Banaschewski T, Ravens-Sieberer U. The DADYS-Screen: Development and Evaluation of a Screening Tool for Affective Dysregulation in Children. Assessment 2022; 30:1080-1094. [PMID: 35301874 PMCID: PMC10152573 DOI: 10.1177/10731911221082709] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Affective dysregulation (AD) in children is characterized by persistent irritability and severe temper outbursts. This study developed and evaluated a screening questionnaire for AD in children. The development included the generation of an initial item pool from existing instruments, a Delphi rating of experts, focus groups with experts and parents, and psychometric analyses of clinical and population-based samples. Based on data of a large community-based study, the final screening questionnaire was developed (n = 771; 49.7 % female; age M = 10.02 years; SD = 1.34) and evaluated (n = 8,974; 48.7 % female; age M = 10.00 years; SD = 1.38) with methods from classical test theory and item response theory. The developed DADYS-Screen (Diagnostic Tool for Affective Dysregulation in Children-Screening Questionnaire) includes 12 items with good psychometric properties and scale characteristics including a good fit to a one-factorial model in comparison to the baseline model, although only a "mediocre" fit according to the root mean square error of approximation (RMSEA). Results could be confirmed using a second and larger data set. Overall, the DADYS-Screen is able to identify children with AD, although it needs further investigation using clinical data.
Collapse
Affiliation(s)
| | - Anne Kaman
- University Medical Center Hamburg-Eppendorf, Germany
| | | | | | | | | | | | | | | | | | - Michael Koelch
- University of Ulm, Germany.,Rostock University Medical Center, Germany
| | | | | |
Collapse
|
213
|
Brandão T, Brites R, Hipólito J, Nunes O. The Emotion Regulation Goals Scale: Advancing its psychometric properties using item response theory analysis. J Clin Psychol 2022; 78:1940-1957. [PMID: 35294783 DOI: 10.1002/jclp.23343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 01/20/2022] [Accepted: 02/25/2022] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Emotion goals are considered paramount for influencing the initiation, maintenance, and cessation of emotion regulation. Recently, some authors developed an instrument to assess emotion goals-the Emotion Regulation Goals Scale (ERGS). METHOD This study included two studies aimed to examine the psychometric properties of the ERGS in two Portuguese samples: 400 adults from the community (76% women; M age = 37.10) (Study 1) and 205 university students (80% women; M age = 21.72) (Study 2). RESULTS Confirmatory factor analysis (CFA) and item response theory (IRT) analysis were used to examine the psychometric properties of the ERGS in the two samples. The CFA confirmed the five-factor structure of the 18-item ERGS, but the analyses of both studies suggested the elimination of two items given their lower loadings/low discrimination. CONCLUSIONS A five-factor structure with 16 items was proposed, with good reliability and with evidence of construct validity with relevant constructs.
Collapse
Affiliation(s)
- Tânia Brandão
- CIP-UAL, Departamento de Psicologia, Universidade Autónoma de Lisboa Luís de Camões, Lisboa, Portugal.,CPUP-Center for Psychology at University of Porto, Porto, Portugal
| | - Rute Brites
- CIP-UAL, Departamento de Psicologia, Universidade Autónoma de Lisboa Luís de Camões, Lisboa, Portugal
| | - João Hipólito
- CIP-UAL, Departamento de Psicologia, Universidade Autónoma de Lisboa Luís de Camões, Lisboa, Portugal
| | - Odete Nunes
- CIP-UAL, Departamento de Psicologia, Universidade Autónoma de Lisboa Luís de Camões, Lisboa, Portugal
| |
Collapse
|
214
|
Nam JH, Kim EJ, Cho EH. Sport Psychological Skill Factors and Scale Development for Taekwondo Athletes. Int J Environ Res Public Health 2022; 19:ijerph19063433. [PMID: 35329120 PMCID: PMC8955023 DOI: 10.3390/ijerph19063433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/23/2022] [Accepted: 03/02/2022] [Indexed: 02/05/2023]
Abstract
The purpose of this study was to identify the sport psychological skills of Taekwondo athletes and to develop a scale measuring such skills. We collected preliminary data using an open-ended online survey targeting Taekwondo athletes from nine countries (South Korea, China, Malaysia, United States, Spain, France, Brazil, United Kingdom, and Taiwan) who participated in international competitions between 2019 and 2020. We extracted participants’ sport psychological skills from 75 survey responses, guided by expert meetings and a thorough literature review. We verified our Taekwondo psychological skill scale’s construct validity using 840 survey responses. We utilized V coefficients, parallel analysis, an exploratory structural equation model, maximum likelihood, confirmatory factor analysis, and multi-group confirmatory factor analysis for data analysis. We identified six core sport psychological skills: “goal setting,” “confidence,” “imagery,” “self-talk,” “fighting spirit,” and “concentration.” Our final measure, which demonstrated evidence of reliability and validity, comprises 18 items spanning 6 factors, with each item rated on a 3-point Likert scale.
Collapse
Affiliation(s)
- Jung-Hoon Nam
- Department of Sports Healthcare, Catholic Kwandong University, Gangneung 25601, Korea;
| | - Eung-Joon Kim
- Department of Physical Education, Korea National Sport University, Seoul 05541, Korea;
| | - Eun-Hyung Cho
- Department of Sport Science, Korea Institute of Sport Science, Seoul 01794, Korea
- Correspondence:
| |
Collapse
|
215
|
Ray JV. Differential Item Functioning of the Youth Psychopathic Traits Inventory Across Race/Ethnicity and Gender Among a Sample of Justice-Involved Youth: An Item Response Theory Analysis. Assessment 2022; 30:1009-1027. [PMID: 35245976 DOI: 10.1177/10731911221077230] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Research is yet to examine whether the items of the Youth Psychopathic Traits Inventory (YPI) function equally well across race/ethnicity and gender. The current study applies an item response theory analysis to detect differential item functioning (DIF) of the YPI subscale across White, Black, and Hispanic youth and males and females among a sample of justice-involved youth. Significant DIF was detected for several items between Black youth and White youth and Black youth and Hispanic youth. Few incidences of DIF emerged between White and Hispanic youth and between males and females. The findings suggest that subscales of the YPI provide more information for White and Hispanic youth compared with Black youth. They also suggest that while there was significant DIF in the difficulty of items, the direction of DIF did not substantially favor one group or another. Thus, the findings suggest that the YPI produces comparable estimates of psychopathic traits for females and males and for White and Hispanic youth. However, the results raise concerns about comparing YPI subscale scores between White and Black youth and Hispanic and Black youth. The findings have important implications for the use of the YPI subscales among diverse samples.
Collapse
|
216
|
Nelson LD, Magnus BE, Temkin NR, Dikmen S, Manley GT, Balsis S. How Do Scores on the Functional Status Examination (FSE) Correspond to Scores on the Glasgow Outcome Scale-Extended (GOSE)? Neurotrauma Rep 2022; 3:122-128. [PMID: 35403101 PMCID: PMC8985527 DOI: 10.1089/neur.2021.0057] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
This study was designed to determine how raw scores correspond between two alternative measures of functional recovery from traumatic brain injury (TBI), the Functional Status Examination (FSE) and the Glasgow Outcome Scale-Extended (GOSE). Using data from 357 persons with moderate-severe TBI who participated in a large clinical trial, we performed item response theory analysis to characterize the relationship between functional ability measured by the FSE and GOSE at 6 months post-injury. Results revealed that raw scores for the FSE and GOSE can be linked, and a table is provided to translate scores from one instrument to the other. For example, a FSE score of 7 (on its 0-21 scale, where higher scores reflect more impairment) is equivalent to a GOSE score of 6 (where GOSE is scaled on an 8-point scale, with higher scores reflecting less impairment). These results allow clinicians or researchers who have a score for a person on one instrument to cross-reference it to a score on the other instrument. Importantly, this enables researchers to combine data sets where some persons only completed the GOSE and some only the FSE. In addition, an investigator could save participant time by eliminating one instrument from a battery of tests, yet still retain a score on that instrument for each participant. More broadly, the findings help anchor scores from these two instruments to the broader continuum of injury-related functional limitations.
Collapse
Affiliation(s)
- Lindsay D. Nelson
- Department of Neurosurgery and Neurology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,*Address correspondence to: Lindsay D. Nelson, PhD, Departments of Neurosurgery and Neurology, Medical College of Wisconsin, 8701 West Watertown Plank Road, Milwaukee, WI 53226, USA;
| | - Brooke E. Magnus
- Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts, USA
| | - Nancy R. Temkin
- Department of Neurological Surgery and Biostatistics, University of Washington, Seattle, Washington, USA
| | - Sureyya Dikmen
- Department of Rehabilitation Medicine, University of Washington, Seattle, Washington, USA
| | - Geoffrey T. Manley
- Department of Neurological Surgery, University of California San Francisco, San Francisco, California, USA
| | - Steve Balsis
- Department of Psychology, University of Massachusetts Lowell, Lowell, Massachusetts, USA
| |
Collapse
|
217
|
Bergner Y, Halpin P, Vie JJ. Multidimensional Item Response Theory in the Style of Collaborative Filtering. Psychometrika 2022; 87:266-288. [PMID: 34698979 DOI: 10.1007/s11336-021-09788-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 06/25/2021] [Indexed: 06/13/2023]
Abstract
This paper presents a machine learning approach to multidimensional item response theory (MIRT), a class of latent factor models that can be used to model and predict student performance from observed assessment data. Inspired by collaborative filtering, we define a general class of models that includes many MIRT models. We discuss the use of penalized joint maximum likelihood to estimate individual models and cross-validation to select the best performing model. This model evaluation process can be optimized using batching techniques, such that even sparse large-scale data can be analyzed efficiently. We illustrate our approach with simulated and real data, including an example from a massive open online course. The high-dimensional model fit to this large and sparse dataset does not lend itself well to traditional methods of factor interpretation. By analogy to recommender-system applications, we propose an alternative "validation" of the factor model, using auxiliary information about the popularity of items consulted during an open-book examination in the course.
Collapse
Affiliation(s)
- Yoav Bergner
- Steinhardt School of Culture, Education, and Human Development, New York University, 82 Washington Square East, New York, NY, 10003, USA
| | - Peter Halpin
- School of Education, Peabody Hall, University of North Carolina-Chapel Hill, Office 111, Chapel Hill, NC, 27599-3500, USA
| | - Jill-Jênn Vie
- Inria, UMR 9189 CRIStAL, 40 avenue Halley, 59650, Villeneuve-d'Ascq, France.
| |
Collapse
|
218
|
Joo SH, Lee P, Stark S. Bayesian Approaches for Detecting Differential Item Functioning Using the Generalized Graded Unfolding Model. Appl Psychol Meas 2022; 46:98-115. [PMID: 35281341 PMCID: PMC8908411 DOI: 10.1177/01466216211066606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Differential item functioning (DIF) analysis is one of the most important applications of item response theory (IRT) in psychological assessment. This study examined the performance of two Bayesian DIF methods, Bayes factor (BF) and deviance information criterion (DIC), with the generalized graded unfolding model (GGUM). The Type I error and power were investigated in a Monte Carlo simulation that manipulated sample size, DIF source, DIF size, DIF location, subpopulation trait distribution, and type of baseline model. We also examined the performance of two likelihood-based methods, the likelihood ratio (LR) test and Akaike information criterion (AIC), using marginal maximum likelihood (MML) estimation for comparison with past DIF research. The results indicated that the proposed BF and DIC methods provided well-controlled Type I error and high power using a free-baseline model implementation, their performance was superior to LR and AIC in terms of Type I error rates when the reference and focal group trait distributions differed. The implications and recommendations for applied research are discussed.
Collapse
|
219
|
Grittner U, Bloomfield K, Kuntsche S, Callinan S, Stanesby O, Gmel G. Improving measurement of harms from others' drinking: Using item-response theory to scale harms from others' heavy drinking in 10 countries. Drug Alcohol Rev 2022; 41:577-587. [PMID: 34460976 PMCID: PMC8882707 DOI: 10.1111/dar.13377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 06/05/2021] [Accepted: 07/14/2021] [Indexed: 11/29/2022]
Abstract
INTRODUCTION The heavy drinking of others may negatively affect an individual on several dimensions of life. Until now, there is scarce research about how to judge the severity of various experiences of such harms. This study aims to empirically scale the severity of such harm items and to determine who is at most risk of these harms. METHODS We used population-based survey data from 10 countries of the GENAHTO project (Gender and Alcohol's Harms to Others, data collection: 2011-2016). Questions about harms from others' drinking asked about verbal and physical harm, damage of belongings, traffic accidents, harassment, threatening behaviour, family and financial problems. We used item response theory methods (IRT) to scale severity of the aforementioned items. To acknowledge culturally based variations in different countries, we assessed 'differential item functioning'. RESULTS The items 'family problems', 'financial problems' and 'clothes and property damage' as well as 'physical harm' were scaled as more severe in most countries compared to other items. Substantial differential item functioning was present in more than half of the country pairings. The item 'financial problems' was most often differentially scaled. Younger people who drank more, as well as women (compared to men), reported more harm. DISCUSSION AND CONCLUSIONS Using IRT, we were able to evaluate grades of severity in harms from others' drinking. IRT scaling yielded in similar rankings of items as reported from other studies. However, empirical scaling allows for more differentiated severity scaling than simple summary scores and is more sensitive to cultural differences.
Collapse
Affiliation(s)
- Ulrike Grittner
- Institute of Biometry and Clinical Epidemiology, Charité – Universitätsmedizin Berlin, Berlin, Germany, Berlin Institute of Health, Berlin, Germany
| | - Kim Bloomfield
- Institute of Biometry and Clinical Epidemiology, Charité – Universitätsmedizin Berlin, Berlin, Germany, Berlin Institute of Health, Berlin, Germany, Centre for Alcohol and Drug Research, Aarhus University, Copenhagen, Denmark., Health Promotion Department of Public Health, University of Southern Denmark, Esbjerg, Denmark, Alcohol Research Group, Public Health Institute, Emeryville, USA
| | - Sandra Kuntsche
- Centre for Alcohol Policy Research, School of Psychology and Public Health, La Trobe University, Melbourne, Australia
| | - Sarah Callinan
- Centre for Alcohol Policy Research, School of Psychology and Public Health, La Trobe University, Melbourne, Australia
| | - Oliver Stanesby
- Centre for Alcohol Policy Research, School of Psychology and Public Health, La Trobe University, Melbourne, Australia
| | - Gerhard Gmel
- Alcohol Treatment Centre, Lausanne University Hospital CHUV, Lausanne, Switzerland., Addiction Switzerland, Research Department, Lausanne, Switzerland., Centre for Addiction and Mental Health, Institute for Mental Health Policy Research, Toronto, Canada., University of the West of England, Faculty of Health and Applied Science, Bristol, United Kingdom
| |
Collapse
|
220
|
Chen X, Hu W, Hu Y, Xia X, Li X. Discrimination and structural validity evaluation of Zung self-rating depression scale for pregnant women in China. J Psychosom Obstet Gynaecol 2022; 43:26-34. [PMID: 32498640 DOI: 10.1080/0167482x.2020.1770221] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
PURPOSE The applicability of the Zung self-rating depression scale (SDS) in pregnancy is unknown. We aimed to identify redundant items and evaluate the Zung SDS's structural validity. METHOD Two samples of pregnant women were invited from two districts in Shanghai (Yangpu sample, n = 6468 and Huangpu sample, n = 402). The Yangpu sample was randomly split into YGroup1/2/3. Item's properties were evaluated via the item response theory in YGroup1. Exploratory and confirmatory factor analyses were correspondingly executed in YGroup2 and YGroup3. Those items with discrimination parameter (α) lower than 0.65 or factor loading smaller than 0.4 were deleted from the scale. The final structure was validated in the Huangpu sample. RESULTS Items 4 (sleep), 7 (weight loss), 8 (constipation) and 9 (tachyarrhythmia) exhibited low discrimination power. Items 2 (diurnal variation), 5 (appetite), 10 (fatigue) and 19 (suicide idea) made a low contribution to all factors. A three-factor model was eventually constructed as cognitive (Items 14, 16, 17, 18 and 20), psychomotor (Items 6, 11 and 12) and affective (Items 1, 3, 13 and 15). CONCLUSION The Zung SDS needs modification before applied to pregnant women in China. The items describing the overlap symptoms of the physical change in pregnancy and mood disorder should be deleted.
Collapse
Affiliation(s)
- Xinning Chen
- Obstetrics and Gynecology Hospital of Fudan University, Shanghai, China
| | - Weihong Hu
- Obstetrics and Gynecology Hospital of Fudan University, Shanghai, China
| | - Yao Hu
- Shanghai Mental Health Center, Shanghai, China
| | - Xian Xia
- Obstetrics and Gynecology Hospital of Fudan University, Shanghai, China
| | - Xiaotian Li
- Obstetrics and Gynecology Hospital of Fudan University, Shanghai, China.,Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Shanghai, China
| |
Collapse
|
221
|
Stenhaug BA, Domingue BW. Predictive Fit Metrics for Item Response Models. Appl Psychol Meas 2022; 46:136-155. [PMID: 35281339 PMCID: PMC8908407 DOI: 10.1177/01466216211066603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The fit of an item response model is typically conceptualized as whether a given model could have generated the data. In this study, for an alternative view of fit, "predictive fit," based on the model's ability to predict new data is advocated. The authors define two prediction tasks: "missing responses prediction"-where the goal is to predict an in-sample person's response to an in-sample item-and "missing persons prediction"-where the goal is to predict an out-of-sample person's string of responses. Based on these prediction tasks, two predictive fit metrics are derived for item response models that assess how well an estimated item response model fits the data-generating model. These metrics are based on long-run out-of-sample predictive performance (i.e., if the data-generating model produced infinite amounts of data, what is the quality of a "model's predictions on average?"). Simulation studies are conducted to identify the prediction-maximizing model across a variety of conditions. For example, defining prediction in terms of missing responses, greater average person ability, and greater item discrimination are all associated with the 3PL model producing relatively worse predictions, and thus lead to greater minimum sample sizes for the 3PL model. In each simulation, the prediction-maximizing model to the model selected by Akaike's information criterion, Bayesian information criterion (BIC), and likelihood ratio tests are compared. It is found that performance of these methods depends on the prediction task of interest. In general, likelihood ratio tests often select overly flexible models, while BIC selects overly parsimonious models. The authors use Programme for International Student Assessment data to demonstrate how to use cross-validation to directly estimate the predictive fit metrics in practice. The implications for item response model selection in operational settings are discussed.
Collapse
|
222
|
Kim S, Kolen MJ. Scale Linking for the Testlet Item Response Theory Model. Appl Psychol Meas 2022; 46:79-97. [PMID: 35281343 PMCID: PMC8908412 DOI: 10.1177/01466216211063234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In their 2005 paper, Li and her colleagues proposed a test response function (TRF) linking method for a two-parameter testlet model and used a genetic algorithm to find minimization solutions for the linking coefficients. In the present paper the linking task for a three-parameter testlet model is formulated from the perspective of bi-factor modeling, and three linking methods for the model are presented: the TRF, mean/least squares (MLS), and item response function (IRF) methods. Simulations are conducted to compare the TRF method using a genetic algorithm with the TRF and IRF methods using a quasi-Newton algorithm and the MLS method. The results indicate that the IRF, MLS, and TRF methods perform very well, well, and poorly, respectively, in estimating the linking coefficients associated with testlet effects, that the use of genetic algorithms offers little improvement to the TRF method, and that the minimization function for the TRF method is not as well-structured as that for the IRF method.
Collapse
|
223
|
Ho SYC, Chien TW, Shao Y, Hsieh JH. Visualizing the features of inflection point shown on a temporal bar graph using the data of COVID-19 pandemic. Medicine (Baltimore) 2022; 101:e28749. [PMID: 35119031 PMCID: PMC8812627 DOI: 10.1097/md.0000000000028749] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 01/13/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Exponential-like infection growth leading to peaks (denoted by inflection points [IP] or turning points) is usually the hallmark of infectious disease outbreaks, including coronaviruses. To determine the IPs of the novel coronavirus (COVID-19), we applied the item response theory model to detect phase transitions for each country/region and characterize the IP feature on the temporal bar graph (TBG). METHODS The IP (using the item difficulty parameter to locate) was verified by the differential equation in calculus and interpreted by the TBG with 2 virtual and real empirical data (i.e., from Collatz conjecture and COVID-19 pandemic in 2020). Comparisons of IPs, R2, and burst strength [BS = ln() denoted by the infection number at IP(Nip) and the item slope parameter(a) in item response theory were made for countries/regions and continents on the choropleth map and the forest plot. RESULTS We found that the evolution of COVID-19 on the TBG makes the data clear and easy to understand, the shorter IP (=53.9) was in China and the longest (=247.3) was in Europe, and the highest R2 (as the variance explained by the model) was in the US, with a mean R2 of 0.98. We successfully estimated the IPs for countries/regions on COVID-19 in 2020 and presented them on the TBG. CONCLUSION Temporal visualization is recommended for researchers in future relevant studies (e.g., the evolution of keywords in a specific discipline) and is not merely limited to the IP search in COVID-19 pandemics as we did in this study.
Collapse
Affiliation(s)
- Sam Yu-Chieh Ho
- Department of Emergency Medicine, Chi-Mei Medical Center, Tainan, Taiwan
| | - Tsair-Wei Chien
- Department of Medical Research, Chiali Chi-Mei Medical Center, Tainan, Taiwan
| | - Yang Shao
- School of Economics, Jiaxing University, Jiaxing, China
| | - Ju-Hao Hsieh
- Department of Emergency Medicine, Chi-Mei Medical Center, Tainan, Taiwan
| |
Collapse
|
224
|
Johansson S, Lövheim H, Olofsson B, Gustafson Y, Niklasson J. A clinically feasible short version of the 15-item geriatric depression scale extracted using item response theory in a sample of adults aged 85 years and older. Aging Ment Health 2022; 26:431-437. [PMID: 33554652 DOI: 10.1080/13607863.2021.1881759] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
OBJECTIVES To extract the items most suitable for a short version of the 15-item Geriatric Depression Scale (GDS-15) in a sample of adults aged ≥ 85 years using item response theory (IRT). METHOD This population-based cross-sectional study included 651 individuals aged ≥ 85 years from the Umeå 85+/GErontological Regional DAtabase (GERDA) study. Participants were either community dwelling (approximately 70%) or resided in institutional care (approximately 30%) in northern Sweden and western Finland in 2000-2002 and 2005-2007. The psychometric properties of GDS-15 items were investigated using an IRT-based approach to find items most closely corresponding to the GDS-15 cut off value of ≥5 points. Receiver operating characteristic curves were used to compare the performance of the proposed short version with that of previously proposed short GDS versions. RESULTS GDS-15 items 3, 8, 12, and 13 best differentiated respondents' levels of depressive symptoms corresponding to the GDS-15 cut off value of ≥5, regardless of age or sex, and thus comprise the proposed short version of the scale (GDS-4 GERDA). For the identification of individuals with depression (total GDS-15 score ≥ 5), the GDS-4 GERDA with a cut-off score of ≥2 had 92.9% sensitivity and 85.0% specificity. CONCLUSION The GDS-4 GERDA could be used as an optimized short version of the GDS-15 to screen for depression among adults aged ≥ 85 years.
Collapse
Affiliation(s)
- Sanna Johansson
- Department of Community Medicine and Rehabilitation, Geriatric Medicine, Sunderby Research Unit, Umeå University, Umeå, Sweden
| | - Hugo Lövheim
- Department of Community Medicine and Rehabilitation, Geriatric Medicine, Umeå University, Umeå, Sweden
| | | | - Yngve Gustafson
- Department of Community Medicine and Rehabilitation, Geriatric Medicine, Umeå University, Umeå, Sweden
| | - Johan Niklasson
- Department of Community Medicine and Rehabilitation, Geriatric Medicine, Sunderby Research Unit, Umeå University, Umeå, Sweden
| |
Collapse
|
225
|
Verkuilen J. The Fisher information function and scoring in binary ideal point item response models: a cautionary tale. Br J Math Stat Psychol 2022; 75:182-197. [PMID: 34687451 DOI: 10.1111/bmsp.12254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 08/20/2021] [Indexed: 06/13/2023]
Abstract
This article examines the Fisher information functions, I(θ) , and explores implications for scoring of binary ideal point item response models. These models typically appear to have I(θ) that are bimodal and identically equal to 0 at the ideal point. The article shows that this is an inherent property of ideal point IRT models, which either have this property or are indeterminate and thus violate the likelihood regularity conditions. For some models, the indeterminacy can be resolved, generating an effectively unimodal I(θ) , albeit with violated regularity conditions. In other cases, I(θ) diverges. All reasonable ideal point IRT models exhibit this behaviour. Users should exercise caution when relying on asymptotics, particularly for shorter assessments. Use of simulated plausible values or prediction from a fully Bayesian estimation is recommended for scoring.
Collapse
Affiliation(s)
- Jay Verkuilen
- Ph.D. Program in Educational Psychology, The City University of New York Graduate Center, New York, USA
| |
Collapse
|
226
|
Lyu W, Bolt DM. A psychometric model for respondent-level anchoring on self-report rating scale instruments. Br J Math Stat Psychol 2022; 75:116-135. [PMID: 34350978 DOI: 10.1111/bmsp.12251] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 07/14/2021] [Indexed: 06/13/2023]
Abstract
Among the various forms of response bias that can emerge with self-report rating scale assessments are those related to anchoring, the tendency for respondents to select categories in close proximity to the rating category used for the immediately preceding item. In this study we propose a psychometric model based on a multidimensional nominal model for response style that also simultaneously accommodates a respondent-level anchoring tendency. The model is estimated using a fully Bayesian estimation procedure. By applying this model to a real test data set measuring extraversion, we explore a theory that both response styles and anchoring might be viewed as evidence of a lack of effortful responding. Empirical results show that there is a positive correlation between the strength of midpoint response style and the anchoring effect; further, responses indicative of either anchoring or response style both negatively correlate with response time, consistent with a theory that both phenomena reflect reduced respondent effort. The results support attending to both anchoring and midpoint response style as ways of assessing respondent engagement.
Collapse
Affiliation(s)
- Weicong Lyu
- Department of Educational Psychology, University of Wisconsin-Madison, Wisconsin, USA
| | - Daniel M Bolt
- Department of Educational Psychology, University of Wisconsin-Madison, Wisconsin, USA
| |
Collapse
|
227
|
Poulton A, Rutherford K, Boothe S, Brygel M, Crole A, Dali G, Bruns LR, Sinnott RO, Hester R. Evaluating untimed and timed abridged versions of Raven's Advanced Progressive Matrices. J Clin Exp Neuropsychol 2022; 44:73-84. [PMID: 35658791 DOI: 10.1080/13803395.2022.2080185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
INTRODUCTION Raven's Advanced Progressive Matrices (APM) are frequently utilized in clinical and experimental settings to index intellectual capacity. As the APM is a relatively long assessment, abridged versions of the test have been proposed. The psychometric properties of an untimed 12-item APM have received some consideration in the literature, but validity explorations have been limited. Moreover, both reliability and validity of a timed 12-item APM have not previously been examined. METHOD We considered the psychometric properties of untimed (Study 1; N = 608; Mage = 27.89, SD = 11.68) and timed (Study 2; N = 479; Mage = 20.93, SD = 3.12) versions of a brief online 12-item form of the APM. RESULTS Confirmatory factor analyses established both versions of the tests are unidimensional. Item response theory analyses revealed that, in each case, the 12 items are characterized by distinct differences in difficulty, discrimination, and guessing. Differential item functioning showed few male/female or native English/non-native English performance differences. Test-retest reliability was .65 (Study 1) to .69 (Study 2). Both tests had medium-to-large correlations with the Wechsler Abbreviated Scale of Intelligence (2nd ed.) Perceptual Reasoning Index (r = .50, Study 1; r = .56, Study 2) and Full-Scale IQ (r = .34, Study 1; r = .41, Study 2). CONCLUSION In sum, results suggest both untimed and timed online versions of the brief APM are psychometrically sound. As test duration was found to be highly variable for the untimed version, the timed form might be a more suitable choice when it is likely to form part of a longer battery of tests. Nonetheless, classical test and item response theory analyses, plus validity considerations, suggest the untimed version might be the superior abridged form.
Collapse
Affiliation(s)
- Antoinette Poulton
- Melbourne School of Psychological Sciences, University of Melbourne, Parkville, VIC, Australia
| | - Kathleen Rutherford
- Melbourne School of Psychological Sciences, University of Melbourne, Parkville, VIC, Australia
| | - Sarah Boothe
- Melbourne School of Psychological Sciences, University of Melbourne, Parkville, VIC, Australia
| | - Madeleine Brygel
- Melbourne School of Psychological Sciences, University of Melbourne, Parkville, VIC, Australia
| | - Alice Crole
- Melbourne School of Psychological Sciences, University of Melbourne, Parkville, VIC, Australia
| | - Gezelle Dali
- Melbourne School of Psychological Sciences, University of Melbourne, Parkville, VIC, Australia
| | - Loren Richard Bruns
- Computing and Information Systems, University of Melbourne, Parkville, VIC, Australia
| | - Richard O Sinnott
- Computing and Information Systems, University of Melbourne, Parkville, VIC, Australia
| | - Robert Hester
- Melbourne School of Psychological Sciences, University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
228
|
Abstract
The presence of rapid guessing (RG) presents a challenge to practitioners in obtaining accurate estimates of measurement properties and examinee ability. In response to this concern, researchers have utilized response times as a proxy of RG and have attempted to improve parameter estimation accuracy by filtering RG responses using popular scoring approaches, such as the effort-moderated item response theory (EM-IRT) model. However, such an approach assumes that RG can be correctly identified based on an indirect proxy of examinee behavior. A failure to meet this assumption leads to the inclusion of distortive and psychometrically uninformative information in parameter estimates. To address this issue, a simulation study was conducted to examine how violations to the assumption of correct RG classification influences EM-IRT item and ability parameter estimation accuracy and compares these results with parameter estimates from the three-parameter logistic (3PL) model, which includes RG responses in scoring. Two RG misclassification factors were manipulated: type (underclassification vs. overclassification) and rate (10%, 30%, and 50%). Results indicated that the EM-IRT model provided improved item parameter estimation over the 3PL model regardless of misclassification type and rate. Furthermore, under most conditions, increased rates of RG underclassification were associated with the greatest bias in ability parameter estimates from the EM-IRT model. In spite of this, the EM-IRT model with RG misclassifications demonstrated more accurate ability parameter estimation than the 3PL model when the mean ability of RG subgroups did not differ. This suggests that in certain situations it may be better for practitioners to (a) imperfectly identify RG than to ignore the presence of such invalid responses and (b) select liberal over conservative response time thresholds to mitigate bias from underclassified RG.
Collapse
Affiliation(s)
- Joseph A. Rios
- University of Minnesota, Twin Cities, Minneapolis, MN, USA
| |
Collapse
|
229
|
Sanchez-Garcia M, de la Rosa-Cáceres A, Díaz-Batanero C, Fernández-Calderón F, Lozano OM. Cocaine use disorder criteria in a clinical sample: an analysis using item response theory, factor and network analysis. Am J Drug Alcohol Abuse 2022; 48:284-292. [PMID: 35100067 DOI: 10.1080/00952990.2021.2012185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
BACKGROUND The conceptualization of substance use disorders (SUDs) was modified in successive editions of the DSM. Dimensionality and inclusion/exclusion of several criteria was studied using various analytic approaches. OBJECTIVE The study aimed to deepen our knowledge of the interrelationships between the diagnostic criteria for cocaine use disorder (CUD), applying three different analytical techniques: factor analysis, Item Response Theory (IRT) models, and network analysis. METHODS 425 (85.4% male) outpatients were evaluated for CUD using the Substance Dependence Severity Scale. Confirmatory Factor Analysis, 2-parameter logistic model (IRT) and network analysis were applied to analyze the relationships between the diagnostic criteria. RESULTS The results show that "legal problems" criterion is not congruent with the CUD measure on three analyses. Also, network analysis suggests the usefulness of the "craving" criterion. The criterion "quit/control" is the one that presents the best centrality indices and expected influence, showing strong relationships with the criteria of "craving," "tolerance," "neglect roles" and "activities given up." CONCLUSIONS Network analysis appears to be a useful and complementary technique to factor analysis and IRT for understanding CUD. The "quit/control" criterion emerges as a central criterion to understand CUD.
Collapse
Affiliation(s)
- M Sanchez-Garcia
- Department of Clinical and Experimental Psychology, University of Huelva, Huelva, Spain.,Research Center for Natural Resources, Health and The Environment, University of Huelva, Huelva, Spain
| | - A de la Rosa-Cáceres
- Department of Clinical and Experimental Psychology, University of Huelva, Huelva, Spain
| | - C Díaz-Batanero
- Department of Clinical and Experimental Psychology, University of Huelva, Huelva, Spain.,Research Center for Natural Resources, Health and The Environment, University of Huelva, Huelva, Spain
| | - F Fernández-Calderón
- Department of Clinical and Experimental Psychology, University of Huelva, Huelva, Spain.,Research Center for Natural Resources, Health and The Environment, University of Huelva, Huelva, Spain
| | - O M Lozano
- Department of Clinical and Experimental Psychology, University of Huelva, Huelva, Spain.,Research Center for Natural Resources, Health and The Environment, University of Huelva, Huelva, Spain
| |
Collapse
|
230
|
Steinberg L, Rogers A. Changing the Scale: The Effect of Modifying Response Scale Labels on the Measurement of Personality and Affect. Multivariate Behav Res 2022; 57:79-93. [PMID: 32876478 DOI: 10.1080/00273171.2020.1807305] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Much research in psychology is based on self-report questionnaire data using items with Likert-type response scales. Often the same items are administered with different response scale labels in different studies. Using measures of personality and affect, the effect of type of label (bipolar or unipolar) on the categorical item responses was investigated with the methods of item response theory (IRT). In two studies, the effect of type of label was examined in the context of all options labeled and only endpoint options labeled. In Study 1, we found that when every number of a response scale is labeled, the responses to the same items differ between bipolar (agree-disagree) and unipolar (not at all - very much) labels. Study 2 showed that these differences are not observed when only the endpoints are labeled. The findings are discussed in terms of their implications for measurement and research reporting of personality, clinical, health, social, and other psychological constructs. IRT methods offer a way to increase our understanding of the psychological processes underlying answering questions.
Collapse
|
231
|
Jiménez S, Moral de la Rubia J, Varela-Garay RM, Merino-Soto C, Toledano-Toledano F. Resilience measurement scale in family caregivers of children with cancer: Multidimensional item response theory modeling. Front Psychiatry 2022; 13:985456. [PMID: 36727086 PMCID: PMC9885114 DOI: 10.3389/fpsyt.2022.985456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 12/23/2022] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Currently, information about the psychometric properties of the Resilience Measurement Scale (RESI-M) in family caregivers of children with cancer according to item response theory (IRT) is not available; this information could complement and confirm the findings available from classical test theory (CTT). The objective of this study was to test the five-factor structure of the RESI-M using a full information confirmatory multidimensional IRT graded response model and to estimate the multidimensional item-level parameters of discrimination (MDISC) and difficulty (MDIFF) from the RESI-M scale to investigate its construct validity and level of measurement error. METHODS An observational study was carried out, which included a sample of 633 primary caregivers of children with cancer, who were recruited through nonprobabilistic sampling. The caregivers responded to a battery of tests that included a sociodemographic variables questionnaire, the RESI-M, and measures of depression, quality of life, anxiety, and caregiver burden to explore convergent and divergent validity. RESULTS The main findings confirmed a five-factor structure of the RESI-M scale, with RMSEA = 0.078 (95% CI: 0.075, 0.080), TLI = 0.90, and CFI = 0.91. The estimation of the MDISC and MDIFF parameters indicated different values for each item, showing that all the items contribute differentially to the measurement of the dimensions of resilience. CONCLUSION That regardless of the measurement approach (IRT or CTT), the five-factor model of the RESI-M is valid at the theoretical, empirical, and methodological levels.
Collapse
Affiliation(s)
- Said Jiménez
- Unidad de Investigación en Medicina Basada en Evidencias, Hospital Infantil de México Federico Gómez, National Institute of Health, Mexico City, Mexico
| | | | - Rosa María Varela-Garay
- Departamento de Trabajo Social y Servicios Sociales, Facultad de Ciencias Sociales, Universidad Pablo de Olavide, Seville, Spain
| | - Cesar Merino-Soto
- Instituto de Investigación en Psicología, Universidad de San Martin de Porres, Lima, Peru
| | - Filiberto Toledano-Toledano
- Unidad de Investigación en Medicina Basada en Evidencias, Hospital Infantil de México Federico Gómez, National Institute of Health, Mexico City, Mexico.,Unidad de Investigación Sociomédica, Instituto Nacional de Rehabilitación Luis Guillermo Ibarra Ibarra, Mexico City, Mexico.,Dirección de Investigación y Diseminación del Conocimiento, Instituto Nacional de Ciencias e Innovación para la Formación de Comunidad Científica, INDEHUS, Mexico City, Mexico
| |
Collapse
|
232
|
Abstract
Cumulative sum (CUSUM) and change-point analysis (CPA) are two well-established statistical process control methods to detect changes in a sequence. Both have been used in psychometric research to detect aberrant responses in a response sequence, e.g., test speededness, inattentiveness, or cheating. However, the pros and cons of CUSUM and CPA in different testing settings still remain unclear. In this paper, we conduct a comprehensive comparison of the performance of twelve CUSUM-based statistics and three CPA-based procedures in detecting test speededness. Two speededness mechanisms are considered, namely the graduate change model (GCM) and the hybrid model (HM), to test the robustness and flexibility of the two methods. Simulation studies show that the performances of the statistics are affected by the underlying data generating model, the severity of speededness, and the test length. Generally, under HM some CUSUM statistics perform much better than the CPA-based statistics. Under the GCM, the performance of the CPA statistics is dramatically improved. Taken together, due to the unknown mechanism of speededness in real applications, two CUSUM-based statistics are recommended when the test length is long (e.g., 80 items), regardless of the underlying mechanism being HM or GCM. In a relatively short (e.g., 40 items) or medium-length (e.g., 60 items) test, no statistic always ends up in the top three under both HM and GCM. In those cases, either one of the two CUSUM-based statistics mentioned above can be a reasonable choice because of their good (though not necessarily the best) performance in a wide range of conditions.
Collapse
Affiliation(s)
- Xiaofeng Yu
- Department of Psychology, University of Notre Dame
- Department of Psychology, Jiangxi Normal University
| | - Ying Cheng
- Department of Psychology, University of Notre Dame
| |
Collapse
|
233
|
Abstract
Researchers in the social sciences often obtain ratings of a construct of interest provided by multiple raters. While using multiple raters provides a way to help avoid the subjectivity of any given person's responses, rater disagreement can be a problem. A variety of models exist to address rater disagreement in both structural equation modeling and item response theory frameworks. Recently, a model was developed by Bauer et al. (2013) and referred to as the "trifactor model" to provide applied researchers with a straightforward way of estimating scores that are purged of variance that is idiosyncratic by rater. Although the intent of the model is to be usable and interpretable, little is known about the circumstances under which it performs well, and those it does not. We conduct simulation studies to examine the performance of the trifactor model under a range of sample sizes and model specifications and then compare model fit, bias, and convergence rates.
Collapse
Affiliation(s)
- James Soland
- University of Virginia, Charlottesville, VA, USA
- NWEA, Portland, OR, USA
| | | |
Collapse
|
234
|
Ghoshal A, O'Carroll RE, Ferguson E, Shepherd L, Doherty S, Mathew M, Morgan K, Doyle F. Assessing medical mistrust in organ donation across countries using item response theory. J Health Psychol 2021; 27:2806-2819. [PMID: 34963351 DOI: 10.1177/13591053211064985] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Although medical mistrust (MM) may be an impediment to public health interventions, no MM scale has been validated across countries and the assessment of MM has not been explored using item response theory, which allows generalisation beyond the sampled data. We aimed to determine the dimensionality of a brief MM measure across four countries through Mokken analysis and Graded Response Modelling. Analysis of 1468 participants from UK (n = 1179), Ireland (n = 191), India (n = 49) and Malaysia (n = 49) demonstrated that MM items formed a hierarchical, unidimensional measure, which is very informative about high levels of MM. Possible item reduction and scoring changes were also demonstrated. This study demonstrates that this brief MM measure is suitable for international studies as it is unidimensional across countries, cross cultural, and shows that minor adjustments will not impact on the assessment of MM when using these items.
Collapse
Affiliation(s)
- Arunangshu Ghoshal
- Tata Memorial Centre, India.,Homi Bhaba National Institute (HBNI), India
| | | | | | | | | | | | - Karen Morgan
- Perdana University-Royal College of Surgeons in Ireland School of Medicine, Malaysia
| | - Frank Doyle
- Royal College of Surgeons in Ireland, Ireland
| |
Collapse
|
235
|
Tsai KT, Chien TW, Lin JK, Yeh YT, Chou W. Comparison of prediction accuracies between mathematical models to make projections of confirmed cases during the COVID-19 pandamic by country/region. Medicine (Baltimore) 2021; 100:e28134. [PMID: 34918666 PMCID: PMC8677971 DOI: 10.1097/md.0000000000028134] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 09/23/2021] [Accepted: 11/14/2021] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND The COVID-19 pandemic caused >0.228 billion infected cases as of September 18, 2021, implying an exponential growth for infection worldwide. Many mathematical models have been proposed to predict the future cumulative number of infected cases (CNICs). Nevertheless, none compared their prediction accuracies in models. In this work, we compared mathematical models recently published in scholarly journals and designed online dashboards that present actual information about COVID-19. METHODS All CNICs were downloaded from GitHub. Comparison of model R2 was made in 3 models based on quadratic equation (QE), modified QE (OE-m), and item response theory (IRT) using paired-t test and analysis of variance (ANOVA). The Kano diagram was applied to display the association and the difference in model R2 on a dashboard. RESULTS We observed that the correlation coefficient was 0.48 (t = 9.87, n = 265) between QE and IRT models based on R2 when modeling CNICs in a short run (dated from January 1 to February 16, 2021). A significant difference in R2 was found (P < .001, F = 53.32) in mean R2 of 0.98, 0.92, and 0.84 for IRT, OE-mm, and QE, respectively. The IRT-based COVID-19 model is superior to the counterparts of QE-m and QE in model R2 particularly in a longer period of infected days (i.e., in the entire year in 2020). CONCLUSION An online dashboard was demonstrated to display the association and difference in prediction accuracy among predictive models. The IRT mathematical model was recommended to make projections about the evolution of CNICs for each county/region in future applications, not just limited to the COVID-19 epidemic.
Collapse
Affiliation(s)
- Kang-Ting Tsai
- Center for Integrative Medicine, ChiMei Medical Center, Tainan, Taiwan
- Department of Geriatrics and Gerontology, ChiMei Medical Center, Tainan, Taiwan
- Department of Senior Welfare and Services, Southern Taiwan University of Science and Technology, Tainan, Taiwan
| | - Tsair-Wei Chien
- Department of Medical Research, Chiali Chi-Mei Hospital, Tainan, Taiwan
| | - Ju-Kuo Lin
- Department of Ophthalmology, Chi-Mei Medical Center, Tainan, Taiwan
- Department of Optometry, Chung Hwa University of Medical Technology, Tainan, Taiwan
| | - Yu-Tsen Yeh
- Department of Ophthalmology, Chi-Mei Medical Center, Tainan, Taiwan
- Medical School, St. George's University of London, London, United Kingdom
| | - Willy Chou
- Department of Physical Medicine and Rehabilitation, Chi Mei Medical Center, Tainan, Taiwan
- Department of Physical Medicine and Rehabilitation, Chung San Medical University Hospital, Taichung, Taiwan
| |
Collapse
|
236
|
Rezapour M, Veenstra C, Cuccolo K, Ferraro FR. Properties of a Transport Instrument for Measuring Psychological Impacts of Delay on Commuters, Mokken Scale Analysis. Front Psychol 2021; 12:748899. [PMID: 34970187 PMCID: PMC8712429 DOI: 10.3389/fpsyg.2021.748899] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 11/05/2021] [Indexed: 11/13/2022] Open
Abstract
This study assessed the validity of instrument including various negative psychological and physical behaviors of commuters due to the public transport delay. Instruments have been mostly evaluated by parametric method of item response theory (IRT). However, the IRT has been characterized by some restrictive assumptions about the data, focusing on detailed model fit evaluation. The Mokken scale analysis (MSA), as a scaling procedure is a non-parametric method, which does not require adherence to any distribution. The results of the study show that in most regards, our instrument meets the minimum requirements highlighted by the MSA. However, the instrument did not adhere to the minimum requirements of the "scalability" for two variables including "stomach pain" and "increased heart rate". So, modifications were proposed to address the violations. Although MSA technique has been used frequently in other fields, this is one of the earliest studies to implement the technique in the context of transport psychology.
Collapse
Affiliation(s)
| | - Cristopher Veenstra
- Department of Psychology, University of North Dakota, Grand Forks, ND, United States
| | - Kelly Cuccolo
- Department of Psychology, University of North Dakota, Grand Forks, ND, United States
| | - F. Richard Ferraro
- Department of Psychology, University of North Dakota, Grand Forks, ND, United States
| |
Collapse
|
237
|
Chinnarasri P, Wongpakaran N, Wongpakaran T. Developing and Validating the Narcissistic Personality Scale (NPS) among Older Thai Adults Using Rasch Analysis. Healthcare (Basel) 2021; 9:healthcare9121717. [PMID: 34946443 PMCID: PMC8701268 DOI: 10.3390/healthcare9121717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 12/02/2021] [Accepted: 12/09/2021] [Indexed: 11/21/2022] Open
Abstract
Background: Being older could be stressful, especially among people with narcissistic personality disorders. Nevertheless, the tool is yet to be available among older Thai individuals. The study aimed to develop a tool to detect symptoms of narcissistic personality, and to validate its psychometric properties among older Thai adults. Methods: The Narcissistic Personality Scale (NPS) was developed based on nine domain symptoms of narcissistic personality disorder from the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5), consisting of 80 items. The original scale was field-tested using Rasch analysis for item reduction, rendering a final 43 items. NPS was further investigated among 296 seniors aged 60 years old. Rasch analysis was used to assess its construct validity. Result: Of 43 items, 17 were further removed as infit or outfit mean square >1.5. The final 26-item NPS met all necessary criteria of unidimensionality and local independence without differential item functioning due to age and sex, and good targeting with subjects. Person and item reliability were 0.88 and 0.95, respectively. No disordered threshold or category was found. Conclusions: The NPS is a promising tool with a proven construct validity based on the Rasch measurement model among Thai seniors. This new questionnaire can be used as outcome measures in clinical practice.
Collapse
|
238
|
Schwartz CE, Stucky BD, Stark RB. Expanding the purview of wellness indicators: validating a new measure that includes attitudes, behaviors, and perspectives. Health Psychol Behav Med 2021; 9:1031-1052. [PMID: 34881116 PMCID: PMC8648008 DOI: 10.1080/21642850.2021.2008940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
Objective The present study validated the DeltaQuest Wellness Measure (DQ Wellness), a new 15-item measure of wellness that spans relevant attitudes, behaviors, and perspectives. Design This cross-sectional web-based study recruited chronically-ill patients and/or caregivers (n = 3,961) and a nationally representative comparison group (n = 855). Main Outcome Measures The DQ Wellness assesses: a way of being in the world that involves seeing and embracing the good and expressing kindness toward others; engagement in one's activities and self-care; downplaying negative thoughts that reduce one's energy; and an ability to feel joy. Six widely used measures of physical and mental health, cognition, and psychological well-being enabled construct-validity comparisons. Item-response theory (IRT) methods evaluated reliability, factor structure, and differential item functioning (DIF) by gender. Results The DQ Wellness showed strong cross-sectional reliability (marginal reliability = 0.89) and fit a bifactor model (RMSEA = 0.063, CFI = 0.982, TLI = 0.983). The DQ Wellness general score demonstrated construct validity, convergent and divergent validity, unique variance, and known-groups validity, and minimal gender DIF. The study is limited to addressing cross-sectional reliability and validity, and response rates are not known due to the recruitment source. Conclusion The DQ Wellness is a relatively brief measure, taps novel content, and could be useful for observational or interventional studies.
Collapse
Affiliation(s)
- Carolyn E Schwartz
- DeltaQuest Foundation, Inc., Concord, MA, USA.,Departments of Medicine and Orthopaedic Surgery, Tufts University Medical School, Boston, MA, USA
| | | | | |
Collapse
|
239
|
Abstract
More than 40 questionnaires have been developed to assess functional somatic symptoms (FSS), but there are several methodological issues regarding the measurement of FSS. We aimed to identify which items of the somatization subscale of the Symptom Checklist-90 (SCL-90) are more informative and discriminative between persons at different levels of severity of FSS. To this end, item response theory was applied to the somatization scale of the SCL-90, collected from a sample of 82,740 adult participants without somatic conditions in the Lifelines Cohort Study. Sensitivity analyses were performed with all the participants who completed the somatization scale. Both analyses showed that Items 11 "feeling weak physically" and 12 "heavy feelings in arms or legs" were the most discriminative and informative to measure severity levels of FSS, regardless of somatic conditions. Clinicians and researchers may pay extra attention to these symptoms to augment the assessment of FSS.
Collapse
Affiliation(s)
- Angélica Acevedo-Mesa
- University of Groningen, University Medical Center Groningen, Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), Groningen, the Netherlands
| | - Jorge Nunes Tendeiro
- University of Groningen, Department of Psychometrics and Statistics, Groningen, the Netherlands
| | - Annelieke Roest
- University of Groningen, Department of Developmental Psychology, Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), Groningen, the Netherlands
| | - Judith G M Rosmalen
- University of Groningen, University Medical Center Groningen, Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), Groningen, the Netherlands
| | - Rei Monden
- University of Groningen, University Medical Center Groningen, Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), Groningen, the Netherlands
| |
Collapse
|
240
|
Lozano JH, Revuelta J. A Bayesian Generalized Explanatory Item Response Model to Account for Learning During the Test. Psychometrika 2021; 86:994-1015. [PMID: 34460068 PMCID: PMC8636451 DOI: 10.1007/s11336-021-09786-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Accepted: 06/17/2021] [Indexed: 06/13/2023]
Abstract
The present paper introduces a new explanatory item response model to account for the learning that takes place during a psychometric test due to the repeated use of the operations involved in the items. The proposed model is an extension of the operation-specific learning model (Fischer and Formann in Appl Psychol Meas 6:397-416, 1982; Scheiblechner in Z für Exp Angew Psychol 19:476-506, 1972; Spada in Spada and Kempf (eds.) Structural models of thinking and learning, Huber, Bern, Germany, pp 227-262, 1977). The paper discusses special cases of the model, which, together with the general formulation, differ in the type of response in which the model states that learning occurs: (1) correct and incorrect responses equally (non-contingent learning); (2) correct responses only (contingent learning); and (3) correct and incorrect responses to a different extent (differential contingent learning). A Bayesian framework is adopted for model estimation and evaluation. A simulation study is conducted to examine the performance of the estimation and evaluation methods in recovering the true parameters and selecting the true model. Finally, an empirical study is presented to illustrate the applicability of the model to detect learning effects using real data.
Collapse
|
241
|
Xu C, Christensen JM, Haykal T, Asaad M, Sidey-Gibbons C, Schaverien M. Measurement Properties of the Lymphedema Life Impact Scale. Lymphat Res Biol 2021; 20:425-434. [PMID: 34842442 DOI: 10.1089/lrb.2021.0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Background: The updated Lymphedema Life Impact Scale (LLIS, version 2) has been widely used to evaluate the effect of lymphedema from the patient's perspective. We sought to assess its ability to accurately and efficiently measure lymphedema-related impact using modern psychometric techniques. Methods and Results: We collected a total of 1054 patient-reported outcome measure scores from 285 patients with upper extremity lymphedema and 65 patients with lower extremity lymphedema between 2016 and 2020. We first evaluated the relationship between the LLIS score, L-Dex score, and limb volume difference (LVD), and used classical test and item response theories to assess its psychometric performance. The LLIS score was only very weakly associated with LVD (r = 0.17, p < 0.001) and L-Dex score (r = 0.22, p < 0.001). The LLIS had acceptable dimensionality. Items 7 (affects body image) and 16 (affects proper fit of clothing/shoes) were locally dependent (Yen's Q3 = 0.45). Eight of the 17 items was interpreted differently between upper and lower limb lymphedema patients (pseudo R2 ≥ 0.01). The scoring structure required correction for items 9 (affects intimate relations) and 12 (manages lymphedema). Removing items 18 (infection occurrence) and 7 resulted in substantially improved item response theory model fit (Tucker-Lewis index = 0.93, comparative fix index = 0.95, root mean square error of approximation = 0.07, and root means square of the residual = 0.06). The relationships between the LLIS and objective measures of lymphedema remained weak following modification (LVD; r = 0.13, p = 0.01, L-Dex; r = 0.26, p < 0.001). Conclusion: We were able to slightly improve the psychometric properties of the LLIS. However, these improvements did not rectify apparent issues with construct validity and both versions of the LLIS displayed a weak relationship with objective measures of lymphedema severity.
Collapse
Affiliation(s)
- Cai Xu
- MD Anderson Center for INSPiRED Cancer Care (Integrated Systems for Patient-Reported Data), The University of Texas MD Anderson Cancer Center, Houston, Texas, USA.,Department of Symptom Research and The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Joani M Christensen
- Department of Plastic Surgery, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Tareck Haykal
- Department of Plastic Surgery, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Malke Asaad
- Department of Plastic Surgery, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Chris Sidey-Gibbons
- MD Anderson Center for INSPiRED Cancer Care (Integrated Systems for Patient-Reported Data), The University of Texas MD Anderson Cancer Center, Houston, Texas, USA.,Department of Symptom Research and The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Mark Schaverien
- Department of Plastic Surgery, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| |
Collapse
|
242
|
Battauz M, Bellio R. Shrinkage estimation of the three-parameter logistic model. Br J Math Stat Psychol 2021; 74:591-609. [PMID: 33734439 DOI: 10.1111/bmsp.12241] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 01/19/2020] [Indexed: 06/12/2023]
Abstract
The three-parameter logistic model is widely used to model the responses to a proficiency test when the examinees can guess the correct response, as is the case for multiple-choice items. However, the weak identifiability of the parameters of the model results in large variability of the estimates and in convergence difficulties in the numerical maximization of the likelihood function. To overcome these issues, in this paper we explore various shrinkage estimation methods, following two main approaches. First, a ridge-type penalty on the guessing parameters is introduced in the likelihood function. The tuning parameter is then selected through various approaches: cross-validation, information criteria or using an empirical Bayes method. The second approach explored is based on the methodology developed to reduce the bias of the maximum likelihood estimator through an adjusted score equation. The performance of the methods is investigated through simulation studies and a real data example.
Collapse
Affiliation(s)
- Michela Battauz
- Department of Economics and Statistics, University of Udine, Italy
| | - Ruggero Bellio
- Department of Economics and Statistics, University of Udine, Italy
| |
Collapse
|
243
|
Paller AS, Lai JS, Jackson K, Rangel SM, Nowinski C, Silverberg JI, Ustsinovich V, Cella D. Generation and Validation of the PROMIS Itch Questionnaire - Child to Measure the Impact of Itch on Life Quality. J Invest Dermatol 2021; 142:1309-1317.e1. [PMID: 34757070 DOI: 10.1016/j.jid.2021.10.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 09/24/2021] [Accepted: 10/06/2021] [Indexed: 11/28/2022]
Abstract
Itch compromises quality of life, but most itch assessments focus only on itch intensity. We aimed to develop and validate a comprehensive PROMIS (Patient Reported Outcomes Measurement Information System®) pediatric measure for itch symptoms and itch impact, defined as the effect specifically of itch on physical, mental, and social health, all of which can affect life quality. After literature review, concept elicitation and cognitive interviews with parents and children with itch, and repeated content-expert review, an item pool was generated and refined. The pool was calibrated with data from 499 pruritic children using exploratory and confirmatory factor analyses, item response theory, and item fit analysis. The resultant 45-item bank, PROMIS Itch Questionnaire - Child (PIQ-C), showed good convergent and discriminant validity in 181 children 8-17 years of age, discriminating children with different levels of severity, and was responsive to change. Strong correlations (rho>.60) were observed with pain and sleep measures, and moderate correlations with other pediatric PROMIS measures. PIQ-C comprehensively measures itch intensity and burden, providing an itch-specific alternative for assessing life quality. The independent calibration of each item/question allows for flexibility in generating short-forms or computerized adaptive testing for efficient use in research and office practice.
Collapse
Affiliation(s)
- Amy S Paller
- Departments of Dermatology(,) Northwestern University Feinberg School of Medicine, Chicago, IL.
| | - Jin-Shei Lai
- Departments of Medical Social Sciences(,) Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Kathryn Jackson
- Departments of Medical Social Sciences(,) Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Stephanie M Rangel
- Departments of Dermatology(,) Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Cindy Nowinski
- Departments of Medical Social Sciences(,) Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Jonathan I Silverberg
- Departments of Dermatology(,) Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Vitali Ustsinovich
- Departments of Medical Social Sciences(,) Northwestern University Feinberg School of Medicine, Chicago, IL
| | - David Cella
- Departments of Medical Social Sciences(,) Northwestern University Feinberg School of Medicine, Chicago, IL
| |
Collapse
|
244
|
Rosenthal MZ, Anand D, Cassiello-Robbins C, Williams ZJ, Guetta RE, Trumbull J, Kelley LD. Development and Initial Validation of the Duke Misophonia Questionnaire. Front Psychol 2021; 12:709928. [PMID: 34659024 PMCID: PMC8511674 DOI: 10.3389/fpsyg.2021.709928] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/27/2021] [Indexed: 11/13/2022] Open
Abstract
Misophonia is characterized by decreased tolerance and accompanying defensive motivational system responding to certain aversive sounds and contextual cues associated with such stimuli, typically repetitive oral (e. g., eating sounds) or nasal (e.g., breathing sounds) stimuli. Responses elicit significant psychological distress and impairment in functioning, and include acute increases in (a) negative affect (e.g., anger, anxiety, and disgust), (b) physiological arousal (e.g., sympathetic nervous system activation), and (c) overt behavior (e.g., escape behavior and verbal aggression toward individuals generating triggers). A major barrier to research and treatment of misophonia is the lack of rigorously validated assessment measures. As such, the primary purpose of this study was to develop and psychometrically validate a self-report measure of misophonia, the Duke Misophonia Questionnaire (DMQ). There were two phases of measure development. In Phase 1, items were generated and iteratively refined from a combination of the scientific literature and qualitative feedback from misophonia sufferers, their family members, and professional experts. In Phase 2, a large community sample of adults (n = 424) completed DMQ candidate items and other measures needed for psychometric analyses. A series of iterative analytic procedures (e.g., factor analyses and IRT) were used to derive final DMQ items and scales. The final DMQ has 86 items and includes subscales: (1) Trigger frequency (16 items), (2) Affective Responses (5 items), (3) Physiological Responses (8 items), (4) Cognitive Responses (10 items), (5) Coping Before (6 items), (6) Coping During (10 items), (7) Coping After (5 items), (8) Impairment (12 items), and Beliefs (14 items). Composite scales were derived for overall Symptom Severity (combined Affective, Physiological, and Cognitive subscales) and Coping (combined the three Coping subscales). Depending on the needs of researchers or clinicians, the DMQ may be use in full form, individual subscales, or with the derived composite scales.
Collapse
Affiliation(s)
- M Zachary Rosenthal
- Department of Psychiatry & Behavioral Sciences, Duke University Medical Center, Durham, NC, United States.,Department of Psychology & Neuroscience, Duke University, Durham, NC, United States
| | - Deepika Anand
- Department of Psychiatry & Behavioral Sciences, Duke University Medical Center, Durham, NC, United States
| | - Clair Cassiello-Robbins
- Department of Psychiatry & Behavioral Sciences, Duke University Medical Center, Durham, NC, United States
| | - Zachary J Williams
- Medical Scientist Training Program, Vanderbilt University School of Medicine, Nashville, TN, United States.,Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, United States.,Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, United States.,Frist Center for Autism and Innovation, Vanderbilt University, Nashville, TN, United States
| | - Rachel E Guetta
- Department of Psychology & Neuroscience, Duke University, Durham, NC, United States
| | - Jacqueline Trumbull
- Department of Psychology & Neuroscience, Duke University, Durham, NC, United States
| | - Lisalynn D Kelley
- Department of Psychiatry & Behavioral Sciences, Duke University Medical Center, Durham, NC, United States
| |
Collapse
|
245
|
Vitoratou S, Uglik-Marucha N, Hayes C, Erfanian M, Pearson O, Gregory J. Item Response Theory Investigation of Misophonia Auditory Triggers. Audiol Res 2021; 11:567-581. [PMID: 34698077 PMCID: PMC8544191 DOI: 10.3390/audiolres11040051] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 10/06/2021] [Accepted: 10/08/2021] [Indexed: 11/16/2022] Open
Abstract
Misophonia is characterised by a low tolerance for day-to-day sounds, causing intense negative affect. This study conducts an in-depth investigation of 35 misophonia triggers. A sample of 613 individuals who identify as experiencing misophonia and 202 individuals from the general population completed self-report measures. Using contemporary psychometric methods, we studied the triggers in terms of internal consistency, stability in time, precision, severity, discrimination ability, and information. Three dimensions of sensitivity were identified, namely, to eating sounds, to nose/throat sounds, and to general environmental sounds. The most informative and discriminative triggers belonged to the eating sounds. Participants identifying with having misophonia had also significantly increased odds to endorse eating sounds as auditory triggers than others. This study highlights the central role of eating sounds in this phenomenon and finds that different triggers are endorsed by those with more severe sound sensitivities than those with low sensitivity.
Collapse
Affiliation(s)
- Silia Vitoratou
- Psychometrics and Measurement Lab, Biostatistics and Health Informatics Department, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AB, UK; (S.V.); (N.U.-M.); (C.H.); (O.P.)
| | - Nora Uglik-Marucha
- Psychometrics and Measurement Lab, Biostatistics and Health Informatics Department, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AB, UK; (S.V.); (N.U.-M.); (C.H.); (O.P.)
| | - Chloe Hayes
- Psychometrics and Measurement Lab, Biostatistics and Health Informatics Department, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AB, UK; (S.V.); (N.U.-M.); (C.H.); (O.P.)
| | - Mercede Erfanian
- UCL Institute for Environmental Design and Engineering, The Bartlett, University College London, London WC1H 0NN, UK;
| | - Oliver Pearson
- Psychometrics and Measurement Lab, Biostatistics and Health Informatics Department, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE5 8AB, UK; (S.V.); (N.U.-M.); (C.H.); (O.P.)
| | - Jane Gregory
- Centre for Anxiety Disorders and Trauma, South London and Maudsley NHS Foundation Trust, London SE5 8AZ, UK
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6GG, UK
- Correspondence:
| |
Collapse
|
246
|
Amtmann D, Bamer A, McMullen K, Ryan CM, Schneider JC, Carrougher GJ, Gibran N. Evaluation of the psychometric properties of the burn specific health scale-brief: A National Institute on Disability, Independent Living, and Rehabilitation Research Burn Model System Study. J Burn Care Res 2021; 43:602-612. [PMID: 34643699 DOI: 10.1093/jbcr/irab190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
OBJECTIVE The Burn Specific Health Scale-Brief (BSHS-B) is a commonly used burn specific health outcome measure that includes 40 items across nine subscales. The objective of this study was to use both classical and modern psychometric methods to evaluate psychometric properties of the BSHS-B. METHODS Data were collected post burn injury by a multisite federally funded project tracking long term outcomes. We examined dimensionality, local dependence, item fit, and functioning of response categories, homogeneity, and floor and ceiling effects. Items were fit to Item Response Theory models for evaluation. RESULTS A total of 653 adults with burn injury completed the BSHS-B. Factor analyses supported unidimensionality for all subscales, but not for a total score based on all 40 items. All nine of the subscales had significant ceiling effects. Six item pairs displayed local dependence suggesting redundance and 11 items did not fit the Item Response Theory models. At least 15 items have too many response options. CONCLUSIONS Results identified numerous psychometric issues with the BSHS-B. A single summary score should never be used for any purpose. Psychometric properties of the scale need to be improved by removing redundant items, reducing response categories and modifying or deleting problematic items. Additional conceptual work is needed to, at a minimum, revise the work subscale and optimally to revisit and clearly define the constructs measured by all the subscales. Additional items are needed to address ceiling effects.
Collapse
Affiliation(s)
- Dagmar Amtmann
- Department of Rehabilitation Medicine, University of Washington, Seattle, WA
| | - Alyssa Bamer
- Department of Rehabilitation Medicine, University of Washington, Seattle, WA
| | - Kara McMullen
- Department of Rehabilitation Medicine, University of Washington, Seattle, WA
| | - Colleen M Ryan
- Shriners Hospitals for Children - Boston, Boston, MA.,Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - Jeffrey C Schneider
- Department of Physical Medicine and Rehabilitation, Spaulding Rehabilitation Hospital, Spaulding Research Institute, Harvard Medical School, Boston, MA
| | | | - Nicole Gibran
- Department of Surgery, University of Washington Harborview, Seattle, WA
| |
Collapse
|
247
|
Wang M, Reeve BB. Evaluations of the sum-score-based and item response theory-based tests of group mean differences under various simulation conditions. Stat Methods Med Res 2021; 30:2604-2618. [PMID: 34617840 PMCID: PMC8649417 DOI: 10.1177/09622802211043263] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
The use of patient-reported outcomes measures is gaining popularity in clinical
trials for comparing patient groups. Such comparisons typically focus on the
differences in group means and are carried out using either a traditional
sum-score-based approach or item response theory (IRT)-based approaches. Several
simulation studies have evaluated different group mean comparison approaches in
the past, but the performance of these approaches remained unknown under certain
uninvestigated conditions (e.g. under the impact of differential item
functioning (DIF)). By incorporating some of the uninvestigated simulation
features, the current study examines Type I error, statistical power, and effect
size estimation accuracy associated with group mean comparisons using simple sum
scores, IRT model likelihood ratio tests, and IRT expected-a-posteriori scores.
Manipulated features include sample size per group, number of items, number of
response categories, strength of discrimination parameters, location of
thresholds, impact of DIF, and presence of missing data. Results are summarized
and visualized using decision trees.
Collapse
Affiliation(s)
- Mian Wang
- Lineberger Comprehensive Cancer Center, 2331University of North Carolina at Chapel Hill, Carrboro, NC, USA
| | - Bryce B Reeve
- Department of Population Health Sciences, 3065Duke University School of Medicine, Durham, NC, USA
| |
Collapse
|
248
|
Yildiz H. IRTGUI: An R Package for Unidimensional Item Response Theory Analysis With a Graphical User Interface. Appl Psychol Meas 2021; 45:551-552. [PMID: 34866712 PMCID: PMC8640354 DOI: 10.1177/01466216211040532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In the last decade, many R packages were published to perform item response theory (IRT) analysis. Some researchers and practitioners have difficulty in using these functional tools because of their insufficient coding skills. The IRTGUI package provides these researchers a user-friendly GUI where they can perform unidimensional IRT analysis without coding skills. Using the IRTGUI package, person and item parameters, model and item fit indices can be obtained. Dimensionality and local independence assumptions can be tested. With the IRTGUI package, users can generate dichotomous data sets with customizable conditions. Also, Wright Maps, item characteristics and information curves can be graphically displayed. All outputs can be easily downloaded by users.
Collapse
|
249
|
Tu N, Zhang B, Angrave L, Sun T. bmggum: An R Package for Bayesian Estimation of the Multidimensional Generalized Graded Unfolding Model With Covariates. Appl Psychol Meas 2021; 45:553-555. [PMID: 34866713 PMCID: PMC8640348 DOI: 10.1177/01466216211040488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Over the past couple of decades, there has been an increasing interest in adopting ideal point models to represent noncognitive constructs, as they have been demonstrated to better measure typical behaviors than traditional dominance models do. The generalized graded unfolding model (GGUM) has consistently been the most popular ideal point model among researchers and practitioners. However, the GGUM2004 software and the later developed GGUM package in R can only handle unidimensional models despite the fact that many noncognitive constructs are multidimensional in nature. In addition, GGUM2004 and the GGUM package often yield unreasonable estimates of item parameters and standard errors. To address these issues, we developed the new open-source bmggum R package that is capable of estimating both unidimensional and multidimensional GGUM using a fully Bayesian approach, with supporting capabilities of stabilizing parameterization, incorporating person covariates, estimating constrained models, providing fit diagnostics, producing convergence metrics, and effectively handling missing data.
Collapse
Affiliation(s)
- Naidan Tu
- University of South Florida, Tampa, FL, USA
| | - Bo Zhang
- Texas A&M University, College Station, TX, USA
| | | | - Tianjun Sun
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Kansas State University, Manhattan, KS, USA
| |
Collapse
|
250
|
Nebl PJ, McCoy MG, Foster GC, Zickar MJ. Assessment of the Mate Retention Inventory-Short Form Using Item Response Theory. Evol Psychol 2021; 19:14747049211044150. [PMID: 34633890 PMCID: PMC10358423 DOI: 10.1177/14747049211044150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 08/02/2021] [Accepted: 08/18/2021] [Indexed: 11/15/2022] Open
Abstract
The mate retention inventory (MRI) has been a valuable tool in the field of evolutionary psychology for the past 30 years. The goal of the current research is to subject the MRI to rigorous psychometric analysis using item response theory to answer three broad questions. Do the individual items of the MRI fit the scale well? Does the overall function of the MRI match what is predicted? Finally, do men and women respond similarly to the MRI? Using a graded response model, it was found that all but two of the items fit acceptable model patterns. Test information function analysis found that the scale acceptably captures individual differences for participants with a high degree of mate retention but the scale is lacking in capturing information from participants with a low degree of mate retention. Finally, discriminate item function analysis reveals that the MRI is better at assessing male than female participants, indicating that the scale may not be the best indicator of female behavior in a relationship. Overall, we conclude that the MRI is a good scale, especially for assessing male behavior, but it could be improved for assessing female behavior and individuals lower on overall mate retention behavior. It is suggested that this paper be used as a framework for how the newest psychometrics techniques can be applied in order to create more robust and valid measures in the field of evolutionary psychology.
Collapse
Affiliation(s)
- Patrick J. Nebl
- Department of Psychology, Elmhurst University, Elmhurst, IL, USA
| | - Mark G. McCoy
- Department of Psychology, Manchester University, North Manchester, IN, USA
| | - Garett C. Foster
- Department of Psychology, Bowling Green State University, Bowling Green, OH, USA
| | - Michael J. Zickar
- Department of Psychology, Bowling Green State University, Bowling Green, OH, USA
| |
Collapse
|