1
|
Zimmer F, Draxler C, Debelak R. Power Analysis for the Wald, LR, Score, and Gradient Tests in a Marginal Maximum Likelihood Framework: Applications in IRT. Psychometrika 2023; 88:1249-1298. [PMID: 36029390 PMCID: PMC10656348 DOI: 10.1007/s11336-022-09883-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 01/11/2022] [Indexed: 06/15/2023]
Abstract
The Wald, likelihood ratio, score, and the recently proposed gradient statistics can be used to assess a broad range of hypotheses in item response theory models, for instance, to check the overall model fit or to detect differential item functioning. We introduce new methods for power analysis and sample size planning that can be applied when marginal maximum likelihood estimation is used. This allows the application to a variety of IRT models, which are commonly used in practice, e.g., in large-scale educational assessments. An analytical method utilizes the asymptotic distributions of the statistics under alternative hypotheses. We also provide a sampling-based approach for applications where the analytical approach is computationally infeasible. This can be the case with 20 or more items, since the computational load increases exponentially with the number of items. We performed extensive simulation studies in three practically relevant settings, i.e., testing a Rasch model against a 2PL model, testing for differential item functioning, and testing a partial credit model against a generalized partial credit model. The observed distributions of the test statistics and the power of the tests agreed well with the predictions by the proposed methods in sufficiently large samples. We provide an openly accessible R package that implements the methods for user-supplied hypotheses.
Collapse
Affiliation(s)
| | - Clemens Draxler
- The Health and Life Sciences University, Hall in Tirol, Austria
| | | |
Collapse
|
2
|
Fellinghauer C, Debelak R, Strobl C. What Affects the Quality of Score Transformations? Potential Issues in True-Score Equating Using the Partial Credit Model. Educ Psychol Meas 2023; 83:1249-1290. [PMID: 37970488 PMCID: PMC10638984 DOI: 10.1177/00131644221143051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2023]
Abstract
This simulation study investigated to what extent departures from construct similarity as well as differences in the difficulty and targeting of scales impact the score transformation when scales are equated by means of concurrent calibration using the partial credit model with a common person design. Practical implications of the simulation results are discussed with a focus on scale equating in health-related research settings. The study simulated data for two scales, varying the number of items and the sample sizes. The factor correlation between scales was used to operationalize construct similarity. Targeting of the scales was operationalized through increasing departure from equal difficulty and by varying the dispersion of the item and person parameters in each scale. The results show that low similarity between scales goes along with lower transformation precision. In cases with equal levels of similarity, precision improves in settings where the range of the item parameters is encompassing the person parameters range. With decreasing similarity, score transformation precision benefits more from good targeting. Difficulty shifts up to two logits somewhat increased the estimation bias but without affecting the transformation precision. The observed robustness against difficulty shifts supports the advantage of applying a true-score equating methods over identity equating, which was used as a naive baseline method for comparison. Finally, larger sample size did not improve the transformation precision in this study, longer scales improved only marginally the quality of the equating. The insights from the simulation study are used in a real-data example.
Collapse
|
3
|
Zimmer F, Henninger M, Debelak R. Sample size planning for complex study designs: A tutorial for the mlpwr package. Behav Res Methods 2023:10.3758/s13428-023-02269-0. [PMID: 38030925 DOI: 10.3758/s13428-023-02269-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/09/2023] [Indexed: 12/01/2023]
Abstract
A common challenge in designing empirical studies is determining an appropriate sample size. When more complex models are used, estimates of power can only be obtained using Monte Carlo simulations. In this tutorial, we introduce the R package mlpwr to perform simulation-based power analysis based on surrogate modeling. Surrogate modeling is a powerful tool in guiding the search for study design parameters that imply a desired power or meet a cost threshold (e.g., in terms of monetary cost). mlpwr can be used to search for the optimal allocation when there are multiple design parameters, e.g., when balancing the number of participants and the number of groups in multilevel modeling. At the same time, the approach can take into account the cost of each design parameter, and aims to find a cost-efficient design. We introduce the basic functionality of the package, which can be applied to a wide range of statistical models and study designs. Additionally, we provide two examples based on empirical studies for illustration: one for sample size planning when using an item response theory model, and one for assigning the number of participants and the number of countries for a study using multilevel modeling.
Collapse
Affiliation(s)
- Felix Zimmer
- Psychological Methods, Evaluation and Statistics, Department of Psychology, University of Zurich, Binzmuehlestrasse 14, Box 27, 8050, Zurich, Switzerland.
| | - Mirka Henninger
- Psychological Methods, Evaluation and Statistics, Department of Psychology, University of Zurich, Binzmuehlestrasse 14, Box 27, 8050, Zurich, Switzerland
| | - Rudolf Debelak
- Psychological Methods, Evaluation and Statistics, Department of Psychology, University of Zurich, Binzmuehlestrasse 14, Box 27, 8050, Zurich, Switzerland
| |
Collapse
|
4
|
Henninger M, Debelak R, Rothacher Y, Strobl C. Interpretable machine learning for psychological research: Opportunities and pitfalls. Psychol Methods 2023:2023-75978-001. [PMID: 37227894 DOI: 10.1037/met0000560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In recent years, machine learning methods have become increasingly popular prediction methods in psychology. At the same time, psychological researchers are typically not only interested in making predictions about the dependent variable, but also in learning which predictor variables are relevant, how they influence the dependent variable, and which predictors interact with each other. However, most machine learning methods are not directly interpretable. Interpretation techniques that support researchers in describing how the machine learning technique came to its prediction may be a means to this end. We present a variety of interpretation techniques and illustrate the opportunities they provide for interpreting the results of two widely used black box machine learning methods that serve as our examples: random forests and neural networks. At the same time, we illustrate potential pitfalls and risks of misinterpretation that may occur in certain data settings. We show in which way correlated predictors impact interpretations with regard to the relevance or shape of predictor effects and in which situations interaction effects may or may not be detected. We use simulated didactic examples throughout the article, as well as an empirical data set for illustrating an approach to objectify the interpretation of visualizations. We conclude that, when critically reflected, interpretable machine learning techniques may provide useful tools when describing complex psychological relationships. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
|
5
|
Henninger M, Debelak R, Strobl C. A New Stopping Criterion for Rasch Trees Based on the Mantel-Haenszel Effect Size Measure for Differential Item Functioning. Educ Psychol Meas 2023; 83:181-212. [PMID: 36601252 PMCID: PMC9806517 DOI: 10.1177/00131644221077135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
To detect differential item functioning (DIF), Rasch trees search for optimal splitpoints in covariates and identify subgroups of respondents in a data-driven way. To determine whether and in which covariate a split should be performed, Rasch trees use statistical significance tests. Consequently, Rasch trees are more likely to label small DIF effects as significant in larger samples. This leads to larger trees, which split the sample into more subgroups. What would be more desirable is an approach that is driven more by effect size rather than sample size. In order to achieve this, we suggest to implement an additional stopping criterion: the popular Educational Testing Service (ETS) classification scheme based on the Mantel-Haenszel odds ratio. This criterion helps us to evaluate whether a split in a Rasch tree is based on a substantial or an ignorable difference in item parameters, and it allows the Rasch tree to stop growing when DIF between the identified subgroups is small. Furthermore, it supports identifying DIF items and quantifying DIF effect sizes in each split. Based on simulation results, we conclude that the Mantel-Haenszel effect size further reduces unnecessary splits in Rasch trees under the null hypothesis, or when the sample size is large but DIF effects are negligible. To make the stopping criterion easy-to-use for applied researchers, we have implemented the procedure in the statistical software R. Finally, we discuss how DIF effects between different nodes in a Rasch tree can be interpreted and emphasize the importance of purification strategies for the Mantel-Haenszel procedure on tree stopping and DIF item classification.
Collapse
|
6
|
Debelak R, Pawel S, Strobl C, Merkle EC. Score-based measurement invariance checks for Bayesian maximum-a-posteriori estimates in item response theory. Br J Math Stat Psychol 2022; 75:728-752. [PMID: 35670000 PMCID: PMC9796736 DOI: 10.1111/bmsp.12275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 04/05/2022] [Indexed: 06/15/2023]
Abstract
A family of score-based tests has been proposed in recent years for assessing the invariance of model parameters in several models of item response theory (IRT). These tests were originally developed in a maximum likelihood framework. This study discusses analogous tests for Bayesian maximum-a-posteriori estimates and multiple-group IRT models. We propose two families of statistical tests, which are based on an approximation using a pooled variance method, or on a simulation approach based on asymptotic results. The resulting tests were evaluated by a simulation study, which investigated their sensitivity against differential item functioning with respect to a categorical or continuous person covariate in the two- and three-parametric logistic models. Whereas the method based on pooled variance was found to be useful in practice with maximum likelihood as well as maximum-a-posteriori estimates, the simulation-based approach was found to require large sample sizes to lead to satisfactory results.
Collapse
Affiliation(s)
| | - Samuel Pawel
- Epidemiology, Biostatistics and Prevention Institute (EBPI)University of ZurichSwitzerland
| | | | - Edgar C. Merkle
- Department of Psychological SciencesUniversity of MissouriColumbiaMOUSA
| |
Collapse
|
7
|
Paz Castro R, Haug S, Debelak R, Jakob R, Kowatsch T, Schaub MP. Engagement With a Mobile Phone-Based Life Skills Intervention for Adolescents and Its Association With Participant Characteristics and Outcomes: Tree-Based Analysis. J Med Internet Res 2022; 24:e28638. [PMID: 35044309 PMCID: PMC8811696 DOI: 10.2196/28638] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 09/13/2021] [Accepted: 10/29/2021] [Indexed: 01/16/2023] Open
Abstract
Background Mobile phone–delivered life skills programs are an emerging and promising way to promote mental health and prevent substance use among adolescents, but little is known about how adolescents actually use them. Objective The aim of this study is to determine engagement with a mobile phone–based life skills program and its different components, as well as the associations of engagement with adolescent characteristics and intended substance use and mental health outcomes. Methods We performed secondary data analysis on data from the intervention group (n=750) from a study that compared a mobile phone–based life skills intervention for adolescents recruited in secondary and upper secondary school classes with an assessment-only control group. Throughout the 6-month intervention, participants received 1 SMS text message prompt per week that introduced a life skills topic or encouraged participation in a quiz or individual life skills training or stimulated sharing messages with other program participants through a friendly contest. Decision trees were used to identify predictors of engagement (use and subjective experience). The stability of these decision trees was assessed using a resampling method and by graphical representation. Finally, associations between engagement and intended substance use and mental health outcomes were examined using logistic and linear regression analyses. Results The adolescents took part in half of the 50 interactions (mean 23.6, SD 15.9) prompted by the program, with SMS text messages being the most used and contests being the least used components. Adolescents who did not drink in a problematic manner and attended an upper secondary school were the ones to use the program the most. Regarding associations between engagement and intended outcomes, adolescents who used the contests more frequently were more likely to be nonsmokers at follow-up than those who did not (odds ratio 0.86, 95% CI 0.76-0.98; P=.02). In addition, adolescents who read the SMS text messages more attentively were less likely to drink in a problematic manner at follow-up (odds ratio 0.43, 95% CI 1.29-3.41; P=.003). Finally, participants who used the program the most and least were more likely to increase their well-being from baseline to 6-month follow-up compared with those with average engagement (βs=.39; t586=2.66; P=.008; R2=0.24). Conclusions Most of the adolescents participating in a digital life skills program that aimed to prevent substance use and promote mental health engaged with the intervention. However, measures to increase engagement in problem drinkers should be considered. Furthermore, efforts must be made to ensure that interventions are engaging and powerful across different educational levels. First results indicate that higher engagement with digital life skills programs could be associated with intended outcomes. Future studies should apply further measures to improve the reach of lower-engaged participants at follow-up to establish such associations with certainty.
Collapse
Affiliation(s)
- Raquel Paz Castro
- Swiss Research Institute for Public Health and Addiction, University of Zurich, Zurich, Switzerland
| | - Severin Haug
- Swiss Research Institute for Public Health and Addiction, University of Zurich, Zurich, Switzerland
| | - Rudolf Debelak
- Department of Psychology, Psychological Methods, Evaluation and Statistics, University of Zurich, Zurich, Switzerland.,Wilhelm Wundt Institute for Psychology, University of Leipzig, Leipzig, Germany
| | - Robert Jakob
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland
| | - Tobias Kowatsch
- Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland.,Centre for Digital Health Interventions, Institute of Technology Management, University of St.Gallen, St.Gallen, Switzerland
| | - Michael P Schaub
- Swiss Research Institute for Public Health and Addiction, University of Zurich, Zurich, Switzerland
| |
Collapse
|
8
|
Abstract
OBJECTIVES Functional psychologists are concerned with the performance of cognitive activities in the real world in relation to cognitive changes in older age. Conversational contexts may mitigate the influence of cognitive aging on the cognitive activity of language production. This study examined effects of familiarity with interlocutors, as a context, on language production in the real world. METHOD We collected speech samples using iPhones, where an audio recording app (i.e. Electronically Activated Recorder [EAR]) was installed. Over 31,300 brief audio files (30-second long) were randomly collected across four days from 61 young and 48 healthy older adults in Switzerland. We transcribed the audio files that included participants' speech and manually coded for familiar interlocutors (i.e. significant other, friends, family members) and strangers. We computed scores of vocabulary richness and grammatical complexity from the transcripts using computational linguistics techniques. RESULTS Bayesian multilevel analyses showed that participants used richer vocabulary and more complex grammar when talking with familiar interlocutors than with strangers. Young adults used more diverse vocabulary than older adults and the age effects remained stable across contexts. Furthermore, older adults produced equally complex grammar as young adults did with the significant other, but simpler grammar than young adults with friends and family members. CONCLUSION Familiarity with interlocutors is a promising contextual factor for research on aging and language complexity in the real world. Results were discussed in the context of cognitive aging.
Collapse
Affiliation(s)
- Minxia Luo
- Department of Psychology, University of Zurich, Zurich, Switzerland.,University Research Priority Program "Dynamics of Healthy Aging", University of Zurich, Zurich, Switzerland
| | - Rudolf Debelak
- Department of Psychology, University of Zurich, Zurich, Switzerland
| | - Gerold Schneider
- English Department, University of Zurich, Zurich, Switzerland.,Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Mike Martin
- Department of Psychology, University of Zurich, Zurich, Switzerland.,University Research Priority Program "Dynamics of Healthy Aging", University of Zurich, Zurich, Switzerland
| | - Burcu Demiray
- Department of Psychology, University of Zurich, Zurich, Switzerland.,University Research Priority Program "Dynamics of Healthy Aging", University of Zurich, Zurich, Switzerland
| |
Collapse
|
9
|
Becker MO, Dobrota R, Garaiman A, Debelak R, Fligelstone K, Tyrrell Kennedy A, Roennow A, Allanore Y, Carreira PE, Czirják L, Denton CP, Hesselstrand R, Sandqvist G, Kowal-Bielecka O, Bruni C, Matucci-Cerinic M, Mihai C, Gheorghiu AM, Mueller-Ladner U, Sexton J, Kvien TK, Heiberg T, Distler O. Development and validation of a patient-reported outcome measure for systemic sclerosis: the EULAR Systemic Sclerosis Impact of Disease (ScleroID) questionnaire. Ann Rheum Dis 2021; 81:507-515. [PMID: 34824049 DOI: 10.1136/annrheumdis-2021-220702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 11/09/2021] [Indexed: 11/03/2022]
Abstract
OBJECTIVES Patient-reported outcome measures (PROMs) are important for clinical practice and research. Given the high unmet need, our aim was to develop a comprehensive PROM for systemic sclerosis (SSc), jointly with patient experts. METHODS This European Alliance of Associations for Rheumatology (EULAR)-endorsed project involved 11 European SSc centres. Relevant health dimensions were chosen and prioritised by patients. The resulting Systemic Sclerosis Impact of Disease (ScleroID) questionnaire was subsequently weighted and validated by Outcome Measures in Rheumatology criteria in an observational cohort study, cross-sectionally and longitudinally. As comparators, SSc-Health Assessment Questionnaire (HAQ), EuroQol Five Dimensional (EQ-5D), Short Form-36 (SF-36) were included. RESULTS Initially, 17 health dimensions were selected and prioritised. The top 10 health dimensions were selected for the ScleroID questionnaire. Importantly, Raynaud's phenomenon, impaired hand function, pain and fatigue had the highest patient-reported disease impact. The validation cohort study included 472 patients with a baseline visit, from which 109 had a test-retest reliability visit and 113 had a follow-up visit (85% female, 38% diffuse SSc, mean age 58 years, mean disease duration 9 years). The total ScleroID score showed strong Pearson correlation coefficients with comparators (SSc-HAQ, 0.73; Patient's global assessment, Visual Analogue Scale 0.77; HAQ-Disability Index, 0.62; SF-36 physical score, -0.62; each p<0.001). The internal consistency was strong: Cronbach's alpha was 0.87, similar to SSc-HAQ (0.88) and higher than EQ-5D (0.77). The ScleroID had excellent reliability and good sensitivity to change, superior to all comparators (intraclass correlation coefficient 0.84; standardised response mean 0.57). CONCLUSIONS We have developed and validated the EULAR ScleroID, which is a novel, brief, disease-specific, patient-derived, disease impact PROM, suitable for research and clinical use in SSc.
Collapse
Affiliation(s)
- Mike O Becker
- Department of Rheumatology, University Hospital of Zurich, Zurich, Switzerland
| | - Rucsandra Dobrota
- Department of Rheumatology, University Hospital of Zurich, Zurich, Switzerland
| | - Alexandru Garaiman
- Department of Rheumatology, University Hospital of Zurich, Zurich, Switzerland
| | - Rudolf Debelak
- Department of Psychology, Psychological Methods, Evaluation and Statistics, University of Zurich, Zurich, Switzerland.,Department of Psychology, Psychological Methodology, University of Leipzig, Leipzig, Germany
| | | | - Ann Tyrrell Kennedy
- Federation of the European Scleroderma Associations (FESCA) aisbl, Tournai, Belgium
| | - Annelise Roennow
- Federation of European Scleroderma Associations (FESCA), Saint Maur, Belgium
| | - Yannick Allanore
- Department of Rheumatology A, Descartes University, APHP, Cochin Hospital, Paris, France
| | - Patricia E Carreira
- Department of Rheumatology, Hospital Universitario 12 de Octubre, Madrid, Spain
| | - László Czirják
- Department of Rheumatology and Immunology, University of Pécs, Pécs, Hungary
| | - Christopher P Denton
- Centre for Rheumatology, University College London, Royal Free Campus, London, UK
| | | | | | - Otylia Kowal-Bielecka
- Department of Rheumatology and Internal Medicine, Medical University of Bialystok, Bialystok, Poland
| | - Cosimo Bruni
- Department of Experimental and Clinical Medicine, Division of Rheumatology AOUC, University of Florence, Florence, Italy
| | - Marco Matucci-Cerinic
- Department of Experimental and Clinical Medicine, Division of Rheumatology AOUC, University of Florence, Florence, Italy.,IRCCS San Raffaele Hospital, Unit of Immunology, Rheumatology, Allergy and Rare diseases (UnIRAR), Milan, Italy
| | - Carina Mihai
- Department of Rheumatology, University Hospital of Zurich, Zurich, Switzerland.,Department of Internal Medicine and Rheumatology, Cantacuzino Hospital, Carol Davila University of Medicine and Pharmacy, Bucharest, Romania
| | - Ana Maria Gheorghiu
- Department of Internal Medicine and Rheumatology, Cantacuzino Hospital, Carol Davila University of Medicine and Pharmacy, Bucharest, Romania
| | - Ulf Mueller-Ladner
- Department of Rheumatology and Clinical Immunology, Justus-Liebig University Giessen, Campus Kerckhoff, Bad Nauheim, Germany
| | - Joseph Sexton
- Division of Rheumatology and Research, Diakonhjemmet Hospital, Oslo, Norway
| | - Tore K Kvien
- Division of Rheumatology and Research, Diakonhjemmet Hospital, Oslo, Norway
| | - Turid Heiberg
- Regional Research Support, Oslo University Hospital, Oslo, Norway
| | - Oliver Distler
- Department of Rheumatology, University Hospital of Zurich, Zurich, Switzerland
| |
Collapse
|
10
|
Luo M, Debelak R, Schneider G, Martin M, Demiray B. Real-World Language Use With Familiar Versus Unfamiliar Interlocutors in Young and Older Adults. Innov Aging 2020. [PMCID: PMC7742281 DOI: 10.1093/geroni/igaa057.2121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Real-world contexts may compensate for age-related changes in language production. We compared age effects on vocabulary richness (i.e., entropy) and grammatical complexity (i.e., clause length) in conversations with familiar interlocutors (i.e., significant other, friends, family members) versus with strangers. We collected thousands of 30-seconds speech samples from 61 young and 48 healthy older adults across four days using a portable audio recording device — the Electronically Activated Recorder (EAR). Bayesian multilevel analyses showed that participants used richer vocabulary and more complex grammar with familiar interlocutors than strangers. Young adults used richer vocabulary than older adults. Furthermore, older adults produced equally complex grammar with the significant other as young adults did, but simpler grammar with friends and family members. We found no age group differences in grammatical complexity with strangers (lacking statistical power). In sum, familiarity with the significant other may benefit older adults in producing complex grammar in real-world conversations.
Collapse
Affiliation(s)
- Minxia Luo
- University of Zurich, Zurich, Zurich, Switzerland
| | | | | | - Mike Martin
- University of Zurich, Zurich, Zurich, Switzerland
| | | |
Collapse
|
11
|
Abstract
In this paper, we apply Vuong's general approach of model selection to the comparison of nested and non-nested unidimensional and multidimensional item response theory (IRT) models. Vuong's approach of model selection is useful because it allows for formal statistical tests of both nested and non-nested models. However, only the test of non-nested models has been applied in the context of IRT models to date. After summarizing the statistical theory underlying the tests, we investigate the performance of all three distinct Vuong tests in the context of IRT models using simulation studies and real data. In the non-nested case we observed that the tests can reliably distinguish between the graded response model and the generalized partial credit model. In the nested case, we observed that the tests typically perform as well as or sometimes better than the traditional likelihood ratio test. Based on these results, we argue that Vuong's approach provides a useful set of tools for researchers and practitioners to effectively compare competing nested and non-nested IRT models.
Collapse
|
12
|
Debelak R, Koller I. Testing the Local Independence Assumption of the Rasch Model With Q 3-Based Nonparametric Model Tests. Appl Psychol Meas 2020; 44:103-117. [PMID: 32076355 PMCID: PMC7003184 DOI: 10.1177/0146621619835501] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Local independence is a central assumption of commonly used item response theory models. Violations of this assumption are usually tested using test statistics based on item pairs. This study presents two quasi-exact tests based on the Q 3 statistic for testing the hypothesis of local independence in the Rasch model. The proposed tests do not require the estimation of item parameters and can also be applied to small data sets. The authors evaluate the tests with three simulation studies. Their results indicate that the quasi-exact tests hold their alpha level under the Rasch model and have higher power against different forms of local dependence than several alternative parametric and nonparametric model tests for local independence.
Collapse
|
13
|
Huelmann T, Debelak R, Strobl C. A Comparison of Aggregation Rules for Selecting Anchor Items in Multigroup DIF Analysis. Journal of Educational Measurement 2019. [DOI: 10.1111/jedm.12246] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
14
|
Abstract
M-fluctuation tests are a recently proposed method for detecting differential item functioning in Rasch models. This article discusses a generalization of this method to two additional item response theory models: the two-parametric logistic model and the three-parametric logistic model with a common guessing parameter. The Type I error rate and the power of this method were evaluated by a variety of simulation studies. The results suggest that the new method allows the detection of various forms of differential item functioning in these models, which also includes differential discrimination and differential guessing effects. It is also robust against moderate violations of several assumptions made in the item parameter estimation.
Collapse
|
15
|
Abstract
For assessing the fit of item response theory models, it has been suggested to apply overall goodness-of-fit tests as well as tests for individual items and item pairs. Although numerous goodness-of-fit tests have been proposed in the literature for the Rasch model, their relative power against several model violations has not been investigated so far. This study compares four of these tests, which are all available in R software: T10, T11, M2, and the LR test. Results on the Type I error rate and the sensitivity to violations of different assumptions of the Rasch model (unidimensionality, local independence on the level of item pairs, equal item discrimination, zero as a lower asymptote for the item characteristic curves, invariance of the item parameters) are reported. The results indicate that the T11 test is comparatively most powerful against violations of the assumption of parallel item characteristic curves, which includes the presence of unequal item discriminations and a non-zero lower asymptote. Against the remaining model violations, which can be summarized as local dependence, M2 is found to be most powerful. T10 and LR are found to be sensitive against violations of the assumption of parallel item characteristic curves, but are insensitive against local dependence.
Collapse
Affiliation(s)
- Rudolf Debelak
- Department of Psychology, University of Zurich, Zurich, Switzerland
| |
Collapse
|
16
|
Walther A, Mahler F, Debelak R, Ehlert U. Psychobiological Protective Factors Modifying the Association Between Age and Sexual Health in Men: Findings From the Men's Health 40+ Study. Am J Mens Health 2017; 11:737-747. [PMID: 28413941 PMCID: PMC5675228 DOI: 10.1177/1557988316689238] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Sexual health severely decreases with age. For males older than 40 years, erectile dysfunction (ED) is the most common sexual disorder. Although physical and psychological risk factors for ED have been identified, protective factors are yet to be determined. To date, no study has examined endocrine and psychosocial factors in parallel with regard to their modifying effect on the age-related increase in ED. Two hundred and seventy-one self-reporting healthy men aged between 40 and 75 years provided both psychometric data on sexual function and a set of potential psychosocial protective factors, and saliva samples for the analysis of steroid hormones and proinflammatory cytokines. Around 35% of the participants reported at least a mild form of ED. Direct associations with ED were identified for perceived general health, emotional support, relationship quality, intimacy motivation but not for steroid hormones or proinflammatory markers. Moderation analyses for the association between age and ED revealed positive effects for testosterone (T), dehydroepiandrosterone (DHEA), perceived general health, emotional support, intimacy motivation, and a negative effect for interleukin-6 (all p < .05; f2 > .17). Group differences between older men with and without ED emerged for T, DHEA, and psychometric measures such as perceived general health, emotional support, satisfaction with life, and intimacy motivation (all p < .05; d > .3). Both psychosocial and endocrine parameters moderated the association between age and sexual health. Perceived general health, emotional support, intimacy motivation, and relationship quality emerged as psychosocial protective factors against ED. Higher T and DHEA and lower interleukin-6 levels also buffered against an age-related increase in ED.
Collapse
|
17
|
Debelak R, Tran US. Comparing the Effects of Different Smoothing Algorithms on the Assessment of Dimensionality of Ordered Categorical Items with Parallel Analysis. PLoS One 2016; 11:e0148143. [PMID: 26845032 PMCID: PMC4742070 DOI: 10.1371/journal.pone.0148143] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Accepted: 01/13/2016] [Indexed: 11/18/2022] Open
Abstract
The analysis of polychoric correlations via principal component analysis and exploratory factor analysis are well-known approaches to determine the dimensionality of ordered categorical items. However, the application of these approaches has been considered as critical due to the possible indefiniteness of the polychoric correlation matrix. A possible solution to this problem is the application of smoothing algorithms. This study compared the effects of three smoothing algorithms, based on the Frobenius norm, the adaption of the eigenvalues and eigenvectors, and on minimum-trace factor analysis, on the accuracy of various variations of parallel analysis by the means of a simulation study. We simulated different datasets which varied with respect to the size of the respondent sample, the size of the item set, the underlying factor model, the skewness of the response distributions and the number of response categories in each item. We found that a parallel analysis and principal component analysis of smoothed polychoric and Pearson correlations led to the most accurate results in detecting the number of major factors in simulated datasets when compared to the other methods we investigated. Of the methods used for smoothing polychoric correlation matrices, we recommend the algorithm based on minimum trace factor analysis.
Collapse
Affiliation(s)
- Rudolf Debelak
- SCHUHFRIED GmbH, Mödling, Austria
- University of Zurich, Zurich, Switzerland
| | | |
Collapse
|
18
|
Kaller CP, Debelak R, Köstering L, Egle J, Rahm B, Wild PS, Blettner M, Beutel ME, Unterrainer JM. Assessing Planning Ability Across the Adult Life Span: Population-Representative and Age-Adjusted Reliability Estimates for the Tower of London (TOL-F). Arch Clin Neuropsychol 2015; 31:148-64. [PMID: 26715472 DOI: 10.1093/arclin/acv088] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/20/2015] [Indexed: 11/13/2022] Open
Abstract
Planning ahead the consequences of future actions is a prototypical executive function. In clinical and experimental neuropsychology, disc-transfer tasks like the Tower of London (TOL) are commonly used for the assessment of planning ability. Previous psychometric evaluations have, however, yielded a poor reliability of measuring planning performance with the TOL. Based on theory-grounded task analyses and a systematic problem selection, the computerized TOL-Freiburg version (TOL-F) was developed to improve the task's psychometric properties for diagnostic applications. Here, we report reliability estimates for the TOL-F from two large samples collected in Mainz, Germany (n = 3,770; 40-80 years) and in Vienna, Austria (n = 830; 16-84 years). Results show that planning accuracy on the TOL-F possesses an adequate internal consistency and split-half reliability (>0.7) that are stable across the adult life span while the TOL-F covers a broad range of graded difficulty even in healthy adults, making it suitable for both research and clinical application.
Collapse
Affiliation(s)
- Christoph P Kaller
- Department of Neurology, University Medical Center Freiburg, Freiburg, Germany Freiburg Brain Imaging Center, University of Freiburg, Freiburg, Germany BrainLinks-BrainTools Cluster of Excellence, University of Freiburg, Freiburg, Germany
| | | | - Lena Köstering
- Department of Neurology, University Medical Center Freiburg, Freiburg, Germany Freiburg Brain Imaging Center, University of Freiburg, Freiburg, Germany BrainLinks-BrainTools Cluster of Excellence, University of Freiburg, Freiburg, Germany Biological and Personality Psychology, Department of Psychology, University of Freiburg, Freiburg, Germany
| | | | - Benjamin Rahm
- Medical Psychology and Medical Sociology, University Medical Center Mainz, Mainz, Germany
| | - Philipp S Wild
- Preventive Cardiology and Preventive Medicine, Department of Medicine II, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany German Center for Cardiovascular Research (DZHK), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany German Center for Cardiovascular Research (DZHK), partner site RhineMain, Mainz, Germany
| | - Maria Blettner
- Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Manfred E Beutel
- Department of Psychosomatic Medicine and Psychotherapy, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Josef M Unterrainer
- Medical Psychology and Medical Sociology, University Medical Center Mainz, Mainz, Germany
| |
Collapse
|
19
|
Gmehlin D, Fuermaier ABM, Walther S, Debelak R, Rentrop M, Westermann C, Sharma A, Tucha L, Koerts J, Tucha O, Weisbrod M, Aschenbrenner S. Intraindividual variability in inhibitory function in adults with ADHD--an ex-Gaussian approach. PLoS One 2014; 9:e112298. [PMID: 25479234 PMCID: PMC4257533 DOI: 10.1371/journal.pone.0112298] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2014] [Accepted: 10/08/2014] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE Attention deficit disorder (ADHD) is commonly associated with inhibitory dysfunction contributing to typical behavioral symptoms like impulsivity or hyperactivity. However, some studies analyzing intraindividual variability (IIV) of reaction times in children with ADHD (cADHD) question a predominance of inhibitory deficits. IIV is a measure of the stability of information processing and provides evidence that longer reaction times (RT) in inhibitory tasks in cADHD are due to only a few prolonged responses which may indicate deficits in sustained attention rather than inhibitory dysfunction. We wanted to find out, whether a slowing in inhibitory functioning in adults with ADHD (aADHD) is due to isolated slow responses. METHODS Computing classical RT measures (mean RT, SD), ex-Gaussian parameters of IIV (which allow a better separation of reaction time (mu), variability (sigma) and abnormally slow responses (tau) than classical measures) as well as errors of omission and commission, we examined response inhibition in a well-established GoNogo task in a sample of aADHD subjects without medication and healthy controls matched for age, gender and education. RESULTS We did not find higher numbers of commission errors in aADHD, while the number of omissions was significantly increased compared with controls. In contrast to increased mean RT, the distributional parameter mu did not document a significant slowing in aADHD. However, subjects with aADHD were characterized by increased IIV throughout the entire RT distribution as indicated by the parameters sigma and tau as well as the SD of reaction time. Moreover, we found a significant correlation between tau and the number of omission errors. CONCLUSIONS Our findings question a primacy of inhibitory deficits in aADHD and provide evidence for attentional dysfunction. The present findings may have theoretical implications for etiological models of ADHD as well as more practical implications for neuropsychological testing in aADHD.
Collapse
Affiliation(s)
- Dennis Gmehlin
- Department of Clinical Psychology and Neuropsychology, SRH Klinikum, Karlsbad-Langensteinbach, Germany
| | - Anselm B. M. Fuermaier
- Department of Clinical and Developmental Neuropsychology, University of Groningen, Groningen, The Netherlands
| | - Stephan Walther
- Section of Experimental Psychopathology and Neurophysiology, Department of child and adolescent Psychiatry, University of Heidelberg, Germany
| | | | - Mirjam Rentrop
- Department of Clinical Psychology and Neuropsychology, SRH Klinikum, Karlsbad-Langensteinbach, Germany
| | - Celina Westermann
- Department of Clinical Psychology and Neuropsychology, SRH Klinikum, Karlsbad-Langensteinbach, Germany
| | - Anuradha Sharma
- Section of Experimental Psychopathology and Neurophysiology, Department of child and adolescent Psychiatry, University of Heidelberg, Germany
| | - Lara Tucha
- Department of Clinical and Developmental Neuropsychology, University of Groningen, Groningen, The Netherlands
| | - Janneke Koerts
- Department of Clinical and Developmental Neuropsychology, University of Groningen, Groningen, The Netherlands
| | - Oliver Tucha
- Department of Clinical and Developmental Neuropsychology, University of Groningen, Groningen, The Netherlands
| | - Matthias Weisbrod
- Psychiatric Department, SRH Klinikum, Karlsbad-Langensteinbach, Germany
- SüdWestAkadamie für Neuropsychologie (SWAN), Heidelberg, Germany
- Section of Experimental Psychopathology and Neurophysiology, Department of child and adolescent Psychiatry, University of Heidelberg, Germany
| | - Steffen Aschenbrenner
- Department of Clinical Psychology and Neuropsychology, SRH Klinikum, Karlsbad-Langensteinbach, Germany
- SüdWestAkadamie für Neuropsychologie (SWAN), Heidelberg, Germany
| |
Collapse
|
20
|
Rodewald K, Bartolovic M, Debelak R, Aschenbrenner S, Weisbrod M, Roesch-Ely D. Eine Normierungsstudie eines modifizierten Trail Making Tests im deutschsprachigen Raum. Zeitschrift für Neuropsychologie 2012. [DOI: 10.1024/1016-264x/a000060] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Der Trail Making Test (TMT) ist ein international weit verbreitetes Verfahren, das z. B. zur Untersuchung von Patienten mit erworbenen Hirnschädigungen eingesetzt wird. Die Performanz im TMT wird mit unterschiedlichen neuropsychologischen Domänen, wie z. B. Aufmerksamkeit und Exekutivfunktionen, in Verbindung gebracht. Trotz der häufigen Anwendung im klinischen Alltag liegen bisher keine deutschsprachigen Normen für einen umfassenden Altersrange vor. Die vorliegende Untersuchung hat daher den Einfluss von Alter und Bildung auf die Bearbeitungszeit im TMT bei deutschsprachigen Erwachsenen im Alter zwischen 18 und 85 Jahren erfasst und analysiert. Ausschlusskriterien bildeten neurologische oder psychiatrische Erkrankungen, die Beeinträchtigung des Blickfeldes bzw. der Sehfähigkeit, die motorische Beeinträchtigung der Arme und Hände sowie Drogen- oder Alkoholmissbrauch. Die Stichprobe ist in vier Altersgruppen aufgeteilt: 18 – 34 Jahre (n = 148), 35 – 49 Jahre (n = 111), 50 – 64 Jahre (n = 93) und 65 – 84 Jahre (n = 53). Hinsichtlich der Bildung wurden zwei Gruppen gebildet: niedriges bis mittleres Bildungsniveau ( ≤ 12 Jahre formale Bildung) und höheres Bildungsniveau ( ≥ 12 Jahre formale Bildung). Signifikante Korrelationen zwischen den demografischen Variablen und den Bearbeitungszeiten im TMT-A bzw. TMT-B zeigen, dass sowohl Alter als auch Bildung mit der Leistung im TMT korrelieren (p < .01). Post hoc Analysen machen deutlich, dass sich dabei alle Altersgruppen voneinander unterscheiden. Die Ergebnisse für die Bearbeitungszeit stehen in Einklang mit früheren Normierungsstudien, die ebenfalls Alter und Bildung als die bedeutsamsten Moderatoren für die Leistung im TMT identifiziert hatten.
Collapse
Affiliation(s)
- Katlehn Rodewald
- Sektion für Experimentelle Psychopathologie und Neurophysiologie, Psychiatrische Abteilung, Zentrum für Psychosoziale Medizin, Universitätsklinikum Heidelberg
- Berufliches Bildungs- und Rehazentrum (BBRZ) Karlsbad-Langensteinbach
| | - Marina Bartolovic
- Sektion für Experimentelle Psychopathologie und Neurophysiologie, Psychiatrische Abteilung, Zentrum für Psychosoziale Medizin, Universitätsklinikum Heidelberg
| | | | | | - Matthias Weisbrod
- Sektion für Experimentelle Psychopathologie und Neurophysiologie, Psychiatrische Abteilung, Zentrum für Psychosoziale Medizin, Universitätsklinikum Heidelberg
- Abteilung für Psychiatrie und Psychotherapie, SRH Klinikum Karlsbad-Langensteinbach
| | - Daniela Roesch-Ely
- Sektion für Experimentelle Psychopathologie und Neurophysiologie, Psychiatrische Abteilung, Zentrum für Psychosoziale Medizin, Universitätsklinikum Heidelberg
| |
Collapse
|