1
|
Kupffer R, Frick S, Wetzel E. Detecting Careless Responding in Multidimensional Forced-Choice Questionnaires. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2024; 84:887-926. [PMID: 39318479 PMCID: PMC11418602 DOI: 10.1177/00131644231222420] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
The multidimensional forced-choice (MFC) format is an alternative to rating scales in which participants rank items according to how well the items describe them. Currently, little is known about how to detect careless responding in MFC data. The aim of this study was to adapt a number of indices used for rating scales to the MFC format and additionally develop several new indices that are unique to the MFC format. We applied these indices to a data set from an online survey (N = 1,169) that included a series of personality questionnaires in the MFC format. The correlations among the careless responding indices were somewhat lower than those published for rating scales. Results from a latent profile analysis suggested that the majority of the sample (about 76-84%) did not respond carelessly, although the ones who did were characterized by different levels of careless responding. In a simulation study, we simulated different careless responding patterns and varied the overall proportion of carelessness in the samples. With one exception, the indices worked as intended conceptually. Taken together, the results suggest that careless responding also plays an important role in the MFC format. Recommendations on how it can be addressed are discussed.
Collapse
Affiliation(s)
| | - Susanne Frick
- University of Mannheim, Germany
- TU Dortmund University, Germany
| | | |
Collapse
|
2
|
Liu CW. Multidimensional item response theory models for testlet-based doubly bounded data. Behav Res Methods 2024; 56:5309-5353. [PMID: 37985636 DOI: 10.3758/s13428-023-02272-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/13/2023] [Indexed: 11/22/2023]
Abstract
A testlet-based visual analogue scale (VAS) is a doubly bounded scaling approach (e.g., from 0% to 100% or from 0 to 1) composed of multiple adjectives, nouns, or sentences (statements/items) within testlets for measuring individuals' attitudes, opinions, or career interests. While testlet-based VASs have many advantages over Likert scales, such as reducing response style effects, the development of proper statistical models for analyzing testlet-based VAS data lags behind. This paper proposes a novel beta copula model and a competing logit-normal model based on the item response theory framework, assessed by Bayesian parameter estimation, model comparison, and goodness-of-fit statistics. An empirical career interest dataset based on a testlet-based VAS design was analyzed using the proposed models. Simulation studies were conducted to assess the two models' parameter recovery. The results show that the beta copula model had superior fit in the empirical data analysis, and also exhibited good parameter recovery in the simulation studies, suggesting that it is a promising statistical approach to testlet-based doubly bounded responses.
Collapse
Affiliation(s)
- Chen-Wei Liu
- Department of Educational Psychology and Counseling, National Taiwan Normal University, Taipei, Taiwan.
| |
Collapse
|
3
|
Jansen MT, Schulze R. Linear Factor Analytic Thurstonian Forced-Choice Models: Current Status and Issues. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2024; 84:660-690. [PMID: 39055095 PMCID: PMC11268391 DOI: 10.1177/00131644231205011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Thurstonian forced-choice modeling is considered to be a powerful new tool to estimate item and person parameters while simultaneously testing the model fit. This assessment approach is associated with the aim of reducing faking and other response tendencies that plague traditional self-report trait assessments. As a result of major recent methodological developments, the estimation of normative trait scores has become possible in addition to the computation of only ipsative scores. This opened up the important possibility of comparisons between individuals with forced-choice assessment procedures. With item response theory (IRT) methods, a multidimensional forced-choice (MFC) format has also been proposed to estimate individual scores. Customarily, items to assess different traits are presented in blocks, often triplets, in applications of the MFC, which is an efficient form of item presentation but also a simplification of the original models. The present study provides a comprehensive review of the present status of Thurstonian forced-choice models and their variants. Critical features of the current models, especially the block models, are identified and discussed. It is concluded that MFC modeling with item blocks is highly problematic and yields biased results. In particular, the often-recommended presentation of blocks with items that are keyed in different directions of a trait proves to be counterproductive considering the goal to reduce response tendencies. The consequences and implications of the highlighted issues are further discussed.
Collapse
|
4
|
Sun L, Qin Z, Wang S, Tian X, Luo F. Contributions to Constructing Forced-Choice Questionnaires Using the Thurstonian IRT Model. MULTIVARIATE BEHAVIORAL RESEARCH 2024; 59:229-250. [PMID: 37776890 DOI: 10.1080/00273171.2023.2248979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/02/2023]
Abstract
Forced-choice questionnaires involve presenting items in blocks and asking respondents to provide a full or partial ranking of the items within each block. To prevent involuntary or voluntary response distortions, blocks are usually formed of items that possess similar levels of desirability. Assembling forced-choice blocks is not a trivial process, because in addition to desirability, both the direction and magnitude of relationships between items and the traits being measured (i.e., factor loadings) need to be carefully considered. Based on simulations and empirical studies using item pairs, we provide recommendations on how to construct item pairs matched by desirability. When all pairs contain items keyed in the same direction, score reliability is improved by maximizing within-block loading differences. Higher reliability is obtained when even a small number of pairs consist of unequally keyed items.
Collapse
Affiliation(s)
- Luning Sun
- The Psychometrics Centre, University of Cambridge
| | - Zijie Qin
- Faculty of Psychology, Beijing Normal University
| | - Shan Wang
- Faculty of Psychology, Beijing Normal University
| | - Xuetao Tian
- Faculty of Psychology, Beijing Normal University
| | - Fang Luo
- Faculty of Psychology, Beijing Normal University
| |
Collapse
|
5
|
Wang Q, Zheng Y, Liu K, Cai Y, Peng S, Tu D. Item selection methods in multidimensional computerized adaptive testing for forced-choice items using Thurstonian IRT model. Behav Res Methods 2024; 56:600-614. [PMID: 36750522 DOI: 10.3758/s13428-022-02037-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2022] [Indexed: 02/09/2023]
Abstract
Multidimensional computerized adaptive testing for forced-choice items (MFC-CAT) combines the benefits of multidimensional forced-choice (MFC) items and computerized adaptive testing (CAT) in that it eliminates response biases and reduces administration time. Previous studies that explored designs of MFC-CAT only discussed item selection methods based on the Fisher information (FI), which is known to perform unstably at early stages of CAT. This study proposes a set of new item selection methods based on the KL information for MFC-CAT (namely MFC-KI, MFC-KB, and MFC-KLP) based on the Thurstonian IRT (TIRT) model. Three simulation studies, including one based on real data, were conducted to compare the performance of the proposed KL-based item selection methods against the existing FI-based methods in three- and five-dimensional MFC-CAT scenarios with various test lengths and inter-trait correlations. Results demonstrate that the proposed KL-based item selection methods are feasible for MFC-CAT and generate acceptable trait estimation accuracy and uniformity of item pool usage. Among the three proposed methods, MFC-KB and MFC-KLP outperformed the existing FI-based item selection methods and resulted in the most accurate trait estimation and relatively even utilization of the item pool.
Collapse
Affiliation(s)
- Qin Wang
- Jiangxi Normal University, Nanchang, China
| | - Yi Zheng
- Arizonal State Univerity, Tempe, AZ, USA
| | - Kai Liu
- Jiangxi Normal University, Nanchang, China
| | - Yan Cai
- Jiangxi Normal University, Nanchang, China.
| | - Siwei Peng
- Jiangxi Normal University, Nanchang, China
| | - Dongbo Tu
- Jiangxi Normal University, Nanchang, China.
| |
Collapse
|
6
|
Frick S. Estimating and Using Block Information in the Thurstonian IRT Model. PSYCHOMETRIKA 2023; 88:1556-1589. [PMID: 37640828 PMCID: PMC10656335 DOI: 10.1007/s11336-023-09931-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Indexed: 08/31/2023]
Abstract
Multidimensional forced-choice (MFC) tests are increasing in popularity but their construction is complex. The Thurstonian item response model (Thurstonian IRT model) is most often used to score MFC tests that contain dominance items. Currently, in a frequentist framework, information about the latent traits in the Thurstonian IRT model is computed for binary outcomes of pairwise comparisons, but this approach neglects stochastic dependencies. In this manuscript, it is shown how to estimate Fisher information on the block level. A simulation study showed that the observed and expected standard errors based on the block information were similarly accurate. When local dependencies for block sizes [Formula: see text] were neglected, the standard errors were underestimated, except with the maximum a posteriori estimator. It is shown how the multidimensional block information can be summarized for test construction. A simulation study and an empirical application showed small differences between the block information summaries depending on the outcome considered. Thus, block information can aid the construction of reliable MFC tests.
Collapse
Affiliation(s)
- Susanne Frick
- University of Mannheim, Mannheim, Germany.
- TU Dortmund University, Dortmund, Germany.
| |
Collapse
|
7
|
Huang HY. Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2023; 83:146-180. [PMID: 36601255 PMCID: PMC9806518 DOI: 10.1177/00131644211069906] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs) can provide information regarding the mastery status of test takers on latent discrete variables and are more commonly used for cognitive tests employed in educational settings than for noncognitive tests. The purpose of this study is to develop a new class of DCM for FC items under the higher-order DCM framework to meet the practical demands of simultaneously controlling for response biases and providing diagnostic classification information. By conducting a series of simulations and calibrating the model parameters with a Bayesian estimation, the study shows that, in general, the model parameters can be recovered satisfactorily with the use of long tests and large samples. More attributes improve the precision of the second-order latent trait estimation in a long test, but decrease the classification accuracy and the estimation quality of the structural parameters. When statements are allowed to load on two distinct attributes in paired comparison items, the specific-attribute condition produces better a parameter estimation than the overlap-attribute condition. Finally, an empirical analysis related to work-motivation measures is presented to demonstrate the applications and implications of the new model.
Collapse
Affiliation(s)
- Hung-Yu Huang
- University of Taipei, Taiwan
- Hung-Yu Huang, Distinguished Professor,
Department of Psychology and Counseling, University of Taipei, No.1, Ai-Guo West
Road, Taipei, 10048, Taiwan.
| |
Collapse
|
8
|
Frick S, Brown A, Wetzel E. Investigating the Normativity of Trait Estimates from Multidimensional Forced-Choice Data. MULTIVARIATE BEHAVIORAL RESEARCH 2023; 58:1-29. [PMID: 34464217 DOI: 10.1080/00273171.2021.1938960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The Thurstonian item response model (Thurstonian IRT model) allows deriving normative trait estimates from multidimensional forced-choice (MFC) data. In the MFC format, persons must rank-order items that measure different attributes according to how well the items describe them. This study evaluated the normativity of Thurstonian IRT trait estimates both in a simulation and empirically. The simulation investigated normativity and compared Thurstonian IRT trait estimates to those using classical partially ipsative scoring, from dichotomous true-false (TF) data and rating scale data. The results showed that, with blocks of opposite keyed items, Thurstonian IRT trait estimates were normative in contrast to classical partially ipsative estimates. Unbalanced numbers of items per trait, few opposite keyed items, traits correlated positively or assessing fewer traits did not decrease measurement precision markedly. Measurement precision was lower than that of rating scale data. The empirical study investigated whether relative MFC responses provide a better differentiation of behaviors within persons than absolute TF responses. However, criterion validity was equal and construct validity (with constructs measured by rating scales) lower in MFC. Thus, Thurstonian IRT modeling of MFC data overcomes the drawbacks of classical scoring, but gains in validity may depend on eliminating common method biases from the comparison.
Collapse
Affiliation(s)
- Susanne Frick
- Department of Psychology, School of Social Sciences, University of Mannheim
| | - Anna Brown
- Department of Psychology, University of Kent
| | - Eunike Wetzel
- Department of Psychology, Otto-von-Guericke University Magdeburg
- Department of Psychology, University of Koblenz-Landau
| |
Collapse
|
9
|
Frick S. Modeling Faking in the Multidimensional Forced-Choice Format: The Faking Mixture Model. PSYCHOMETRIKA 2022; 87:773-794. [PMID: 34927219 PMCID: PMC9166892 DOI: 10.1007/s11336-021-09818-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 08/31/2021] [Accepted: 10/15/2021] [Indexed: 06/14/2023]
Abstract
The multidimensional forced-choice (MFC) format has been proposed to reduce faking because items within blocks can be matched on desirability. However, the desirability of individual items might not transfer to the item blocks. The aim of this paper is to propose a mixture item response theory model for faking in the MFC format that allows to estimate the fakability of MFC blocks, termed the Faking Mixture model. Given current computing capabilities, within-subject data from both high- and low-stakes contexts are needed to estimate the model. A simulation showed good parameter recovery under various conditions. An empirical validation showed that matching was necessary but not sufficient to create an MFC questionnaire that can reduce faking. The Faking Mixture model can be used to reduce fakability during test construction.
Collapse
Affiliation(s)
- Susanne Frick
- Department of Psychology, School of Social Sciences, Mannheim, Germany.
| |
Collapse
|
10
|
Mollica C, Tardella L. Remarkable properties for diagnostics and inference of ranking data modelling. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2022; 75:334-362. [PMID: 35132613 PMCID: PMC9305251 DOI: 10.1111/bmsp.12260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 11/02/2021] [Indexed: 06/14/2023]
Abstract
The Plackett-Luce model (PL) for ranked data assumes the forward order of the ranking process. This hypothesis postulates that the ranking process of the items is carried out by sequentially assigning the positions from the top (most liked) to the bottom (least liked) alternative. This assumption has been recently relaxed with the Extended Plackett-Luce model (EPL) through the introduction of the discrete reference order parameter, describing the rank attribution path. By starting from two formal properties of the EPL, the former related to the inverse ordering of the item probabilities at the first and last stage of the ranking process and the latter well-known as independence of irrelevant alternatives (or Luce's choice axiom), we derive novel diagnostic tools for testing the appropriateness of the EPL assumption as the actual sampling distribution of the observed rankings. These diagnostic tools can help uncovering possible idiosyncratic paths in the sequential choice process. Besides contributing to fill the gap of goodness-of-fit methods for the family of multistage models, we also show how one of the two statistics can be conveniently exploited to construct a heuristic method, that surrogates the maximum likelihood approach for inferring the underlying reference order parameter. The relative performance of the proposals, compared with more conventional approaches, is illustrated by means of extensive simulation studies.
Collapse
Affiliation(s)
- Cristina Mollica
- Dipartimento di Scienze StatisticheSapienza Università di RomaItaly
| | - Luca Tardella
- Dipartimento di Scienze StatisticheSapienza Università di RomaItaly
| |
Collapse
|
11
|
Genome-Enabled Prediction Methods Based on Machine Learning. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:189-218. [PMID: 35451777 DOI: 10.1007/978-1-0716-2205-6_7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Growth of artificial intelligence and machine learning (ML) methodology has been explosive in recent years. In this class of procedures, computers get knowledge from sets of experiences and provide forecasts or classification. In genome-wide based prediction (GWP), many ML studies have been carried out. This chapter provides a description of main semiparametric and nonparametric algorithms used in GWP in animals and plants. Thirty-four ML comparative studies conducted in the last decade were used to develop a meta-analysis through a Thurstonian model, to evaluate algorithms with the best predictive qualities. It was found that some kernel, Bayesian, and ensemble methods displayed greater robustness and predictive ability. However, the type of study and data distribution must be considered in order to choose the most appropriate model for a given problem.
Collapse
|
12
|
Martínez A, Salgado JF. A Meta-Analysis of the Faking Resistance of Forced-Choice Personality Inventories. Front Psychol 2021; 12:732241. [PMID: 34659043 PMCID: PMC8511514 DOI: 10.3389/fpsyg.2021.732241] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 09/01/2021] [Indexed: 11/13/2022] Open
Abstract
This study presents a comprehensive meta-analysis on the faking resistance of forced-choice (FC) inventories. The results showed that (1) FC inventories show resistance to faking behavior; (2) the magnitude of faking is higher in experimental contexts than in real-life selection processes, suggesting that the effects of faking may be, in part, a laboratory phenomenon; and (3) quasi-ipsative FC inventories are more resistant to faking than the other FC formats. Smaller effect sizes were found for conscientiousness when the quasi-ipsative format was used (δ = 0.49 vs. δ = 1.27 for ipsative formats). Also, the effect sizes were smaller for the applicant samples than for the experimental samples. Finally, the contributions and practical implications of these findings are discussed.
Collapse
Affiliation(s)
- Alexandra Martínez
- Department of Political Science and Sociology, Faculty of Labor Relations, University of Santiago de Compostela, Santiago de Compostela, Spain
| | | |
Collapse
|
13
|
Development of the Child- and Parent-Rated Scales of Food Allergy Anxiety (SOFAA). THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY. IN PRACTICE 2021; 10:161-169.e6. [PMID: 34265450 DOI: 10.1016/j.jaip.2021.06.039] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 06/07/2021] [Accepted: 06/23/2021] [Indexed: 12/20/2022]
Abstract
BACKGROUND Anxiety can be excessive and impairing in children with food allergy (FA). There is no accepted condition-specific measure of anxiety for this population. OBJECTIVE To evaluate the validity and reliability of new child- and parent-rated measures of FA-related anxiety in youth. METHODS Items for the Scale of Food Allergy Anxiety (SOFAA) were developed by a cognitive-behavioral therapist specializing in pediatric anxiety, in consultation with FA medical professionals and parents of children with FA. Dyads (n = 77) of children with FA (aged 8-18 years; 42.9% females) and their parents (95.5% females) completed full versions of the SOFAA (21 items; scored 0-4) via online survey. RESULTS The child-rated SOFFA-C mean score was 29.1 ± 18.3; the parent-rated SOFAA-P mean score was 33.9 ± 16.1. Higher scores indicate higher reported anxiety. Coefficient alphas were 0.94 and 0.92. Factor analyses and item-response theory analyses supported the creation of the 14-item SOFAA-C-brief and the 7-item SOFAA-P-brief, accounting for 93% and 79% of total variance, respectively. Correlations revealed strong convergence between child- and parent-report for both the full (r = 0.85) and brief (r = 0.79) versions. Correlations with a generic measure of child anxiety (Screen for Child Anxiety Related Disorders) and the Food Allergy Quality of Life Questionnaire ranged from moderate to strong, whereas those with a generic measure of child eating problems (About Your Child's Eating) were weak to moderate, supporting convergent and divergent validity. Scores of 48 dyads who completed SOFAAs at time 2 (mean, 16.0 days) appeared stable over time, supporting test-retest reliability. CONCLUSIONS The 21-item SOFAA-C and SOFAA-P are reliable and valid scales for measuring condition-specific anxiety in youth with FA. As shorter screening measures, the SOFAA-C-brief and the SOFAA-P-brief are also reliable and valid.
Collapse
|
14
|
Calderón Carvajal C, Ximénez Gómez C, Lay-Lisboa S, Briceño M. Reviewing the Structure of Kolb’s Learning Style Inventory From Factor Analysis and Thurstonian Item Response Theory (IRT) Model Approaches. JOURNAL OF PSYCHOEDUCATIONAL ASSESSMENT 2021. [DOI: 10.1177/07342829211003739] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Kolb’s Learning Style Inventory (LSI) continues to generate a great debate among researchers, given the contradictory evidence resulting from its psychometric properties. One primary criticism focuses on the artificiality of the results derived from its internal structure because of the ipsative nature of the forced-choice format. This study seeks to contribute to the resolution of this debate. A short version of Kolb’s LSI with a forced-choice format and an additional inventory scored on a Likert scale was completed by a sample of students of the University Católica del Norte in Antofagasta, Chile. The data obtained from the two forms of the reduced version of the LSI were compared using principal component analysis, confirmatory factor analysis, and the Thurstonian Item Response Theory model. The results support the hypothesis of the existence of four learning mode dimensions. However, they do not support the existence of the learning styles as proposed by Kolb, indicating that said reports are the product of the artificial structure generated by the ipsative forced-choice format .
Collapse
Affiliation(s)
| | | | - Siu Lay-Lisboa
- School of Psychology, Universidad Católica del Norte, Antofagasta, Chile
| | - Mauricio Briceño
- School of Psychology, Universidad Católica del Norte, Antofagasta, Chile
| |
Collapse
|
15
|
Abstract
Forced-choice (FC) assessments of noncognitive psychological constructs (e.g., personality, behavioral tendencies) are popular in high-stakes organizational testing scenarios (e.g., informing hiring decisions) due to their enhanced resistance against response distortions (e.g., faking good, impression management). The measurement precisions of FC assessment scores used to inform personnel decisions are of paramount importance in practice. Different types of reliability estimates are reported for FC assessment scores in current publications, while consensus on best practices appears to be lacking. In order to provide understanding and structure around the reporting of FC reliability, this study systematically examined different types of reliability estimation methods for Thurstonian IRT-based FC assessment scores: their theoretical differences were discussed, and their numerical differences were illustrated through a series of simulations and empirical studies. In doing so, this study provides a practical guide for appraising different reliability estimation methods for IRT-based FC assessment scores.
Collapse
|
16
|
Lang JW, Tay L. The Science and Practice of Item Response Theory in Organizations. ANNUAL REVIEW OF ORGANIZATIONAL PSYCHOLOGY AND ORGANIZATIONAL BEHAVIOR 2021. [DOI: 10.1146/annurev-orgpsych-012420-061705] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Item response theory (IRT) is a modeling approach that links responses to test items with underlying latent constructs through formalized statistical models. This article focuses on how IRT can be used to advance science and practice in organizations. We describe established applications of IRT as a scale development tool and new applications of IRT as a research and theory testing tool that enables organizational researchers to improve their understanding of workers and organizations. We focus on IRT models and their application in four key research and practice areas: testing, questionnaire responding, construct validation, and measurement equivalence of scores. In so doing, we highlight how novel developments in IRT such as explanatory IRT, multidimensional IRT, random item models, and more complex models of response processes such as ideal point models and tree models can potentially advance existing science and practice in these areas. As a starting point for readers interested in learning IRT and applying recent developments in IRT in their research, we provide concrete examples with data and R code.
Collapse
Affiliation(s)
- Jonas W.B. Lang
- Department of Human Resource Management and Organizational Psychology, Ghent University, B-9000 Gent, Belgium
- Business School, University of Exeter, EX4 4PU Exeter, United Kingdom
| | - Louis Tay
- Department of Psychological Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| |
Collapse
|
17
|
Lee H, Smith WZ. Fit Indices for Measurement Invariance Tests in the Thurstonian IRT Model. APPLIED PSYCHOLOGICAL MEASUREMENT 2020; 44:282-295. [PMID: 32536730 PMCID: PMC7262996 DOI: 10.1177/0146621619893785] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This study examined whether cutoffs in fit indices suggested for traditional formats with maximum likelihood estimators can be utilized to assess model fit and to test measurement invariance when a multiple group confirmatory factor analysis was employed for the Thurstonian item response theory (IRT) model. Regarding the performance of the evaluation criteria, detection of measurement non-invariance and Type I error rates were examined. The impact of measurement non-invariance on estimated scores in the Thurstonian IRT model was also examined through accuracy and efficiency in score estimation. The fit indices used for the evaluation of model fit performed well. Among six cutoffs for changes in model fit indices, only ΔCFI > .01 and ΔNCI > .02 detected metric non-invariance when the medium magnitude of non-invariance occurred and none of the cutoffs performed well to detect scalar non-invariance. Based on the generated sampling distributions of fit index differences, this study suggested ΔCFI > .001 and ΔNCI > .004 for scalar non-invariance and ΔCFI > .007 for metric non-invariance. Considering Type I error rate control and detection rates of measurement non-invariance, ΔCFI was recommended for measurement non-invariance tests for forced-choice format data. Challenges in measurement non-invariance tests in the Thurstonian IRT model were discussed along with the direction for future research to enhance the utility of forced-choice formats in test development for cross-cultural and international settings.
Collapse
Affiliation(s)
- HyeSun Lee
- California State University Channel Islands, Camarillo, USA
| | | |
Collapse
|
18
|
Bürkner PC, Schulte N, Holling H. On the Statistical and Practical Limitations of Thurstonian IRT Models. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2019; 79:827-854. [PMID: 31488915 PMCID: PMC6713979 DOI: 10.1177/0013164419832063] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Forced-choice questionnaires have been proposed to avoid common response biases typically associated with rating scale questionnaires. To overcome ipsativity issues of trait scores obtained from classical scoring approaches of forced-choice items, advanced methods from item response theory (IRT) such as the Thurstonian IRT model have been proposed. For convenient model specification, we introduce the thurstonianIRT R package, which uses Mplus, lavaan, and Stan for model estimation. Based on practical considerations, we establish that items within one block need to be equally keyed to achieve similar social desirability, which is essential for creating forced-choice questionnaires that have the potential to resist faking intentions. According to extensive simulations, measuring up to five traits using blocks of only equally keyed items does not yield sufficiently accurate trait scores and inter-trait correlation estimates, neither for frequentist nor for Bayesian estimation methods. As a result, persons' trait scores remain partially ipsative and, thus, do not allow for valid comparisons between persons. However, we demonstrate that trait scores based on only equally keyed blocks can be improved substantially by measuring a sizable number of traits. More specifically, in our simulations of 30 traits, scores based on only equally keyed blocks were non-ipsative and highly accurate. We conclude that in high-stakes situations where persons are motivated to give fake answers, Thurstonian IRT models should only be applied to tests measuring a sizable number of traits.
Collapse
|
19
|
Walton KE, Cherkasova L, Roberts RD. On the Validity of Forced Choice Scores Derived From the Thurstonian Item Response Theory Model. Assessment 2019; 27:706-718. [PMID: 31007043 DOI: 10.1177/1073191119843585] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Forced choice (FC) measures may be a desirable alternative to single stimulus (SS) Likert items, which are easier to fake and can have associated response biases. However, classical methods of scoring FC measures lead to ipsative data, which have a number of psychometric problems. A Thurstonian item response theory (TIRT) model has been introduced as a way to overcome these issues, but few empirical validity studies have been conducted to ensure its effectiveness. This was the goal of the current three studies, which used FC measures of domains from popular personality frameworks including the Big Five and HEXACO, and both statement and adjective item stems. We computed TIRT and ipsative scores and compared their validity estimates. Convergent and discriminant validity of the scores were evaluated by correlating them with SS scores, and test-criterion validity evidence was evaluated by examining their relationships with meaningful outcomes. In all three studies, there was evidence for the convergent and test-criterion validity of the TIRT scores, though at times this was on par with the validity of the ipsative scores. The discriminant validity of the TIRT scores was problematic and was often worse than the ipsative scores.
Collapse
Affiliation(s)
| | | | - Richard D Roberts
- Research and Assessment Design (RAD): Science Solution, Philadelphia, PA, USA
| |
Collapse
|
20
|
Nye CD, Joo SH, Zhang B, Stark S. Advancing and Evaluating IRT Model Data Fit Indices in Organizational Research. ORGANIZATIONAL RESEARCH METHODS 2019. [DOI: 10.1177/1094428119833158] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Item response theory (IRT) models have a number of advantages for developing and evaluating scales in organizational research. However, these advantages can be obtained only when the IRT model used to estimate the parameters fits the data well. Therefore, examining IRT model fit is important before drawing conclusions from the data. To test model fit, a wide range of indices are available in the IRT literature and have demonstrated utility in past research. Nevertheless, the performance of many of these indices for detecting misfit has not been directly compared in simulations. The current study evaluates a number of these indices to determine their utility for detecting various types of misfit in both dominance and ideal point IRT models. Results indicate that some indices are more effective than others but that none of the indices accurately detected misfit due to multidimensionality in the data. The implications of these results for future organizational research are discussed.
Collapse
Affiliation(s)
| | | | - Bo Zhang
- University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | | |
Collapse
|
21
|
Dueber DM, Love AMA, Toland MD, Turner TA. Comparison of Single-Response Format and Forced-Choice Format Instruments Using Thurstonian Item Response Theory. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2019; 79:108-128. [PMID: 30636784 PMCID: PMC6318742 DOI: 10.1177/0013164417752782] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
One of the most cited methodological issues is with the response format, which is traditionally a single-response Likert response format. Therefore, our study aims to elucidate and illustrate an alternative response format and analytic technique, Thurstonian item response theory (IRT), for analyzing data from surveys using an alternate response format, the forced-choice format. Specifically, we strove to give a thorough introduction of Thurstonian IRT at a more elementary level than previous publications in order to widen the possible audience. This article presents analyses and comparison of two versions of a self-report scale, one version using a single-response format and the other using a forced-choice format. Drawing from lessons learned from our study and literature, we present a number of recommendations for conducting research using the forced-choice format and Thurstonian IRT, as well as suggested avenues for future research.
Collapse
|
22
|
Sass R, Frick S, Reips UD, Wetzel E. Taking the Test Taker's Perspective: Response Process and Test Motivation in Multidimensional Forced-Choice Versus Rating Scale Instruments. Assessment 2018; 27:572-584. [PMID: 29560735 DOI: 10.1177/1073191118762049] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The multidimensional forced-choice (MFC) format has been proposed as an alternative to the rating scale (RS) response format. However, it is unclear how changing the response format may affect the response process and test motivation of participants. In Study 1, we investigated the MFC response process using the think-aloud technique. In Study 2, we compared test motivation between the RS format and different versions of the MFC format (presenting 2, 3, 4, and 5 items simultaneously). The response process to MFC item blocks was similar to the RS response process but involved an additional step of weighing the items within a block against each other. The RS and MFC response format groups did not differ in their test motivation. Thus, from the test taker's perspective, the MFC format is somewhat more demanding to respond to, but this does not appear to decrease test motivation.
Collapse
Affiliation(s)
| | | | | | - Eunike Wetzel
- University of Konstanz, Konstanz, Germany.,Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| |
Collapse
|
23
|
Pavlov G, Maydeu-Olivares A, Fairchild AJ. Effects of Applicant Faking on Forced-Choice and Likert Scores. ORGANIZATIONAL RESEARCH METHODS 2018. [DOI: 10.1177/1094428117753683] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Affiliation(s)
- Goran Pavlov
- Department of Psychology, University of South Carolina, Columbia, SC, USA
| | - Alberto Maydeu-Olivares
- Department of Psychology, University of South Carolina, Columbia, SC, USA
- Faculty of Psychology, University of Barcelona, Barcelona, Spain
| | | |
Collapse
|
24
|
Wang WC, Qiu XL, Chen CW, Ro S, Jin KY. Item Response Theory Models for Ipsative Tests With Multidimensional Pairwise Comparison Items. APPLIED PSYCHOLOGICAL MEASUREMENT 2017; 41:600-613. [PMID: 29881107 PMCID: PMC5978479 DOI: 10.1177/0146621617703183] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
There is re-emerging interest in adopting forced-choice items to address the issue of response bias in Likert-type items for noncognitive latent traits. Multidimensional pairwise comparison (MPC) items are commonly used forced-choice items. However, few studies have been aimed at developing item response theory models for MPC items owing to the challenges associated with ipsativity. Acknowledging that the absolute scales of latent traits are not identifiable in ipsative tests, this study developed a Rasch ipsative model for MPC items that has desirable measurement properties, yields a single utility value for each statement, and allows for comparing psychological differentiation between and within individuals. The simulation results showed a good parameter recovery for the new model with existing computer programs. This article provides an empirical example of an ipsative test on work style and behaviors.
Collapse
Affiliation(s)
| | - Xue-Lan Qiu
- The Education University of Hong Kong, Hong Kong
| | | | | | - Kuan-Yu Jin
- The Education University of Hong Kong, Hong Kong
| |
Collapse
|
25
|
Merk J, Schlotz W, Falter T. The Motivational Value Systems Questionnaire (MVSQ): Psychometric Analysis Using a Forced Choice Thurstonian IRT Model. Front Psychol 2017; 8:1626. [PMID: 28979228 PMCID: PMC5611709 DOI: 10.3389/fpsyg.2017.01626] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 09/04/2017] [Indexed: 11/24/2022] Open
Abstract
This study presents a new measure of value systems, the Motivational Value Systems Questionnaire (MVSQ), which is based on a theory of value systems by psychologist Clare W. Graves. The purpose of the instrument is to help people identify their personal hierarchies of value systems and thus become more aware of what motivates and demotivates them in work-related contexts. The MVSQ is a forced-choice (FC) measure, making it quicker to complete and more difficult to intentionally distort, but also more difficult to assess its psychometric properties due to ipsativity of FC data compared to rating scales. To overcome limitations of ipsative data, a Thurstonian IRT (TIRT) model was fitted to the questionnaire data, based on a broad sample of N = 1,217 professionals and students. Comparison of normative (IRT) scale scores and ipsative scores suggested that MVSQ IRT scores are largely freed from restrictions due to ipsativity and thus allow interindividual comparison of scale scores. Empirical reliability was estimated using a sample-based simulation approach which showed acceptable and good estimates and, on average, slightly higher test-retest reliabilities. Further, validation studies provided evidence on both construct validity and criterion-related validity. Scale score correlations and associations of scores with both age and gender were largely in line with theoretically- and empirically-based expectations, and results of a multitrait-multimethod analysis supports convergent and discriminant construct validity. Criterion validity was assessed by examining the relation of value system preferences to departmental affiliation which revealed significant relations in line with prior hypothesizing. These findings demonstrate the good psychometric properties of the MVSQ and support its application in the assessment of value systems in work-related contexts.
Collapse
Affiliation(s)
- Josef Merk
- Faculty of Business Studies, Regensburg Technical University of Applied SciencesRegensburg, Germany.,Institute of Experimental Psychology, University of RegensburgRegensburg, Germany
| | - Wolff Schlotz
- Institute of Experimental Psychology, University of RegensburgRegensburg, Germany.,Max Planck Institute for Empirical AestheticsFrankfurt, Germany
| | - Thomas Falter
- Faculty of Business Studies, Regensburg Technical University of Applied SciencesRegensburg, Germany
| |
Collapse
|
26
|
Moderator effects of job complexity on the validity of forced-choice personality inventories for predicting job performance. JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 2017. [DOI: 10.1016/j.rpto.2017.07.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
27
|
Brown A, Inceoglu I, Lin Y. Preventing Rater Biases in 360-Degree Feedback by Forcing Choice. ORGANIZATIONAL RESEARCH METHODS 2016. [DOI: 10.1177/1094428116668036] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We examined the effects of response biases on 360-degree feedback using a large sample ( N = 4,675) of organizational appraisal data. Sixteen competencies were assessed by peers, bosses, and subordinates of 922 managers as well as self-assessed using the Inventory of Management Competencies (IMC) administered in two formats—Likert scale and multidimensional forced choice. Likert ratings were subject to strong response biases, making even theoretically unrelated competencies correlate highly. Modeling a latent common method factor, which represented nonuniform distortions similar to those of “ideal-employee” factor in both self- and other assessments, improved validity of competency scores as evidenced by meaningful second-order factor structures, better interrater agreement, and better convergent correlations with an external personality measure. Forced-choice rankings modeled with Thurstonian item response theory (IRT) yielded as good construct and convergent validities as the bias-controlled Likert ratings and slightly better rater agreement. We suggest that the mechanism for these enhancements is finer differentiation between behaviors in comparative judgements and advocate the operational use of the multidimensional forced-choice response format as an effective bias prevention method.
Collapse
Affiliation(s)
- Anna Brown
- School of Psychology, University of Kent, Canterbury, Kent, UK
| | - Ilke Inceoglu
- Surrey Business School, University of Surrey, Guildford, Surrey, UK
| | - Yin Lin
- School of Psychology, University of Kent, Canterbury, Kent, UK
- CEB SHL Talent Measurement Solutions, Thames Ditton, Surrey, UK
| |
Collapse
|
28
|
Wetzel E, Roberts BW, Fraley RC, Brown A. Equivalence of Narcissistic Personality Inventory constructs and correlates across scoring approaches and response formats. JOURNAL OF RESEARCH IN PERSONALITY 2016. [DOI: 10.1016/j.jrp.2015.12.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
29
|
Brown A. Item Response Models for Forced-Choice Questionnaires: A Common Framework. PSYCHOMETRIKA 2016; 81:135-60. [PMID: 25663304 DOI: 10.1007/s11336-014-9434-9] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Indexed: 05/26/2023]
Abstract
In forced-choice questionnaires, respondents have to make choices between two or more items presented at the same time. Several IRT models have been developed to link respondent choices to underlying psychological attributes, including the recent MUPP (Stark et al. in Appl Psychol Meas 29:184-203, 2005) and Thurstonian IRT (Brown and Maydeu-Olivares in Educ Psychol Meas 71:460-502, 2011) models. In the present article, a common framework is proposed that describes forced-choice models along three axes: (1) the forced-choice format used; (2) the measurement model for the relationships between items and psychological attributes they measure; and (3) the decision model for choice behavior. Using the framework, fundamental properties of forced-choice measurement of individual differences are considered. It is shown that the scale origin for the attributes is generally identified in questionnaires using either unidimensional or multidimensional comparisons. Both dominance and ideal point models can be used to provide accurate forced-choice measurement; and the rules governing accurate person score estimation with these models are remarkably similar.
Collapse
Affiliation(s)
- Anna Brown
- School of Psychology, University of Kent, Canterbury, Kent, CT2 7NP , UK.
| |
Collapse
|
30
|
Van Dam NT, Brown A, Mole TB, Davis JH, Britton WB, Brewer JA. Development and Validation of the Behavioral Tendencies Questionnaire. PLoS One 2015; 10:e0140867. [PMID: 26535904 PMCID: PMC4633225 DOI: 10.1371/journal.pone.0140867] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 10/01/2015] [Indexed: 11/24/2022] Open
Abstract
At a fundamental level, taxonomy of behavior and behavioral tendencies can be described in terms of approach, avoid, or equivocate (i.e., neither approach nor avoid). While there are numerous theories of personality, temperament, and character, few seem to take advantage of parsimonious taxonomy. The present study sought to implement this taxonomy by creating a questionnaire based on a categorization of behavioral temperaments/tendencies first identified in Buddhist accounts over fifteen hundred years ago. Items were developed using historical and contemporary texts of the behavioral temperaments, described as “Greedy/Faithful”, “Aversive/Discerning”, and “Deluded/Speculative”. To both maintain this categorical typology and benefit from the advantageous properties of forced-choice response format (e.g., reduction of response biases), binary pairwise preferences for items were modeled using Latent Class Analysis (LCA). One sample (n1 = 394) was used to estimate the item parameters, and the second sample (n2 = 504) was used to classify the participants using the established parameters and cross-validate the classification against multiple other measures. The cross-validated measure exhibited good nomothetic span (construct-consistent relationships with related measures) that seemed to corroborate the ideas present in the original Buddhist source documents. The final 13-block questionnaire created from the best performing items (the Behavioral Tendencies Questionnaire or BTQ) is a psychometrically valid questionnaire that is historically consistent, based in behavioral tendencies, and promises practical and clinical utility particularly in settings that teach and study meditation practices such as Mindfulness Based Stress Reduction (MBSR).
Collapse
Affiliation(s)
- Nicholas T. Van Dam
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- Nathan S. Kline Institute for Psychiatric Research, Orangeburg, New York, United States of America
| | - Anna Brown
- School of Psychology, University of Kent, Canterbury, United Kingdom
| | - Tom B. Mole
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
| | - Jake H. Davis
- Graduate Center, City University of New York, New York, New York, United States of America
| | - Willoughby B. Britton
- Department of Behavioral and Social Sciences, Brown University Medical School, Providence, Rhode Island, United States of America
| | - Judson A. Brewer
- Departments of Medicine and Psychiatry, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
31
|
Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians. Comput Stat 2015. [DOI: 10.1007/s00180-015-0588-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
32
|
Wetzel E, Lüdtke O, Zettler I, Böhnke JR. The Stability of Extreme Response Style and Acquiescence Over 8 Years. Assessment 2015; 23:279-91. [PMID: 25986062 DOI: 10.1177/1073191115583714] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This study investigated the stability of extreme response style (ERS) and acquiescence response style (ARS) over a period of 8 years. ERS and ARS were measured with item sets drawn randomly from a large pool of items used in an ongoing German panel study. Latent-trait-state-occasion and latent-state models were applied to test the relationship between time-specific (state) response style behaviors and time-invariant trait components of response styles. The results show that across different random item samples, on average between 49% and 59% of the variance in the state response style factors was explained by the trait response style factors. This indicates that the systematic differences respondents show in their preferences for certain response categories are remarkably stable over a period of 8 years. The stability of ERS and ARS implies that it is important to consider response styles in the analysis of self-report data from polytomous rating scales, especially in longitudinal studies aimed at investigating stability in substantive traits. Furthermore, the stability of response styles raises the question in how far they might be considered trait-like latent variables themselves that could be of substantive interest.
Collapse
Affiliation(s)
- Eunike Wetzel
- University of Konstanz, Konstanz, Germany Eberhard Karls University Tübingen, Tübingen, Germany
| | - Oliver Lüdtke
- Leibniz Institute for Science and Mathematics Education, Kiel, Germany Center for International Student Assessment, Germany
| | - Ingo Zettler
- Eberhard Karls University Tübingen, Tübingen, Germany University of Copenhagen, Copenhagen, Denmark
| | - Jan R Böhnke
- Mental Health and Addiction Research Group (MHARG), Hull York Medical School and Department of Health Sciences, University of York, York, UK
| |
Collapse
|
33
|
Bilsky W, Gollan T, Roccas S, Grad H, Teixeira MLM, Rodriguez M, Schweiger Gallo I, Segal-Caspi L. On the Relative Importance of Personal Values. JOURNAL OF INDIVIDUAL DIFFERENCES 2015. [DOI: 10.1027/1614-0001/a000162] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The relative importance of values is a central feature in Schwartz’s value theory. However, instruments used for validating his theory did not assess relative importance directly. Rather, values were independently rated and scores then statistically centered, person-by-person. Whether these scores match those that result from explicitly comparing values has not been tested. We study this here using the Computerized Paired Comparison of Values (CPCV). This instrument was applied to samples from Germany, Brazil, Spain, and Israel, together with Schwartz’s Portrait Values Questionnaire (PVQ). CPCV- and PVQ-data were analyzed by separate and joint multidimensional scaling, generalized procrustes, and response time analyses. Results support the validity of Schwartz’s structural theory, independently of the assessment instrument used.
Collapse
|
34
|
Sung YT, Cheng YW, Wu JS. Constructing a Situation-Based Career Interest Assessment for Junior High School Students and Examining Their Interest Structure. JOURNAL OF CAREER ASSESSMENT 2015. [DOI: 10.1177/1069072715580419] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This study constructed a situation-based career interest assessment (SCIA) that is suitable for applying to junior high school students. The assessment framework is based on Holland’s theory of vocational interest. The subjects for the pilot study and the formal test were 1,072 junior high school students and 1,136 junior high school students, respectively. The results of reliability tests produced correlation coefficients between .77 and .95, exploratory factor analysis produced factor loadings between .32 and .92 for six factors, and confirmatory factor analysis produced relative fit indices of .95 across the comparative fit index and nonnormed fit index, indicating a satisfactory goodness of fit. Convergent and discriminant validity analyses indicated that the SCIA has acceptable construct validity. Multidimensional scaling analysis, internal correlation, and randomization test showed that the interest structure of adolescents and the relationship among interest types are only partially consistent with Holland’s theory.
Collapse
Affiliation(s)
- Yao-Ting Sung
- Department of Educational Psychology and Counseling, National Taiwan Normal University, Taipei, Taiwan, Republic of China
| | - Yu-Wen Cheng
- Department of Educational Psychology and Counseling, National Taiwan Normal University, Taipei, Taiwan, Republic of China
| | - Jeng-Shin Wu
- Department of Educational Psychology and Counseling, National Taiwan Normal University, Taipei, Taiwan, Republic of China
| |
Collapse
|
35
|
Anguiano-Carrasco C, MacCann C, Geiger M, Seybert JM, Roberts RD. Development of a Forced-Choice Measure of Typical-Performance Emotional Intelligence. JOURNAL OF PSYCHOEDUCATIONAL ASSESSMENT 2014. [DOI: 10.1177/0734282914550387] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Self-report ratings of emotional intelligence (EI) can be faked in high-stakes situations. Although forced-choice administration can prevent response distortion, it produces ipsative scores when scored conventionally. This study ( n = 486) develops an 18-item EI rating scale assessing emotion perception, understanding, and management. We compare validity evidence for: (a) a single-stimulus rating scale; and (b) a forced-choice assessment scored with conventional methods versus item response theory (IRT) methods. The single-stimulus items showed acceptable fit to a three-factor solution, and the forced-choice items showed acceptable fit to the IRT solution. Correlations with criterion variables (ability and self-reported EI, Big Five personality, loneliness, life satisfaction, and GPA) were obtained for 283 participants. Correlations were in the expected direction for the single-stimulus and the IRT-based forced-choice scores. In contrast, the conventionally scored forced-choice test showed the expected correlations for emotion management, but not for emotion perception nor understanding. Results suggest that IRT-based methods for scoring forced-choice assessments produce equivalent validity to single-stimulus rating scales. As such, IRT-based scores on forced-choice assessments may allow EI tests to be used for high-stakes applications, where faking is a concern.
Collapse
|
36
|
Calderón Carvajal C, Ximénez Gómez C. Análisis factorial de ítems de respuesta forzada: una revisión y un ejemplo. REVISTA LATINOAMERICANA DE PSICOLOGIA 2014. [DOI: 10.1016/s0120-0534(14)70003-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
37
|
Salgado JF, Táuriz G. The Five-Factor Model, forced-choice personality inventories and performance: A comprehensive meta-analysis of academic and occupational validity studies. EUROPEAN JOURNAL OF WORK AND ORGANIZATIONAL PSYCHOLOGY 2012. [DOI: 10.1080/1359432x.2012.716198] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
38
|
|
39
|
Stark S, Chernyshenko OS, Drasgow F, White LA. Adaptive Testing With Multidimensional Pairwise Preference Items. ORGANIZATIONAL RESEARCH METHODS 2012. [DOI: 10.1177/1094428112444611] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Affiliation(s)
- Stephen Stark
- Department of Psychology, University of South Florida, Tampa, FL, USA
| | | | - Fritz Drasgow
- Department of Psychology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Leonard A. White
- U.S. Army Research Institute for the Behavioral and Social Sciences, Arlington, VA, USA
| |
Collapse
|
40
|
Evers A, Muñiz J, Bartram D, Boben D, Egeland J, Fernández-Hermida JR, Frans Ö, Gintiliené G, Hagemeister C, Halama P, Iliescu D, Jaworowska A, Jiménez P, Manthouli M, Matesic K, Schittekatte M, Sümer HC, Urbánek T. Testing Practices in the 21st Century. EUROPEAN PSYCHOLOGIST 2012. [DOI: 10.1027/1016-9040/a000102] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The main goal of the European Federation of Psychologists’Associations (EFPA) Standing Committee on Tests and Testing (SCTT) is the improvement of testing practices in European countries. In order to reach this goal, the SCTT carries out various actions and projects, some of which are described in this paper. To better inform its work, it decided to survey the opinions of professional psychologists on testing practices. A questionnaire of 33 items was administered to a sample of 12,606 professional psychologists from 17 European countries. The questionnaire was based on, but not identical to, one used in 2000. The new data show that the positive attitude of the respondents toward the use of tests that was obtained in 2000 has increased in most countries, with a high percentage of the surveyed psychologists using tests regularly. Five main dimensions explained 43% of the total item variance. The dimensions involve items relating to: Concern over incorrect test use, regulations on tests and testing, Internet testing, appreciation of tests, and knowledge and training relating to tests and test use. Important differences between countries were found on these five dimensions. Differences were found according to gender for four of the five dimensions and in relation to field of specialization for all five dimensions. The most commonly used tests are the classic psychometric tests of intelligence and personality: WISC, WAIS, MMPI, RAVEN, 16PF, NEO-PI-R, BDI, SCL-90. Finally, some future perspectives are discussed.
Collapse
Affiliation(s)
- Arne Evers
- University of Amsterdam, The Netherlands
| | | | | | - Dusica Boben
- Drustvo Psihologov Slovenije, Ljubljana, Slovenia
| | - Jens Egeland
- Vestfold Mental Health Care Trust, Tønsberg, Norway
| | | | | | | | | | | | | | - Aleksandra Jaworowska
- Psychological Test Laboratory of the Polish Psychological Association, Warsaw, Poland
| | | | | | | | | | | | - Tomáš Urbánek
- Institute of Psychology, Academy of Sciences, Brno, Czech Republic
| |
Collapse
|