1
|
Zheng C, Liu J, Li Y, Xu P, Zhang B, Wei R, Zhang W, Liu B, Huang J. A 2PLM-RANK multidimensional forced-choice model and its fast estimation algorithm. Behav Res Methods 2024; 56:6363-6388. [PMID: 38409459 DOI: 10.3758/s13428-023-02315-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/28/2024]
Abstract
High-stakes non-cognitive tests frequently employ forced-choice (FC) scales to deter faking. To mitigate the issue of score ipsativity derived, many scoring models have been devised. Among them, the multi-unidimensional pairwise preference (MUPP) framework is a highly flexible and commonly used framework. However, the original MUPP model was developed for unfolding response process and can only handle paired comparisons. The present study proposes the 2PLM-RANK as a generalization of the MUPP model to accommodate dominance RANK format response. In addition, an improved stochastic EM (iStEM) algorithm is devised for more stable and efficient parameter estimation. Simulation results generally supported the efficiency and utility of the new algorithm in estimating the 2PLM-RANK when applied to both triplets and tetrads across various conditions. An empirical illustration with responses to a 24-dimensional personality test further supported the practicality of the proposed model. To further aid in the application of the new model, a user-friendly R package is also provided.
Collapse
Affiliation(s)
- Chanjin Zheng
- Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China.
| | - Juan Liu
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Yaling Li
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Peiyi Xu
- Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Bo Zhang
- School of Labor and Employment Relations and Department of Psychology, University of Illinois Urbana-Champaign, Champaign, USA
| | - Ran Wei
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Wenqing Zhang
- Department of Educational Psychology, Faculty of Education, East China Normal University, Shanghai, China
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Boyang Liu
- Beijing Insight Online Management Consulting Co.,Ltd, Beijing, China
| | - Jing Huang
- Educational Psychology and Research Methodology, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
2
|
Zhang B, Luo J, Li J. Moving beyond Likert and Traditional Forced-Choice Scales: A Comprehensive Investigation of the Graded Forced-Choice Format. MULTIVARIATE BEHAVIORAL RESEARCH 2024; 59:434-460. [PMID: 37652572 DOI: 10.1080/00273171.2023.2235682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
The graded forced-choice (FC) format has recently emerged as an alternative that may preserve the advantages and overcome the issues of the dichotomous FC measures. The current study presented the first large-scale evaluation of the performance of three types of FC measures (FC2, FC4 and FC5 with 2, 4 and 5 response options, respectively) and compared their performance to their Likert (LK) counterparts (LK2, LK4, and LK5) on (1) psychometric properties, (2) respondent reactions, and (3) susceptibility to response styles. Results showed that, compared to LK measures with the same number of response options, the three FC scales provided better support for the hypothesized factor structure, were perceived as more faking-resistant and cognitive demanding, and were less susceptible to response styles. FC4/5 and LK4/5 demonstrated similarly good reliability, while LK2 provided more reliable scores than FC2. When compared across the three FC measures, FC4 and FC5 displayed comparable psychometric performance and respondent reactions. FC4 exhibited a moderate presence of extreme response style, while FC5 had a weak presence of both extreme and middle response styles. Based on these findings, the study recommends the use of graded FC over dichotomous FC and LK, particularly FC5 when extreme response style is a concern.
Collapse
Affiliation(s)
- Bo Zhang
- School of Labor and Employment Relations, University of Illinois Urbana-Champaign
- Department of Psychology, University of Illinois Urbana-Champaign
| | - Jing Luo
- Feinberg School of Medicine, Northwestern University
| | - Jian Li
- Faculty of Psychology, Beijing Normal University
| |
Collapse
|
3
|
Tu N, Kumar LS, Joo S, Stark S. Linking Methods for Multidimensional Forced Choice Tests Using the Multi-Unidimensional Pairwise Preference Model. APPLIED PSYCHOLOGICAL MEASUREMENT 2024; 48:104-124. [PMID: 38585303 PMCID: PMC10993864 DOI: 10.1177/01466216241238741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Applications of multidimensional forced choice (MFC) testing have increased considerably over the last 20 years. Yet there has been little, if any, research on methods for linking the parameter estimates from different samples. This research addressed that important need by extending four widely used methods for unidimensional linking and comparing the efficacy of new estimation algorithms for MFC linking coefficients based on the Multi-Unidimensional Pairwise Preference model (MUPP). More specifically, we compared the efficacy of multidimensional test characteristic curve (TCC), item characteristic curve (ICC; Haebara, 1980), mean/mean (M/M), and mean/sigma (M/S) methods in a Monte Carlo study that also manipulated test length, test dimensionality, sample size, percentage of anchor items, and linking scenarios. Results indicated that the ICC method outperformed the M/M method, which was better than the M/S method, with the TCC method being the least effective. However, as the number of items "per dimension" and the percentage of anchor items increased, the differences between the ICC, M/M, and M/S methods decreased. Study implications and practical recommendations for MUPP linking, as well as limitations, are discussed.
Collapse
Affiliation(s)
- Naidan Tu
- University of South Florida, FL, USA
| | | | | | | |
Collapse
|
4
|
Nie L, Xu P, Hu D. Multidimensional IRT for forced choice tests: A literature review. Heliyon 2024; 10:e26884. [PMID: 38449643 PMCID: PMC10915382 DOI: 10.1016/j.heliyon.2024.e26884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 02/11/2024] [Accepted: 02/21/2024] [Indexed: 03/08/2024] Open
Abstract
The Multidimensional Forced Choice (MFC) test is frequently utilized in non-cognitive evaluations because of its effectiveness in reducing response bias commonly associated with the conventional Likert scale. Nonetheless, it is critical to recognize that the MFC test generates ipsative data, a type of measurement that has been criticized due to its limited applicability for comparing individuals. Multidimensional item response theory (MIRT) models have recently sparked renewed interest among academics and professionals. This is largely due to the development of several models that make it easier to collect normative data from forced-choice tests. The paper introduces a modeling framework made up of three key components: response format, measurement model, and decision theory. Under this paradigm, four IRT models were chosen as examples. Following that, a comprehensive study is carried out to compare and characterize the parameter estimation techniques used in MFC-IRT models. This work then examines empirical research on the concept by analyzing three distinct domains: parameter invariance testing, computerized adaptive testing (CAT), and validity investigation. Finally, it is recommended that future research initiatives follow four distinct paths: modeling, parameter invariance testing, forced-choice CAT, and validity studies.
Collapse
Affiliation(s)
- Lei Nie
- School of Public Administration, East China Normal University, China
| | - Peiyi Xu
- Department of Educational Psychology, Faculty of Education, East China Normal University, China
| | - Di Hu
- School of Education and Social Policy, Northwestern University, USA
| |
Collapse
|
5
|
Tu N, Joo S, Lee P, Stark S. Comparison of parameter estimation approaches for multi-unidimensional pairwise preference tests. Behav Res Methods 2023; 55:2764-2786. [PMID: 35931936 DOI: 10.3758/s13428-022-01927-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/07/2022] [Indexed: 11/08/2022]
Abstract
Multidimensional forced-choice (MFC) testing has been proposed as a way of reducing response biases in noncognitive measurement. Although early item response theory (IRT) research focused on illustrating that person parameter estimates with normative properties could be obtained using various MFC models and formats, more recent attention has been devoted to exploring the processes involved in test construction and how that influences MFC scores. This research compared two approaches for estimating multi-unidimensional pairwise preference model (MUPP; Stark et al., 2005) parameters based on the generalized graded unfolding model (GGUM; Roberts et al., 2000). More specifically, we compared the efficacy of statement and person parameter estimation based on a "two-step" process, developed by Stark et al. (2005), with a more recently developed "direct" estimation approach (Lee et al., 2019) in a Monte Carlo study that also manipulated test length, test dimensionality, sample size, and the correlations between generating person parameters for each dimension. Results indicated that the two approaches had similar scoring accuracy, although the two-step approach had better statement parameter recovery than the direct approach. Limitations, implications for MFC test construction and scoring, and recommendations for future MFC research and practice are discussed.
Collapse
Affiliation(s)
- Naidan Tu
- Department of Psychology, University of South Florida, Tampa, FL, USA.
| | - Sean Joo
- Department of Educational Psychology, University of Kansas, Lawrence, KS, USA
| | - Philseok Lee
- Department of Psychology, George Mason University, Fairfax, VA, USA
| | - Stephen Stark
- Department of Psychology, University of South Florida, Tampa, FL, USA
| |
Collapse
|
6
|
Tu N, Zhang B, Angrave L, Sun T, Neuman M. Estimating the Multidimensional Generalized Graded Unfolding Model with Covariates Using a Bayesian Approach. J Intell 2023; 11:163. [PMID: 37623546 PMCID: PMC10455612 DOI: 10.3390/jintelligence11080163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 08/05/2023] [Accepted: 08/09/2023] [Indexed: 08/26/2023] Open
Abstract
Noncognitive constructs are commonly assessed in educational and organizational research. They are often measured by summing scores across items, which implicitly assumes a dominance item response process. However, research has shown that the unfolding response process may better characterize how people respond to noncognitive items. The Generalized Graded Unfolding Model (GGUM) representing the unfolding response process has therefore become increasingly popular. However, the current implementation of the GGUM is limited to unidimensional cases, while most noncognitive constructs are multidimensional. Fitting a unidimensional GGUM separately for each dimension and ignoring the multidimensional nature of noncognitive data may result in suboptimal parameter estimation. Recently, an R package bmggum was developed that enables the estimation of the Multidimensional Generalized Graded Unfolding Model (MGGUM) with covariates using a Bayesian algorithm. However, no simulation evidence is available to support the accuracy of the Bayesian algorithm implemented in bmggum. In this research, two simulation studies were conducted to examine the performance of bmggum. Results showed that bmggum can estimate MGGUM parameters accurately, and that multidimensional estimation and incorporating relevant covariates into the estimation process improved estimation accuracy. The effectiveness of two Bayesian model selection indices, WAIC and LOO, were also investigated and found to be satisfactory for model selection. Empirical data were used to demonstrate the use of bmggum and its performance was compared with three other GGUM software programs: GGUM2004, GGUM, and mirt.
Collapse
Affiliation(s)
- Naidan Tu
- Department of Psychology, University of South Florida, Tampa, FL 33620, USA
| | - Bo Zhang
- School of Labor and Employment Relations & Department of Psychology, University of Illinois Urbana-Champaign, Champaign, IL 61820, USA
| | - Lawrence Angrave
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Tianjun Sun
- Department of Psychological Sciences, Kansas State University, Manhattan, KS 66506, USA
| | - Mathew Neuman
- Department of Psychological & Brain Sciences, Texas A & M University, College Station, TX 77840, USA
| |
Collapse
|
7
|
Joo SH, Lee P, Stark S. Modeling Multidimensional Forced Choice Measures with the Zinnes and Griggs Pairwise Preference Item Response Theory Model. MULTIVARIATE BEHAVIORAL RESEARCH 2023; 58:241-261. [PMID: 34370564 DOI: 10.1080/00273171.2021.1960142] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
This research developed a new ideal point-based item response theory (IRT) model for multidimensional forced choice (MFC) measures. We adapted the Zinnes and Griggs (ZG; 1974) IRT model and the multi-unidimensional pairwise preference (MUPP; Stark et al., 2005) model, henceforth referred to as ZG-MUPP. We derived the information function to evaluate the psychometric properties of MFC measures and developed a model parameter estimation algorithm using Markov chain Monte Carlo (MCMC). To evaluate the efficacy of the proposed model, we conducted a simulation study under various experimental conditions such as sample sizes, number of items, and ranges of discrimination and location parameters. The results showed that the model parameters were accurately estimated when the sample size was as low as 500. The empirical results also showed that the scores from the ZG-MUPP model were comparable to those from the MUPP model and the Thurstonian IRT (TIRT) model. Practical implications and limitations are further discussed.
Collapse
|
8
|
Frick S. Modeling Faking in the Multidimensional Forced-Choice Format: The Faking Mixture Model. PSYCHOMETRIKA 2022; 87:773-794. [PMID: 34927219 PMCID: PMC9166892 DOI: 10.1007/s11336-021-09818-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 08/31/2021] [Accepted: 10/15/2021] [Indexed: 06/14/2023]
Abstract
The multidimensional forced-choice (MFC) format has been proposed to reduce faking because items within blocks can be matched on desirability. However, the desirability of individual items might not transfer to the item blocks. The aim of this paper is to propose a mixture item response theory model for faking in the MFC format that allows to estimate the fakability of MFC blocks, termed the Faking Mixture model. Given current computing capabilities, within-subject data from both high- and low-stakes contexts are needed to estimate the model. A simulation showed good parameter recovery under various conditions. An empirical validation showed that matching was necessary but not sufficient to create an MFC questionnaire that can reduce faking. The Faking Mixture model can be used to reduce fakability during test construction.
Collapse
Affiliation(s)
- Susanne Frick
- Department of Psychology, School of Social Sciences, Mannheim, Germany.
| |
Collapse
|
9
|
Lee P, Joo SH, Zhou S, Son M. Investigating the impact of negatively keyed statements on multidimensional forced-choice personality measures: A comparison of partially ipsative and IRT scoring methods. PERSONALITY AND INDIVIDUAL DIFFERENCES 2022. [DOI: 10.1016/j.paid.2022.111555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
10
|
Pavlov G, Shi D, Maydeu-Olivares A, Fairchild A. Item desirability matching in forced-choice test construction. PERSONALITY AND INDIVIDUAL DIFFERENCES 2021. [DOI: 10.1016/j.paid.2021.111114] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
Joo SH, Lee P, Park JY, Stark S. Assessing Dimensionality of the Ideal Point Item Response Theory Model Using Posterior Predictive Model Checking. ORGANIZATIONAL RESEARCH METHODS 2021. [DOI: 10.1177/10944281211050609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Although the use of ideal point item response theory (IRT) models for organizational research has increased over the last decade, the assessment of construct dimensionality of ideal point scales has been overlooked in previous research. In this study, we developed and evaluated dimensionality assessment methods for an ideal point IRT model under the Bayesian framework. We applied the posterior predictive model checking (PPMC) approach to the most widely used ideal point IRT model, the generalized graded unfolding model (GGUM). We conducted a Monte Carlo simulation to compare the performance of item pair discrepancy statistics and to evaluate the Type I error and power rates of the methods. The simulation results indicated that the Bayesian dimensionality detection method controlled Type I errors reasonably well across the conditions. In addition, the proposed method showed better performance than existing methods, yielding acceptable power when 20% of the items were generated from the secondary dimension. Organizational implications and limitations of the study are further discussed.
Collapse
|
12
|
Abstract
Forced-choice (FC) assessments of noncognitive psychological constructs (e.g., personality, behavioral tendencies) are popular in high-stakes organizational testing scenarios (e.g., informing hiring decisions) due to their enhanced resistance against response distortions (e.g., faking good, impression management). The measurement precisions of FC assessment scores used to inform personnel decisions are of paramount importance in practice. Different types of reliability estimates are reported for FC assessment scores in current publications, while consensus on best practices appears to be lacking. In order to provide understanding and structure around the reporting of FC reliability, this study systematically examined different types of reliability estimation methods for Thurstonian IRT-based FC assessment scores: their theoretical differences were discussed, and their numerical differences were illustrated through a series of simulations and empirical studies. In doing so, this study provides a practical guide for appraising different reliability estimation methods for IRT-based FC assessment scores.
Collapse
|
13
|
Adaptive testing with the GGUM-RANK multidimensional forced choice model: Comparison of pair, triplet, and tetrad scoring. Behav Res Methods 2020; 52:761-772. [PMID: 31342469 DOI: 10.3758/s13428-019-01274-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Likert-type measures have been criticized in psychological assessment because they are vulnerable to response biases, including central tendency, acquiescence, leniency, halo, and socially desirable responding. As an alternative, multidimensional forced choice (MFC) testing has been proposed to address these concerns. A number of researchers have developed item response theory (IRT) models for MFC data and have examined latent trait estimation with tests of different dimensionality and length. Research has also explored the advantages of computerized adaptive testing (CAT) with MFC pair tests having as many as 25 dimensions, but there have been no published studies on CAT with MFC triplets or tetrads. Thus, in this research we aimed to address that issue. We used recently developed item information functions for an MFC ranking model to compare the benefits of CAT with MFC pair, triplet, and tetrad tests. A simulation study showed that CAT substantially outperformed nonadaptive testing for latent trait estimation across MFC formats. More importantly, CAT with MFC pairs provided estimation accuracy similar to or better than that from tests of equivalent numbers of nonadaptive MFC triplets. On the basis of these findings, implications and recommendations are further discussed for constructing MFC measures to use in psychological contexts.
Collapse
|
14
|
Lee P, Joo SH, Stark S. Detecting DIF in Multidimensional Forced Choice Measures Using the Thurstonian Item Response Theory Model. ORGANIZATIONAL RESEARCH METHODS 2020. [DOI: 10.1177/1094428120959822] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Although modern item response theory (IRT) methods of test construction and scoring have overcome ipsativity problems historically associated with multidimensional forced choice (MFC) formats, there has been little research on MFC differential item functioning (DIF) detection, where item refers to a block, or group, of statements presented for an examinee’s consideration. This research investigated DIF detection with three-alternative MFC items based on the Thurstonian IRT (TIRT) model, using omnibus Wald tests on loadings and thresholds. We examined constrained and free baseline model comparisons strategies with different types and magnitudes of DIF, latent trait correlations, sample sizes, and levels of impact in an extensive Monte Carlo study. Results indicated the free baseline strategy was highly effective in detecting DIF, with power approaching 1.0 in the large sample size and large magnitude of DIF conditions, and similar effectiveness in the impact and no-impact conditions. This research also included an empirical example to demonstrate the viability of the best performing method with real examinees and showed how a DIF and a DTF effect size measure can be used to assess the practical significance of MFC DIF findings.
Collapse
|
15
|
Investigating faking effects on the construct validity through the Monte Carlo simulation study. PERSONALITY AND INDIVIDUAL DIFFERENCES 2019. [DOI: 10.1016/j.paid.2019.07.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
16
|
Chen C, Wang W, Chiu MM, Ro S. Item Selection and Exposure Control Methods for Computerized Adaptive Testing with Multidimensional Ranking Items. JOURNAL OF EDUCATIONAL MEASUREMENT 2019. [DOI: 10.1111/jedm.12252] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
17
|
Morillo D, Abad FJ, Kreitchmann RS, Leenen I, Hontangas P, Ponsoda V. The Journey from Likert to Forced-Choice Questionnaires: Evidence of the Invariance of Item Parameters. REVISTA DE PSICOLOGÍA DEL TRABAJO Y DE LAS ORGANIZACIONES 2019. [DOI: 10.5093/jwop2019a11] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|