1
|
Zimny L, Schroeders U, Wilhelm O. Ant colony optimization for parallel test assembly. Behav Res Methods 2024; 56:5834-5848. [PMID: 38277085 DOI: 10.3758/s13428-023-02319-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 01/27/2024]
Abstract
Ant colony optimization (ACO) algorithms have previously been used to compile single short scales of psychological constructs. In the present article, we showcase the versatility of the ACO to construct multiple parallel short scales that adhere to several competing and interacting criteria simultaneously. Based on an initial pool of 120 knowledge items, we assembled three 12-item tests that (a) adequately cover the construct at the domain level, (b) follow a unidimensional measurement model, (c) allow reliable and (d) precise measurement of factual knowledge, and (e) are gender-fair. Moreover, we aligned the test characteristic and test information functions of the three tests to establish the equivalence of the tests. We cross-validated the assembled short scales and investigated their association with the full scale and covariates that were not included in the optimization procedure. Finally, we discuss potential extensions to metaheuristic test assembly and the equivalence of parallel knowledge tests in general.
Collapse
Affiliation(s)
- Luc Zimny
- Institute of Psychology and Education, Ulm University, Albert-Einstein-Allee 47, 89081, Ulm, Germany.
| | | | - Oliver Wilhelm
- Institute of Psychology and Education, Ulm University, Albert-Einstein-Allee 47, 89081, Ulm, Germany
| |
Collapse
|
2
|
Bäckström M, Björklund F. Why Forced-Choice and Likert Items Provide the Same Information on Personality, Including Social Desirability. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2024; 84:549-576. [PMID: 38756462 PMCID: PMC11095325 DOI: 10.1177/00131644231178721] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
The forced-choice response format is often considered superior to the standard Likert-type format for controlling social desirability in personality inventories. We performed simulations and found that the trait information based on the two formats converges when the number of items is high and forced-choice items are mixed with regard to positively and negatively keyed items. Given that forced-choice items extract the same personality information as Likert-type items do, including socially desirable responding, other means are needed to counteract social desirability. We propose using evaluatively neutralized items in personality measurement, as they can counteract social desirability regardless of response format.
Collapse
|
3
|
Nie L, Xu P, Hu D. Multidimensional IRT for forced choice tests: A literature review. Heliyon 2024; 10:e26884. [PMID: 38449643 PMCID: PMC10915382 DOI: 10.1016/j.heliyon.2024.e26884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 02/11/2024] [Accepted: 02/21/2024] [Indexed: 03/08/2024] Open
Abstract
The Multidimensional Forced Choice (MFC) test is frequently utilized in non-cognitive evaluations because of its effectiveness in reducing response bias commonly associated with the conventional Likert scale. Nonetheless, it is critical to recognize that the MFC test generates ipsative data, a type of measurement that has been criticized due to its limited applicability for comparing individuals. Multidimensional item response theory (MIRT) models have recently sparked renewed interest among academics and professionals. This is largely due to the development of several models that make it easier to collect normative data from forced-choice tests. The paper introduces a modeling framework made up of three key components: response format, measurement model, and decision theory. Under this paradigm, four IRT models were chosen as examples. Following that, a comprehensive study is carried out to compare and characterize the parameter estimation techniques used in MFC-IRT models. This work then examines empirical research on the concept by analyzing three distinct domains: parameter invariance testing, computerized adaptive testing (CAT), and validity investigation. Finally, it is recommended that future research initiatives follow four distinct paths: modeling, parameter invariance testing, forced-choice CAT, and validity studies.
Collapse
Affiliation(s)
- Lei Nie
- School of Public Administration, East China Normal University, China
| | - Peiyi Xu
- Department of Educational Psychology, Faculty of Education, East China Normal University, China
| | - Di Hu
- School of Education and Social Policy, Northwestern University, USA
| |
Collapse
|
4
|
Lin Y, Brown A, Williams P. Multidimensional Forced-Choice CAT With Dominance Items: An Empirical Comparison With Optimal Static Testing Under Different Desirability Matching. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2023; 83:322-350. [PMID: 36866068 PMCID: PMC9972128 DOI: 10.1177/00131644221077637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Several forced-choice (FC) computerized adaptive tests (CATs) have emerged in the field of organizational psychology, all of them employing ideal-point items. However, despite most items developed historically follow dominance response models, research on FC CAT using dominance items is limited. Existing research is heavily dominated by simulations and lacking in empirical deployment. This empirical study trialed a FC CAT with dominance items described by the Thurstonian Item Response Theory model with research participants. This study investigated important practical issues such as the implications of adaptive item selection and social desirability balancing criteria on score distributions, measurement accuracy and participant perceptions. Moreover, nonadaptive but optimal tests of similar design were trialed alongside the CATs to provide a baseline for comparison, helping to quantify the return on investment when converting an otherwise-optimized static assessment into an adaptive one. Although the benefit of adaptive item selection in improving measurement precision was confirmed, results also indicated that at shorter test lengths CAT had no notable advantage compared with optimal static tests. Taking a holistic view incorporating both psychometric and operational considerations, implications for the design and deployment of FC assessments in research and practice are discussed.
Collapse
Affiliation(s)
- Yin Lin
- University of Kent, Canterbury,
UK
- SHL, Thames Ditton, Surrey, UK
| | | | | |
Collapse
|
5
|
Joo SH, Lee P, Stark S. Modeling Multidimensional Forced Choice Measures with the Zinnes and Griggs Pairwise Preference Item Response Theory Model. MULTIVARIATE BEHAVIORAL RESEARCH 2023; 58:241-261. [PMID: 34370564 DOI: 10.1080/00273171.2021.1960142] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
This research developed a new ideal point-based item response theory (IRT) model for multidimensional forced choice (MFC) measures. We adapted the Zinnes and Griggs (ZG; 1974) IRT model and the multi-unidimensional pairwise preference (MUPP; Stark et al., 2005) model, henceforth referred to as ZG-MUPP. We derived the information function to evaluate the psychometric properties of MFC measures and developed a model parameter estimation algorithm using Markov chain Monte Carlo (MCMC). To evaluate the efficacy of the proposed model, we conducted a simulation study under various experimental conditions such as sample sizes, number of items, and ranges of discrimination and location parameters. The results showed that the model parameters were accurately estimated when the sample size was as low as 500. The empirical results also showed that the scores from the ZG-MUPP model were comparable to those from the MUPP model and the Thurstonian IRT (TIRT) model. Practical implications and limitations are further discussed.
Collapse
|
6
|
Cardona Cordero NR, Lafarga Previdi I, Torres HR, Ayala I, Boronow KE, Santos Rivera A, Meeker JD, Alshawabkeh A, Cordero JF, Brody JG, Brown P, Vélez Vega CM. Mi PROTECT: A personalized smartphone platform to report back results to participants of a maternal-child and environmental health research cohort program in Puerto Rico. PLOS DIGITAL HEALTH 2023; 2:e0000172. [PMID: 36812649 PMCID: PMC9931308 DOI: 10.1371/journal.pdig.0000172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 11/29/2022] [Indexed: 01/20/2023]
Abstract
BACKGROUND The PROTECT Center is a multi-project initiative that studies the relationship between exposure to environmental contaminants and preterm births during the prenatal and postnatal period among women living in Puerto Rico. PROTECT's Community Engagement Core and Research Translation Coordinator (CEC/RTC) play a key role in building trust and capacity by approaching the cohort as an engaged community that provides feedback about processes, including how personalized results of their exposure to chemicals should be reported back. The goal of the Mi PROTECT platform was to create a mobile-based application of DERBI (Digital Exposure Report-Back Interface) for our cohort that provides tailored, culturally appropriate information about individual contaminant exposures as well as education on chemical substances and approaches to exposure reduction. METHODS Participants (N = 61) were presented with commonly used terms in environmental health research related to collected samples and biomarkers, followed by a guided training on accessing and exploring the Mi PROTECT platform. Participants evaluated the guided training and Mi PROTECT platform answering a Likert scale in separated surveys that included 13 and 8 questions, respectively. RESULTS Participants provided overwhelmingly positive feedback on the clarity and fluency of presenters in the report-back training. Most participants reported that the mobile phone platform was both accessible and easy to navigate (83% and 80%, respectively) and that images included in the platform facilitated comprehension of the information. Overall, most participants (83%) reported that language, images, and examples in Mi PROTECT strongly represented them as Puerto Ricans. CONCLUSIONS Findings from the Mi PROTECT pilot test informed investigators, community partners and stakeholders by demonstrating a new way to promote stakeholder participation and foster the "research right-to-know."
Collapse
Affiliation(s)
- Nancy R. Cardona Cordero
- Department of Social Sciences, School of Public Health, Medical Sciences Campus, University of Puerto Rico, San Juan, Puerto Rico
| | - Irene Lafarga Previdi
- Center for Collaborative Research in Health Disparities, Medical Sciences Campus, University of Puerto Rico, San Juan, Puerto Rico
| | - Héctor R. Torres
- College of Engineering, Northeastern University, Boston, Massachusetts, United States of America
| | - Ishwara Ayala
- College of Engineering, Northeastern University, Boston, Massachusetts, United States of America
| | | | - Amailie Santos Rivera
- College of Engineering, Northeastern University, Boston, Massachusetts, United States of America
| | - John D. Meeker
- Department of Environmental Health Sciences, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Akram Alshawabkeh
- College of Engineering, Northeastern University, Boston, Massachusetts, United States of America
| | - José F. Cordero
- Department of Epidemiology and Biostatistics at the University of Georgia’s College of Public Health, Athens, Georgia, United States of America
| | - Julia Green Brody
- Silent Spring Institute, Newton, Massachusetts, United States of America
| | - Phil Brown
- Social Science Environmental Health Research Institute, Northeastern University, Boston, Massachusetts, United States of America
| | - Carmen M. Vélez Vega
- Department of Social Sciences, School of Public Health, Medical Sciences Campus, University of Puerto Rico, San Juan, Puerto Rico
| |
Collapse
|
7
|
Watrin L, Weihrauch L, Wilhelm O. The criterion‐related validity of conscientiousness in personnel selection: A meta‐analytic reality check. INTERNATIONAL JOURNAL OF SELECTION AND ASSESSMENT 2022. [DOI: 10.1111/ijsa.12413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Luc Watrin
- Department of Individual Differences and Psychological Assessment, Institute of Psychology and Education Ulm University Ulm Germany
| | - Lucas Weihrauch
- Department of Individual Differences and Psychological Assessment, Institute of Psychology and Education Ulm University Ulm Germany
| | - Oliver Wilhelm
- Department of Individual Differences and Psychological Assessment, Institute of Psychology and Education Ulm University Ulm Germany
| |
Collapse
|
8
|
Bürkner PC. On the Information Obtainable from Comparative Judgments. PSYCHOMETRIKA 2022; 87:1439-1472. [PMID: 35133553 PMCID: PMC9636126 DOI: 10.1007/s11336-022-09843-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 11/04/2021] [Indexed: 06/14/2023]
Abstract
Personality tests employing comparative judgments have been proposed as an alternative to Likert-type rating scales. One of the main advantages of a comparative format is that it can reduce faking of responses in high-stakes situations. However, previous research has shown that it is highly difficult to obtain trait score estimates that are both faking resistant and sufficiently accurate for individual-level diagnostic decisions. With the goal of contributing to a solution, I study the information obtainable from comparative judgments analyzed by means of Thurstonian IRT models. First, I extend the mathematical theory of ordinal comparative judgments and corresponding models. Second, I provide optimal test designs for Thurstonian IRT models that maximize the accuracy of people's trait score estimates from both frequentist and Bayesian statistical perspectives. Third, I derive analytic upper bounds for the accuracy of these trait estimates achievable through ordinal Thurstonian IRT models. Fourth, I perform numerical experiments that complement results obtained in earlier simulation studies. The combined analytical and numerical results suggest that it is indeed possible to design personality tests using comparative judgments that yield trait scores estimates sufficiently accurate for individual-level diagnostic decisions, while reducing faking in high-stakes situations. Recommendations for the practical application of comparative judgments for the measurement of personality, specifically in high-stakes situations, are given.
Collapse
|
9
|
Lee P, Joo SH, Zhou S, Son M. Investigating the impact of negatively keyed statements on multidimensional forced-choice personality measures: A comparison of partially ipsative and IRT scoring methods. PERSONALITY AND INDIVIDUAL DIFFERENCES 2022. [DOI: 10.1016/j.paid.2022.111555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
10
|
Pavlov G, Shi D, Maydeu-Olivares A, Fairchild A. Item desirability matching in forced-choice test construction. PERSONALITY AND INDIVIDUAL DIFFERENCES 2021. [DOI: 10.1016/j.paid.2021.111114] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
Abstract
Forced-choice (FC) assessments of noncognitive psychological constructs (e.g., personality, behavioral tendencies) are popular in high-stakes organizational testing scenarios (e.g., informing hiring decisions) due to their enhanced resistance against response distortions (e.g., faking good, impression management). The measurement precisions of FC assessment scores used to inform personnel decisions are of paramount importance in practice. Different types of reliability estimates are reported for FC assessment scores in current publications, while consensus on best practices appears to be lacking. In order to provide understanding and structure around the reporting of FC reliability, this study systematically examined different types of reliability estimation methods for Thurstonian IRT-based FC assessment scores: their theoretical differences were discussed, and their numerical differences were illustrated through a series of simulations and empirical studies. In doing so, this study provides a practical guide for appraising different reliability estimation methods for IRT-based FC assessment scores.
Collapse
|
12
|
Lee P, Joo SH, Stark S. Detecting DIF in Multidimensional Forced Choice Measures Using the Thurstonian Item Response Theory Model. ORGANIZATIONAL RESEARCH METHODS 2020. [DOI: 10.1177/1094428120959822] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Although modern item response theory (IRT) methods of test construction and scoring have overcome ipsativity problems historically associated with multidimensional forced choice (MFC) formats, there has been little research on MFC differential item functioning (DIF) detection, where item refers to a block, or group, of statements presented for an examinee’s consideration. This research investigated DIF detection with three-alternative MFC items based on the Thurstonian IRT (TIRT) model, using omnibus Wald tests on loadings and thresholds. We examined constrained and free baseline model comparisons strategies with different types and magnitudes of DIF, latent trait correlations, sample sizes, and levels of impact in an extensive Monte Carlo study. Results indicated the free baseline strategy was highly effective in detecting DIF, with power approaching 1.0 in the large sample size and large magnitude of DIF conditions, and similar effectiveness in the impact and no-impact conditions. This research also included an empirical example to demonstrate the viability of the best performing method with real examinees and showed how a DIF and a DTF effect size measure can be used to assess the practical significance of MFC DIF findings.
Collapse
|