1
|
Reisenzein R, Junge M. Measuring the intensity of emotions. Front Psychol 2024; 15:1437843. [PMID: 39286570 PMCID: PMC11402726 DOI: 10.3389/fpsyg.2024.1437843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 07/30/2024] [Indexed: 09/19/2024] Open
Abstract
We describe a theoretical framework for the measurement of the intensity of emotional experiences and summarize findings of a series of studies that implemented this framework. Our approach is based on a realist view of quantities and combines the modern psychometric (i.e., latent-variable) view of measurement with a deductive order of inquiry for testing measurement axioms. At the core of the method are nonmetric probabilistic difference scaling methods, a class of indirect scaling methods based on ordinal judgments of intensity differences. Originally developed to scale sensations and preferences, these scaling methods are also well-suited for measuring emotion intensity, particularly in basic research. They are easy to perform and provide scale values of emotion intensity that are much more precise than the typically used, quality-intensity emotion rating scales. Furthermore, the scale values appear to fulfill central measurement-theoretical axioms necessary for interval-level measurement. Because of these properties, difference scaling methods allow precise tests of emotion theories on the individual subject level.
Collapse
Affiliation(s)
- Rainer Reisenzein
- Institute of Psychology, University of Greifswald, Greifswald, Germany
| | - Martin Junge
- Institute of Psychology, University of Greifswald, Greifswald, Germany
| |
Collapse
|
2
|
Driggs J, Vangsness L. Judgments of Difficulty (JODs) While Observing an Automated System Support the Media Equation and Unique Agent Hypotheses. HUMAN FACTORS 2024:187208241273379. [PMID: 39155398 DOI: 10.1177/00187208241273379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/20/2024]
Abstract
OBJECTIVE We investigated how people used cues to make Judgments of Difficulty (JODs) while observing automation perform a task and when performing this task themselves. BACKGROUND Task difficulty is a factor affecting trust in automation; however, no research has explored how individuals make JODs when watching automation or whether these judgments are similar to or different from those made while watching humans. Lastly, it is unclear how cue use when observing automation differs as a function of experience. METHOD The study involved a visual search task. Some participants performed the task first, then watched automation complete it. Others watched and then performed, and a third group alternated between performing and watching. After each trial, participants made a JOD by indicating if the task was easier or harder than before. Task difficulty randomly changed every five trials. RESULTS A Bayesian regression suggested that cue use is similar to and different from cue use while observing humans. For central cues, support for the UAH was bounded by experience: those who performed the task first underweighted central cues when making JODs, relative to their counterparts in a previous study involving humans. For peripheral cues, support for the MEH was unequivocal and participants weighted cues similarly across observation sources. CONCLUSION People weighted cues similar to and different from when they watched automation perform a task relative to when they watched humans, supporting the Media Equation and Unique Agent Hypotheses. APPLICATION This study adds to a growing understanding of judgments in human-human and human-automation interactions.
Collapse
|
3
|
Cheng C, Lay KL, Hsu YF, Tsai YM. Can Likert scales predict choices? Testing the congruence between using Likert scale and comparative judgment on measuring attribution. METHODS IN PSYCHOLOGY 2021. [DOI: 10.1016/j.metip.2021.100081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
4
|
Chen CW, Wang WC, Mok MMC, Scherer R. A Lognormal Ipsative Model for Multidimensional Compositional Items. Front Psychol 2021; 12:573252. [PMID: 34712161 PMCID: PMC8545823 DOI: 10.3389/fpsyg.2021.573252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 09/14/2021] [Indexed: 11/13/2022] Open
Abstract
Compositional items – a form of forced-choice items – require respondents to allocate a fixed total number of points to a set of statements. To describe the responses to these items, the Thurstonian item response theory (IRT) model was developed. Despite its prominence, the model requires that items composed of parts of statements result in a factor loading matrix with full rank. Without this requirement, the model cannot be identified, and the latent trait estimates would be seriously biased. Besides, the estimation of the Thurstonian IRT model often results in convergence problems. To address these issues, this study developed a new version of the Thurstonian IRT model for analyzing compositional items – the lognormal ipsative model (LIM) – that would be sufficient for tests using items with all statements positively phrased and with equal factor loadings. We developed an online value test following Schwartz’s values theory using compositional items and collected response data from a sample size of N = 512 participants with ages from 13 to 51 years. The results showed that our LIM had an acceptable fit to the data, and that the reliabilities exceeded 0.85. A simulation study resulted in good parameter recovery, high convergence rate, and the sufficient precision of estimation in the various conditions of covariance matrices between traits, test lengths and sample sizes. Overall, our results indicate that the proposed model can overcome the problems of the Thurstonian IRT model when all statements are positively phrased and factor loadings are similar.
Collapse
Affiliation(s)
- Chia-Wen Chen
- Centre for Educational Measurement, University of Oslo, Oslo, Norway
| | - Wen-Chung Wang
- Assessment Research Centre, The Education University of Hong Kong, Tai Po, Hong Kong, SAR China
| | - Magdalena Mo Ching Mok
- Assessment Research Centre, The Education University of Hong Kong, Tai Po, Hong Kong, SAR China.,Graduate Institute of Educational Information and Measurement, National Taichung University of Education, Taichung, Taiwan
| | - Ronny Scherer
- Centre for Educational Measurement, University of Oslo, Oslo, Norway
| |
Collapse
|
5
|
Wiedermann W, Frick U, Merkle EC. Detecting Heterogeneity of Intervention Effects in Comparative Judgments. PREVENTION SCIENCE : THE OFFICIAL JOURNAL OF THE SOCIETY FOR PREVENTION RESEARCH 2021; 24:444-454. [PMID: 33687608 DOI: 10.1007/s11121-021-01212-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/01/2021] [Indexed: 12/29/2022]
Abstract
Comparative measures such as paired comparisons and rankings are frequently used to evaluate health states and quality of life. The present article introduces log-linear Bradley-Terry (LLBT) models to evaluate intervention effectiveness when outcomes are measured as paired comparisons or rankings and presents a combination of the LLBT model and model-based recursive partitioning (MOB) to detect treatment effect heterogeneity. The MOB LLBT approach enables researchers to identify subgroups that differ in the preference order and in the effect an intervention has on choice behavior. Applicability of MOB LLBT models is demonstrated using an artificial data example with known data-generating mechanism and a real-world data example focusing on drug-harm perception among music festival visitors. In the artificial data example, the MOB LLBT model is able to adequately recover the "true" (population) model. In the real-world data example, the standard LLBT model confirms the existence of a situational willingness among festival visitors to trivialize drug harm when peer consumption behavior is made cognitively accessible. In addition, MOB LLBT results suggest that this trivialization effect is highly context-dependent and most pronounced for participants with low-to-moderate alcohol intoxication who also proactively contacted a substance counselor at the festival venue. Both data examples suggest that MOB LLBT models allow for more nuanced statements about the effectiveness of interventions. We provide R code examples to implement MOB LLBT models for paired comparisons, rankings, and rating (Likert-type) data.
Collapse
Affiliation(s)
| | - Ulrich Frick
- HS Doepfer University of Applied Sciences, Cologne, Germany
| | | |
Collapse
|
6
|
Brown D, Van den Bergh I, de Bruin S, Machida L, van Etten J. Data synthesis for crop variety evaluation. A review. AGRONOMY FOR SUSTAINABLE DEVELOPMENT 2020; 40:25. [PMID: 32863892 PMCID: PMC7440334 DOI: 10.1007/s13593-020-00630-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/23/2020] [Indexed: 05/12/2023]
Abstract
Crop varieties should fulfill multiple requirements, including agronomic performance and product quality. Variety evaluations depend on data generated from field trials and sensory analyses, performed with different levels of participation from farmers and consumers. Such multi-faceted variety evaluation is expensive and time-consuming; hence, any use of these data should be optimized. Data synthesis can help to take advantage of existing and new data, combining data from different sources and combining it with expert knowledge to produce new information and understanding that supports decision-making. Data synthesis for crop variety evaluation can partly build on extant experiences and methods, but it also requires methodological innovation. We review the elements required to achieve data synthesis for crop variety evaluation, including (1) data types required for crop variety evaluation, (2) main challenges in data management and integration, (3) main global initiatives aiming to solve those challenges, (4) current statistical approaches to combine data for crop variety evaluation and (5) existing data synthesis methods used in evaluation of varieties to combine different datasets from multiple data sources. We conclude that currently available methods have the potential to overcome existing barriers to data synthesis and could set in motion a virtuous cycle that will encourage researchers to share data and collaborate on data-driven research.
Collapse
Affiliation(s)
- David Brown
- Laboratory of Geo-Information Science and Remote Sensing, Wageningen University & Research, Droevendaalsesteeg 3, 6708 PB Wageningen, The Netherlands
- Bioversity International, Turrialba, 30501 Costa Rica
| | - Inge Van den Bergh
- Bioversity International, C/O KU Leuven, W. De Croylaan 42, P.O. Box 2455, 3001 Leuven, Belgium
| | - Sytze de Bruin
- Laboratory of Geo-Information Science and Remote Sensing, Wageningen University & Research, Droevendaalsesteeg 3, 6708 PB Wageningen, The Netherlands
| | - Lewis Machida
- Bioversity International, C/O International Institute of Tropical Agriculture (IITA), Nelson Mandela African Institute of Science and Technology, P.O. Box 447, Arusha, Tanzania
| | | |
Collapse
|
7
|
Abstract
OBJECTIVE We used this experiment to determine the degree to which cues to difficulty are used to make judgments of difficulty (JODs). BACKGROUND Traditional approaches involve seeking to standardize the information people used to evaluate subjective workload; however, it is likely that conscious and unconscious cues underlie peoples' JODs. METHOD We designed a video game task that tested the degree to which time-on-task, performance-based feedback, and central cues to difficulty informed JODs. These relationships were modeled along five continuous dimensions of difficulty. RESULTS Central cues most strongly contributed to JODs; judgments were supplemented by peripheral cues (performance-based feedback and time-on-task) even though these cues were not always valid. In addition, participants became more likely to rate the task as "easier" over time. CONCLUSION Although central cues are strong predictors of task difficulty, people confuse task difficulty (central cues), effort allocation and skill (performance-based feedback), and proxy cues to difficulty (time) when making JODs. APPLICATION Identifying the functional relationships between cues to difficulty and JODs will provide valuable insight regarding the information that people use to evaluate tasks and to make decisions.
Collapse
|
8
|
Wang WC, Qiu XL, Chen CW, Ro S, Jin KY. Item Response Theory Models for Ipsative Tests With Multidimensional Pairwise Comparison Items. APPLIED PSYCHOLOGICAL MEASUREMENT 2017; 41:600-613. [PMID: 29881107 PMCID: PMC5978479 DOI: 10.1177/0146621617703183] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
There is re-emerging interest in adopting forced-choice items to address the issue of response bias in Likert-type items for noncognitive latent traits. Multidimensional pairwise comparison (MPC) items are commonly used forced-choice items. However, few studies have been aimed at developing item response theory models for MPC items owing to the challenges associated with ipsativity. Acknowledging that the absolute scales of latent traits are not identifiable in ipsative tests, this study developed a Rasch ipsative model for MPC items that has desirable measurement properties, yields a single utility value for each statement, and allows for comparing psychological differentiation between and within individuals. The simulation results showed a good parameter recovery for the new model with existing computer programs. This article provides an empirical example of an ipsative test on work style and behaviors.
Collapse
Affiliation(s)
| | - Xue-Lan Qiu
- The Education University of Hong Kong, Hong Kong
| | | | | | - Kuan-Yu Jin
- The Education University of Hong Kong, Hong Kong
| |
Collapse
|
9
|
Dittrich R, Francis B, Hatzinger R, Katzenbeisser W. A paired comparison approach for the analysis of sets of Likert-scale responses. STAT MODEL 2016. [DOI: 10.1177/1471082x0600700102] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This paper provides an alternative methodology for the analysis of a set of Likert responses measured on a common attitudinal scale when the primary focus of interest is on the relative importance of items in the set. The method makes fewer assumptions about the distribution of the responses than the more usual approaches such as comparisons of means, MANOVA or ordinal data methods. The approach transforms the Likert responses into paired comparison responses between the items. The complete multivariate pattern of responses thus produced can be analyzed by an appropriately reformulated paired comparison model. The dependency structure between item responses can also be modelled flexibly. The advantage of this approach is that sets of Likert responses can be analyzed simultaneously within the Generalized Linear Model framework, providing standard likelihood-based inference for model selection. This method is applied to a recent international survey on the importance of environmental problems.
Collapse
Affiliation(s)
- Regina Dittrich
- Regina Dittrich, Vienna University of Economics, Department of Statistics
and Mathematics, Augasse 2-6, A-1090 Vienna, Austria
| | | | | | | |
Collapse
|
10
|
Abstract
To prevent response bias, personality questionnaires may use comparative response formats. These include forced choice, where respondents choose among a number of items, and quantitative comparisons, where respondents indicate the extent to which items are preferred to each other. The present article extends Thurstonian modeling of binary choice data to "proportion-of-total" (compositional) formats. Following the seminal work of Aitchison, compositional item data are transformed into log ratios, conceptualized as differences of latent item utilities. The mean and covariance structure of the log ratios is modeled using confirmatory factor analysis (CFA), where the item utilities are first-order factors, and personal attributes measured by a questionnaire are second-order factors. A simulation study with two sample sizes, N = 300 and N = 1,000, shows that the method provides very good recovery of true parameters and near-nominal rejection rates. The approach is illustrated with empirical data from N = 317 students, comparing model parameters obtained with compositional and Likert-scale versions of a Big Five measure. The results show that the proposed model successfully captures the latent structures and person scores on the measured traits.
Collapse
Affiliation(s)
- Anna Brown
- a School of Psychology, University of Kent
| |
Collapse
|
11
|
Brown A. Item Response Models for Forced-Choice Questionnaires: A Common Framework. PSYCHOMETRIKA 2016; 81:135-60. [PMID: 25663304 DOI: 10.1007/s11336-014-9434-9] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Indexed: 05/26/2023]
Abstract
In forced-choice questionnaires, respondents have to make choices between two or more items presented at the same time. Several IRT models have been developed to link respondent choices to underlying psychological attributes, including the recent MUPP (Stark et al. in Appl Psychol Meas 29:184-203, 2005) and Thurstonian IRT (Brown and Maydeu-Olivares in Educ Psychol Meas 71:460-502, 2011) models. In the present article, a common framework is proposed that describes forced-choice models along three axes: (1) the forced-choice format used; (2) the measurement model for the relationships between items and psychological attributes they measure; and (3) the decision model for choice behavior. Using the framework, fundamental properties of forced-choice measurement of individual differences are considered. It is shown that the scale origin for the attributes is generally identified in questionnaires using either unidimensional or multidimensional comparisons. Both dominance and ideal point models can be used to provide accurate forced-choice measurement; and the rules governing accurate person score estimation with these models are remarkably similar.
Collapse
Affiliation(s)
- Anna Brown
- School of Psychology, University of Kent, Canterbury, Kent, CT2 7NP , UK.
| |
Collapse
|
12
|
Culpepper SA, Balamuta JJ. A Hierarchical Model for Accuracy and Choice on Standardized Tests. PSYCHOMETRIKA 2015; 82:10.1007/s11336-015-9484-7. [PMID: 26608961 DOI: 10.1007/s11336-015-9484-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Indexed: 06/05/2023]
Abstract
This paper assesses the psychometric value of allowing test-takers choice in standardized testing. New theoretical results examine the conditions where allowing choice improves score precision. A hierarchical framework is presented for jointly modeling the accuracy of cognitive responses and item choices. The statistical methodology is disseminated in the 'cIRT' R package. An 'answer two, choose one' (A2C1) test administration design is introduced to avoid challenges associated with nonignorable missing data. Experimental results suggest that the A2C1 design and payout structure encouraged subjects to choose items consistent with their cognitive trait levels. Substantively, the experimental data suggest that item choices yielded comparable information and discrimination ability as cognitive items. Given there are no clear guidelines for writing more or less discriminating items, one practical implication is that choice can serve as a mechanism to improve score precision.
Collapse
Affiliation(s)
- Steven Andrew Culpepper
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 South Wright Street, Champaign, IL, 61820 , USA.
| | - James Joseph Balamuta
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 South Wright Street, Champaign, IL, 61820 , USA.
| |
Collapse
|
13
|
Crocker C, Thomson DM. Anchored scaling in best–worst experiments: A process for facilitating comparison of conceptual profiles. Food Qual Prefer 2014. [DOI: 10.1016/j.foodqual.2013.11.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
14
|
Abstract
BACKGROUND The original SF-6D valuation study collected 3503 standard gambled responses from 611 UK respondents to predict quality-adjusted life year (QALY) values. METHODS Using 19,980 paired comparison responses from 666 US respondents and a stacked probit model, the 25 coefficients of the original SF-6D multiattribute utility (MAU) regression were estimated, such that each coefficient represents a QALY decrement. The US QALY predictions were compared with UK predictions using 8428 SF-6D states in the US Medicare Health Outcomes Survey (MHOS), 1998 to 2003. RESULTS Twenty-two of the 25 decrements in the SF-6D MAU regression are statistically significant. The remaining decrements are insignificant based on US and UK results. The US and UK QALY predictions for the MHOS SF-6D states are remarkably similar given differences in experimental design, format, and sampling (Lin's coefficient of agreement, 0.941; absolute mean difference, 0.043). Limitations. The underlying theoretical framework for the STUDY DESIGN and econometric analysis builds from the episodic random utility model and the concept of QALYs and inherits their limitations. CONCLUSIONS This study enhances the potential for US comparative effectiveness research by translating SF-6D states into US QALYs as well as improves upon discrete choice experiment design and econometric methods for health valuation.
Collapse
Affiliation(s)
- Benjamin M Craig
- Health Outcomes & Behavior Program, Moffitt Cancer Center, Tampa, Florida (BMC)
- Department of Economics, University of South Florida, Tampa, Florida (BMC)
| | - A Simon Pickard
- Center for Pharmacoeconomic Research and Department of Pharmacy Practice, University of Illinois at Chicago, Chicago, Illinois (ASP)
| | - Elly Stolk
- Department of Health Policy and Management, Erasmus University Rotterdam, Rotterdam, The Netherlands (ES)
| | - John E Brazier
- Health Economics, Health Economics and Decision Science, School of Health and Related Research, University of Sheffield, Sheffield, UK (JEB)
| |
Collapse
|
15
|
Abstract
Two studies investigated the utility of indirect scaling methods, based on graded pair comparisons, for the testing of quantitative emotion theories. In Study 1, we measured the intensity of relief and disappointment caused by lottery outcomes, and in Study 2, the intensity of disgust evoked by pictures, using both direct intensity ratings and graded pair comparisons. The stimuli were systematically constructed to reflect variables expected to influence the intensity of the emotions according to theoretical models of relief/disappointment and disgust, respectively. Two probabilistic scaling methods were used to estimate scale values from the pair comparison judgements: Additive functional measurement (AFM) and maximum likelihood difference scaling (MLDS). The emotion models were fitted to the direct and indirect intensity measurements using nonlinear regression (Study 1) and analysis of variance (Study 2). Both studies found substantially improved fits of the emotion models for the indirectly determined emotion intensities, with their advantage being evident particularly at the level of individual participants. The results suggest that indirect scaling methods yield more precise measurements of emotion intensity than rating scales and thereby provide stronger tests of emotion theories in general and quantitative emotion theories in particular.
Collapse
Affiliation(s)
- Martin Junge
- a Institute of Psychology , University of Greifswald , Greifswald , Germany
| | | |
Collapse
|
16
|
Cattelan M. Models for Paired Comparison Data: A Review with Emphasis on Dependent Data. Stat Sci 2012. [DOI: 10.1214/12-sts396] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
Stark S, Chernyshenko OS, Drasgow F, White LA. Adaptive Testing With Multidimensional Pairwise Preference Items. ORGANIZATIONAL RESEARCH METHODS 2012. [DOI: 10.1177/1094428112444611] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Affiliation(s)
- Stephen Stark
- Department of Psychology, University of South Florida, Tampa, FL, USA
| | | | - Fritz Drasgow
- Department of Psychology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Leonard A. White
- U.S. Army Research Institute for the Behavioral and Social Sciences, Arlington, VA, USA
| |
Collapse
|
18
|
Automated assembly of optimally spaced and balanced paired comparisons: controlling order effects. Behav Res Methods 2011; 44:753-64. [PMID: 22090261 DOI: 10.3758/s13428-011-0170-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
To control order effects in questionnaires containing paired comparisons, Ross (1934) described an optimal ordering of the pairings. The pairs can also be balanced so that every stimulus appears equal numbers of times as the first and the second member of a pair. First, we describe and illustrate the optimally spaced, balanced ordering of pairings. Then we show how the optimally spaced, balanced order can be used to implement a matrix-sampling design or a fully incomplete design when the number of stimuli n is so large that respondents cannot reasonably be expected to judge all n(n - 1)/2 pairs. The algorithm for balancing and optimally spacing the list of pairs is described.
Collapse
|
19
|
Maydeu-Olivares A, Brown A. Item Response Modeling of Paired Comparison and Ranking Data. MULTIVARIATE BEHAVIORAL RESEARCH 2010; 45:935-74. [PMID: 26760724 DOI: 10.1080/00273171.2010.531231] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The comparative format used in ranking and paired comparisons tasks can significantly reduce the impact of uniform response biases typically associated with rating scales. Thurstone's (1927, 1931) model provides a powerful framework for modeling comparative data such as paired comparisons and rankings. Although Thurstonian models are generally presented as scaling models, that is, stimuli-centered models, they can also be used as person-centered models. In this article, we discuss how Thurstone's model for comparative data can be formulated as item response theory models so that respondents' scores on underlying dimensions can be estimated. Item parameters and latent trait scores can be readily estimated using a widely used statistical modeling program. Simulation studies show that item characteristic curves can be accurately estimated with as few as 200 observations and that latent trait scores can be recovered to a high precision. Empirical examples are given to illustrate how the model may be applied in practice and to recommend guidelines for designing ranking and paired comparisons tasks in the future.
Collapse
|
20
|
Kuhn KM. Compensation as a signal of organizational culture: the effects of advertising individual or collective incentives. INTERNATIONAL JOURNAL OF HUMAN RESOURCE MANAGEMENT 2009. [DOI: 10.1080/09585190902985293] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Kristine M. Kuhn
- a Department of Management , Washington State University , Pullman, Washington , USA
| |
Collapse
|
21
|
Chernyshenko OS, Stark S, Prewett MS, Gray AA, Stilson FR, Tuttle MD. Normative Scoring of Multidimensional Pairwise Preference Personality Scales Using IRT: Empirical Comparisons With Other Formats. HUMAN PERFORMANCE 2009. [DOI: 10.1080/08959280902743303] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
22
|
Chuang Y, Chen LL, Chuang MC. Computer-based rating method for evaluating multiple visual stimuli on multiple scales. COMPUTERS IN HUMAN BEHAVIOR 2008. [DOI: 10.1016/j.chb.2007.08.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
Twin Analysis on Paired Comparison Data. Behav Genet 2007; 38:212-22. [DOI: 10.1007/s10519-007-9183-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2007] [Accepted: 11/01/2007] [Indexed: 10/22/2022]
|
24
|
Assessing vocational interests in the Basque Country using paired comparison design. JOURNAL OF VOCATIONAL BEHAVIOR 2007. [DOI: 10.1016/j.jvb.2007.04.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
25
|
Maydeu-Olivares A, Hernández A. Identification and Small Sample Estimation of Thurstone's Unrestricted Model for Paired Comparisons Data. MULTIVARIATE BEHAVIORAL RESEARCH 2007; 42:323-347. [PMID: 26765490 DOI: 10.1080/00273170701360555] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The interpretation of a Thurstonian model for paired comparisons where the utilities' covariance matrix is unrestricted proved to be difficult due to the comparative nature of the data. We show that under a suitable constraint the utilities' correlation matrix can be estimated, yielding a readily interpretable solution. This set of identification constraints can recover any true utilities' covariance matrix, but it is not unique. Indeed, we show how to transform the estimated correlation matrix into alternative correlation matrices that are equally consistent with the data but may be more consistent with substantive theory. Also, we show how researchers can investigate the sample size needed to estimate a particular model by exploiting the simulation capabilities of a popular structural equation modeling statistical package.
Collapse
|
26
|
Böckenholt U. Thurstonian-Based Analyses: Past, Present, and Future Utilities. PSYCHOMETRIKA 2006; 71:615-629. [PMID: 20046841 PMCID: PMC2798976 DOI: 10.1007/s11336-006-1598-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2006] [Accepted: 12/22/2006] [Indexed: 05/28/2023]
Abstract
Current psychometric models of choice behavior are strongly influenced by Thurstone's (1927, 1931) experimental and statistical work on measuring and scaling preferences. Aided by advances in computational techniques, choice models can now accommodate a wide range of different data types and sources of preference variability among respondents induced by such diverse factors as person-specific choice sets or different functional forms for the underlying utility representations. At the same time, these models are increasingly challenged by behavioral work demonstrating the prevalence of choice behavior that is not consistent with the underlying assumptions of these models. I discuss new modeling avenues that can account for such seemingly inconsistent choice behavior and conclude by emphasizing the interdisciplinary frontiers in the study of choice behavior and the resulting challenges for psychometricians.
Collapse
|
27
|
|
28
|
14 Random-Effects Models for Preference Data. ACTA ACUST UNITED AC 2006. [DOI: 10.1016/s0169-7161(06)26014-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
29
|
Maydeu-Olivares A, Böckenholt U. Structural Equation Modeling of Paired-Comparison and Ranking Data. Psychol Methods 2005; 10:285-304. [PMID: 16221029 DOI: 10.1037/1082-989x.10.3.285] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
L. L. Thurstone's (1927) model provides a powerful framework for modeling individual differences in choice behavior. An overview of Thurstonian models for comparative data is provided, including the classical Case V and Case III models as well as more general choice models with unrestricted and factor-analytic covariance structures. A flow chart summarizes the model selection process. The authors show how to embed these models within a more familiar structural equation modeling (SEM) framework. The different special cases of Thurstone's model can be estimated with a popular SEM statistical package, including factor analysis models for paired comparisons and rankings. Only minor modifications are needed to accommodate both types of data. As a result, complex models for comparative judgments can be both estimated and tested efficiently.
Collapse
|