1
|
Molander O, Wennberg P, Dowling NA, Berman AH. Assessing gambling disorder using frequency- and time-based response options: A Rasch analysis of the gambling disorder identification test. Int J Methods Psychiatr Res 2024; 33:e2018. [PMID: 38475935 DOI: 10.1002/mpr.2018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 12/19/2023] [Accepted: 02/28/2024] [Indexed: 03/14/2024] Open
Abstract
OBJECTIVES The Gambling Disorder Identification Test (GDIT) is a recently developed self-report measure. The GDIT includes items with multiple response options that are either based on frequency or time, and item response theory evaluations of these could yield vital knowledge on its measurement performance. METHODS The GDIT was evaluated using Rasch analysis in a study involving 597 Swedish gamblers. RESULTS In a three-dimensional Rasch model, the item response difficulty range extended from -1.88 to 4.06 and increased with higher time- and frequency-based responses. Differential item functioning showed that some GDIT items displayed age and gender-related differences. Additionally, person-separation reliability indicated the GDIT could reliably be divided into three to four diagnostic levels. CONCLUSIONS The frequency- and time-based item response options of the GDIT offer excellent measurement, allowing for elaborate assessment across both lower and higher gambling severity. The GDIT can be used to detect DSM-5 Gambling Disorder, thereby holding significance from both epidemiological and clinical standpoints. Notably, the 3-item GDIT Gambling Behavior subscale also shows potential as a brief screening tool for identifying at-risk gambling behavior.
Collapse
Affiliation(s)
- Olof Molander
- Department of Clinical Neuroscience, Centre for Psychiatry Research, Karolinska Institutet, Solna, Sweden
| | - Peter Wennberg
- Department of Public Health Sciences, Stockholm University, Stockholm, Sweden
- Department of Global Public Health, Karolinska Institutet, Solna, Sweden
- Department of Psychology, Inland Norway University of Applied Sciences, Lillehammer, Norway
| | - Nicki A Dowling
- School of Psychology, Deakin University, Geelong, Victoria, Australia
| | - Anne H Berman
- Department of Clinical Neuroscience, Centre for Psychiatry Research, Karolinska Institutet, Solna, Sweden
- Department of Psychology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
2
|
Holm E, Hansen PB, Romøren ASH, Garmann NG. The Norwegian CDI-III as an assessment tool for lexical and grammatical development in preschoolers. Front Psychol 2023; 14:1175658. [PMID: 37560104 PMCID: PMC10408306 DOI: 10.3389/fpsyg.2023.1175658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 06/09/2023] [Indexed: 08/11/2023] Open
Abstract
Parental report instruments are a non-invasive way to assess children's language development and have proved to give both valid and reliable results when used with children under the age of 2;6 (and in some cases up to 3). In this study we examine the newly developed Norwegian edition of a language assessment tool for older preschoolers: MacArthur-Bates Communicative Development Inventory III (CDI-III), investigating whether this parental report tool can be used for assessing the language of monolingual Norwegian-speaking children between 2;6 and 4 years. NCDI-III results for 100 children between 2;6 and 4.0 are presented. All sections were significantly intercorrelated. All sections except Pronunciation showed growth with age. Internal consistency was measured both in terms of Cronbach's alpha and corrected item-scale correlation, and the results are discussed considering features of item difficulty distribution. Methodological considerations are discussed, as well as implications relevant both for possible later revisions and for CDI-III adaptations to new languages.
Collapse
Affiliation(s)
- Elisabeth Holm
- Department of Early Childhood Education, Oslo Metropolitan University, Oslo, Norway
| | - Pernille Bonnevie Hansen
- Department of Scandinavian Languages and Literature, Inland Norway University of Applied Sciences, Hamar, Norway
| | - Anna Sara H. Romøren
- Department of Early Childhood Education, Oslo Metropolitan University, Oslo, Norway
| | - Nina Gram Garmann
- Department of Early Childhood Education, Oslo Metropolitan University, Oslo, Norway
| |
Collapse
|
3
|
Zhou Y, Jia N. The Impact of Item Difficulty on Judgment of Confidence-A Cross-Level Moderated Mediation Model. J Intell 2023; 11:113. [PMID: 37367515 DOI: 10.3390/jintelligence11060113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 05/26/2023] [Accepted: 06/02/2023] [Indexed: 06/28/2023] Open
Abstract
The factors that influence metacognitive judgments often appear in combination, rather than in isolation. The multi-cue utilization model proposes that individuals often make use of multiple cues when making judgments. Previous studies have focused on the integration of intrinsic and extrinsic cues, while the current investigation examines the integration and influence of intrinsic cues and mnemonic cues. Judgment of confidence is a common form of metacognitive judgment. In this study, 37 college students completed Raven's Progressive Matrices and made judgments of confidence. We used the cross-level moderated mediation model to explore the impact of item difficulty on confidence judgments. Our results indicated that item difficulty negatively predicts the level of confidence. Item difficulty has an impact on the confidence evaluation by altering the processing fluency of intermediate variables. The joint effect of intrinsic cue item difficulty and mnemonic cue processing fluency influences confidence judgments. Additionally, we found that intelligence moderates the effect of difficulty on processing fluency across levels. Specifically, individuals with higher intelligence exhibited lower fluency on difficult tasks and higher fluency on simple tasks than individuals with lower intelligence. These findings expand on the multi-cue utilization model and integrate the influence mechanism of intrinsic and mnemonic cues on confidence judgments. Finally, we propose and verify a cross-level moderated mediation model that explains how item difficulty affects confidence judgments.
Collapse
Affiliation(s)
- Yuke Zhou
- College of Education, Hebei Normal University, Shijiazhuang 050024, China
| | - Ning Jia
- College of Education, Hebei Normal University, Shijiazhuang 050024, China
| |
Collapse
|
4
|
Lee Meeuw Kjoe PR, Vermeulen IE, Agelink van Rentergem JA, van der Wall E, Schagen S. Standardized item selection for alternate computerized versions of Rey Auditory Verbal Learning Test(-based) word lists. J Clin Exp Neuropsychol 2022; 44:681-701. [PMID: 36660813 DOI: 10.1080/13803395.2023.2166904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
INTRODUCTION Despite an increasing need for new Rey Auditory Verbal Learning Test (RAVLT)-based word lists in computerized testing, no criteria or standardized procedures exist for its development. To lay a foundation for future development of new and alternate computerized RAVLT(-based) word lists, we present cross-lingual word criteria, developed new lists using the criteria and evaluated performance on the lists using online assessment. METHOD Based on psycholinguistic literature, we identified relevant word selection criteria. To validate the criteria, we developed two new American-English word lists and one new Dutch list, and administered the RAVLT using visual presentation of the new or original list in an online American (n = 248) and Dutch sample (n = 246) of healthy people. We compared performance of the new and original word lists on trial scores and serial position effects using Bayesian correlations and analyses of variance. Additionally, we compared proportions of correct responses per item, corrected for serial position. RESULTS We identified 13 relevant word selection criteria. The criteria led to two new highly comparable American-English word lists with lower trial scores compared to the original American-English list, indicating that the criteria helped to develop parallel lists with fewer associations between items. The new Dutch word list showed similar trial scores, serial position effects, and proportions of correct responses per item corrected for serial position compared to the original Dutch version. CONCLUSIONS The systematic use of word selection criteria can facilitate development of new parallel word lists, including in new language areas. Future studies should evaluate the use of the word criteria for the other sections of the RAVLT (such as delayed recall and recognition), performance using original test modalities (auditory presentation and recall of words) as well as performance in clinical samples.
Collapse
Affiliation(s)
- Philippe R Lee Meeuw Kjoe
- Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Ivar E Vermeulen
- Department of Communication Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Joost A Agelink van Rentergem
- Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
| | - Elsken van der Wall
- Department of Medical Oncology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Sanne Schagen
- Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands.,Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
5
|
Gaudet J, Thibeault C, Betts L, Mastrilli P, Saeed D, Ilyin N. Supporting Canadian Nursing Students to Write the NCLEX-RN Exam: A Three-Phased Mixed Methods Descriptive Design. Can J Nurs Res 2022; 54:331-344. [PMID: 35658610 DOI: 10.1177/08445621221103933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND In 2015, the College of Nurses of Ontario, replaced the Canadian Registered Nurse Examination with the NCLEX-RN exam as entry-to-practice. Faculty in a college-university partnership searched for products to provide nursing students with focused practice in writing exams modelled on the Canadian NCLEX-RN test plan. PURPOSE The aim of this three-phased evaluation study was to test and validate NCLEX-RN exam preparation materials newly developed for the Canadian context. METHODS A mixed methods descriptive design was used to capture subjective perspectives and objective measures. After ethical approval was obtained, 13 students assessed the e-learning platform's usability. Eight faculty/clinical experts assessed the content validity of materials using a content validity index (CVI) at both item (I-CVI), and scale (S-CVI) levels. Lastly, 72 completed tests served as the basis for assessing psychometric properties of selected test items. RESULTS Materials were assessed as useful and easy to use and navigate. I-CVIs ranged between 0.5 to 1.0 with none falling below 0.5 while S-CVIs were above the standard for acceptability of greater than 0.8 with none falling below 0.9. Overall test reliability measured by the Kuder-Richardson formula was 0.73. Many items assessed for difficulty (64%) showed a proportion of correct responses within desired ranges, and most point-biserial indices ranged from fair to very good. CONCLUSION Strong evidence supported the usability and content validity of the materials assessed. Item difficulty and discrimination analyses were within acceptable ranges. Suggestions for improvements were offered. Predictive analysis should form the basis of future research in this area.
Collapse
Affiliation(s)
- Julie Gaudet
- Sally Horsfall Eaton School of Nursing, 7949George Brown College, Toronto, Canada
| | - Catherine Thibeault
- Trent/Fleming School of Nursing, 104270Trent University, Peterborough, Canada
| | - Lorraine Betts
- Sally Horsfall Eaton School of Nursing, 7949George Brown College, Toronto, Canada
| | - Paula Mastrilli
- Sally Horsfall Eaton School of Nursing, 7949George Brown College, Toronto, Canada
| | - Dalia Saeed
- Trent/Fleming School of Nursing, 104270Trent University, Peterborough, Canada
| | - Nicole Ilyin
- Sally Horsfall Eaton School of Nursing, 7949George Brown College, Toronto, Canada
| |
Collapse
|
6
|
Abstract
A model for multiple-choice exams is developed from a signal-detection perspective. A correct alternative in a multiple-choice exam can be viewed as being a signal embedded in noise (incorrect alternatives). Examinees are assumed to have perceptions of the plausibility of each alternative, and the decision process is to choose the most plausible alternative. It is also assumed that each examinee either knows or does not know each item. These assumptions together lead to a signal detection choice model for multiple-choice exams. The model can be viewed, statistically, as a mixture extension, with random mixing, of the traditional choice model, or similarly, as a grade-of-membership extension. A version of the model with extreme value distributions is developed, in which case the model simplifies to a mixture multinomial logit model with random mixing. The approach is shown to offer measures of item discrimination and difficulty, along with information about the relative plausibility of each of the alternatives. The model, parameters, and measures derived from the parameters are compared to those obtained with several commonly used item response theory models. An application of the model to an educational data set is presented.
Collapse
Affiliation(s)
- Lawrence T. DeCarlo
- Teachers College, Columbia University,
New York, NY, USA
- Lawrence T. DeCarlo, Department of Human
Development, Teachers College, Columbia University, 525 West 120th Street, Box
118, New York, NY 10027-6696, USA.
| |
Collapse
|
7
|
Abstract
A true-false exam can be viewed as being a signal detection task-the task is to detect whether or not an item is true (signal) or false (noise). In terms of signal detection theory (SDT), examinees can be viewed as performing the task by comparing the perceived plausibility of an item (a perceptual component) to a threshold that delineates true from false (a decision component). The resulting model is distinct yet is related to item response theory (IRT) models and grade of membership models, with the difference that SDT explicitly recognizes the role of examinees' perceptions in determining their response to an item. SDT also views IRT concepts such as "difficulty" and "guessing" in a different light, in that both are viewed as reflecting the same aspect-item bias. An application to a true-false algebra exam is presented and the various models are compared.
Collapse
Affiliation(s)
- Lawrence T. DeCarlo
- Columbia University, New York, NY, USA
- Lawrence T. DeCarlo, Department of Human Development, Teachers College, Columbia University, Box 118, 525 West 120th Street, New York, NY 10027-6696, USA.
| |
Collapse
|
8
|
Applegate GM, Sutherland KA, Becker KA, Luo X. The Effect of Option Homogeneity in Multiple-Choice Items. Appl Psychol Meas 2019; 43:113-124. [PMID: 30792559 PMCID: PMC6376538 DOI: 10.1177/0146621618770803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Previous research has found that option homogeneity in multiple-choice items affects item difficulty when items with homogeneous options are compared to the same items with heterogeneous options. This study conducted an empirical test of the effect of option homogeneity in multiple-choice items on a professional licensure examination to determine the predictability and magnitude of the change. Similarity of options to the key was determined by using subject matter experts and a natural language processing algorithm. Contrary to current research, data analysis revealed no consistent effect on item difficulty, discrimination, fit to the measurement model, or response time associated with the absence or presence of option homogeneity. While the results are negative, they call into question established guidelines in item development. A hypothesis is proposed to explain why this effect is found in some studies but not others.
Collapse
Affiliation(s)
| | | | | | - Xiao Luo
- National Council of State Boards of Nursing, Chicago, IL, USA
| |
Collapse
|
9
|
Abstract
In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of item difficulties. The similarity of the models of measurement hampers the dissociation of these factors. Since the item-position effect should theoretically be independent of the item difficulties, the statistical ex post manipulation of the difficulties should enable the discrimination of the two types of factors. This method was investigated in two studies. In the first study, Advanced Progressive Matrices (APM) data of 300 participants were investigated. As expected, the factor thought to be due to the item-position effect was observed. In the second study, using data simulated to show the major characteristics of the APM data, the wide range of items with various difficulties was set to zero to reduce the likelihood of detecting the difficulty factor. Despite this reduction, however, the factor now identified as item-position factor, was observed in virtually all simulated datasets.
Collapse
|
10
|
Kirschstein T, Wolters A, Lenz JH, Fröhlich S, Hakenberg O, Kundt G, Darmüntzel M, Hecker M, Altiner A, Müller-Hilke B. An algorithm for calculating exam quality as a basis for performance-based allocation of funds at medical schools. GMS J Med Educ 2016; 33:Doc44. [PMID: 27275509 PMCID: PMC4894354 DOI: 10.3205/zma001043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2015] [Revised: 02/02/2016] [Accepted: 03/04/2016] [Indexed: 06/06/2023]
Abstract
OBJECTIVE The amendment of the Medical Licensing Act (ÄAppO) in Germany in 2002 led to the introduction of graded assessments in the clinical part of medical studies. This, in turn, lent new weight to the importance of written tests, even though the minimum requirements for exam quality are sometimes difficult to reach. Introducing exam quality as a criterion for the award of performance-based allocation of funds is expected to steer the attention of faculty members towards more quality and perpetuate higher standards. However, at present there is a lack of suitable algorithms for calculating exam quality. METHODS In the spring of 2014, the students' dean commissioned the "core group" for curricular improvement at the University Medical Center in Rostock to revise the criteria for the allocation of performance-based funds for teaching. In a first approach, we developed an algorithm that was based on the results of the most common type of exam in medical education, multiple choice tests. It included item difficulty and discrimination, reliability as well as the distribution of grades achieved. RESULTS This algorithm quantitatively describes exam quality of multiple choice exams. However, it can also be applied to exams involving short assay questions and the OSCE. It thus allows for the quantitation of exam quality in the various subjects and - in analogy to impact factors and third party grants - a ranking among faculty. CONCLUSION Our algorithm can be applied to all test formats in which item difficulty, the discriminatory power of the individual items, reliability of the exam and the distribution of grades are measured. Even though the content validity of an exam is not considered here, we believe that our algorithm is suitable as a general basis for performance-based allocation of funds.
Collapse
Affiliation(s)
- Timo Kirschstein
- Universitätsmedizin Rostock, "core group" zur Verbesserung der Lehre, Rostock, Deutschland
| | - Alexander Wolters
- Universitätsmedizin Rostock, "core group" zur Verbesserung der Lehre, Rostock, Deutschland
| | - Jan-Hendrik Lenz
- Universitätsmedizin Rostock, "core group" zur Verbesserung der Lehre, Rostock, Deutschland
| | - Susanne Fröhlich
- Universitätsmedizin Rostock, "core group" zur Verbesserung der Lehre, Rostock, Deutschland
| | - Oliver Hakenberg
- Universitätsmedizin Rostock, "core group" zur Verbesserung der Lehre, Rostock, Deutschland
| | - Günther Kundt
- Universitätsmedizin Rostock, Institut für Biostatistik und Informatik in Medizin und Alternsforschung, Rostock, Deutschland
| | | | - Michael Hecker
- Universitätsmedizin Rostock, Klinik und Poliklinik für Neurologie, Zentrum für Nervenheilkunde, Rostock, Deutschland
| | - Attila Altiner
- Universitätsmedizin Rostock, Studiendekanat, Rostock, Deutschland
| | | |
Collapse
|
11
|
Matlock KL, Turner R. Unidimensional IRT Item Parameter Estimates Across Equivalent Test Forms With Confounding Specifications Within Dimensions. Educ Psychol Meas 2016; 76:258-279. [PMID: 29795865 PMCID: PMC5965585 DOI: 10.1177/0013164415589756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall. Manipulated variables were the number of items and average item difficulty within subsets of items primarily measuring one of two dimensions. Data sets were simulated at four levels of correlation (0, .3, .6, and .9). Item parameters were estimated using the Rasch and two-parameter logistic unidimensional item response theory models. Estimated discrimination and difficulty were compared across forms and to the true item parameters. The average unidimensional estimated discrimination was consistent across forms having the same correlation. Forms having a larger set of easy items measuring one dimension were estimated as being more difficult than forms having a larger set of hard items. Estimates were also investigated within subsets of items, and measures of bias were reported. This study encourages test developers to not only maintain consistent test specifications across forms as a whole but also within subcontent areas.
Collapse
|