1
|
Akhtar H, Kovacs K. Measuring Process Factors of Fluid Reasoning Using Multidimensional Computerized Adaptive Testing. Assessment 2024:10731911241236351. [PMID: 38491853 DOI: 10.1177/10731911241236351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2024]
Abstract
Although many fluid reasoning (Gf) tests have been developed, there is a lack of figural tests measuring its lower-order process factors simultaneously. The present article introduces the development of the Multidimensional Induction-Deduction Computerized Adaptive Test (MID-CAT) to measure two process factors of Gf. The MID-CAT is designed to provide an instrument that is flexible, efficient, and entirely free for non-commercial use. We created 530 items and administered them to a sample of N = 2,247. Items were fitted and calibrated using the Rasch model. The results indicate that the final item pool has a wide range of difficulties that could precisely measure a wide range of test-takers' abilities. A simulation study also indicates that MID-CAT provides greater measurement efficiency than separate-unidimensional CAT or fixed-item test. In the discussion, we provide perspectives on how the MID-CAT can be used for future research.
Collapse
Affiliation(s)
- Hanif Akhtar
- ELTE Eötvös Loránd University, Budapest, Hungary
- University of Muhammadiyah Malang, Indonesia
| | | |
Collapse
|
2
|
Heltne A, Braeken J, Hummelen B, Germans Selvik S, Buer Christensen T, Paap MCS. Do Flexible Administration Procedures Promote Individualized Clinical Assessments? An Explorative Analysis of How Clinicians Utilize the Funnel Structure of the SCID-5-AMPD Module I: LPFS. J Pers Assess 2023; 105:636-646. [PMID: 36511879 DOI: 10.1080/00223891.2022.2152344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/27/2022] [Accepted: 11/08/2022] [Indexed: 12/15/2022]
Abstract
The current study examined clinicians' utilization of the SCID-5-AMPD-I funnel structure. Across 237 interviews, conducted as part of the NorAMP study, we found that clinicians administered on average 2-3 adjacent levels under each subdomain, effectively administering only about 50% of available items. Comparing administration patterns of interviews, no two interviews contained the exact same set of administered items. On average, when comparing individual interviews, only about half of the administered items in each interview were administered in both interviews. Cross-classified mixed effects models were estimated to examine the factors affecting item administration. Results indicated that the interplay between patient preliminary scores and item level had a substantial impact on item administration, suggesting clinicians tend to administer items corresponding to expected patient severity. Overall, our findings suggest clinicians utilize the SCID-5-AMPD-I funnel structure to conduct efficient and individually tailored assessments informed by relevant patient characteristics. Adopting similar non-fixed administration procedures for other interviews could potentially provide similar benefits compared to traditional fixed-form administration procedures. The current study can serve as a template for verifying and evaluating future adoptions of non-fixed administration procedures in other interviews.
Collapse
Affiliation(s)
- Aleksander Heltne
- Department of Research and Innovation, Clinic for Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
- Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Johan Braeken
- Department of Research and Innovation, Clinic for Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
- Centre for Educational Measurement, University of Oslo (CEMO), Oslo, Norway
| | - Benjamin Hummelen
- Department of Research and Innovation, Clinic for Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
| | - Sara Germans Selvik
- Department of Psychiatry, Helse Nord-Trønderlag, Namsos Hospital, Namsos, Norway
- Department of Mental Health, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | | | - Muirne C S Paap
- Department of Research and Innovation, Clinic for Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
- Department of Child and Family Welfare, Faculty of Behavioural and Social Sciences, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
3
|
Giordano A, Testa S, Bassi M, Cilia S, Bertolotto A, Quartuccio ME, Pietrolongo E, Falautano M, Grobberio M, Niccolai C, Allegri B, Viterbo RG, Confalonieri P, Giovannetti AM, Cocco E, Grasso MG, Lugaresi A, Ferriani E, Nocentini U, Zaffaroni M, De Livera A, Jelinek G, Solari A, Rosato R. Applying multidimensional computerized adaptive testing to the MSQOL-54: a simulation study. Health Qual Life Outcomes 2023; 21:61. [PMID: 37357308 DOI: 10.1186/s12955-023-02152-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 06/15/2023] [Indexed: 06/27/2023] Open
Abstract
BACKGROUND The Multiple Sclerosis Quality of Life-54 (MSQOL-54) is one of the most commonly-used MS-specific health-related quality of life (HRQOL) measures. It is a multidimensional, MS-specific HRQOL inventory, which includes the generic SF-36 core items, supplemented with 18 MS-targeted items. Availability of an adaptive short version providing immediate item scoring may improve instrument usability and validity. However, multidimensional computerized adaptive testing (MCAT) has not been previously applied to MSQOL-54 items. We thus aimed to apply MCAT to the MSQOL-54 and assess its performance. METHODS Responses from a large international sample of 3669 MS patients were assessed. We calibrated 52 (of the 54) items using bifactor graded response model (10 group factors and one general HRQOL factor). Then, eight simulations were run with different termination criteria: standard errors (SE) for the general factor and group factors set to different values, and change in factor estimates from one item to the next set at < 0.01 for both the general and the group factors. Performance of the MCAT was assessed by the number of administered items, root mean square difference (RMSD), and correlation. RESULTS Eight items were removed due to local dependency. The simulation with SE set to 0.32 (general factor), and no SE thresholds (group factors) provided satisfactory performance: the median number of administered items was 24, RMSD was 0.32, and correlation was 0.94. CONCLUSIONS Compared to the full-length MSQOL-54, the simulated MCAT required fewer items without losing precision for the general HRQOL factor. Further work is needed to add/integrate/revise MSQOL-54 items in order to make the calibration and MCAT performance efficient also on group factors, so that the MCAT version may be used in clinical practice and research.
Collapse
Affiliation(s)
- Andrea Giordano
- Unit of Neuroepidemiology, Fondazione IRRCS Istituto Neurologico Carlo Besta, Via Celoria 11, Milan, 20133, Italy
- Department of Psychology, University of Turin, Turin, Italy
| | - Silvia Testa
- Department of Human and Social Sciences, University of Aosta Valley, Aosta, Italy
| | - Marta Bassi
- Department of Biomedical and Clinical Sciences, Università di Milano, Milan, Italy
| | - Sabina Cilia
- Department of Territorial Activities, Azienda Sanitaria Provinciale, Health District, Catania, Italy
| | - Antonio Bertolotto
- Neurology Unit & Regional Referral Multiple Sclerosis Centre (CReSM), University Hospital San Luigi Gonzaga, Orbassano, Italy
| | | | - Erika Pietrolongo
- Department of Neurosciences, Imaging and Clinical Sciences, University G. d'Annunzio, Chieti, Italy
| | - Monica Falautano
- Psychological Service - Neurological and Neurological Rehabilitation Units, IRCCS San Raffaele, Milan, Italy
| | - Monica Grobberio
- Laboratory of Clinical Neuropsychology, Psychology Unit, ASST Lariana, Como, Italy
| | | | - Beatrice Allegri
- Multiple Sclerosis Center, Neurology Unit, Hospital of Vaio, Fidenza, Italy
| | | | - Paolo Confalonieri
- Multiple Sclerosis Center, Unit of Neuroimmunology and Neuromuscular Diseases, Fondazione IRRCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Ambra Mara Giovannetti
- Unit of Neuroepidemiology, Fondazione IRRCS Istituto Neurologico Carlo Besta, Via Celoria 11, Milan, 20133, Italy
- Multiple Sclerosis Center, Unit of Neuroimmunology and Neuromuscular Diseases, Fondazione IRRCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Eleonora Cocco
- Department of Medical Science and Public Health, University of Cagliari, Cagliari, Italy
- Multiple Sclerosis Center, ASL Cagliari, ATS Sardegna, Cagliari, Italy
| | | | - Alessandra Lugaresi
- Dipartimento di Scienze Biomediche e Neuromotorie, Università di Bologna, Bologna, Italy
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy
| | - Elisa Ferriani
- UOC Psicologia Ospedaliera, AUSL di Bologna, Bologna, Italy
| | - Ugo Nocentini
- Department of Clinical Sciences and Translational Medicine, University of Rome "Tor Vergata", Rome, Italy
- Behavioral Neuropsychology Laboratory, IRCCS S. Lucia Foundation, Rome, Italy
| | - Mauro Zaffaroni
- Neurologia ad indirizzo Neuroimmunologico - Centro Sclerosi Multipla, Ospedale di Gallarate - ASST della Valle Olona, Gallarate, Italy
| | - Alysha De Livera
- Mathematics and Statistics, La Trobe University, Melbourne, Australia
- Neuroepidemiology Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Australia
| | - George Jelinek
- Neuroepidemiology Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Australia
| | - Alessandra Solari
- Unit of Neuroepidemiology, Fondazione IRRCS Istituto Neurologico Carlo Besta, Via Celoria 11, Milan, 20133, Italy.
| | - Rosalba Rosato
- Department of Psychology, University of Turin, Turin, Italy
| |
Collapse
|
4
|
Frans N, Braeken J, Veldkamp BP, Paap MCS. Empirical Priors in Polytomous Computerized Adaptive Tests: Risks and Rewards in Clinical Settings. APPLIED PSYCHOLOGICAL MEASUREMENT 2023; 47:48-63. [PMID: 36425285 PMCID: PMC9679926 DOI: 10.1177/01466216221124091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The use of empirical prior information about participants has been shown to substantially improve the efficiency of computerized adaptive tests (CATs) in educational settings. However, it is unclear how these results translate to clinical settings, where small item banks with highly informative polytomous items often lead to very short CATs. We explored the risks and rewards of using prior information in CAT in two simulation studies, rooted in applied clinical examples. In the first simulation, prior precision and bias in the prior location were manipulated independently. Our results show that a precise personalized prior can meaningfully increase CAT efficiency. However, this reward comes with the potential risk of overconfidence in wrong empirical information (i.e., using a precise severely biased prior), which can lead to unnecessarily long tests, or severely biased estimates. The latter risk can be mitigated by setting a minimum number of items that are to be administered during the CAT, or by setting a less precise prior; be it at the expense of canceling out any efficiency gains. The second simulation, with more realistic bias and precision combinations in the empirical prior, places the prevalence of the potential risks in context. With similar estimation bias, an empirical prior reduced CAT test length, compared to a standard normal prior, in 68% of cases, by a median of 20%; while test length increased in only 3% of cases. The use of prior information in CAT seems to be a feasible and simple method to reduce test burden for patients and clinical practitioners alike.
Collapse
Affiliation(s)
- Niek Frans
- Department of Research and
Innovation, Division of Mental Health and Addiction, Oslo University
Hospital, Oslo, Norway
- The Nieuwenhuis Institute for
Educational Research, Faculty of Behavioural and Social Sciences, University of Groningen, Groningen, The Netherlands
| | - Johan Braeken
- Department of Research and
Innovation, Division of Mental Health and Addiction, Oslo University
Hospital, Oslo, Norway
- Centre for Educational Measurement
at the University of Oslo (CEMO), Faculty of Educational Sciences, University of Oslo, Oslo, Norway
| | - Bernard P. Veldkamp
- Department of Research Methodology,
Measurement and Data Analysis, Faculty of Behavioural, Management and Social
Sciences, University of Twente, Enschede, The Netherlands
| | - Muirne C. S. Paap
- Department of Research and
Innovation, Division of Mental Health and Addiction, Oslo University
Hospital, Oslo, Norway
- The Nieuwenhuis Institute for
Educational Research, Faculty of Behavioural and Social Sciences, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
5
|
ŞİMŞEK AS, TAVŞANCIL E. Applicability And Efficiency of a Polytomous IRT-Based Computerized Adaptive Test for Measuring Psychological Traits. EĞITIMDE VE PSIKOLOJIDE ÖLÇME VE DEĞERLENDIRME DERGISI 2022. [DOI: 10.21031/epod.1148313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Currently, research on computerized adaptive testing (CAT) focuses mainly on dichotomous items and cognitive traits (achievement, aptitude, etc.). However, polytomous IRT-based CAT is a promising research area for measuring psychological traits that has attracted much attention. The main purpose of this study is to test the practicality of the polytomous IRT-based CAT and its equivalence with the paper-pencil version. Data were collected from 1449 high school students (45% female), using the paper-pencil version for IRT parameter estimates and CAT simulation studies. For the equivalence study, the research group consisted of 81 students (47% female) who participated in both the paper-pencil and live CAT application. The paper-pencil version of vocational interest inventory consists of 17 factors and 164 items. The simulation study showed that the EAP estimation method and the SE < .500 test termination strategy were superior compared to the other CAT designs. The Item selection did not help to reduce test duration or increase measurement accuracy. As a result, it was found that an area of interest can be assessed with about four items. The results of the live CAT application showed that the estimates of CAT were strongly positively correlated with its paper-pencil version. In addition, the live CAT application increased usability compared to the fixed-length test version by reducing test length by 50% and time by 77%. This study shows that the polytomous IRT-based CAT is applicable and efficient for measuring psychological traits.
Collapse
Affiliation(s)
| | - Ezel TAVŞANCIL
- ANKARA ÜNİVERSİTESİ, EĞİTİM BİLİMLERİ FAKÜLTESİ, ÖLÇME VE DEĞERLENDİRME BÖLÜMÜ, EĞİTİMDE ÖLÇME VE DEĞERLENDİRME ANABİLİM DALI
| |
Collapse
|
6
|
Wang C, Weiss DJ, Su S, Suen KY, Basford J, Cheville AL. Multidimensional Computerized Adaptive Testing: A Potential Path Toward the Efficient and Precise Assessment of Applied Cognition, Daily Activity, and Mobility for Hospitalized Patients. Arch Phys Med Rehabil 2022; 103:S3-S14. [PMID: 35090886 PMCID: PMC9064883 DOI: 10.1016/j.apmr.2022.01.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 12/20/2021] [Accepted: 01/12/2022] [Indexed: 11/17/2022]
Abstract
OBJECTIVE To develop and evaluate an efficient and precise variable-length functional assessment of applied cognition, daily activity, and mobility to inform mobility preservation and rehabilitation service delivery among hospitalized patients. DESIGN A multidimensional item bank tapping into these dimensions was developed, with all items calibrated using a multidimensional graded response model. The items were adaptively selected from the item banks to maximize the test information, and the test ended when a joint stopping rule was satisfied. A simulation study was conducted based on the completed instrument, the Functional Assessment in Acute Care Multidimensional Computerized Adaptive Test (FAMCAT), to compare its measurement precision and efficiency capabilities relative to conventional unidimensional computerized adaptive testing. Precision was measured by the bias and root mean squared error between the estimated and true (ie, simulated) θ estimates, whereas efficiency was measured by average test length. Data were collected by an interviewer reading questions from a tablet computer and entering patients' responses. SETTING A large Midwestern hospital. PARTICIPANTS A total of 4143 patients hospitalized with medical diagnosis and/or surgical complications, with 2060 in the calibration sample and 2083 in the validation cohort. INTERVENTION Not applicable. RESULTS Among the 2083 patients in the validation sample, FAMCAT administration required an average of 6 (SD=3.11) minutes. Ninety-six percent had their tests terminated by the standard error rule after responding to an average of 22.05 (SD=7.98) items, whereas 15 were terminated by the change in θ rule, with an average test length of 45.27 (SD=11.49). The remaining 76 responded until reaching the maximum test length of 60 items. CONCLUSIONS The FAMCAT has the potential to satisfy the need for structured, frequent, and precise assessment of functional domains among hospitalized patients with medical diagnosis and/or surgical complications. The results are promising and may be informative for others who wish to develop similar instruments when concurrent assessment of correlated domains is required.
Collapse
Affiliation(s)
- Chun Wang
- College of Education, University of Washington, Seattle, WA.
| | - David J Weiss
- Department of Psychology, University of Minnesota, Minneapolis, MN
| | - Shiyang Su
- Department of Psychology, University of Central Florida, Orlando, FL
| | - King Yiu Suen
- Department of Psychology, University of Minnesota, Minneapolis, MN
| | - Jeffrey Basford
- Department of Physical Medicine and Rehabilitation, Mayo Clinic, Rochester, MN
| | - Andrea L Cheville
- Department of Physical Medicine and Rehabilitation, Mayo Clinic, Rochester, MN
| |
Collapse
|
7
|
Two-Year Postoperative Validation of Patient-Reported Outcomes Measurement Information System Physical Function After Lumbar Decompression. J Am Acad Orthop Surg 2021; 29:748-757. [PMID: 33999869 DOI: 10.5435/jaaos-d-20-01194] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 04/04/2021] [Indexed: 02/01/2023] Open
Abstract
INTRODUCTION Physical function evaluated by Patient-Reported Outcomes Measurement Information System (PROMIS PF) instrument has been validated through the short-term postsurgical period in spine surgery patients. Evidence for long-term efficacy of PROMIS PF is lacking in lumbar decompression (LD) patients. The objective of this study was to evaluate correlations between PROMIS PF and legacy patient-reported outcome measures for patients undergoing LD. METHODS Consecutive primary or revision, single-level or multilevel LD surgeries were retrospectively reviewed from May 2015 to September 2017. Patients lacking preoperative or 2-year PROMIS PF scores were excluded. Demographics, baseline pathology, and perioperative characteristics were collected, and descriptive statistics performed. Visual Analogue Scale (VAS) back and leg, Oswestry Disability Index (ODI), 12-Item Short Form (SF-12) Physical Composite Score (PCS), and PROMIS PF were collected at preoperative and postoperative timepoints and evaluated for improvement from baseline values. Correlations between PROMIS PF and VAS back, VAS leg, SF-12 PCS, and ODI were calculated and categorized according to the strength of relationship. RESULTS Ninety-two patients were included in this study with 58.7% of LDs performed at the single level. All patient-reported outcome measures demonstrated significant improvement from baseline values at all assessment timepoints (all P < 0.001). Apart from preoperative VAS back and VAS leg, PROMIS PF demonstrated a significant and strong correlation with VAS back, VAS leg, ODI, and SF-12 PCS at all timepoints (all P < 0.001). CONCLUSION PROMIS PF demonstrated a strong correlation with pain, disability, and PF outcome measures throughout the postoperative period out to 2 years after LD. Our study provides longitudinal evidence that the PROMIS PF instrument is a valid measure for PF for patients undergoing LD.
Collapse
|
8
|
Moore TM, Butler ER, Scott JC, Port AM, Ruparel K, Njokweni LJ, Gur RE, Gur RC. When CAT is not an option: complementary methods of test abbreviation for neurocognitive batteries. Cogn Neuropsychiatry 2021; 26:35-54. [PMID: 33308027 PMCID: PMC7855518 DOI: 10.1080/13546805.2020.1859360] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
INTRODUCTION There is an obvious need for efficient measurement of neuropsychiatric phenomena. A proven method-computerized adaptive testing (CAT)-is not feasible for all tests, necessitating alternatives for increasing test efficiency. METHODS We combined/compared two methods for abbreviating rapid tests using two tests unamenable to CAT (a Continuous Performance Test [CPT] and n-back test [NBACK]). N=9,498 (mean age 14.2 years; 52% female) were administered the tests, and abbreviation was accomplished using methods answering two questions: what happens to measurement error as items are removed, and what happens to correlations with validity criteria as items are removed. The first was investigated using quasi-CAT simulation, while the second was investigated using bootstrapped confidence intervals around full-form-short-form comparisons. RESULTS Results for the two methods overlapped, suggesting that the CPT could be abbreviated to 57% of original and NBACK could be abbreviated to 87% of original with the max-acceptable loss of precision and min-acceptable relationships with validity criteria. CONCLUSIONS This method combination shows promise for use in other test types, and the divergent results for the CPT/NBACK demonstrate the methods' abilities to detect when a test should not be shortened. The methods should be used in combination because they emphasize complementary measurement qualities: precision/validity..
Collapse
Affiliation(s)
- Tyler M. Moore
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA,Correspondence concerning this article should be addressed to Tyler M. Moore, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Office B502, Philadelphia, PA 19104.
| | - Ellyn R. Butler
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - J. Cobb Scott
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA,VISN4 Mental Illness Research, Education, and Clinical Center at the Philadelphia VA Medical Center, Philadelphia, PA, 19104, USA
| | - Allison M. Port
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kosha Ruparel
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Lucky J. Njokweni
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Raquel E. Gur
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Ruben C. Gur
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA,VISN4 Mental Illness Research, Education, and Clinical Center at the Philadelphia VA Medical Center, Philadelphia, PA, 19104, USA
| |
Collapse
|
9
|
Braeken J, Paap MCS. Making Fixed-Precision Between-Item Multidimensional Computerized Adaptive Tests Even Shorter by Reducing the Asymmetry Between Selection and Stopping Rules. APPLIED PSYCHOLOGICAL MEASUREMENT 2020; 44:531-547. [PMID: 34393302 PMCID: PMC7495795 DOI: 10.1177/0146621620932666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Fixed-precision between-item multidimensional computerized adaptive tests (MCATs) are becoming increasingly popular. The current generation of item-selection rules used in these types of MCATs typically optimize a single-valued objective criterion for multivariate precision (e.g., Fisher information volume). In contrast, when all dimensions are of interest, the stopping rule is typically defined in terms of a required fixed marginal precision per dimension. This asymmetry between multivariate precision for selection and marginal precision for stopping, which is not present in unidimensional computerized adaptive tests, has received little attention thus far. In this article, we will discuss this selection-stopping asymmetry and its consequences, and introduce and evaluate three alternative item-selection approaches. These alternatives are computationally inexpensive, easy to communicate and implement, and result in effective fixed-marginal-precision MCATs that are shorter in test length than with the current generation of item-selection approaches.
Collapse
Affiliation(s)
| | - Muirne C. S. Paap
- University of Groningen, The Netherlands
- Oslo University Hospital, Norway
| |
Collapse
|
10
|
Flens G, Smits N, Terwee CB, Pijck L, Spinhoven P, de Beurs E. Practical Significance of Longitudinal Measurement Invariance Violations in the Dutch-Flemish PROMIS Item Banks for Depression and Anxiety: An Illustration With Ordered-Categorical Data. Assessment 2019; 28:277-294. [PMID: 31625411 DOI: 10.1177/1073191119880967] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
We investigated longitudinal measurement invariance in the Dutch-Flemish PROMIS adult v1.0 item banks for Depression and Anxiety using two clinical samples with mood and anxiety disorders (n = 640 and n = 528, respectively). Factor analysis was used to evaluate whether the item banks were sufficiently unidimensional at two test-occasions and whether the measured constructs remained the same over time. The results indicated that the item banks were sufficiently unidimensional, but the thresholds and residual variances of the constructs changed over time. However, using tentative rules of thumb, these invariance violations did not substantially affect the endorsement of a specific response category of a specific item at a specific test-occasion. Furthermore, the impact on the mean latent change scores of the item banks remained below the proposed cutoff value for substantial bias. These findings suggest that the invariance violations lacked practical significance for test-users, meaning that the item banks provide sufficiently invariant latent factor scores for use in clinical practice.
Collapse
Affiliation(s)
- Gerard Flens
- Alliance for Quality in Mental Health Care, Utrecht, Netherlands
| | - Niels Smits
- University of Amsterdam, Amsterdam, Netherlands
| | | | - Liv Pijck
- Parnassia Psychiatric Institute, The Hague, Netherlands
| | | | | |
Collapse
|
11
|
Segawa E, Schalet B, Cella D. A comparison of computer adaptive tests (CATs) and short forms in terms of accuracy and number of items administrated using PROMIS profile. Qual Life Res 2019; 29:213-221. [DOI: 10.1007/s11136-019-02312-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/30/2019] [Indexed: 11/29/2022]
|
12
|
Introduction to special section: test construction. Qual Life Res 2018; 27:1671-1672. [PMID: 29802512 DOI: 10.1007/s11136-018-1886-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
13
|
Smits N, Paap MCS, Böhnke JR. Some recommendations for developing multidimensional computerized adaptive tests for patient-reported outcomes. Qual Life Res 2018; 27:1055-1063. [PMID: 29476312 PMCID: PMC5874279 DOI: 10.1007/s11136-018-1821-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/21/2018] [Indexed: 10/31/2022]
Abstract
PURPOSE Multidimensional item response theory and computerized adaptive testing (CAT) are increasingly used in mental health, quality of life (QoL), and patient-reported outcome measurement. Although multidimensional assessment techniques hold promises, they are more challenging in their application than unidimensional ones. The authors comment on minimal standards when developing multidimensional CATs. METHODS Prompted by pioneering papers published in QLR, the authors reflect on existing guidance and discussions from different psychometric communities, including guidelines developed for unidimensional CATs in the PROMIS project. RESULTS The commentary focuses on two key topics: (1) the design, evaluation, and calibration of multidimensional item banks and (2) how to study the efficiency and precision of a multidimensional item bank. The authors suggest that the development of a carefully designed and calibrated item bank encompasses a construction phase and a psychometric phase. With respect to efficiency and precision, item banks should be large enough to provide adequate precision over the full range of the latent constructs. Therefore CAT performance should be studied as a function of the latent constructs and with reference to relevant benchmarks. Solutions are also suggested for simulation studies using real data, which often result in too optimistic evaluations of an item bank's efficiency and precision. DISCUSSION Multidimensional CAT applications are promising but complex statistical assessment tools which necessitate detailed theoretical frameworks and methodological scrutiny when testing their appropriateness for practical applications. The authors advise researchers to evaluate item banks with a broad set of methods, describe their choices in detail, and substantiate their approach for validation.
Collapse
Affiliation(s)
- Niels Smits
- Research Institute of Child Development and Education, University of Amsterdam, Nieuwe Achtergracht 127, 1018 WS, Amsterdam, The Netherlands.
| | - Muirne C S Paap
- Department of Special Needs, Education, and Youth Care, Faculty of Behavioural and Social Sciences, University of Groningen, Groningen, The Netherlands
| | - Jan R Böhnke
- Dundee Centre for Health and Related Research, School of Nursing and Health Sciences, University of Dundee, Dundee, UK
| |
Collapse
|