1
|
Van Wijk EV, Donkers J, De Laat PCJ, Meiboom AA, Jacobs B, Ravesloot JH, Tio RA, Van Der Vleuten CPM, Langers AMJ, Bremers AJA. Computer Adaptive vs. Non-adaptive Medical Progress Testing: Feasibility, Test Performance, and Student Experiences. PERSPECTIVES ON MEDICAL EDUCATION 2024; 13:406-416. [PMID: 39071727 PMCID: PMC11276406 DOI: 10.5334/pme.1345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 06/19/2024] [Indexed: 07/30/2024]
Abstract
Background Computerized adaptive testing tailors test items to students' abilities by adapting difficulty level. This more efficient, and reliable assessment form may provide advantages over a conventional medical progress test (PT). Prior to our study, a direct comparison of students' performance on a computer adaptive progress test (CA-PT) and a conventional PT, which is crucial for nationwide implementation of the CA-PT, was missing. Therefore, we assessed the correlation between CA-PT and conventional PT test performance and explored the feasibility and student experiences of CA-PT in a large medical cohort. Methods In this cross-over study medical students (n = 1432) of three Dutch medical schools participated in both a conventional PT and CA-PT. They were stratified to start with either a conventional PT or CA-PT to determine test performance. Student motivation, engagement and experiences were assessed by questionnaires in students from seven Dutch medical schools. Parallel-forms reliability was assessed using the Pearson correlation coefficient. Results A strong correlation was found (0.834) between conventional PT and CA-PT test performance. The CA-PT was administered without system performance issues and was completed in a median time of 83 minutes (67-102 minutes). Questionnaire response rate was 31.7% (526/1658). Despite a higher experienced difficulty, most students reported persistence, adequate task management and good focus during the CA-PT. Conclusions CA-PT provides a reliable estimation of students' ability level in less time than a conventional non-adaptive PT and is feasible in students throughout the entire medical curriculum. Despite the strong correlation between PT scores, students found the CA-PT more challenging.
Collapse
Affiliation(s)
- Elise V. Van Wijk
- Center for Innovation in Medical Education, Leiden University Medical Center, the Netherlands
| | - Jeroen Donkers
- School of Health Professions Education, Faculty of Health, Medicine and Life Sciences, Maastricht University, the Netherlands
| | - Peter C. J. De Laat
- Department of Pediatrics, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Ariadne A. Meiboom
- Department of General Practice and Elderly Care Medicine, Amsterdam University Medical Center, Amsterdam, the Netherlands
| | - Bram Jacobs
- Department of Neurology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Jan Hindrik Ravesloot
- Department of Physiology, Amsterdam University Medical Center, Amsterdam, the Netherlands
| | - René A. Tio
- Department of Cardiology, Catharina Hospital Eindhoven, Eindhoven, the Netherlands
| | - Cees P. M. Van Der Vleuten
- Department of Educational Development and Research, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, the Netherlands
| | - Alexandra M. J. Langers
- Department of Gastroenterology and Hepatology, Leiden University Medical Center, Leiden, the Netherlands
| | - Andre J. A. Bremers
- Department of Surgery, Radboud University Medical Center, Nijmegen, the Netherlands
| |
Collapse
|
2
|
Cao X, Lin Y, Liu D, Zheng F, Duh HBL. Novel item selection strategies for cognitive diagnostic computerized adaptive testing: A heuristic search framework. Behav Res Methods 2024; 56:2859-2885. [PMID: 37749422 DOI: 10.3758/s13428-023-02228-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/28/2023] [Indexed: 09/27/2023]
Abstract
The computerized adaptive form of cognitive diagnostic testing, CD-CAT, has gained increasing attention in the domain of personalized measurements for its ability to categorize individual mastery status of fine-grained attributes more accurately and efficiently through administering items tailored to one's ability progressively. How to select the next item based on previous response(s) is crucial for the success of CD-CAT. Previous item selection strategies for CD-CAT have often followed a greedy or semi-greedy approach, which makes it difficult to strike a balance between diagnostic performance and item bank utilization. To address this issue, this study takes a graph perspective and transforms the item selection problem in CD-CAT into a path-searching problem, in which paths refer to possible test construction and nodes refer to individual items. A heuristic function is defined to predict the prospect of a path, indicating how well the corresponding test can diagnose the current examinee. Two search mechanisms with different biases towards item exposure control are proposed to approximate the optimal path with the best prospect. The first unused item on the resulting path is selected as the next item. The above components compose a novel CD-CAT item selection framework based on heuristic search. Simulation studies are conducted under a variety of conditions regarding bank designs, bank-quality conditions, and testing scenarios. The results are compared with different types of classic item selection strategies in CD-CAT, showing that the proposed framework can enhance bank utilization at a smaller cost of diagnostic performance.
Collapse
Affiliation(s)
- Xi Cao
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Ying Lin
- Department of Psychology, Sun Yat-sen University, Guangzhou, China.
| | - Dong Liu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
- Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang, China
| | - Fudan Zheng
- School of Computer Engineering, Guangzhou City University of Technology, Guangzhou, China
| | - Henry Been-Lirn Duh
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
- PolyU-NVIDIA Joint Research Centre, Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
3
|
Kang HA, Arbet G, Betts J, Muntean W. Location-Matching Adaptive Testing for Polytomous Technology-Enhanced Items. APPLIED PSYCHOLOGICAL MEASUREMENT 2024; 48:57-76. [PMID: 38327610 PMCID: PMC10846469 DOI: 10.1177/01466216241227548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
The article presents adaptive testing strategies for polytomously scored technology-enhanced innovative items. We investigate item selection methods that match examinee's ability levels in location and explore ways to leverage test-taking speeds during item selection. Existing approaches to selecting polytomous items are mostly based on information measures and tend to experience an item pool usage problem. In this study, we introduce location indices for polytomous items and show that location-matched item selection significantly improves the usage problem and achieves more diverse item sampling. We also contemplate matching items' time intensities so that testing times can be regulated across the examinees. Numerical experiment from Monte Carlo simulation suggests that location-matched item selection achieves significantly better and more balanced item pool usage. Leveraging working speed in item selection distinctly reduced the average testing times as well as variation across the examinees. Both the procedures incurred marginal measurement cost (e.g., precision and efficiency) and yet showed significant improvement in the administrative outcomes. The experiment in two test settings also suggested that the procedures can lead to different administrative gains depending on the test design.
Collapse
Affiliation(s)
| | | | - Joe Betts
- National Council of State Boards of Nursing, IL, USA
| | | |
Collapse
|
4
|
Di Sandro A, Moore TM, Zoupou E, Kennedy KP, Lopez KC, Ruparel K, Njokweni LJ, Rush S, Daryoush T, Franco O, Gorgone A, Savino A, Didier P, Wolf DH, Calkins ME, Cobb Scott J, Gur RE, Gur RC. Validation of the cognitive section of the Penn computerized adaptive test for neurocognitive and clinical psychopathology assessment (CAT-CCNB). Brain Cogn 2024; 174:106117. [PMID: 38128447 PMCID: PMC10799332 DOI: 10.1016/j.bandc.2023.106117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/27/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023]
Abstract
BACKGROUND The Penn Computerized Neurocognitive Battery is an efficient tool for assessing brain-behavior domains, and its efficiency was augmented via computerized adaptive testing (CAT). This battery requires validation in a separate sample to establish psychometric properties. METHODS In a mixed community/clinical sample of N = 307 18-to-35-year-olds, we tested the relationships of the CAT tests with the full-form tests. We compared discriminability among recruitment groups (psychosis, mood, control) and examined how their scores relate to demographics. CAT-Full relationships were evaluated based on a minimum inter-test correlation of 0.70 or an inter-test correlation within at least 0.10 of the full-form correlation with a previous administration of the full battery. Differences in criterion relationships were tested via mixed models. RESULTS Most tests (15/17) met the minimum criteria for replacing the full-form with the updated CAT version (mean r = 0.67; range = 0.53-0.80) when compared to relationships of the full-forms with previous administrations of the full-forms (mean r = 0.68; range = 0.50-0.85). Most (16/17) CAT-based relationships with diagnostics and other validity criteria were indistinguishable (interaction p > 0.05) from their full-form counterparts. CONCLUSIONS The updated CNB shows psychometric properties acceptable for research. The full-forms of some tests should be retained due to insufficient time savings to justify the loss in precision.
Collapse
Affiliation(s)
- Akira Di Sandro
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Tyler M Moore
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA.
| | - Eirini Zoupou
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| | - Kelly P Kennedy
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| | - Katherine C Lopez
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Kosha Ruparel
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| | - Lucky J Njokweni
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sage Rush
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| | - Tarlan Daryoush
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| | - Olivia Franco
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| | - Alesandra Gorgone
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| | - Andrew Savino
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Paige Didier
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Daniel H Wolf
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| | - Monica E Calkins
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| | - J Cobb Scott
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; VISN4 Mental Illness Research, Education, and Clinical Center at the Philadelphia VA Medical Center, 19104, USA
| | - Raquel E Gur
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| | - Ruben C Gur
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Lifespan Brain Institute (LiBI), Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| |
Collapse
|
5
|
Wang D, Ma W, Cai Y, Tu D. A general nonparametric classification method for multiple strategies in cognitive diagnostic assessment. Behav Res Methods 2024; 56:723-735. [PMID: 36814008 DOI: 10.3758/s13428-023-02075-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/17/2023] [Indexed: 02/24/2023]
Abstract
Cognitive diagnosis models (CDMs) have been used as psychometric tools in educational assessments to estimate students' strengths and weaknesses in terms of cognitive skills learned and skills that need study. In practice, it is not uncommon that questions can often be solved using more than one strategy, which requires CDMs capable of accommodating multiple strategies. However, existing parametric multi-strategy CDMs need a large sample size to produce a reliable estimation of item parameters and examinees' proficiency class memberships, which obstructs their practical applications. This article proposes a general nonparametric multi-strategy classification method with promising classification accuracy in small samples for dichotomous response data. The method can accommodate different strategy selection approaches and different condensation rules. Simulation studies showed that the proposed method outperformed the parametric CDMs when sample sizes were small. A set of real data was analyzed as well to illustrate the application of the proposed method in practice.
Collapse
Affiliation(s)
- Daxun Wang
- School of Psychology, Jiangxi Normal University, 99 Ziyang Ave, Nanchang, Jiangxi, 330022, China
| | - Wenchao Ma
- Department of Educational Studies in Psychology, Research Methodology and Counseling, The University of Alabama, Tuscaloosa, AL, USA
| | - Yan Cai
- School of Psychology, Jiangxi Normal University, 99 Ziyang Ave, Nanchang, Jiangxi, 330022, China.
| | - Dongbo Tu
- School of Psychology, Jiangxi Normal University, 99 Ziyang Ave, Nanchang, Jiangxi, 330022, China.
| |
Collapse
|
6
|
Heltne A, Braeken J, Hummelen B, Germans Selvik S, Buer Christensen T, Paap MCS. Do Flexible Administration Procedures Promote Individualized Clinical Assessments? An Explorative Analysis of How Clinicians Utilize the Funnel Structure of the SCID-5-AMPD Module I: LPFS. J Pers Assess 2023; 105:636-646. [PMID: 36511879 DOI: 10.1080/00223891.2022.2152344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/27/2022] [Accepted: 11/08/2022] [Indexed: 12/15/2022]
Abstract
The current study examined clinicians' utilization of the SCID-5-AMPD-I funnel structure. Across 237 interviews, conducted as part of the NorAMP study, we found that clinicians administered on average 2-3 adjacent levels under each subdomain, effectively administering only about 50% of available items. Comparing administration patterns of interviews, no two interviews contained the exact same set of administered items. On average, when comparing individual interviews, only about half of the administered items in each interview were administered in both interviews. Cross-classified mixed effects models were estimated to examine the factors affecting item administration. Results indicated that the interplay between patient preliminary scores and item level had a substantial impact on item administration, suggesting clinicians tend to administer items corresponding to expected patient severity. Overall, our findings suggest clinicians utilize the SCID-5-AMPD-I funnel structure to conduct efficient and individually tailored assessments informed by relevant patient characteristics. Adopting similar non-fixed administration procedures for other interviews could potentially provide similar benefits compared to traditional fixed-form administration procedures. The current study can serve as a template for verifying and evaluating future adoptions of non-fixed administration procedures in other interviews.
Collapse
Affiliation(s)
- Aleksander Heltne
- Department of Research and Innovation, Clinic for Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
- Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Johan Braeken
- Department of Research and Innovation, Clinic for Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
- Centre for Educational Measurement, University of Oslo (CEMO), Oslo, Norway
| | - Benjamin Hummelen
- Department of Research and Innovation, Clinic for Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
| | - Sara Germans Selvik
- Department of Psychiatry, Helse Nord-Trønderlag, Namsos Hospital, Namsos, Norway
- Department of Mental Health, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | | | - Muirne C S Paap
- Department of Research and Innovation, Clinic for Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
- Department of Child and Family Welfare, Faculty of Behavioural and Social Sciences, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
7
|
Wang H, Li L, Zhang P. Gender Differences in Mental Rotational Training Based on Computer Adaptive Tests. Behav Sci (Basel) 2023; 13:719. [PMID: 37753997 PMCID: PMC10525974 DOI: 10.3390/bs13090719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 08/10/2023] [Accepted: 08/28/2023] [Indexed: 09/28/2023] Open
Abstract
Mental rotation tasks have been widely used to assess individuals' spatial cognition and the ability to mentally manipulate objects. This study employed a computerized adaptive training method to investigate the behavioral performance of participants of different genders in mental rotation tasks with different rotation angles before and after training. A total of 44 Chinese university students participated in the experiment, with the experimental group undergoing a five-day mental rotation training program. During the training phase, a three-down/one-up staircase procedure was used to adjust the stimulus levels (response time) based on participants' responses. The results showed that the training had a facilitative effect on the mental rotation ability of both male and female participants, and it was able to eliminate the gender differences in mental rotation performance. Regarding the angles, we observed that the improvement in the angles involved in the training was significantly higher compared to untrained angles. However, no significant differences in improvement were found among the three trained angles. In summary, these findings demonstrate the effectiveness of computerized adaptive training methods in improving mental rotation ability and highlight the influence of gender and angles on learning outcomes.
Collapse
Affiliation(s)
| | | | - Pan Zhang
- Department of Psychology, Hebei Normal University, Shijiazhuang 050024, China; (H.W.); (L.L.)
| |
Collapse
|
8
|
Wen A, Wolitzky-Taylor K, Gibbons RD, Craske M. A randomized controlled trial on using predictive algorithm to adapt level of psychological care for community college students: STAND triaging and adapting to level of care study protocol. Trials 2023; 24:508. [PMID: 37553688 PMCID: PMC10410881 DOI: 10.1186/s13063-023-07441-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 06/08/2023] [Indexed: 08/10/2023] Open
Abstract
BACKGROUND There is growing interest in using personalized mental health care to treat disorders like depression and anxiety to improve treatment engagement and efficacy. This randomized controlled trial will compare a traditional symptom severity decision-making algorithm to a novel multivariate decision-making algorithm for triage to and adaptation of mental health care. The stratified levels of care include a self-guided online wellness program, coach-guided online cognitive behavioral therapy, and clinician-delivered psychotherapy with or without pharmacotherapy. The novel multivariate algorithm will be comprised of baseline (for triage and adaptation) and time-varying variables (for adaptation) in four areas: social determinants of mental health, early adversity and life stressors, predisposing, enabling, and need influences on health service use, and comprehensive mental health status. The overarching goal is to evaluate whether the multivariate algorithm improves adherence to treatment, symptoms, and functioning above and beyond the symptom-based algorithm. METHODS/DESIGN This trial will recruit a total of 1000 participants over the course of 5 years in the greater Los Angeles Metropolitan Area. Participants will be recruited from a highly diverse sample of community college students. For the symptom severity approach, initial triaging to level of care will be based on symptom severity, whereas for the multivariate approach, the triaging will be based on a comprehensive set of baseline measures. After the initial triaging, level of care will be adapted throughout the duration of the treatment, utilizing either symptom severity or multivariate statistical approaches. Participants will complete computerized assessments and self-report questionnaires at baseline and up to 40 weeks. The multivariate decision-making algorithm will be updated annually to improve predictive outcomes. DISCUSSION Results will provide a comparison on the traditional symptom severity decision-making and the novel multivariate decision-making with respect to treatment adherence, symptom improvement, and functional recovery. Moreover, the developed multivariate decision-making algorithms may be used as a template in other community college settings. Ultimately, findings will inform the practice of level of care triage and adaptation in psychological treatments, as well as the use of personalized mental health care broadly. TRIAL REGISTRATION ClinicalTrials.gov NCT05591937, submitted August 2022, published October 2022.
Collapse
Affiliation(s)
- Alainna Wen
- Department of Psychiatry and Biobehavioral Sciences, University of California - Los Angeles, 760 Westwood Plaza, Suite 28-216, CA, 90024, Los Angeles, USA
| | - Kate Wolitzky-Taylor
- Department of Psychiatry and Biobehavioral Sciences, University of California - Los Angeles, 760 Westwood Plaza, Suite 28-216, CA, 90024, Los Angeles, USA
| | - Robert D Gibbons
- Center for Health Statistics, University of Chicago, 5841 S. Maryland Avenue MC 2007, Office W260, Chicago, IL, 60637, USA
| | - Michelle Craske
- Department of Psychiatry and Biobehavioral Sciences, University of California - Los Angeles, 760 Westwood Plaza, Suite 28-216, CA, 90024, Los Angeles, USA.
- Department of Psychology, University of California - Los Angeles, 1285 Franz Hall, Box 951563, Los Angeles, CA, USA.
| |
Collapse
|
9
|
Tian C, Choi J. The Impact of Item Model Parameter Variations on Person Parameter Estimation in Computerized Adaptive Testing With Automatically Generated Items. APPLIED PSYCHOLOGICAL MEASUREMENT 2023; 47:275-290. [PMID: 37283592 PMCID: PMC10240571 DOI: 10.1177/01466216231165313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Sibling items developed through automatic item generation share similar but not identical psychometric properties. However, considering sibling item variations may bring huge computation difficulties and little improvement on scoring. Assuming identical characteristics among siblings, this study explores the impact of item model parameter variations (i.e., within-family variation between siblings) on person parameter estimation in linear tests and Computerized Adaptive Testing (CAT). Specifically, we explore (1) what if small/medium/large within-family variance is ignored, (2) if the effect of larger within-model variance can be compensated by greater test length, (3) if the item model pool properties affect the impact of within-family variance on scoring, and (4) if the issues in (1) and (2) are different in linear vs. adaptive testing. Related sibling model is used for data generation and identical sibling model is assumed for scoring. Manipulated factors include test length, the size of within-model variation, and item model pool characteristics. Results show that as within-family variance increases, the standard error of scores remains at similar levels. For correlations between true and estimated score and RMSE, the effect of the larger within-model variance was compensated by test length. For bias, scores are biased towards the center, and bias was not compensated by test length. Despite the within-family variation is random in current simulations, to yield less biased ability estimates, the item model pool should provide balanced opportunities such that "fake-easy" and "fake-difficult" item instances cancel their effects. The results of CAT are similar to that of linear tests, except for higher efficiency.
Collapse
Affiliation(s)
- Chen Tian
- Department of Human Development and Quantitative Methodology, University of Maryland, College Park, MD, USA
| | - Jaehwa Choi
- Department of Educational Leadership, The George Washington University, Washington, DC, USA
| |
Collapse
|
10
|
Dong F, Moore TM, Westfall M, Kohler C, Calkins ME. Development of empirically derived brief program evaluation measures in Pennsylvania first-episode psychosis coordinated specialty care programs. Early Interv Psychiatry 2023; 17:96-106. [PMID: 35343055 DOI: 10.1111/eip.13298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 02/17/2022] [Accepted: 03/13/2022] [Indexed: 01/21/2023]
Abstract
AIM The Pennsylvania first episode psychosis program evaluation (PA-FEP-PE) core assessment battery was developed as a standard and comprehensive clinical assessment and data collection tool in Pennsylvania coordinated specialty care programs (CSC). To reduce administrative time and maximize clinical utility by maintaining acceptable levels of precision, we aimed to generate a short form using item response theory (IRT)-based computer-adaptive test (CAT) simulation and analyse the implementation and acceptability of the short form among providers from PA-CSC. METHODS FEP participants (n = 759; age 14-36) from nine coordinated specialty care programs completed 156 items drawn from the PA-FEP-PE battery. Multidimensional IRT-based CAT simulations were used to select the best PA-FEP-PE items for abbreviated forms. RESULTS A 67-item PA-FEP-PE short form was developed to capture six factors: (1) positive affect and surgency (with negative loadings on Anxious-Misery items); (2) psychiatric services satisfaction; (3) antipsychotic side effect severity; (4) family turmoil and associated traumas; (5) trauma load; and (6) psychosis. The total number of items was reduced more than 50% in the PA-FEP-PE shortened forms. The short form demonstrated good psychometric properties, and it was well accepted by our providers in the implementation. CONCLUSIONS The empirical derivation and implementation of abbreviated measures of key domains and constructs in FEP care have streamlined and facilitated PA-FEP program evaluation. Our work supports potential application of IRT based methods to empirically reduce core assessment battery measures in large-scale data collection efforts such as in the Early Psychosis Intervention Network.
Collapse
Affiliation(s)
- Fanghong Dong
- School of Nursing, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Tyler M Moore
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Lifespan Brain Institute, Penn Medicine and Children's Hospital of Philadelphia (CHOP), Philadelphia, Pennsylvania, USA
| | - Megan Westfall
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Christian Kohler
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Monica E Calkins
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Lifespan Brain Institute, Penn Medicine and Children's Hospital of Philadelphia (CHOP), Philadelphia, Pennsylvania, USA
| |
Collapse
|
11
|
Sun X, Gao Y, Xin T, Song N. Binary Restrictive Threshold Method for Item Exposure Control in Cognitive Diagnostic Computerized Adaptive Testing. Front Psychol 2021; 12:517155. [PMID: 34421694 PMCID: PMC8374050 DOI: 10.3389/fpsyg.2021.517155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 03/30/2021] [Indexed: 11/24/2022] Open
Abstract
Although classification accuracy is a critical issue in cognitive diagnostic computerized adaptive testing, attention has increasingly shifted to item exposure control to ensure test security. In this study, we developed the binary restrictive threshold (BRT) method to balance measurement accuracy and item exposure. In addition, a simulation study was conducted to evaluate its performance. The results indicated that the BRT method performed better than the restrictive progressive (RP) and stratified dynamic binary searching (SDBS) approaches but worse than the restrictive threshold (RT) method in terms of classification accuracy. With respect to item exposure control, the BRT method exhibited noticeably stronger performance compared with the RT method, even though its performance was not as high as that of the RP and SDBS methods.
Collapse
Affiliation(s)
- Xiaojian Sun
- School of Mathematics and Statistics, Southwest University, Chongqing, China.,Southwest University Branch, Collaborative Innovation Center of Assessment for Basic Education Quality, Chongqing, China
| | - Yizhu Gao
- Faculty of Education, University of Alberta, Edmonton, AB, Canada
| | - Tao Xin
- Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University, Beijing, China
| | - Naiqing Song
- School of Mathematics and Statistics, Southwest University, Chongqing, China.,Southwest University Branch, Collaborative Innovation Center of Assessment for Basic Education Quality, Chongqing, China
| |
Collapse
|
12
|
Item Parameter Estimation in Multistage Designs: A Comparison of Different Estimation Approaches for the Rasch Model. PSYCH 2021. [DOI: 10.3390/psych3030022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
There is some debate in the psychometric literature about item parameter estimation in multistage designs. It is occasionally argued that the conditional maximum likelihood (CML) method is superior to the marginal maximum likelihood method (MML) because no assumptions have to be made about the trait distribution. However, CML estimation in its original formulation leads to biased item parameter estimates. Zwitser and Maris (2015, Psychometrika) proposed a modified conditional maximum likelihood estimation method for multistage designs that provides practically unbiased item parameter estimates. In this article, the differences between different estimation approaches for multistage designs were investigated in a simulation study. Four different estimation conditions (CML, CML estimation with the consideration of the respective MST design, MML with the assumption of a normal distribution, and MML with log-linear smoothing) were examined using a simulation study, considering different multistage designs, number of items, sample size, and trait distributions. The results showed that in the case of the substantial violation of the normal distribution, the CML method seemed to be preferable to MML estimation employing a misspecified normal trait distribution, especially if the number of items and sample size increased. However, MML estimation using log-linear smoothing lea to results that were very similar to the CML method with the consideration of the respective MST design.
Collapse
|
13
|
He Y, Chen P, Li Y. Maximum information per time unit designs for continuous online item calibration. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2021; 74 Suppl 1:24-51. [PMID: 33047302 DOI: 10.1111/bmsp.12221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2019] [Revised: 08/21/2020] [Indexed: 06/11/2023]
Abstract
Previous designs for online calibration have only considered examinees' responses to items. However, the use of response time, a useful metric that can easily be collected by a computer, has not yet been embedded in calibration designs. In this article we utilize response time to optimize the assignment of new items online, and accordingly propose two new adaptive designs. These are the D-optimal per expectation time unit design (D-ET) and the D-optimal per time unit design (D-T). The former method uses the conditional maximum likelihood estimation (CMLE) method to estimate the expected response times, while the latter employs the nonparametric k-nearest-neighbour method to predict the response times. Simulations were conducted to compare the two new designs with the D-optimal online calibration design (D design) in the context of continuous online calibration. In addition, a preliminary study was carried out to evaluate the performance of CMLE prior to its application in D-ET. The results showed that, compared to the D design, the D-ET and D-T designs saved response time and accrued larger calibration information per time unit, without sacrificing item calibration precision.
Collapse
Affiliation(s)
- Yinhong He
- Nanjing University of Information Science and Technology, China
| | | | - Yong Li
- Beijing Normal University, China
| |
Collapse
|
14
|
Li J, Ma L, Zeng P, Kang C. New Item Selection Method Accommodating Practical Constraints in Cognitive Diagnostic Computerized Adaptive Testing: Maximum Deviation and Maximum Limitation Global Discrimination Indexes. Front Psychol 2021; 12:619771. [PMID: 34079491 PMCID: PMC8165177 DOI: 10.3389/fpsyg.2021.619771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 03/26/2021] [Indexed: 11/13/2022] Open
Abstract
Maximum deviation global discrimination index (MDGDI) is a new item selection method for cognitive diagnostic computerized adaptive testing that allows for attribute coverage balance. We developed the maximum limitation global discrimination index (MLGDI) from MDGDI, which allows for both attribute coverage balance and item exposure control. MLGDI can realize the attribute coverage balance and exposure control of the item. Our simulation study aimed to evaluate the performance of our new method against maximum global discrimination index (GDI), modified maximum GDI (MMGDI), standardized weighted deviation GDI (SWDGDI), and constraint progressive with SWDGDI (CP_SWDGDI). The results indicated that (1a) under the condition of realizing the attribute coverage balance, MDGDI had the highest attribute classification accuracy; (1b) when the selection strategy accommodated the practical constraints of the attribute coverage balance and item exposure control, MLGDI had the highest attribute classification accuracy; (2) adding the item exposure control mechanism to the item selection method reduces the classification accuracy of the attributes of the item selection method; and (3) compared with GDI, MMGDI, SWDGDI, CP_SWDGDI, and MDGDI, MLGDI can better achieve the attribute-coverage requirement, control item exposure rate, and attribute correct classification rate.
Collapse
Affiliation(s)
- Junjie Li
- Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province, Zhejiang Normal University, Jinhua, China
| | - Lihua Ma
- Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province, Zhejiang Normal University, Jinhua, China
| | - Pingfei Zeng
- Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province, Zhejiang Normal University, Jinhua, China
| | - Chunhua Kang
- Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province, Zhejiang Normal University, Jinhua, China
| |
Collapse
|
15
|
Lin Z, Chen P, Xin T. The Block Item Pocket Method for Reviewable Multidimensional Computerized Adaptive Testing. APPLIED PSYCHOLOGICAL MEASUREMENT 2021; 45:22-36. [PMID: 33304019 PMCID: PMC7711249 DOI: 10.1177/0146621620947177] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Most computerized adaptive testing (CAT) programs do not allow item review due to a decrease in estimation precision and aberrant manipulation strategies. In this article, a block item pocket (BIP) method that combines the item pocket method with the successive block method to realize reviewable CAT was proposed. A worst-case but still reasonable answering strategy and the Wainer-like manipulation strategy were simulated to evaluate the estimation precision of reviewable unidimensional computerized adaptive testing (UCAT) and multidimensional computerized adaptive testing (MCAT) under a series of BIP settings. For both UCAT and MCAT, it was found that the estimation precision of the BIP method improved as the number of blocks increased or the item pocket size decreased under the reasonable strategy. The BIP method was more effective in handling the Wainer-like strategy. With the help of block design, the BIP method can still maintain acceptable estimation precision under slightly large total IP size conditions. These results suggested that the BIP method was a reliable solution for both reviewable UCAT and MCAT.
Collapse
Affiliation(s)
- Zhe Lin
- Beijing Normal University, China
| | | | - Tao Xin
- Beijing Normal University, China
| |
Collapse
|
16
|
Moore TM, Butler ER, Scott JC, Port AM, Ruparel K, Njokweni LJ, Gur RE, Gur RC. When CAT is not an option: complementary methods of test abbreviation for neurocognitive batteries. Cogn Neuropsychiatry 2021; 26:35-54. [PMID: 33308027 PMCID: PMC7855518 DOI: 10.1080/13546805.2020.1859360] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
INTRODUCTION There is an obvious need for efficient measurement of neuropsychiatric phenomena. A proven method-computerized adaptive testing (CAT)-is not feasible for all tests, necessitating alternatives for increasing test efficiency. METHODS We combined/compared two methods for abbreviating rapid tests using two tests unamenable to CAT (a Continuous Performance Test [CPT] and n-back test [NBACK]). N=9,498 (mean age 14.2 years; 52% female) were administered the tests, and abbreviation was accomplished using methods answering two questions: what happens to measurement error as items are removed, and what happens to correlations with validity criteria as items are removed. The first was investigated using quasi-CAT simulation, while the second was investigated using bootstrapped confidence intervals around full-form-short-form comparisons. RESULTS Results for the two methods overlapped, suggesting that the CPT could be abbreviated to 57% of original and NBACK could be abbreviated to 87% of original with the max-acceptable loss of precision and min-acceptable relationships with validity criteria. CONCLUSIONS This method combination shows promise for use in other test types, and the divergent results for the CPT/NBACK demonstrate the methods' abilities to detect when a test should not be shortened. The methods should be used in combination because they emphasize complementary measurement qualities: precision/validity..
Collapse
Affiliation(s)
- Tyler M. Moore
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA,Correspondence concerning this article should be addressed to Tyler M. Moore, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Office B502, Philadelphia, PA 19104.
| | - Ellyn R. Butler
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - J. Cobb Scott
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA,VISN4 Mental Illness Research, Education, and Clinical Center at the Philadelphia VA Medical Center, Philadelphia, PA, 19104, USA
| | - Allison M. Port
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kosha Ruparel
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Lucky J. Njokweni
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Raquel E. Gur
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Ruben C. Gur
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA,VISN4 Mental Illness Research, Education, and Clinical Center at the Philadelphia VA Medical Center, Philadelphia, PA, 19104, USA
| |
Collapse
|
17
|
Sun X, Liu Y, Xin T, Song N. The Impact of Item Calibration Error on Variable-Length Cognitive Diagnostic Computerized Adaptive Testing. Front Psychol 2020; 11:575141. [PMID: 33343450 PMCID: PMC7738350 DOI: 10.3389/fpsyg.2020.575141] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 11/04/2020] [Indexed: 11/13/2022] Open
Abstract
Calibration errors are inevitable and should not be ignored during the estimation of item parameters. Items with calibration error can affect the measurement results of tests. One of the purposes of the current study is to investigate the impacts of the calibration errors during the estimation of item parameters on the measurement accuracy, average test length, and test efficiency for variable-length cognitive diagnostic computerized adaptive testing. The other purpose is to examine the methods for reducing the adverse effects of calibration errors. Simulation results show that (1) calibration error has negative effect on the measurement accuracy for the deterministic input, noisy "and" gate (DINA) model, and the reduced reparameterized unified model; (2) the average test lengths is shorter, and the test efficiency is overestimated for items with calibration errors; (3) the compensatory reparameterized unified model (CRUM) is less affected by the calibration errors, and the classification accuracy, average test length, and test efficiency are slightly stable in the CRUM framework; (4) methods such as improving the quality of items, using large calibration sample to calibrate the parameters of items, as well as using cross-validation method can reduce the adverse effects of calibration errors on CD-CAT.
Collapse
Affiliation(s)
- Xiaojian Sun
- School of Mathematics and Statistics, Southwest University, Chongqing, China.,Southwest University Branch, Collaborative Innovation Center of Assessment for Basic Education Quality, Chongqing, China
| | - Yanlou Liu
- China Academy of Big Data for Education, Qufu Normal University, Qufu, China
| | - Tao Xin
- Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University, Beijing, China
| | - Naiqing Song
- School of Mathematics and Statistics, Southwest University, Chongqing, China.,Southwest University Branch, Collaborative Innovation Center of Assessment for Basic Education Quality, Chongqing, China
| |
Collapse
|
18
|
Escala de Desesperanza de Beck (BHS): ventajas de una administración adaptativa. REVISTA IBEROAMERICANA DE PSICOLOGÍA 2020. [DOI: 10.33881/2027-1786.rip.14106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
La escala de Desesperanza de Beck (BHS) es usada con frecuencia como screening para la detección de riesgo suicida y/o depresión aun cuando, por su extensión de 20 ítems, resulta poco eficiente. En este trabajo se analiza si una estrategia de administración adaptativa podría abreviar el tiempo de aplicación de la BHS. Participaron 783 individuos de población general (50.9% mujeres). Se seleccionó aleatoriamente un 70% de los casos para calibrar los ítems con el Modelo Logístico de 2 parámetros de la Teoría de Respuesta al Ítem. Se eliminaron dos elementos que presentaron funcionamiento inadecuado. El 30% restante de la muestra se utilizó para simular una administración adaptativa de los 18 ítems calibrados. Se compararon dos modalidades de interrupción: a) al administrar 9 ítems y b) al alcanzar un error de estimación ≤ 0.35 o administrar 9 ítems (criterio mixto). Bajo ambas condiciones se registraron correlaciones de .95 con el nivel de Desesperanza estimado a partir de los 18 ítems. No obstante, la interrupción basada en el criterio mixto no mostró un impacto adicional en la eficiencia de la medida. Al igual que la versión de 18 ítems, las aplicaciones adaptativas estimaron con más precisión los niveles elevados del rasgo. La medición adaptativa no afectó las evidencias de validez al estudiar la asociación del rasgo con facetas del Neuroticismo y dimensiones sintomatológicas. Se concluye que una administración adaptativa de 9 ítems puede abreviar considerablemente la BHS sin perjudicar la validez y confiabilidad de la medida.
Collapse
|
19
|
Adaptive testing with the GGUM-RANK multidimensional forced choice model: Comparison of pair, triplet, and tetrad scoring. Behav Res Methods 2020; 52:761-772. [PMID: 31342469 DOI: 10.3758/s13428-019-01274-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Likert-type measures have been criticized in psychological assessment because they are vulnerable to response biases, including central tendency, acquiescence, leniency, halo, and socially desirable responding. As an alternative, multidimensional forced choice (MFC) testing has been proposed to address these concerns. A number of researchers have developed item response theory (IRT) models for MFC data and have examined latent trait estimation with tests of different dimensionality and length. Research has also explored the advantages of computerized adaptive testing (CAT) with MFC pair tests having as many as 25 dimensions, but there have been no published studies on CAT with MFC triplets or tetrads. Thus, in this research we aimed to address that issue. We used recently developed item information functions for an MFC ranking model to compare the benefits of CAT with MFC pair, triplet, and tetrad tests. A simulation study showed that CAT substantially outperformed nonadaptive testing for latent trait estimation across MFC formats. More importantly, CAT with MFC pairs provided estimation accuracy similar to or better than that from tests of equivalent numbers of nonadaptive MFC triplets. On the basis of these findings, implications and recommendations are further discussed for constructing MFC measures to use in psychological contexts.
Collapse
|
20
|
Raudenbush SW, Hernandez M, Goldin-Meadow S, Carrazza C, Foley A, Leslie D, Sorkin JE, Levine SC. Longitudinally adaptive assessment and instruction increase numerical skills of preschool children. Proc Natl Acad Sci U S A 2020; 117:27945-27953. [PMID: 33106414 PMCID: PMC7668039 DOI: 10.1073/pnas.2002883117] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Social inequality in mathematical skill is apparent at kindergarten entry and persists during elementary school. To level the playing field, we trained teachers to assess children's numerical and spatial skills every 10 wk. Each assessment provided teachers with information about a child's growth trajectory on each skill, information designed to help them evaluate their students' progress, reflect on past instruction, and strategize for the next phase of instruction. A key constraint is that teachers have limited time to assess individual students. To maximize the information provided by an assessment, we adapted the difficulty of each assessment based on each child's age and accumulated evidence about the child's skills. Children in classrooms of 24 trained teachers scored 0.29 SD higher on numerical skills at posttest than children in 25 randomly assigned control classrooms (P = 0.005). We observed no effect on spatial skills. The intervention also positively influenced children's verbal comprehension skills (0.28 SD higher at posttest, P < 0.001), but did not affect their print-literacy skills. We consider the potential contribution of this approach, in combination with similar regimes of assessment and instruction in elementary schools, to the reduction of social inequality in numerical skill and discuss possible explanations for the absence of an effect on spatial skills.
Collapse
Affiliation(s)
- Stephen W Raudenbush
- Department of Sociology, University of Chicago, Chicago, IL 60637;
- Harris School of Public Policy, University of Chicago, Chicago, IL 60637
- Committee on Education, University of Chicago, Chicago, IL 60637
| | - Marc Hernandez
- Department of Education and Child Development, National Opinion Research Center, University of Chicago, Chicago, IL 60637
| | - Susan Goldin-Meadow
- Committee on Education, University of Chicago, Chicago, IL 60637
- Department of Psychology, University of Chicago, Chicago, IL 60637
| | - Cristina Carrazza
- Committee on Education, University of Chicago, Chicago, IL 60637
- Department of Psychology, University of Chicago, Chicago, IL 60637
| | - Alana Foley
- Committee on Education, University of Chicago, Chicago, IL 60637
- Department of Psychology, University of Chicago, Chicago, IL 60637
| | - Debbie Leslie
- UChicago STEM Education, University of Chicago, Chicago, IL 60637
| | - Janet E Sorkin
- Committee on Education, University of Chicago, Chicago, IL 60637
- Department of Psychology, University of Chicago, Chicago, IL 60637
| | - Susan C Levine
- Committee on Education, University of Chicago, Chicago, IL 60637
- Department of Psychology, University of Chicago, Chicago, IL 60637
| |
Collapse
|
21
|
Braeken J, Paap MCS. Making Fixed-Precision Between-Item Multidimensional Computerized Adaptive Tests Even Shorter by Reducing the Asymmetry Between Selection and Stopping Rules. APPLIED PSYCHOLOGICAL MEASUREMENT 2020; 44:531-547. [PMID: 34393302 PMCID: PMC7495795 DOI: 10.1177/0146621620932666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Fixed-precision between-item multidimensional computerized adaptive tests (MCATs) are becoming increasingly popular. The current generation of item-selection rules used in these types of MCATs typically optimize a single-valued objective criterion for multivariate precision (e.g., Fisher information volume). In contrast, when all dimensions are of interest, the stopping rule is typically defined in terms of a required fixed marginal precision per dimension. This asymmetry between multivariate precision for selection and marginal precision for stopping, which is not present in unidimensional computerized adaptive tests, has received little attention thus far. In this article, we will discuss this selection-stopping asymmetry and its consequences, and introduce and evaluate three alternative item-selection approaches. These alternatives are computationally inexpensive, easy to communicate and implement, and result in effective fixed-marginal-precision MCATs that are shorter in test length than with the current generation of item-selection approaches.
Collapse
Affiliation(s)
| | - Muirne C. S. Paap
- University of Groningen, The Netherlands
- Oslo University Hospital, Norway
| |
Collapse
|
22
|
Zheng Y, Cheon H, Katz CM. Using Machine Learning Methods to Develop a Short Tree-Based Adaptive Classification Test: Case Study With a High-Dimensional Item Pool and Imbalanced Data. APPLIED PSYCHOLOGICAL MEASUREMENT 2020; 44:499-514. [PMID: 34565931 PMCID: PMC7495791 DOI: 10.1177/0146621620931198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This study explores advanced techniques in machine learning to develop a short tree-based adaptive classification test based on an existing lengthy instrument. A case study was carried out for an assessment of risk for juvenile delinquency. Two unique facts of this case are (a) the items in the original instrument measure a large number of distinctive constructs; (b) the target outcomes are of low prevalence, which renders imbalanced training data. Due to the high dimensionality of the items, traditional item response theory (IRT)-based adaptive testing approaches may not work well, whereas decision trees, which are developed in the machine learning discipline, present as a promising alternative solution for adaptive tests. A cross-validation study was carried out to compare eight tree-based adaptive test constructions with five benchmark methods using data from a sample of 3,975 subjects. The findings reveal that the best-performing tree-based adaptive tests yielded better classification accuracy than the benchmark method IRT scoring with optimal cutpoints, and yielded comparable or better classification accuracy than the best benchmark method, random forest with balanced sampling. The competitive classification accuracy of the tree-based adaptive tests also come with an over 30-fold reduction in the length of the instrument, only administering between 3 to 6 items to any individual. This study suggests that tree-based adaptive tests have an enormous potential when used to shorten instruments that measure a large variety of constructs.
Collapse
Affiliation(s)
- Yi Zheng
- Arizona State University, Tempe, USA
| | | | | |
Collapse
|
23
|
Wang W, Song L, Ding S, Wang T, Gao P, Xiong J. A Semi-supervised Learning Method for Q-Matrix Specification Under the DINA and DINO Model With Independent Structure. Front Psychol 2020; 11:2120. [PMID: 33013538 PMCID: PMC7511573 DOI: 10.3389/fpsyg.2020.02120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 07/30/2020] [Indexed: 12/02/2022] Open
Abstract
Cognitive diagnosis assessment (CDA) can be regarded as a kind of formative assessments because it is intended to promote assessment for learning and modify instruction and learning in classrooms by providing the formative diagnostic information about students' cognitive strengths and weaknesses. CDA has two phases, like a statistical pattern recognition. The first phase is feature generation, followed by classification stage. A Q-matrix, which describes the relationship between items and latent skills, corresponds to the feature generation phase in statistical pattern recognition. Feature generation is of paramount importance in any pattern recognition task. In practice, the Q-matrix is difficult to specify correctly in cognitive diagnosis and misspecification of the Q-matrix can seriously affect the accuracy of the classification of examinees. Based on the fact that any columns of a reduced Q-matrix can be expressed by the columns of a reachability R matrix under the logical OR operation, a semi-supervised learning approach and an optimal design for examinee sampling were proposed for Q-matrix specification under the conjunctive and disjunctive model with independent structure. This method only required subject matter experts specifying a R matrix corresponding to a small part of test items for the independent structure in which the R matrix is an identity matrix. Simulation and real data analysis showed that the new method with the optimal design is promising in terms of correct recovery rates of q-entries.
Collapse
Affiliation(s)
- Wenyi Wang
- School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
| | - Lihong Song
- Elementary Education College, Jiangxi Normal University, Nanchang, China
- *Correspondence: Lihong Song
| | - Shuliang Ding
- School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
| | - Teng Wang
- School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
| | - Peng Gao
- School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
| | - Jian Xiong
- School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
| |
Collapse
|
24
|
Wang D, Cai Y, Tu D. Q-Matrix Estimation Methods for Cognitive Diagnosis Models: Based on Partial Known Q-Matrix. MULTIVARIATE BEHAVIORAL RESEARCH 2020:1-13. [PMID: 32308032 DOI: 10.1080/00273171.2020.1746901] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Different from the item response models that postulate a single underlying proficiency, cognitive diagnostic assessments (CDAs) can provide fine-grained diagnostic information about students' knowledge state to aid classroom instructions. In CDAs, a Q-matrix that associates each item in a test with the cognitive skills is required to infer students' knowledge states. In practice, the Q-matrix is typically performed by domain experts, which is certainly affected by the subjective tendency of experts and, to a large extent, may consist of some misspecifications. In addition, if the number of items increases, the expert-based Q-matrix specification will be time-consuming and costly. To address this concern, this paper proposed several approaches based on the likelihood ratio test to estimate Q-matrix with partial known Q-matrix and the response data, which can be used with a wide class of cognitive diagnosis models (CDMs). The feasibility and effectiveness of the proposed methods were evaluated by simulated data generated under various conditions and an example to real data. Results show that new methods can estimate Q-matrix correctly and outperforms the existing method in most conditions.
Collapse
Affiliation(s)
- Daxun Wang
- School of Psychology, Jiangxi Normal University
| | - Yan Cai
- School of Psychology, Jiangxi Normal University
| | - Dongbo Tu
- School of Psychology, Jiangxi Normal University
| |
Collapse
|
25
|
Wang Y, Sun X, Chong W, Xin T. Attribute Discrimination Index-Based Method to Balance Attribute Coverage for Short-Length Cognitive Diagnostic Computerized Adaptive Testing. Front Psychol 2020; 11:224. [PMID: 32180747 PMCID: PMC7059599 DOI: 10.3389/fpsyg.2020.00224] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2019] [Accepted: 01/31/2020] [Indexed: 11/13/2022] Open
Abstract
We propose a new method that balances attribute coverage for short-length cognitive diagnostic computerized adaptive testing (CD-CAT). The new method uses the attribute discrimination index (ADI-based method) instead of the number of items that measure each attribute [modified global discrimination index (MGDI)-based method] to balance the attribute coverage. Therefore, the information that each attribute provides can be captured. The purpose of the simulation study was to evaluate the performance of the new method, and the results showed the following: (a) Compared with uncontrolled attribute-balance coverage method, the new method produced a higher mastery pattern correct classification rate (PCCR) and attribute correct classification rate (ACCR) with both the posterior-weighted Kullback–Leibler (PWKL) and the modified PWKL (MPWKL) item selection method. (b) Equalization of ACCR (E-ACCR) based on the ADI-based method leads to better results, followed by the MGDI-based method. The uncontrolled method leads to the worst results regardless of item selection methods. (c) Both the ADI-based and MGDI-based methods produced acceptable examinee qualification rates, regardless of item selection methods, although they were relatively low for the uncontrolled condition.
Collapse
Affiliation(s)
- Yutong Wang
- Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University, Beijing, China
| | - Xiaojian Sun
- School of Mathematics and Statistics, Southwest University, Chongqing, China
| | - Weifeng Chong
- Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University, Beijing, China
| | - Tao Xin
- Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University, Beijing, China
| |
Collapse
|
26
|
Wyse AE, McBride JR. A Framework for Measuring the Amount of Adaptation of Rasch‐based Computerized Adaptive Tests. JOURNAL OF EDUCATIONAL MEASUREMENT 2020. [DOI: 10.1111/jedm.12267] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
27
|
Xiong J, Ding S, Luo F, Luo Z. Online Calibration of Polytomous Items Under the Graded Response Model. Front Psychol 2020; 10:3085. [PMID: 32038427 PMCID: PMC6989429 DOI: 10.3389/fpsyg.2019.03085] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 12/30/2019] [Indexed: 12/01/2022] Open
Abstract
Computerized adaptive testing (CAT) is an efficient testing mode, which allows each examinee to answer appropriate items according his or her latent trait level. The implementation of CAT requires a large-scale item pool, and item pool needs to be frequently replenished with new items to ensure test validity and security. Online calibration is a technique to calibrate the parameters of new items in CAT, which seeds new items in the process of answering operational items, and estimates the parameters of new items through the response data of examinees on new items. The most popular estimation methods include one EM cycle method (OEM) and multiple EM cycle method (MEM) under dichotomous item response theory models. This paper extends OEM and MEM to the graded response model (GRM), a popular model for polytomous data with ordered categories. Two simulation studies were carried out to explore online calibration under a variety of conditions, including calibration design, initial item parameter calculation methods, calibration methods, calibration sample size and the number of categories. Results show that the calibration accuracy of new items were acceptable, and which were affected by the interaction of some factors, therefore some conclusions were given.
Collapse
Affiliation(s)
- Jianhua Xiong
- School of Psychology, Jiangxi Normal University, Nanchang, China.,School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
| | - Shuliang Ding
- School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
| | - Fen Luo
- School of Psychology, Jiangxi Normal University, Nanchang, China.,School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
| | - Zhaosheng Luo
- School of Psychology, Jiangxi Normal University, Nanchang, China
| |
Collapse
|
28
|
A method of Q-matrix validation for polytomous response cognitive diagnosis model based on relative fit statistics. ACTA PSYCHOLOGICA SINICA 2020. [DOI: 10.3724/sp.j.1041.2020.00093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
29
|
Choe EM, Chang HH. The Asymptotic Distribution of Average Test Overlap Rate in Computerized Adaptive Testing. PSYCHOMETRIKA 2019; 84:1129-1151. [PMID: 31264029 DOI: 10.1007/s11336-019-09674-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 06/10/2019] [Indexed: 06/09/2023]
Abstract
The average test overlap rate is often computed and reported as a measure of test security risk or item pool utilization of a computerized adaptive test (CAT). Despite the prevalent use of this sample statistic in both literature and operations, its sampling distribution has never been known nor studied in earnest. In response, a proof is presented for the asymptotic distribution of a linear transformation of the average test overlap rate in fixed-length CAT. The theoretical results enable the estimation of standard error and construction of confidence intervals. Moreover, a practical simulation study demonstrates the statistical comparison of average test overlap rates between two CAT designs with different exposure control methods.
Collapse
Affiliation(s)
- Edison M Choe
- Graduate Management Admission Council™ (GMAC™), 11921 Freedom Drive, Suite 300, Reston, VA , 20190, USA.
| | - Hua-Hua Chang
- Purdue University, 100 N. University Street, West Lafayette, IN, 47907, USA
| |
Collapse
|
30
|
Wang W, Song L, Chen P, Ding S. An Item‐Level Expected Classification Accuracy and Its Applications in Cognitive Diagnostic Assessment. JOURNAL OF EDUCATIONAL MEASUREMENT 2019. [DOI: 10.1111/jedm.12200] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
31
|
Collares CF, Cecilio-Fernandes D. When I say … computerised adaptive testing. MEDICAL EDUCATION 2019; 53:115-116. [PMID: 30125393 DOI: 10.1111/medu.13648] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 04/26/2018] [Accepted: 05/30/2018] [Indexed: 06/08/2023]
Affiliation(s)
- Carlos Fernando Collares
- Department of Educational Development and Research, School of Health Professions Education, Faculty of Health, Medicine and Life Sciences, Maastricht University, Mastricht, the Netherlands
- European Board of Medical Assessors, Maastricht, the Netherlands
| | - Dario Cecilio-Fernandes
- Centre for Education Development and Research in Health Professions (CEDAR), University Medical Centre Groningen, University of Groningen, Groningen, the Netherlands
- Werkgroep Interuniversitaire Voortgangstoets Geneeskunde, the Netherlands
| |
Collapse
|
32
|
Wang W, Kingston N. Adaptive Testing With a Hierarchical Item Response Theory Model. APPLIED PSYCHOLOGICAL MEASUREMENT 2019; 43:51-67. [PMID: 30573934 PMCID: PMC6297916 DOI: 10.1177/0146621618765714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The hierarchical item response theory (H-IRT) model is very flexible and allows a general factor and subfactors within an overall structure of two or more levels. When an H-IRT model with a large number of dimensions is used for an adaptive test, the computational burden associated with interim scoring and selection of subsequent items is heavy. An alternative approach for any high-dimension adaptive test is to reduce dimensionality for interim scoring and item selection and then revert to full dimensionality for final score reporting, thereby significantly reducing the computational burden. This study compared the accuracy and efficiency of final scoring for multidimensional, local multidimensional, and unidimensional item selection and interim scoring methods, using both simulated and real item pools. The simulation study was conducted under 10 conditions (i.e., five test lengths and two H-IRT models) with a simulated sample of 10,000 students. The study with the real item pool was conducted using item parameters from an actual 45-item adaptive test with a simulated sample of 10,000 students. Results indicate that the theta estimations provided by the local multidimensional and unidimensional item selection and interim scoring methods were relatively as accurate as the theta estimation provided by the multidimensional item selection and interim scoring method, especially during the real item pool study. In addition, the multidimensional method required the longest computation time and the unidimensional method required the shortest computation time.
Collapse
|
33
|
Wang W, Song L, Ding S, Meng Y, Cao C, Jie Y. An EM-Based Method for Q-Matrix Validation. APPLIED PSYCHOLOGICAL MEASUREMENT 2018; 42:446-459. [PMID: 30787487 PMCID: PMC6373855 DOI: 10.1177/0146621617752991] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
With the purpose to assist the subject matter experts in specifying their Q-matrices, the authors used expectation-maximization (EM)-based algorithm to investigate three alternative Q-matrix validation methods, namely, the maximum likelihood estimation (MLE), the marginal maximum likelihood estimation (MMLE), and the intersection and difference (ID) method. Their efficiency was compared, respectively, with that of the sequential EM-based δ method and its extension (ς2), the γ method, and the nonparametric method in terms of correct recovery rate, true negative rate, and true positive rate under the deterministic-inputs, noisy "and" gate (DINA) model and the reduced reparameterized unified model (rRUM). Simulation results showed that for the rRUM, the MLE performed better for low-quality tests, whereas the MMLE worked better for high-quality tests. For the DINA model, the ID method tended to produce better quality Q-matrix estimates than other methods for large sample sizes (i.e., 500 or 1,000). In addition, the Q-matrix was more precisely estimated under the discrete uniform distribution than under the multivariate normal threshold model for all the above methods. On average, the ς2 and ID method with higher true negative rates are better for correcting misspecified Q-entries, whereas the MLE with higher true positive rates is better for retaining the correct Q-entries. Experiment results on real data set confirmed the effectiveness of the MLE.
Collapse
Affiliation(s)
- Wenyi Wang
- Jiangxi Normal University, Jiangxi,
China
| | | | | | - Yaru Meng
- Xi’an Jiaotong University, Shaanxi,
China
| | - Canxi Cao
- University of Illinois at
Urbana–Champaign, IL, USA
| | - Yongjing Jie
- University of Illinois at
Urbana–Champaign, IL, USA
| |
Collapse
|
34
|
Choe EM, Zhang J, Chang HH. Sequential Detection of Compromised Items Using Response Times in Computerized Adaptive Testing. PSYCHOMETRIKA 2018; 83:650-673. [PMID: 29168039 DOI: 10.1007/s11336-017-9596-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Indexed: 05/28/2023]
Abstract
Item compromise persists in undermining the integrity of testing, even secure administrations of computerized adaptive testing (CAT) with sophisticated item exposure controls. In ongoing efforts to tackle this perennial security issue in CAT, a couple of recent studies investigated sequential procedures for detecting compromised items, in which a significant increase in the proportion of correct responses for each item in the pool is monitored in real time using moving averages. In addition to actual responses, response times are valuable information with tremendous potential to reveal items that may have been leaked. Specifically, examinees that have preknowledge of an item would likely respond more quickly to it than those who do not. Therefore, the current study proposes several augmented methods for the detection of compromised items, all involving simultaneous monitoring of changes in both the proportion correct and average response time for every item using various moving average strategies. Simulation results with an operational item pool indicate that, compared to the analysis of responses alone, utilizing response times can afford marked improvements in detection power with fewer false positives.
Collapse
Affiliation(s)
- Edison M Choe
- Graduate Management Admission Council® (GMAC®), 11921 Freedom Drive, Suite 300, Reston, VA, 20190, USA.
| | - Jinming Zhang
- University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
| | - Hua-Hua Chang
- University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
| |
Collapse
|
35
|
Bolt DM, Kim JS. Parameter Invariance and Skill Attribute Continuity in the DINA Model. JOURNAL OF EDUCATIONAL MEASUREMENT 2018. [DOI: 10.1111/jedm.12175] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
36
|
Moore TM, Calkins ME, Reise SP, Gur RC, Gur RE. Development and public release of a computerized adaptive (CAT) version of the Schizotypal Personality Questionnaire. Psychiatry Res 2018; 263:250-256. [PMID: 29625786 PMCID: PMC5911247 DOI: 10.1016/j.psychres.2018.02.022] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 02/02/2018] [Accepted: 02/02/2018] [Indexed: 12/15/2022]
Abstract
One of the most widely used measures of psychosis-related symptoms and characteristics is the 74-item Schizotypal Personality Questionnaire (SPQ). Using multidimensional Item Response Theory (bifactor 2-parameter model), we calibrated SPQ items in a sample of 375 youths aged 9-24 years and constructed a fully functional computerized adaptive form of the SPQ on an open-source platform for public use. To assess validity, we used the above parameters to simulate CAT sessions in a separate validation sample (N = 100) using three test-length-based stopping rules: 8 items, 16 items, and 32 items. Those scores were then compared to full-form and SPQ-Brief scores on their abilities to predict psychosis or clinical risk status. Areas under the receiver operating characteristic curves indicated mediocre predictive ability, but did not differ among any of the forms, even when only eight adaptive items were administered. The Youden index for the 16-item adaptive version was higher than that for the 22-item SPQ-Brief. Classification accuracy for the full SPQ was 73% compared to 66% for the both the SPQ-Brief and adaptive versions (average of three stopping rules). The SPQ-CAT shows promise as a much shorter but valid assessment of schizotypy which can save time with minimal loss of information.
Collapse
Affiliation(s)
- Tyler M. Moore
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA,Correspondence to: Tyler M. Moore, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce St. – 10 Floor Gates Pavilion, Philadelphia, PA 19104.
| | - Monica E. Calkins
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Steven P. Reise
- Department of Psychology, University of California, Los Angeles, CA, 90095, USA
| | - Ruben C. Gur
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA,VISN4 Mental Illness Research, Education, and Clinical Center at the Philadelphia VA Medical Center, Philadelphia, PA, 19104, USA
| | - Raquel E. Gur
- Department of Psychiatry, Brain Behavior Laboratory, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| |
Collapse
|
37
|
Zheng C, Wang C. Application of Binary Searching for Item Exposure Control in Cognitive Diagnostic Computerized Adaptive Testing. APPLIED PSYCHOLOGICAL MEASUREMENT 2017; 41:561-576. [PMID: 29881106 PMCID: PMC5978473 DOI: 10.1177/0146621617707509] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Cognitive diagnosis has emerged as a new generation of testing theory for educational assessment after the item response theory (IRT). One distinct feature of cognitive diagnostic models (CDMs) is that they assume the latent trait to be discrete instead of continuous as in IRT. From this perspective, cognitive diagnosis bears a close resemblance to searching problems in computer science and, similarly, item selection problem in cognitive diagnostic computerized adaptive testing (CD-CAT) can be considered as a dynamic searching problem. Previously, item selection algorithms in CD-CAT were developed from information indices in information science and attempted to achieve a balance among several objectives by assigning different weights. As a result, they suffered from low efficiency from a tug-of-war competition among multiple goals in item selection and, at the same time, put an undue responsibility of assigning the weights for these goals by trial and error on users. Based on the searching problem perspective on CD-CAT, this article adapts the binary searching algorithm, one of the most well-known searching algorithms in searching problems, to item selection in CD-CAT. The two new methods, the stratified dynamic binary searching (SDBS) algorithm for fixed-length CD-CAT and the dynamic binary searching (DBS) algorithm for variable-length CD-CAT, can achieve multiple goals without any of the aforementioned issues. The simulation studies indicate their performances are comparable or superior to the previous methods.
Collapse
Affiliation(s)
| | - Chun Wang
- University of Minnesota, Minneapolis,
MN, USA
| |
Collapse
|
38
|
He Y, Chen P, Li Y, Zhang S. A New Online Calibration Method Based on Lord's Bias-Correction. APPLIED PSYCHOLOGICAL MEASUREMENT 2017; 41:456-471. [PMID: 29882532 PMCID: PMC5978521 DOI: 10.1177/0146621617697958] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Online calibration technique has been widely employed to calibrate new items due to its advantages. Method A is the simplest online calibration method and has attracted many attentions from researchers recently. However, a key assumption of Method A is that it treats person-parameter estimates θ ^ s (obtained by maximum likelihood estimation [MLE]) as their true values θ s , thus the deviation of the estimated θ ^ s from their true values might yield inaccurate item calibration when the deviation is nonignorable. To improve the performance of Method A, a new method, MLE-LBCI-Method A, is proposed. This new method combines a modified Lord's bias-correction method (named as maximum likelihood estimation-Lord's bias-correction with iteration [MLE-LBCI]) with the original Method A in an effort to correct the deviation of θ ^ s which may adversely affect the item calibration precision. Two simulation studies were carried out to explore the performance of both MLE-LBCI and MLE-LBCI-Method A under several scenarios. Simulation results showed that MLE-LBCI could make a significant improvement over the ML ability estimates, and MLE-LBCI-Method A did outperform Method A in almost all experimental conditions.
Collapse
Affiliation(s)
| | | | - Yong Li
- Beijing Normal University, China
| | | |
Collapse
|
39
|
Park R, Kim J, Chung H, Dodd BG. The Development of MST Test Information for the Prediction of Test Performances. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2017; 77:570-586. [PMID: 30034020 PMCID: PMC5991792 DOI: 10.1177/0013164416662960] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The current study proposes novel methods to predict multistage testing (MST) performance without conducting simulations. This method, called MST test information, is based on analytic derivation of standard errors of ability estimates across theta levels. We compared standard errors derived analytically to the simulation results to demonstrate the validity of the proposed method in both measurement precision and classification accuracy. The results indicate that the MST test information effectively predicted the performance of MST. In addition, the results of the current study highlighted the relationship among the test construction, MST design factors, and MST performance.
Collapse
Affiliation(s)
| | - Jiseon Kim
- University of Washington, Seattle, WA, USA
| | - Hyewon Chung
- Chungnam National University, Daejeon, South Korea
| | | |
Collapse
|
40
|
El-Alfy ESM. Evaluation of sequential adaptive testing with real-data simulation: A case study. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2017. [DOI: 10.3233/jifs-169241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
41
|
Wang C, Song T, Wang Z, Wolfe E. Essay Selection Methods for Adaptive Rater Monitoring. APPLIED PSYCHOLOGICAL MEASUREMENT 2017; 41:60-79. [PMID: 29881078 PMCID: PMC5978486 DOI: 10.1177/0146621616672855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Constructed-response items are commonly used in educational and psychological testing, and the answers to those items are typically scored by human raters. In the current rater monitoring processes, validity scoring is used to ensure that the scores assigned by raters do not deviate severely from the standards of rating quality. In this article, an adaptive rater monitoring approach that may potentially improve the efficiency of current rater monitoring practice is proposed. Based on the Rasch partial credit model and known development in multidimensional computerized adaptive testing, two essay selection methods-namely, the D-optimal method and the Single Fisher information method-are proposed. These two methods intend to select the most appropriate essays based on what is already known about a rater's performance. Simulation studies, using a simulated essay bank and a cloned real essay bank, show that the proposed adaptive rater monitoring methods can recover rater parameters with much fewer essay questions. Future challenges and potential solutions are discussed in the end.
Collapse
Affiliation(s)
- Chun Wang
- University of Minnesota, Minneapolis, MN, USA
| | | | | | - Edward Wolfe
- Educational Testing Service (ETS), Princeton, NJ, USA
| |
Collapse
|
42
|
Mahalingam V, Palkovics M, Kosinski M, Cek I, Stillwell D. A Computer Adaptive Measure of Delay Discounting. Assessment 2016; 25:1036-1055. [PMID: 27886981 DOI: 10.1177/1073191116680448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Delay discounting has been linked to important behavioral, health, and social outcomes, including academic achievement, social functioning and substance use, but thoroughly measuring delay discounting is tedious and time consuming. We develop and consistently validate an efficient and psychometrically sound computer adaptive measure of discounting. First, we develop a binary search-type algorithm to measure discounting using a large international data set of 4,190 participants. Using six independent samples ( N = 1,550), we then present evidence of concurrent validity with two standard measures of discounting and a measure of discounting real rewards, convergent validity with addictive behavior, impulsivity, personality, survival probability; and divergent validity with time perspective, life satisfaction, age and gender. The new measure is considerably shorter than standard questionnaires, includes a range of time delays, can be applied to multiple reward magnitudes, shows excellent concurrent, convergent, divergent, and discriminant validity-by showing more sensitivity to effects of smoking behavior on discounting.
Collapse
Affiliation(s)
| | | | | | - Iva Cek
- 1 University of Cambridge, Cambridge, UK
| | | |
Collapse
|
43
|
Kang HA, Chang HH. Parameter Drift Detection in Multidimensional Computerized Adaptive Testing Based on Informational Distance/Divergence Measures. APPLIED PSYCHOLOGICAL MEASUREMENT 2016; 40:534-550. [PMID: 29881068 PMCID: PMC5978631 DOI: 10.1177/0146621616663676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
An informational distance/divergence-based approach is proposed to detect the presence of parameter drift in multidimensional computerized adaptive testing (MCAT). The study presents significance testing procedures for identifying changes in multidimensional item response functions (MIRFs) over time based on informational distance/divergence measures that capture the discrepancy between two probability functions. To approximate the MIRFs from the observed response data, the k-nearest neighbors algorithm is used with the random search method. A simulation study suggests that the distance/divergence-based drift measures perform effectively in identifying the instances of parameter drift in MCAT. They showed moderate power with small samples of 500 examinees and excellent power when the sample size was as large as 1,000. The proposed drift measures also adequately controlled for Type I error at the nominal level under the null hypothesis.
Collapse
|
44
|
Zheng Y. Online Calibration of Polytomous Items Under the Generalized Partial Credit Model. APPLIED PSYCHOLOGICAL MEASUREMENT 2016; 40:434-450. [PMID: 29881063 PMCID: PMC5978499 DOI: 10.1177/0146621616650406] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Online calibration is a technology-enhanced architecture for item calibration in computerized adaptive tests (CATs). Many CATs are administered continuously over a long term and rely on large item banks. To ensure test validity, these item banks need to be frequently replenished with new items, and these new items need to be pretested before being used operationally. Online calibration dynamically embeds pretest items in operational tests and calibrates their parameters as response data are gradually obtained through the continuous test administration. This study extends existing formulas, procedures, and algorithms for dichotomous item response theory models to the generalized partial credit model, a popular model for items scored in more than two categories. A simulation study was conducted to investigate the developed algorithms and procedures under a variety of conditions, including two estimation algorithms, three pretest item selection methods, three seeding locations, two numbers of score categories, and three calibration sample sizes. Results demonstrated acceptable estimation accuracy of the two estimation algorithms in some of the simulated conditions. A variety of findings were also revealed for the interacted effects of included factors, and recommendations were made respectively.
Collapse
Affiliation(s)
- Yi Zheng
- Arizona State University, Tempe,
USA
| |
Collapse
|
45
|
Su YH. A Comparison of Constrained Item Selection Methods in Multidimensional Computerized Adaptive Testing. APPLIED PSYCHOLOGICAL MEASUREMENT 2016; 40:346-360. [PMID: 29881058 PMCID: PMC5978576 DOI: 10.1177/0146621616639305] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
The construction of assessments in computerized adaptive testing (CAT) usually involves fulfilling a large number of statistical and non-statistical constraints to meet test specifications. To improve measurement precision and test validity, the multidimensional priority index (MPI) and the modified MPI (MMPI) can be used to monitor many constraints simultaneously under a between-item and a within-item multidimensional framework, respectively. As both item selection methods can be implemented easily and computed efficiently, they are important and useful for operational CATs; however, no thorough simulation study has compared the performance of these two item selection methods under two different item bank structures. The purpose of this study was to investigate the efficiency of the MMPI and the MPI item selection methods under the between-item and within-item multidimensional CAT through simulations. The MMPI and the MPI item selection methods yielded similar performance in measurement precision for both multidimensional pools and yielded similar performance in exposure control and constraint management for the between-item multidimensional pool. For the within-item multidimensional pool, the MPI method yielded slightly better performance in exposure control but yielded slightly worse performance in constraint management than the MMPI method.
Collapse
Affiliation(s)
- Ya-Hui Su
- National Chung Cheng University, Chiayi, Taiwan
| |
Collapse
|
46
|
Wang S, Zheng Y, Zheng C, Su YH, Li P. An Automated Test Assembly Design for a Large-Scale Chinese Proficiency Test. APPLIED PSYCHOLOGICAL MEASUREMENT 2016; 40:233-237. [PMID: 29881050 PMCID: PMC5978481 DOI: 10.1177/0146621616628503] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Affiliation(s)
- Shiyu Wang
- University of Illinois at Urbana–Champaign, USA
| | - Yi Zheng
- Arizona State University, Tempe, USA
| | | | - Ya-Hui Su
- National Chung Cheng University, Chia-yi, Taiwan
| | - Peize Li
- Chinese Testing International Co., Ltd., Beijing, China
- Tsinghua University, Beijing, China
| |
Collapse
|
47
|
Wang S, Lin H, Chang HH, Douglas J. Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design. JOURNAL OF EDUCATIONAL MEASUREMENT 2016. [DOI: 10.1111/jedm.12100] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Shiyu Wang
- University of Illinois at Urbana-Champaign
| | | | | | | |
Collapse
|
48
|
Wang W, Song L, Chen P, Meng Y, Ding S. Attribute-Level and Pattern-Level Classification Consistency and Accuracy Indices for Cognitive Diagnostic Assessment. JOURNAL OF EDUCATIONAL MEASUREMENT 2015. [DOI: 10.1111/jedm.12096] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
49
|
Zheng Y, Chang HH. On-the-Fly Assembled Multistage Adaptive Testing. APPLIED PSYCHOLOGICAL MEASUREMENT 2015; 39:104-118. [PMID: 29880996 PMCID: PMC5978506 DOI: 10.1177/0146621614544519] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Recently, multistage testing (MST) has been adopted by several important large-scale testing programs and become popular among practitioners and researchers. Stemming from the decades of history of computerized adaptive testing (CAT), the rapidly growing MST alleviates several major problems of earlier CAT applications. Nevertheless, MST is only one among all possible solutions to these problems. This article presents a new adaptive testing design, "on-the-fly assembled multistage adaptive testing" (OMST), which combines the benefits of CAT and MST and offsets their limitations. Moreover, OMST also provides some unique advantages over both CAT and MST. A simulation study was conducted to compare OMST with MST and CAT, and the results demonstrated the promising features of OMST. Finally, the "Discussion" section provides suggestions on possible future adaptive testing designs based on the OMST framework, which could provide great flexibility for adaptive tests in the digital future and open an avenue for all types of hybrid designs based on the different needs of specific tests.
Collapse
|