1
|
Giordano A, Testa S, Bassi M, Cilia S, Bertolotto A, Quartuccio ME, Pietrolongo E, Falautano M, Grobberio M, Niccolai C, Allegri B, Viterbo RG, Confalonieri P, Giovannetti AM, Cocco E, Grasso MG, Lugaresi A, Ferriani E, Nocentini U, Zaffaroni M, De Livera A, Jelinek G, Solari A, Rosato R. Applying multidimensional computerized adaptive testing to the MSQOL-54: a simulation study. Health Qual Life Outcomes 2023; 21:61. [PMID: 37357308 DOI: 10.1186/s12955-023-02152-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 06/15/2023] [Indexed: 06/27/2023] Open
Abstract
BACKGROUND The Multiple Sclerosis Quality of Life-54 (MSQOL-54) is one of the most commonly-used MS-specific health-related quality of life (HRQOL) measures. It is a multidimensional, MS-specific HRQOL inventory, which includes the generic SF-36 core items, supplemented with 18 MS-targeted items. Availability of an adaptive short version providing immediate item scoring may improve instrument usability and validity. However, multidimensional computerized adaptive testing (MCAT) has not been previously applied to MSQOL-54 items. We thus aimed to apply MCAT to the MSQOL-54 and assess its performance. METHODS Responses from a large international sample of 3669 MS patients were assessed. We calibrated 52 (of the 54) items using bifactor graded response model (10 group factors and one general HRQOL factor). Then, eight simulations were run with different termination criteria: standard errors (SE) for the general factor and group factors set to different values, and change in factor estimates from one item to the next set at < 0.01 for both the general and the group factors. Performance of the MCAT was assessed by the number of administered items, root mean square difference (RMSD), and correlation. RESULTS Eight items were removed due to local dependency. The simulation with SE set to 0.32 (general factor), and no SE thresholds (group factors) provided satisfactory performance: the median number of administered items was 24, RMSD was 0.32, and correlation was 0.94. CONCLUSIONS Compared to the full-length MSQOL-54, the simulated MCAT required fewer items without losing precision for the general HRQOL factor. Further work is needed to add/integrate/revise MSQOL-54 items in order to make the calibration and MCAT performance efficient also on group factors, so that the MCAT version may be used in clinical practice and research.
Collapse
Affiliation(s)
- Andrea Giordano
- Unit of Neuroepidemiology, Fondazione IRRCS Istituto Neurologico Carlo Besta, Via Celoria 11, Milan, 20133, Italy
- Department of Psychology, University of Turin, Turin, Italy
| | - Silvia Testa
- Department of Human and Social Sciences, University of Aosta Valley, Aosta, Italy
| | - Marta Bassi
- Department of Biomedical and Clinical Sciences, Università di Milano, Milan, Italy
| | - Sabina Cilia
- Department of Territorial Activities, Azienda Sanitaria Provinciale, Health District, Catania, Italy
| | - Antonio Bertolotto
- Neurology Unit & Regional Referral Multiple Sclerosis Centre (CReSM), University Hospital San Luigi Gonzaga, Orbassano, Italy
| | | | - Erika Pietrolongo
- Department of Neurosciences, Imaging and Clinical Sciences, University G. d'Annunzio, Chieti, Italy
| | - Monica Falautano
- Psychological Service - Neurological and Neurological Rehabilitation Units, IRCCS San Raffaele, Milan, Italy
| | - Monica Grobberio
- Laboratory of Clinical Neuropsychology, Psychology Unit, ASST Lariana, Como, Italy
| | | | - Beatrice Allegri
- Multiple Sclerosis Center, Neurology Unit, Hospital of Vaio, Fidenza, Italy
| | | | - Paolo Confalonieri
- Multiple Sclerosis Center, Unit of Neuroimmunology and Neuromuscular Diseases, Fondazione IRRCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Ambra Mara Giovannetti
- Unit of Neuroepidemiology, Fondazione IRRCS Istituto Neurologico Carlo Besta, Via Celoria 11, Milan, 20133, Italy
- Multiple Sclerosis Center, Unit of Neuroimmunology and Neuromuscular Diseases, Fondazione IRRCS Istituto Neurologico Carlo Besta, Milan, Italy
| | - Eleonora Cocco
- Department of Medical Science and Public Health, University of Cagliari, Cagliari, Italy
- Multiple Sclerosis Center, ASL Cagliari, ATS Sardegna, Cagliari, Italy
| | | | - Alessandra Lugaresi
- Dipartimento di Scienze Biomediche e Neuromotorie, Università di Bologna, Bologna, Italy
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy
| | - Elisa Ferriani
- UOC Psicologia Ospedaliera, AUSL di Bologna, Bologna, Italy
| | - Ugo Nocentini
- Department of Clinical Sciences and Translational Medicine, University of Rome "Tor Vergata", Rome, Italy
- Behavioral Neuropsychology Laboratory, IRCCS S. Lucia Foundation, Rome, Italy
| | - Mauro Zaffaroni
- Neurologia ad indirizzo Neuroimmunologico - Centro Sclerosi Multipla, Ospedale di Gallarate - ASST della Valle Olona, Gallarate, Italy
| | - Alysha De Livera
- Mathematics and Statistics, La Trobe University, Melbourne, Australia
- Neuroepidemiology Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Australia
| | - George Jelinek
- Neuroepidemiology Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Australia
| | - Alessandra Solari
- Unit of Neuroepidemiology, Fondazione IRRCS Istituto Neurologico Carlo Besta, Via Celoria 11, Milan, 20133, Italy.
| | - Rosalba Rosato
- Department of Psychology, University of Turin, Turin, Italy
| |
Collapse
|
2
|
Schurr T, Loth F, Lidington E, Piccinin C, Arraras JI, Groenvold M, Holzner B, van Leeuwen M, Petersen MA, Schmidt H, Young T, Giesinger JM. Patient-reported outcome measures for physical function in cancer patients: content comparison of the EORTC CAT Core, EORTC QLQ-C30, SF-36, FACT-G, and PROMIS measures using the International Classification of Functioning, Disability and Health. BMC Med Res Methodol 2023; 23:21. [PMID: 36681808 PMCID: PMC9862545 DOI: 10.1186/s12874-022-01826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 12/20/2022] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Patient-reported physical function (PF) is a key endpoint in cancer clinical trials. Using complex statistical methods, common metrics have been developed to compare scores from different patient-reported outcome (PRO) measures, but such methods do not account for possible differences in questionnaire content. Therefore, the aim of our study was a content comparison of frequently used PRO measures for PF in cancer patients. METHODS Relying on the framework of the International Classification of Functioning, Disability and Health (ICF) we categorized the item content of the physical domains of the following measures: EORTC CAT Core, EORTC QLQ-C30, SF-36, PROMIS Cancer Item Bank for Physical Function, PROMIS Short Form for Physical Function 20a, and the FACT-G. Item content was linked to ICF categories by two independent reviewers. RESULTS The 118 items investigated were assigned to 3 components ('d - Activities and Participation', 'b - Body Functions', and 'e - Environmental Factors') and 11 first-level ICF categories. All PF items of the EORTC measures but one were assigned to the first-level ICF categories 'd4 - Mobility' and 'd5 - Self-care', all within the component 'd - Activities and Participation'. The SF-36 additionally included item content related to 'd9 - Community, social and civic life' and the PROMIS Short Form for Physical Function 20a also included content related to 'd6 - domestic life'. The PROMIS Cancer Item Bank (v1.1) covered, in addition, two first-level categories within the component 'b - Body Functions'. The FACT-G Physical Well-being scale was found to be the most diverse scale with item content partly not covered by the ICF framework. DISCUSSION Our results provide information about conceptual differences between common PRO measures for the assessment of PF in cancer patients. Our results complement quantitative information on psychometric characteristics of these measures and provide a better understanding of the possibilities of establishing common metrics.
Collapse
Affiliation(s)
- T Schurr
- Department of Psychiatry, Psychotherapy, Psychosomatics, and Medical Psychology, University Hospital of Psychiatry I, Innsbruck Medical University, Anichstraße 35, A-6020 Innsbruck, Austria
| | - F Loth
- Professorship for Psychological Diagnostics and Intervention Psychology, Faculty of Philosophy and Education, Catholic University of Eichstätt-Ingolstadt, Ostenstraße 25, 85072 Eichstätt, Germany
| | - E Lidington
- Cancer Behavioural Science Unit, King’s College London, Guy’s Hospital, St Thomas Street, London, SE1 9RT UK
| | - C Piccinin
- Quality of Life Department, EORTC, Avenue E. Mounier, 83/11, 1200 Brussels, Belgium
| | - JI Arraras
- Medical Oncology Department, Hospital Universitario de Navarra, C/Irunlarrea 3, S31008 Pamplona, Spain
| | - M Groenvold
- Palliative Care Research Unit, Department of Geriatrics and Palliative Medicine GP, Bispebjerg & Frederiksberg Hospital, University of Copenhagen, Copenhagen, Denmark
| | - B Holzner
- Department of Psychiatry, Psychotherapy, Psychosomatics, and Medical Psychology, University Hospital of Psychiatry II, Innsbruck Medical University, Anichstraße 35, A-6020 Innsbruck, Austria
| | - M van Leeuwen
- Division of Psychosocial Research & Epidemiology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | - MA Petersen
- Palliative Care Research Unit, Department of Geriatrics and Palliative Medicine GP, Bispebjerg & Frederiksberg Hospital, University of Copenhagen, Copenhagen, Denmark
| | - H Schmidt
- University Clinic and Outpatient Clinic for Radiotherapy and Institute of Health and Nursing Science, Medical Faculty of Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - T Young
- Lynda Jackson Macmillan Centre, Mount Vernon Cancer Centre, Rickmansworth Rd, GB- HA6 2RN Halle (Saale), UK
| | - JM Giesinger
- Department of Psychiatry, Psychotherapy, Psychosomatics, and Medical Psychology, University Hospital of Psychiatry II, Innsbruck Medical University, Anichstraße 35, A-6020 Innsbruck, Austria
| |
Collapse
|
3
|
Key considerations to reduce or address respondent burden in patient-reported outcome (PRO) data collection. Nat Commun 2022; 13:6026. [PMID: 36224187 PMCID: PMC9556436 DOI: 10.1038/s41467-022-33826-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 10/05/2022] [Indexed: 11/30/2022] Open
Abstract
Patient-reported outcomes (PROs) are used in clinical trials to provide evidence of the benefits and risks of interventions from a patient perspective and to inform regulatory decisions and health policy. The collection of PROs in routine practice can facilitate monitoring of patient symptoms; identification of unmet needs; prioritisation and/or tailoring of treatment to the needs of individual patients and inform value-based healthcare initiatives. However, respondent burden needs to be carefully considered and addressed to avoid high rates of missing data and poor reporting of PRO results, which may lead to poor quality data for regulatory decision making and/or clinical care. The collection of patient-reported outcomes (PROs) may capture patients’ assessments of their health status. Here authors highlight PRO-specific issues that should be considered to minimise respondent burden in clinical trials and routine care.
Collapse
|
4
|
The patient-reported outcomes measurement information systems (PROMIS®) physical function and its derivative measures in adults: a systematic review of content validity. Qual Life Res 2022; 31:3317-3330. [PMID: 35622294 DOI: 10.1007/s11136-022-03151-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2022] [Indexed: 10/18/2022]
Abstract
PURPOSE This study aims to systematically review and critically appraise the content validity of the adult versions of the Patient-Reported Outcomes Measurement Information System Physical Function (PROMIS-PF) item bank and its derivative measures in any adult population. METHODS MEDLINE and EMBASE were searched in October 2021 for studies on measurement properties of PROMIS-PF measures in an adult population. Studies were included if the study described the development of a PROMIS-PF measure or investigated its relevance, comprehensiveness, or comprehensibility. Assessment of the methodological quality of eligible studies, rating of results, and summarizing evidence was performed following the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) methodology for assessing content validity. A modified GRADE approach was used to determine the level of evidence. RESULTS Three development studies and eight studies on the content validity of one or more of the PROMIS-PF measures were identified. The methodological quality of most studies was rated doubtful. There was low to high level evidence for sufficient relevance, comprehensiveness, and comprehensibility of most PROMIS-PF measures for healthy seniors and various disease populations. We found low to moderate level evidence for insufficient relevance of PROMIS-PF measures for patients with conditions that affected only one body part, and insufficient comprehensibility of the PROMIS-PF measures for minority elderly. CONCLUSION Most PROMIS-PF measures demonstrate sufficient content validity in healthy seniors and various disease populations. However, the quality of this evidence is generally low to moderate, due to limitations in the methodological quality of the studies.
Collapse
|
5
|
Wang C, Weiss DJ, Su S, Suen KY, Basford J, Cheville AL. Multidimensional Computerized Adaptive Testing: A Potential Path Toward the Efficient and Precise Assessment of Applied Cognition, Daily Activity, and Mobility for Hospitalized Patients. Arch Phys Med Rehabil 2022; 103:S3-S14. [PMID: 35090886 PMCID: PMC9064883 DOI: 10.1016/j.apmr.2022.01.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 12/20/2021] [Accepted: 01/12/2022] [Indexed: 11/17/2022]
Abstract
OBJECTIVE To develop and evaluate an efficient and precise variable-length functional assessment of applied cognition, daily activity, and mobility to inform mobility preservation and rehabilitation service delivery among hospitalized patients. DESIGN A multidimensional item bank tapping into these dimensions was developed, with all items calibrated using a multidimensional graded response model. The items were adaptively selected from the item banks to maximize the test information, and the test ended when a joint stopping rule was satisfied. A simulation study was conducted based on the completed instrument, the Functional Assessment in Acute Care Multidimensional Computerized Adaptive Test (FAMCAT), to compare its measurement precision and efficiency capabilities relative to conventional unidimensional computerized adaptive testing. Precision was measured by the bias and root mean squared error between the estimated and true (ie, simulated) θ estimates, whereas efficiency was measured by average test length. Data were collected by an interviewer reading questions from a tablet computer and entering patients' responses. SETTING A large Midwestern hospital. PARTICIPANTS A total of 4143 patients hospitalized with medical diagnosis and/or surgical complications, with 2060 in the calibration sample and 2083 in the validation cohort. INTERVENTION Not applicable. RESULTS Among the 2083 patients in the validation sample, FAMCAT administration required an average of 6 (SD=3.11) minutes. Ninety-six percent had their tests terminated by the standard error rule after responding to an average of 22.05 (SD=7.98) items, whereas 15 were terminated by the change in θ rule, with an average test length of 45.27 (SD=11.49). The remaining 76 responded until reaching the maximum test length of 60 items. CONCLUSIONS The FAMCAT has the potential to satisfy the need for structured, frequent, and precise assessment of functional domains among hospitalized patients with medical diagnosis and/or surgical complications. The results are promising and may be informative for others who wish to develop similar instruments when concurrent assessment of correlated domains is required.
Collapse
Affiliation(s)
- Chun Wang
- College of Education, University of Washington, Seattle, WA.
| | - David J Weiss
- Department of Psychology, University of Minnesota, Minneapolis, MN
| | - Shiyang Su
- Department of Psychology, University of Central Florida, Orlando, FL
| | - King Yiu Suen
- Department of Psychology, University of Minnesota, Minneapolis, MN
| | - Jeffrey Basford
- Department of Physical Medicine and Rehabilitation, Mayo Clinic, Rochester, MN
| | - Andrea L Cheville
- Department of Physical Medicine and Rehabilitation, Mayo Clinic, Rochester, MN
| |
Collapse
|
6
|
Zheng Y, Cheon H, Katz CM. Using Machine Learning Methods to Develop a Short Tree-Based Adaptive Classification Test: Case Study With a High-Dimensional Item Pool and Imbalanced Data. APPLIED PSYCHOLOGICAL MEASUREMENT 2020; 44:499-514. [PMID: 34565931 PMCID: PMC7495791 DOI: 10.1177/0146621620931198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This study explores advanced techniques in machine learning to develop a short tree-based adaptive classification test based on an existing lengthy instrument. A case study was carried out for an assessment of risk for juvenile delinquency. Two unique facts of this case are (a) the items in the original instrument measure a large number of distinctive constructs; (b) the target outcomes are of low prevalence, which renders imbalanced training data. Due to the high dimensionality of the items, traditional item response theory (IRT)-based adaptive testing approaches may not work well, whereas decision trees, which are developed in the machine learning discipline, present as a promising alternative solution for adaptive tests. A cross-validation study was carried out to compare eight tree-based adaptive test constructions with five benchmark methods using data from a sample of 3,975 subjects. The findings reveal that the best-performing tree-based adaptive tests yielded better classification accuracy than the benchmark method IRT scoring with optimal cutpoints, and yielded comparable or better classification accuracy than the best benchmark method, random forest with balanced sampling. The competitive classification accuracy of the tree-based adaptive tests also come with an over 30-fold reduction in the length of the instrument, only administering between 3 to 6 items to any individual. This study suggests that tree-based adaptive tests have an enormous potential when used to shorten instruments that measure a large variety of constructs.
Collapse
Affiliation(s)
- Yi Zheng
- Arizona State University, Tempe, USA
| | | | | |
Collapse
|
7
|
Liegl G, Rose M, Knebel F, Stengel A, Buttgereit F, Obbarius A, Fischer HF, Nolte S. Using subdomain-specific item sets affected PROMIS physical function scores differently in cardiology and rheumatology patients. J Clin Epidemiol 2020; 127:151-160. [PMID: 32781113 DOI: 10.1016/j.jclinepi.2020.08.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 07/22/2020] [Accepted: 08/05/2020] [Indexed: 12/21/2022]
Abstract
OBJECTIVES The Patient-Reported Outcomes Measurement Information System (PROMIS) physical function (PF) item bank has been developed to standardize patient-reported PF across medical fields. However, evidence of scoring equivalence across cardiology and rheumatology patients is still missing. Therefore, this study aims to investigate both (1) the extent of disease-related differential item functioning (DIF) and (2) the impact of the disease group on using subdomain-specific item sets for generating PROMIS PF scores in cardiology and rheumatology patients. STUDY DESIGN AND SETTING Ordinal regression was used to evaluate DIF between cardiology (n = 201) and rheumatology (n = 200) inpatients. To explore the disease-specific impact of PF subdomains on scoring, we compared scores derived from the full item bank with scores derived from subdomain-specific item sets for each disease group. RESULTS DIF was detected in 18 items, predominately from the upper extremity subdomain. When upper extremity items were used, cardiology patients reached systematically higher scores than using the full item bank. Rheumatology patients scored substantially higher when mobility items were used. CONCLUSION Applying the PROMIS PF metric to disease-specific item sets including items from differing subdomains may lead to biased comparisons of PF levels across disease groups. Disease-specific item parameters should be provided for items showing DIF, and subdomain-related content balancing is recommended for scoring the generic PROMIS PF construct.
Collapse
Affiliation(s)
- Gregor Liegl
- Department of Psychosomatic Medicine, Center for Internal Medicine and Dermatology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany.
| | - Matthias Rose
- Department of Psychosomatic Medicine, Center for Internal Medicine and Dermatology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany
| | - Fabian Knebel
- Clinic for Cardiology and Angiology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany
| | - Andreas Stengel
- Department of Psychosomatic Medicine, Center for Internal Medicine and Dermatology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany; Clinic for Rheumatology and Clinical Immunology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany; Department of Psychosomatic Medicine and Psychotherapy, Medical University Hospital Tübingen, Tübingen, Germany
| | - Frank Buttgereit
- Clinic for Rheumatology and Clinical Immunology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany
| | - Alexander Obbarius
- Department of Psychosomatic Medicine, Center for Internal Medicine and Dermatology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany
| | - H Felix Fischer
- Department of Psychosomatic Medicine, Center for Internal Medicine and Dermatology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany
| | - Sandra Nolte
- Department of Psychosomatic Medicine, Center for Internal Medicine and Dermatology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, Berlin Institute of Health, Berlin, Germany; Population Health Strategic Research Centre, School of Health and Social Development, Deakin University, Burwood, Australia
| |
Collapse
|
8
|
Mao X, Zhang J, Xin T. Application of Dimension Reduction to CAT Item Selection Under the Bifactor Model. APPLIED PSYCHOLOGICAL MEASUREMENT 2019; 43:419-434. [PMID: 31452552 PMCID: PMC6696870 DOI: 10.1177/0146621618813086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Multidimensional computerized adaptive testing (MCAT) based on the bifactor model is suitable for tests with multidimensional bifactor measurement structures. Several item selection methods that proved to be more advantageous than the maximum Fisher information method are not practical for bifactor MCAT due to time-consuming computations resulting from high dimensionality. To make them applicable in bifactor MCAT, dimension reduction is applied to four item selection methods, which are the posterior-weighted Fisher D-optimality (PDO) and three non-Fisher information-based methods-posterior expected Kullback-Leibler information (PKL), continuous entropy (CE), and mutual information (MI). They were compared with the Bayesian D-optimality (BDO) method in terms of estimation precision. When both the general and group factors are the measurement objectives, BDO, PDO, CE, and MI perform equally well and better than PKL. When the group factors represent nuisance dimensions, MI and CE perform the best in estimating the general factor, followed by the BDO, PDO, and PKL. How the bifactor pattern and test length affect estimation accuracy was also discussed.
Collapse
Affiliation(s)
| | - Jiahui Zhang
- Michigan State University, East Lansing, MI, USA
| | - Tao Xin
- Beijing Normal University, Beijing, China
| |
Collapse
|
9
|
Geerards D, Klassen AF, Hoogbergen MM, van der Hulst RRWJ, van den Berg L, Pusic AL, Gibbons CJ. Streamlining the Assessment of Patient-Reported Outcomes in Weight Loss and Body Contouring Patients: Applying Computerized Adaptive Testing to the BODY-Q. Plast Reconstr Surg 2019; 143:946e-955e. [PMID: 31033817 DOI: 10.1097/prs.0000000000005587] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
BACKGROUND The BODY-Q is a widely used patient-reported outcome measure of surgical outcomes in weight loss and body contouring patients. Reducing the length of the BODY-Q assessment could overcome implementation barriers in busy clinics. A shorter BODY-Q could be achieved by using computerized adaptive testing, a method to shorten and tailor assessments while maintaining reliability and accuracy. In this study, the authors apply computerized adaptive testing to the BODY-Q and assess computerized adaptive testing performance in terms of item reduction and accuracy. METHODS Parameters describing the psychometric properties of 138 BODY-Q items (i.e., questions) were derived from the original validation sample (n = 734). The 138 items are arranged into 18 scales reflecting Appearance, Quality of Life, and Experience of Care domains. The authors simulated 1000 administrations of the computerized adaptive testing until a stopping rule, reflecting assessment accuracy of standard error less than 0.55, was met. The authors describe the reduction of assessment length in terms of the mean and range of items administered. The authors assessed accuracy by determining correlation between full test and computerized adaptive testing scores. RESULTS The authors ran 54 simulations. Mean item reduction was 36.9 percent (51 items; range, 48 to 138 items). Highest item reduction was achieved for the Experience of Care domain (56.2 percent, 22.5 items). Correlation between full test scores and the BODY-Q computerized adaptive test scores averaged 0.99. CONCLUSIONS Substantial item reduction is possible by using BODY-Q computerized adaptive testing. Reduced assessment length using BODY-Q computerized adaptive testing could reduce patient burden while preserving the accuracy of clinical patient-reported outcomes for patients undergoing weight loss and body contouring operations.
Collapse
Affiliation(s)
- Daan Geerards
- From the Patient-Reported Outcomes, Value & Experience Center, Department of Surgery, Brigham and Women's Hospital; the Department of Surgery, Harvard Medical School; the Department of Pediatrics, McMaster University; the Department of Plastic and Reconstructive Surgery, Catharina Hospital; and the Department of Plastic and Reconstructive Surgery, Maastricht University Medical Center
| | - Anne F Klassen
- From the Patient-Reported Outcomes, Value & Experience Center, Department of Surgery, Brigham and Women's Hospital; the Department of Surgery, Harvard Medical School; the Department of Pediatrics, McMaster University; the Department of Plastic and Reconstructive Surgery, Catharina Hospital; and the Department of Plastic and Reconstructive Surgery, Maastricht University Medical Center
| | - Maarten M Hoogbergen
- From the Patient-Reported Outcomes, Value & Experience Center, Department of Surgery, Brigham and Women's Hospital; the Department of Surgery, Harvard Medical School; the Department of Pediatrics, McMaster University; the Department of Plastic and Reconstructive Surgery, Catharina Hospital; and the Department of Plastic and Reconstructive Surgery, Maastricht University Medical Center
| | - René R W J van der Hulst
- From the Patient-Reported Outcomes, Value & Experience Center, Department of Surgery, Brigham and Women's Hospital; the Department of Surgery, Harvard Medical School; the Department of Pediatrics, McMaster University; the Department of Plastic and Reconstructive Surgery, Catharina Hospital; and the Department of Plastic and Reconstructive Surgery, Maastricht University Medical Center
| | - Lisa van den Berg
- From the Patient-Reported Outcomes, Value & Experience Center, Department of Surgery, Brigham and Women's Hospital; the Department of Surgery, Harvard Medical School; the Department of Pediatrics, McMaster University; the Department of Plastic and Reconstructive Surgery, Catharina Hospital; and the Department of Plastic and Reconstructive Surgery, Maastricht University Medical Center
| | - Andrea L Pusic
- From the Patient-Reported Outcomes, Value & Experience Center, Department of Surgery, Brigham and Women's Hospital; the Department of Surgery, Harvard Medical School; the Department of Pediatrics, McMaster University; the Department of Plastic and Reconstructive Surgery, Catharina Hospital; and the Department of Plastic and Reconstructive Surgery, Maastricht University Medical Center
| | - Chris J Gibbons
- From the Patient-Reported Outcomes, Value & Experience Center, Department of Surgery, Brigham and Women's Hospital; the Department of Surgery, Harvard Medical School; the Department of Pediatrics, McMaster University; the Department of Plastic and Reconstructive Surgery, Catharina Hospital; and the Department of Plastic and Reconstructive Surgery, Maastricht University Medical Center
| |
Collapse
|
10
|
Smits N, van der Ark LA, Conijn JM. Measurement versus prediction in the construction of patient-reported outcome questionnaires: can we have our cake and eat it? Qual Life Res 2018; 27:1673-1682. [PMID: 29098607 PMCID: PMC5997739 DOI: 10.1007/s11136-017-1720-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/12/2017] [Indexed: 02/07/2023]
Abstract
BACKGROUND Two important goals when using questionnaires are (a) measurement: the questionnaire is constructed to assign numerical values that accurately represent the test taker's attribute, and (b) prediction: the questionnaire is constructed to give an accurate forecast of an external criterion. Construction methods aimed at measurement prescribe that items should be reliable. In practice, this leads to questionnaires with high inter-item correlations. By contrast, construction methods aimed at prediction typically prescribe that items have a high correlation with the criterion and low inter-item correlations. The latter approach has often been said to produce a paradox concerning the relation between reliability and validity [1-3], because it is often assumed that good measurement is a prerequisite of good prediction. OBJECTIVE To answer four questions: (1) Why are measurement-based methods suboptimal for questionnaires that are used for prediction? (2) How should one construct a questionnaire that is used for prediction? (3) Do questionnaire-construction methods that optimize measurement and prediction lead to the selection of different items in the questionnaire? (4) Is it possible to construct a questionnaire that can be used for both measurement and prediction? ILLUSTRATIVE EXAMPLE An empirical data set consisting of scores of 242 respondents on questionnaire items measuring mental health is used to select items by means of two methods: a method that optimizes the predictive value of the scale (i.e., forecast a clinical diagnosis), and a method that optimizes the reliability of the scale. We show that for the two scales different sets of items are selected and that a scale constructed to meet the one goal does not show optimal performance with reference to the other goal. DISCUSSION The answers are as follows: (1) Because measurement-based methods tend to maximize inter-item correlations by which predictive validity reduces. (2) Through selecting items that correlate highly with the criterion and lowly with the remaining items. (3) Yes, these methods may lead to different item selections. (4) For a single questionnaire: Yes, but it is problematic because reliability cannot be estimated accurately. For a test battery: Yes, but it is very costly. Implications for the construction of patient-reported outcome questionnaires are discussed.
Collapse
Affiliation(s)
- Niels Smits
- Research Institute of Child Development and Education, University of Amsterdam, Nieuwe Achtergracht 127, 1018 WS, Amsterdam, The Netherlands.
| | - L Andries van der Ark
- Research Institute of Child Development and Education, University of Amsterdam, Nieuwe Achtergracht 127, 1018 WS, Amsterdam, The Netherlands
| | - Judith M Conijn
- Research Institute of Child Development and Education, University of Amsterdam, Nieuwe Achtergracht 127, 1018 WS, Amsterdam, The Netherlands
| |
Collapse
|
11
|
Smits N, Paap MCS, Böhnke JR. Some recommendations for developing multidimensional computerized adaptive tests for patient-reported outcomes. Qual Life Res 2018; 27:1055-1063. [PMID: 29476312 PMCID: PMC5874279 DOI: 10.1007/s11136-018-1821-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/21/2018] [Indexed: 10/31/2022]
Abstract
PURPOSE Multidimensional item response theory and computerized adaptive testing (CAT) are increasingly used in mental health, quality of life (QoL), and patient-reported outcome measurement. Although multidimensional assessment techniques hold promises, they are more challenging in their application than unidimensional ones. The authors comment on minimal standards when developing multidimensional CATs. METHODS Prompted by pioneering papers published in QLR, the authors reflect on existing guidance and discussions from different psychometric communities, including guidelines developed for unidimensional CATs in the PROMIS project. RESULTS The commentary focuses on two key topics: (1) the design, evaluation, and calibration of multidimensional item banks and (2) how to study the efficiency and precision of a multidimensional item bank. The authors suggest that the development of a carefully designed and calibrated item bank encompasses a construction phase and a psychometric phase. With respect to efficiency and precision, item banks should be large enough to provide adequate precision over the full range of the latent constructs. Therefore CAT performance should be studied as a function of the latent constructs and with reference to relevant benchmarks. Solutions are also suggested for simulation studies using real data, which often result in too optimistic evaluations of an item bank's efficiency and precision. DISCUSSION Multidimensional CAT applications are promising but complex statistical assessment tools which necessitate detailed theoretical frameworks and methodological scrutiny when testing their appropriateness for practical applications. The authors advise researchers to evaluate item banks with a broad set of methods, describe their choices in detail, and substantiate their approach for validation.
Collapse
Affiliation(s)
- Niels Smits
- Research Institute of Child Development and Education, University of Amsterdam, Nieuwe Achtergracht 127, 1018 WS, Amsterdam, The Netherlands.
| | - Muirne C S Paap
- Department of Special Needs, Education, and Youth Care, Faculty of Behavioural and Social Sciences, University of Groningen, Groningen, The Netherlands
| | - Jan R Böhnke
- Dundee Centre for Health and Related Research, School of Nursing and Health Sciences, University of Dundee, Dundee, UK
| |
Collapse
|
12
|
Michel P, Baumstarck K, Ghattas B, Pelletier J, Loundou A, Boucekine M, Auquier P, Boyer L. A Multidimensional Computerized Adaptive Short-Form Quality of Life Questionnaire Developed and Validated for Multiple Sclerosis: The MusiQoL-MCAT. Medicine (Baltimore) 2016; 95:e3068. [PMID: 27057832 PMCID: PMC4998748 DOI: 10.1097/md.0000000000003068] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
The aim was to develop a multidimensional computerized adaptive short-form questionnaire, the MusiQoL-MCAT, from a fixed-length QoL questionnaire for multiple sclerosis.A total of 1992 patients were enrolled in this international cross-sectional study. The development of the MusiQoL-MCAT was based on the assessment of between-items MIRT model fit followed by real-data simulations. The MCAT algorithm was based on Bayesian maximum a posteriori estimation of latent traits and Kullback-Leibler information item selection. We examined several simulations based on a fixed number of items. Accuracy was assessed using correlations (r) between initial IRT scores and MCAT scores. Precision was assessed using the standard error measurement (SEM) and the root mean square error (RMSE).The multidimensional graded response model was used to estimate item parameters and IRT scores. Among the MCAT simulations, the 16-item version of the MusiQoL-MCAT was selected because the accuracy and precision became stable with 16 items with satisfactory levels (r ≥ 0.9, SEM ≤ 0.55, and RMSE ≤ 0.3). External validity of the MusiQoL-MCAT was satisfactory.The MusiQoL-MCAT presents satisfactory properties and can individually tailor QoL assessment to each patient, making it less burdensome to patients and better adapted for use in clinical practice.
Collapse
Affiliation(s)
- Pierre Michel
- From the Aix-Marseille University, EA 3279 - Public Health, Chronic Diseases and Quality of Life - Research Unit (PM, KB, BG, AL, MB, PA, LB); Aix-Marseille University - I2 M UMR 7373 - Mathematics Institute of Marseille (PM, BG); and Departments of Neurology and CRMBM CNRS6612, La Timone University Hospital, APHM, Marseille, France (JP)
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Item exposure control for multidimensional computer adaptive testing under maximum likelihood and expected a posteriori estimation. Behav Res Methods 2015; 48:1443-1453. [PMID: 26487053 DOI: 10.3758/s13428-015-0659-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Item bank stratification has been shown to be an effective method for combating item overexposure in both uni- and multidimensional computer adaptive testing. However, item bank stratification cannot guarantee that items will not be overexposed-that is, exposed at a rate exceeding some prespecified threshold. In this article, we propose enhancing stratification for multidimensional computer adaptive tests by combining it with the item eligibility method, a technique for controlling the maximum exposure rate in computerized tests. The performance of the method was examined via a simulation study and compared to existing methods of item selection and exposure control. Also, for the first time, maximum likelihood (MLE) and expected a posteriori (EAP) estimation of examinee ability were compared side by side in a multidimensional computer adaptive test. The simulation suggested that the proposed method is effective in suppressing the maximum item exposure rate with very little loss of measurement accuracy and precision. As compared to MLE, EAP generates smaller mean squared errors of the ability estimates in all simulation conditions.
Collapse
|
14
|
Chang HH. Psychometrics behind Computerized Adaptive Testing. PSYCHOMETRIKA 2015; 80:1-20. [PMID: 24499939 DOI: 10.1007/s11336-014-9401-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2013] [Indexed: 05/27/2023]
Abstract
The paper provides a survey of 18 years' progress that my colleagues, students (both former and current) and I made in a prominent research area in Psychometrics-Computerized Adaptive Testing (CAT). We start with a historical review of the establishment of a large sample foundation for CAT. It is worth noting that the asymptotic results were derived under the framework of Martingale Theory, a very theoretical perspective of Probability Theory, which may seem unrelated to educational and psychological testing. In addition, we address a number of issues that emerged from large scale implementation and show that how theoretical works can be helpful to solve the problems. Finally, we propose that CAT technology can be very useful to support individualized instruction on a mass scale. We show that even paper and pencil based tests can be made adaptive to support classroom teaching.
Collapse
Affiliation(s)
- Hua-Hua Chang
- University of Illinois at Urbana-Champaign, 430 Psychology Building, 630 E. Daniel Street, M/C 716, Champaign, IL, 61820, USA,
| |
Collapse
|
15
|
Michel P, Auquier P, Baumstarck K, Pelletier J, Loundou A, Ghattas B, Boyer L. Development of a cross-cultural item bank for measuring quality of life related to mental health in multiple sclerosis patients. Qual Life Res 2015; 24:2261-71. [PMID: 25712324 DOI: 10.1007/s11136-015-0948-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/17/2015] [Indexed: 11/24/2022]
Abstract
OBJECTIVE Quality of life (QoL) measurements are considered important outcome measures both for research on multiple sclerosis (MS) and in clinical practice. Computerized adaptive testing (CAT) can improve the precision of measurements made using QoL instruments while reducing the burden of testing on patients. Moreover, a cross-cultural approach is also necessary to guarantee the wide applicability of CAT. The aim of this preliminary study was to develop a calibrated item bank that is available in multiple languages and measures QoL related to mental health by combining one generic (SF-36) and one disease-specific questionnaire (MusiQoL). METHODS Patients with MS were enrolled in this international, multicenter, cross-sectional study. The psychometric properties of the item bank were based on classical test and item response theories and approaches, including the evaluation of unidimensionality, item response theory model fitting, and analyses of differential item functioning (DIF). Convergent and discriminant validities of the item bank were examined according to socio-demographic, clinical, and QoL features. RESULTS A total of 1992 patients with MS and from 15 countries were enrolled in this study to calibrate the 22-item bank developed in this study. The strict monotonicity of the Cronbach's alpha curve, the high eigenvalue ratio estimator (5.50), and the adequate CFA model fit (RMSEA = 0.07 and CFI = 0.95) indicated that a strong assumption of unidimensionality was warranted. The infit mean square statistic ranged from 0.76 to 1.27, indicating a satisfactory item fit. DIF analyses revealed no item biases across geographical areas, confirming the cross-cultural equivalence of the item bank. External validity testing revealed that the item bank scores correlated significantly with QoL scores but also showed discriminant validity for socio-demographic and clinical characteristics. CONCLUSION This work demonstrated satisfactory psychometric characteristics for a QoL item bank for MS in multiple languages. This work may offer a common measure for the assessment of QoL in different cultural contexts and for international studies conducted on MS.
Collapse
Affiliation(s)
- Pierre Michel
- Aix-Marseille University, EA3279: Public Health, Chronic Diseases and Quality of Life, Research Unit, 13005, Marseille, France,
| | | | | | | | | | | | | |
Collapse
|