1
|
Moss J. Measures of Agreement with Multiple Raters: Fréchet Variances and Inference. PSYCHOMETRIKA 2024; 89:517-541. [PMID: 38190018 PMCID: PMC11164747 DOI: 10.1007/s11336-023-09945-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 12/06/2023] [Indexed: 01/09/2024]
Abstract
Most measures of agreement are chance-corrected. They differ in three dimensions: their definition of chance agreement, their choice of disagreement function, and how they handle multiple raters. Chance agreement is usually defined in a pairwise manner, following either Cohen's kappa or Fleiss's kappa. The disagreement function is usually a nominal, quadratic, or absolute value function. But how to handle multiple raters is contentious, with the main contenders being Fleiss's kappa, Conger's kappa, and Hubert's kappa, the variant of Fleiss's kappa where agreement is said to occur only if every rater agrees. More generally, multi-rater agreement coefficients can be defined in a g-wise way, where the disagreement weighting function uses g raters instead of two. This paper contains two main contributions. (a) We propose using Fréchet variances to handle the case of multiple raters. The Fréchet variances are intuitive disagreement measures and turn out to generalize the nominal, quadratic, and absolute value functions to the case of more than two raters. (b) We derive the limit theory of g-wise weighted agreement coefficients, with chance agreement of the Cohen-type or Fleiss-type, for the case where every item is rated by the same number of raters. Trying out three confidence interval constructions, we end up recommending calculating confidence intervals using the arcsine transform or the Fisher transform.
Collapse
Affiliation(s)
- Jonas Moss
- Department of Data Science and Analytics, BI Norwegian Business School, Oslo, Norway.
| |
Collapse
|
2
|
van Oest R. The Dependence of Chance-Corrected Weighted Agreement Coefficients on the Power Parameter of the Weighting Scheme: Analysis and Measurement. PSYCHOMETRIKA 2023; 88:554-579. [PMID: 36066789 PMCID: PMC10188398 DOI: 10.1007/s11336-022-09881-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 07/10/2022] [Accepted: 07/26/2022] [Indexed: 05/17/2023]
Abstract
We consider the dependence of a broad class of chance-corrected weighted agreement coefficients on the weighting scheme that penalizes rater disagreements. The considered class encompasses many existing coefficients with any number of raters, and one real-valued power parameter defines the weighting scheme that includes linear, quadratic, identity, and radical weights. We obtain the first-order and second-order derivatives of the coefficients with respect to the power parameter and decompose them into components corresponding to all pairs of different category distances. Each component compares its two distances in terms of the ratio of observed to expected-by-chance frequency. A larger ratio for the smaller distance than the larger distance contributes to a positive relationship between the power parameter and the coefficient value; the opposite contributes to a negative relationship. We provide necessary and sufficient conditions for the coefficient value to increase or decrease and the relationship to intensify or weaken as the power parameter increases. We use the first-order and second-order derivatives for corresponding measurement. Furthermore, we show how these two derivatives allow other researchers to obtain quite accurate estimates of the coefficient value for unreported values of the power parameter, even without access to the original data.
Collapse
Affiliation(s)
- Rutger van Oest
- Department of Marketing, BI Norwegian Business School, Nydalsveien 37, 0484, Oslo, Norway.
| |
Collapse
|
3
|
Ferrari L, Leahy I, Staffa SJ, Berry JG. The Pediatric-Specific American Society of Anesthesiologists Physical Status Score: A Multicenter Study. Anesth Analg 2021; 132:807-817. [PMID: 32665468 DOI: 10.1213/ane.0000000000005025] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
BACKGROUND When applied to the pediatric population, the American Society of Anesthesiologists physical status (ASA-PS) classification has exhibited poor reliability due to its subjective and adult-focused definitions. This study was done to measure interrater agreement of a pediatric-adapted ASA-PS classification and to solicit multicenter perspectives to optimize the pediatric ASA-PS classification. METHODS A prospective, mixed-methods study of 197 pediatric anesthesiologists from 13 academic pediatric hospitals in the United States, Europe, and Australia surveyed in May and July 2019. Participants assigned ASA-PS scores (I to V) for 15 pediatric cases with a heterogeneous mix of acute and chronic health conditions undergoing a variety of surgical and related procedures. Pediatric-adapted definitions of ASA-PS were provided. The intraclass correlation coefficient (ICC) was used to assess interrater reliability of ASA-PS scores. The ICC was estimated using 2-way mixed-effects modeling, accounting for multiple raters assigning scores for the same set of cases. Qualitative feedback on the pediatric-adapted ASA-PS classification was analyzed with line-by-line coding. RESULTS The survey response rate was 83.8% (165 of 197). The ICC agreement among participants on ASA-PS scoring across all 15 clinical cases was 0.58 (95% confidence interval [CI], 0.42-0.77). ICC did not vary significantly by years of anesthesiology practice. ICC varied across hospitals (range: 0.34; 95% CI, 0.12-0.63 to 0.79; 95% CI, 0.66-0.91). The highest level of agreement occurred with cases most often scored as ASA-PS I, IV, and V; the lowest agreement occurred with cases most often scored ASA-PS II and III. Clarification of how well a chronic condition was controlled and presence of an acute illness were 2 common themes suggested to optimize the validity of the pediatric-adapted ASA-PS definitions. CONCLUSIONS The pediatric-adapted ASA-PS classification had moderate interrater reliability among pediatric anesthesiologists. The lower reliability of scoring for ASA-PS II and III cases, in particular, supports the need for further ASA-PS definition refinement for pediatric populations.
Collapse
Affiliation(s)
- Lynne Ferrari
- From the Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts.,Department of Anaesthesiology Harvard Medical School, Boston, Massachusetts
| | - Izabela Leahy
- From the Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts
| | - Steven J Staffa
- From the Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts
| | - Jay G Berry
- Department of Anaesthesiology Harvard Medical School, Boston, Massachusetts.,Complex Care Service, Division of General Pediatrics, Boston Children's Hospital, Boston, Massachusetts
| |
Collapse
|
4
|
Vreman RA, Mantel-Teeuwisse AK, Hövels AM, Leufkens HGM, Goettsch WG. Differences in Health Technology Assessment Recommendations Among European Jurisdictions: The Role of Practice Variations. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2020; 23:10-16. [PMID: 31952664 DOI: 10.1016/j.jval.2019.07.017] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 04/02/2019] [Accepted: 07/15/2019] [Indexed: 05/25/2023]
Abstract
BACKGROUND Health technology assessment (HTA) plays an important role in reimbursement decision-making in many countries, but recommendations vary widely throughout jurisdictions, even for the same drug. This variation may be due to differences in the weighing of evidence or differences in the processes or procedures, which are known as HTA practices. OBJECTIVE To provide insight into the effects of differences in practices on interpretation of intercountry differences in HTA recommendations for conditionally approved drugs. METHODS HTA recommendations for conditionally approved drugs (N = 27) up until June 2017 from England/Wales, France, Germany, the Netherlands, and Scotland were included. Recommendations and practice characteristics were extracted from these five jurisdictions and this data was validated. The effect of nonsubmissions, resubmissions, and reassessments; cost-effectiveness assessments; and price negotiations on changes in the percentage of negative recommendations and the interpretation of intercountry differences in HTA outcomes were analyzed using Fisher exact tests. RESULTS The inclusion of cost-effectiveness assessments led to significant increases in the proportion of negative recommendations in England/Wales (from 4% to 50%, P<.01) and Scotland (from 21% to 71%, P<.01). The subsequent inclusion of price negotiations led to significant reductions in the proportion of negative recommendations in England/Wales (from 50% to 14%, P<.01), France (from 31% to 3%, P=.012), and Germany (from 34% to 0%, P<.01). Results indicated that the inclusion of nonsubmissions and resubmissions might affect Scottish negative HTA recommendations (from 7% to 21%), but this effect was not significant. No significant effects were observed in the Netherlands, possibly owing to sample size. CONCLUSION Variations in HTA practices between international jurisdictions can have a substantial and significant impact on conclusions about recommendations by HTA bodies, as exemplified in this cohort of conditionally approved products. Studies comparing international HTA recommendations should carefully consider possible practice variations between jurisdictions.
Collapse
Affiliation(s)
- Rick A Vreman
- Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Utrecht, The Netherlands; The National Healthcare Institute, Diemen, The Netherlands
| | | | - Anke M Hövels
- The National Healthcare Institute, Diemen, The Netherlands
| | | | - Wim G Goettsch
- Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Utrecht, The Netherlands; The National Healthcare Institute, Diemen, The Netherlands.
| |
Collapse
|
5
|
Vreman RA, Bouvy JC, Bloem LT, Hövels AM, Mantel‐Teeuwisse AK, Leufkens HG, Goettsch WG. Weighing of Evidence by Health Technology Assessment Bodies: Retrospective Study of Reimbursement Recommendations for Conditionally Approved Drugs. Clin Pharmacol Ther 2018; 105:684-691. [PMID: 30300938 PMCID: PMC6587700 DOI: 10.1002/cpt.1251] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 09/21/2018] [Indexed: 11/07/2022]
Abstract
This study assessed whether five Health Technology Assessment (HTA) bodies in Europe were more negative about drugs with a Conditional Marketing Authorization (CMA) that are approved without controlled studies compared to CMA drugs that are approved based on controlled studies. The HTA recommendations were categorized into positive, restricted, and negative. A total of 92 HTA recommendations were available for 27 drugs. Thirty of 62 (48%) and 17 of 30 (57%) of the recommendations were negative for drugs with and without controlled studies, respectively. Overall, only 12 (13%) recommendations were positive. In all jurisdictions, recommendations between drugs with and drugs without controlled data were comparable, which suggests that the presence of controlled data is not decisive in HTA evaluations. The small proportion of unrestricted positive recommendations highlights difficulties with recommending the drugs in this cohort, which may be caused by scientific uncertainty or other factors. Earlier collaboration between stakeholders is advised in order to improve patient access.
Collapse
Affiliation(s)
- Rick A. Vreman
- Division of Pharmacoepidemiology and Clinical PharmacologyUtrecht Institute for Pharmaceutical Sciences (UIPS)Utrecht UniversityUtrechtThe Netherlands
- The National Healthcare Institute (ZIN)DiemenThe Netherlands
| | | | - Lourens T. Bloem
- Division of Pharmacoepidemiology and Clinical PharmacologyUtrecht Institute for Pharmaceutical Sciences (UIPS)Utrecht UniversityUtrechtThe Netherlands
- Dutch Medicines Evaluation Board (MEB)UtrechtThe Netherlands
| | - Anke M. Hövels
- Division of Pharmacoepidemiology and Clinical PharmacologyUtrecht Institute for Pharmaceutical Sciences (UIPS)Utrecht UniversityUtrechtThe Netherlands
| | - Aukje K. Mantel‐Teeuwisse
- Division of Pharmacoepidemiology and Clinical PharmacologyUtrecht Institute for Pharmaceutical Sciences (UIPS)Utrecht UniversityUtrechtThe Netherlands
| | - Hubert G.M. Leufkens
- Division of Pharmacoepidemiology and Clinical PharmacologyUtrecht Institute for Pharmaceutical Sciences (UIPS)Utrecht UniversityUtrechtThe Netherlands
| | - Wim G. Goettsch
- Division of Pharmacoepidemiology and Clinical PharmacologyUtrecht Institute for Pharmaceutical Sciences (UIPS)Utrecht UniversityUtrechtThe Netherlands
- The National Healthcare Institute (ZIN)DiemenThe Netherlands
| |
Collapse
|
6
|
Breker S, Rentmeister J, Sick B, Braun M. Hosting capacity of low-voltage grids for distributed generation: Classification by means of machine learning techniques. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.05.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
7
|
Clausen C, Dahl B, Christiansen Frevert S, Forman JL, Nielsen MB, Lönn L. Inter- and intra-rater agreement in the assessment of the vascularity of spinal metastases using digital subtraction angiography tumor blush. Acta Radiol 2017; 58:734-739. [PMID: 27650032 DOI: 10.1177/0284185116668215] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background Preoperative embolization is based on the preoperative digital subtraction angiography (DSA) tumor blush, and as such is considered the "gold standard" for determining tumor vascularity. However, to our knowledge reliability studies evaluating vascularity ratings of DSA tumor blush in spinal metastases have not been published previously. Purpose To evaluate inter- and intra-rater agreement in the assessment of the vascularity of spinal metastases using DSA tumor blush. Material and Methods This reliability study included 46 patients with symptomatic metastatic spinal cord compression requiring surgery. DSA data stored in the hospital picture archiving and communication system (PACS) from the participants of a randomized controlled trial were used. Inter- and intra-rater agreement on vascularity assessment using DSA tumor blush according to a three-step ordinal scale was evaluated: no hypervascularity; moderate hypervascularity; and pronounced hypervascularity. The statistical analysis was based on the linear weighted kappa's for multiple raters that extend Cohen's κ. Three raters and κ = 0.2 in the null hypothesis implied that the power of the study was 0.96. Results Inter- and intra-rater agreements were moderate in rating the vascularity of spinal metastases and the agreements were significantly higher than the κ = 0.20 in the null hypothesis ( P = 0.0002 and P = 0.0001). The κ value for inter-rater agreement was 0.57 (95% confidence interval [CI], 0.41-0.72) and for intra-rater agreement 0.55 (95% CI, 0.38-0.71). Conclusion There is moderate inter-rater and intra-rater agreement in classifying the vascularity of spinal metastases on a three-step ordinal scale for DSA tumor blush.
Collapse
Affiliation(s)
- Caroline Clausen
- Department of Radiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
- Department of Radiology, Køge Sygehus, Køge, Denmark
| | - Benny Dahl
- Department of Orthopaedic Surgery, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| | | | - Julie Lyng Forman
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | | | - Lars Lönn
- Department of Radiology, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
- Department of Vascular Surgery, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
| |
Collapse
|
8
|
Spångfors M, Arvidsson L, Karlsson V, Samuelson K. The National Early Warning Score: Translation, testing and prediction in a Swedish setting. Intensive Crit Care Nurs 2016; 37:62-67. [PMID: 27386753 DOI: 10.1016/j.iccn.2016.05.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Revised: 01/05/2016] [Accepted: 05/31/2016] [Indexed: 11/19/2022]
Abstract
The National Early Warning Score - NEWS is a "track and trigger" scale designed to assess in-hospital patients' vital signs and detect clinical deterioration. In this study the NEWS was translated into Swedish and its association with the need of intensive care was investigated. A total of 868 patient charts, recorded by the medical emergency team at a university hospital, containing the parameters needed to calculate the NEWS were audited. The NEWS was translated into Swedish and tested for inter-rater reliability with a perfect agreement (weighted κ=1.0) among the raters. The median score for patients admitted to the ICU were higher than for those who were not (10 vs. 8, p<0.0001). AUROC for discriminating admittance to the ICU was 0.68 (95% CI: 0.622-0.739, p<0.0001). A regression analysis showed that lower oxygen saturation and a lower level of consciousness were significantly associated with ICU admission (OR 1.27 [1.06-1.52], p=0.01 and OR 1.77 [1.12-2.82], p=0.02) and may predict admission to the ICU better than the other parameters. The Swedish translated NEWS seems to have excellent inter-rater reliability and can be used without risk of linguistic misinterpretation. High scores for the parameters oxygen saturation and level of consciousness in the NEWS may predict admission to the ICU.
Collapse
Affiliation(s)
- Martin Spångfors
- Lund University, Faculty of Medicine, Department of Clinical Sciences Lund, Anaesthesiology and Intensive Care, Lund, Sweden; Kristianstad Hospital, Department of Anaesthesia and Intensive Care, Kristianstad, Sweden.
| | - Lisa Arvidsson
- Lund University, Faculty of Medicine, Department of Health Sciences, Lund, Sweden; Kristianstad Hospital, Department of Anaesthesia and Intensive Care, Kristianstad, Sweden
| | - Victoria Karlsson
- Lund University, Faculty of Medicine, Department of Health Sciences, Lund, Sweden; Kristianstad Hospital, Department of Anaesthesia and Intensive Care, Kristianstad, Sweden
| | - Karin Samuelson
- Lund University, Faculty of Medicine, Department of Clinical Sciences Lund, Anaesthesiology and Intensive Care, Lund, Sweden; Lund University, Faculty of Medicine, Department of Health Sciences, Lund, Sweden
| |
Collapse
|
9
|
Cuppens K, Knippels I, Saey T, Broeckx M, Van den Herrewegen I, Luca S, Creylman V, Labey L, Peeraer L. Prediction of clinical foot characteristics using quantitative features from different measurement set-ups. FOOTWEAR SCIENCE 2015. [DOI: 10.1080/19424280.2015.1036929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
10
|
|