1
|
Liu CC, Yu RX, Aitkin M. The flaw of averages: Bayes factors as posterior means of the likelihood ratio. Pharm Stat 2024; 23:466-479. [PMID: 38282048 DOI: 10.1002/pst.2355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 09/25/2023] [Accepted: 11/24/2023] [Indexed: 01/30/2024]
Abstract
As an alternative to the Frequentist p-value, the Bayes factor (or ratio of marginal likelihoods) has been regarded as one of the primary tools for Bayesian hypothesis testing. In recent years, several researchers have begun to re-analyze results from prominent medical journals, as well as from trials for FDA-approved drugs, to show that Bayes factors often give divergent conclusions from those of p-values. In this paper, we investigate the claim that Bayes factors are straightforward to interpret as directly quantifying the relative strength of evidence. In particular, we show that for nested hypotheses with consistent priors, the Bayes factor for the null over the alternative hypothesis is the posterior mean of the likelihood ratio. By re-analyzing 39 results previously published in the New England Journal of Medicine, we demonstrate how the posterior distribution of the likelihood ratio can be computed and visualized, providing useful information beyond the posterior mean alone.
Collapse
Affiliation(s)
- Charles C Liu
- Department of Biostatistics, Gilead Sciences, Foster City, CA, USA
| | - Ron Xiaolong Yu
- Department of Biostatistics, Gilead Sciences, Foster City, CA, USA
| | - Murray Aitkin
- School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
2
|
Pittelkow MM, de Vries YA, Monden R, Bastiaansen JA, van Ravenzwaaij D. Comparing the evidential strength for psychotropic drugs: a Bayesian meta-analysis. Psychol Med 2021; 51:2752-2761. [PMID: 34620261 PMCID: PMC8640368 DOI: 10.1017/s0033291721003950] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 09/06/2021] [Accepted: 09/09/2021] [Indexed: 11/17/2022]
Abstract
Approval and prescription of psychotropic drugs should be informed by the strength of evidence for efficacy. Using a Bayesian framework, we examined (1) whether psychotropic drugs are supported by substantial evidence (at the time of approval by the Food and Drug Administration), and (2) whether there are systematic differences across drug groups. Data from short-term, placebo-controlled phase II/III clinical trials for 15 antipsychotics, 16 antidepressants for depression, nine antidepressants for anxiety, and 20 drugs for attention deficit hyperactivity disorder (ADHD) were extracted from FDA reviews. Bayesian model-averaged meta-analysis was performed and strength of evidence was quantified (i.e. BFBMA). Strength of evidence and trialling varied between drugs. Median evidential strength was extreme for ADHD medication (BFBMA = 1820.4), moderate for antipsychotics (BFBMA = 365.4), and considerably lower and more frequently classified as weak or moderate for antidepressants for depression (BFBMA = 94.2) and anxiety (BFBMA = 49.8). Varying median effect sizes (ESschizophrenia = 0.45, ESdepression = 0.30, ESanxiety = 0.37, ESADHD = 0.72), sample sizes (Nschizophrenia = 324, Ndepression = 218, Nanxiety = 254, NADHD = 189.5), and numbers of trials (kschizophrenia = 3, kdepression = 5.5, kanxiety = 3, kADHD = 2) might account for differences. Although most drugs were supported by strong evidence at the time of approval, some only had moderate or ambiguous evidence. These results show the need for more systematic quantification and classification of statistical evidence for psychotropic drugs. Evidential strength should be communicated transparently and clearly towards clinical decision makers.
Collapse
Affiliation(s)
- Merle-Marie Pittelkow
- Department Psychometrics and Statistics, University of Groningen, Groningen, the Netherlands
| | - Ymkje Anna de Vries
- Department of Developmental Psychology, University of Groningen, Groningen, the Netherlands
- Interdisciplinary Center Psychopathology and Emotion Regulation, Department of Psychiatry, University Medical Center Groningen, Groningen, the Netherlands
| | - Rei Monden
- Interdisciplinary Center Psychopathology and Emotion Regulation, Department of Psychiatry, University Medical Center Groningen, Groningen, the Netherlands
- Department of Biomedical Statistics, Graduate School of Medicine, Osaka University, Suita, Osaka, Japan
| | - Jojanneke A. Bastiaansen
- Interdisciplinary Center Psychopathology and Emotion Regulation, Department of Psychiatry, University Medical Center Groningen, Groningen, the Netherlands
- Department of Education and Research, Friesland Mental Health Care Services, Leeuwarden, the Netherlands
| | - Don van Ravenzwaaij
- Department Psychometrics and Statistics, University of Groningen, Groningen, the Netherlands
| |
Collapse
|
3
|
van Schie K, van Veen SC, Hagenaars MA. The effects of dual-tasks on intrusive memories following analogue trauma. Behav Res Ther 2019; 120:103448. [PMID: 31398536 DOI: 10.1016/j.brat.2019.103448] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 07/02/2019] [Accepted: 07/26/2019] [Indexed: 02/05/2023]
Abstract
Patients with post-traumatic stress disorder frequently and involuntarily experience intrusions, which are strongly linked to the trauma hotspot. Voluntary memory characteristics (i.e., vividness and unpleasantness) of this hotspot can be reduced by performing a dual-task, such as making horizontal eye movements, which is frequently used in Eye Movement Desensitization and Reprocessing. We tested whether such dual-task interventions would also reduce involuntary memory (i.e., intrusions). Moreover, we examined if changes in hotspot vividness and unpleasantness predicted intrusion frequency. Additionally, we examined whether the effects were dependent on dual-task modality. We tested this in three experiments. Participants watched a trauma film and performed one of the interventions 10-min post-film (1) Recall + Eye movements, (2) Recall + Counting, or (3) No-Task Control. Before and after the intervention, participants rated the hotspot vividness and unpleasantness. They recorded intrusive memories about the film in a diary for a week. Unexpectedly, we found that hotspot vividness and unpleasantness ratings were not affected by the intervention. However, the prolonged (experiment 2), but not standard (experiment 1), dual-task interventions resulted in a lower number of intrusions, regardless of modality. However, this effect was not replicated in experiment 3. We discuss potential explanations and present suggestions for future research.
Collapse
Affiliation(s)
- Kevin van Schie
- Department of Psychology, Education & Child Studies, Erasmus School of Social and Behavioural Sciences, Erasmus University Rotterdam, Rotterdam, the Netherlands; Department of Clinical Psychology, Faculty of Social and Behavioural Sciences, Utrecht University, Utrecht, the Netherlands.
| | - Suzanne C van Veen
- Department of Clinical Psychology, Faculty of Social and Behavioural Sciences, Utrecht University, Utrecht, the Netherlands
| | - Muriel A Hagenaars
- Department of Clinical Psychology, Faculty of Social and Behavioural Sciences, Utrecht University, Utrecht, the Netherlands
| |
Collapse
|
4
|
The comparative evidence basis for the efficacy of second-generation antidepressants in the treatment of depression in the US: A Bayesian meta-analysis of Food and Drug Administration reviews. J Affect Disord 2018; 235:393-398. [PMID: 29677603 DOI: 10.1016/j.jad.2018.04.040] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 03/06/2018] [Accepted: 04/04/2018] [Indexed: 11/22/2022]
Abstract
BACKGROUND Studies have shown similar efficacy of different antidepressants in the treatment of depression. METHOD Data of phase-2 and -3 clinical-trials for 16 antidepressants (levomilnacipran, desvenlafaxine, duloxetine, venlafaxine, paroxetine, escitalopram, vortioxetine, mirtazapine, venlafaxine XR, sertraline, fluoxetine, citalopram, paroxetine CR, nefazodone, bupropion, vilazodone), approved by the FDA for the treatment of depression between 1987 and 2016, were extracted from the FDA reviews that were used to evaluate efficacy prior to marketing approval, which are less liable to reporting biases. Meta-analytic Bayes factors, which quantify the strength of evidence for efficacy, were calculated. In addition, posterior pooled effect-sizes were calculated and compared with classical estimations. RESULTS The resulted Bayes factors showed that the evidence load for efficacy varied strongly across antidepressants. However, all tested drugs except for bupropion and vilazodone showed strong evidence for their efficacy. The posterior effect-size distributions showed variation across antidepressants, with the highest pooled estimated effect size for venlafaxine followed by paroxetine, and the lowest for bupropion and vilazodone. LIMITATIONS Not all published trials were included in the study. CONCLUSIONS The results illustrate the importance of considering both the effect size and the evidence-load when judging the efficacy of a treatment. In doing so, the currently employed Bayesian approach provided clear insights on top of those gained with traditional approaches.
Collapse
|
5
|
Strawn JR, Mills JA, Cornwall GJ, Mossman SA, Varney ST, Keeshin BR, Croarkin PE. Buspirone in Children and Adolescents with Anxiety: A Review and Bayesian Analysis of Abandoned Randomized Controlled Trials. J Child Adolesc Psychopharmacol 2018; 28:2-9. [PMID: 28846022 PMCID: PMC5771537 DOI: 10.1089/cap.2017.0060] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVES An increasing number of abandoned clinical trials have forestalled efforts to advance the evidence base for the treatment of mood and anxiety disorders in children and adolescents. With this in mind, we sought to present and validate a Bayesian approach for the reanalysis of summary data in abandoned clinical trials and to review and re-evaluate available pharmacokinetic, tolerability, and efficacy data from two large, randomized controlled trials of buspirone in pediatric patients with generalized anxiety disorder (GAD). METHODS Prospective, randomized, parallel-group controlled trials of buspirone in pediatric patients with GAD as well as associated pharmacokinetic studies were identified and data were extracted. In addition to descriptive statistics, marginal posterior densities for each variable of interest were determined and a Monte Carlo pseudosample was generated with random draws obtained from the Student's t-distribution to assess, with inferential statistics, differences in variables of interest. RESULTS Buspirone was evaluated in one flexibly dosed (N = 227) and one fixed-dose (N = 341) trial in children and adolescents aged 6-17 years with a primary diagnosis of GAD. With regard to improvement in the sum of the Columbia Schedule for Affective Disorders and Schizophrenia GAD items, buspirone did not separate from placebo in the fixed-dose trial at low (95% CI: -0.78 to 2.39, p = 0.32) or high dose (95% CI: -0.87 to 1.87, p = 0.47) nor did it separate from placebo in the flexibly dosed study (95% CI: -0.3 to 1.9, p = 0.15). Drop out as a result of a treatment-emergent adverse event was significantly greater in buspirone-treated patients compared to placebo (p = 0.011). Side effects were consistent with the known profile of buspirone with lightheadedness occurring more frequently in buspirone-treated patients (p < 0.001). CONCLUSIONS Buspirone is well tolerated in pediatric patients with GAD, although two randomized controlled trials were underpowered to detect small effect sizes (Cohen's d < 0.15). Finally, Bayesian approaches may facilitate re-examination of data from abandoned clinical trials.
Collapse
Affiliation(s)
- Jeffrey R. Strawn
- Department of Psychiatry and Behavioral Neuroscience, University of Cincinnati, College of Medicine, Cincinnati, Ohio
- Division of Child and Adolescent Psychiatry, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Jeffrey A. Mills
- Department of Economics, Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, Ohio
| | - Gary J. Cornwall
- Department of Economics, Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, Ohio
| | - Sarah A. Mossman
- Department of Psychiatry and Behavioral Neuroscience, University of Cincinnati, College of Medicine, Cincinnati, Ohio
| | - Sara T. Varney
- Department of Psychiatry and Behavioral Neuroscience, University of Cincinnati, College of Medicine, Cincinnati, Ohio
| | | | - Paul E. Croarkin
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
6
|
van Ravenzwaaij D, Ioannidis JPA. A simulation study of the strength of evidence in the recommendation of medications based on two trials with statistically significant results. PLoS One 2017; 12:e0173184. [PMID: 28273140 PMCID: PMC5342224 DOI: 10.1371/journal.pone.0173184] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 02/16/2017] [Indexed: 11/21/2022] Open
Abstract
A typical rule that has been used for the endorsement of new medications by the Food and Drug Administration is to have two trials, each convincing on its own, demonstrating effectiveness. "Convincing" may be subjectively interpreted, but the use of p-values and the focus on statistical significance (in particular with p < .05 being coined significant) is pervasive in clinical research. Therefore, in this paper, we calculate with simulations what it means to have exactly two trials, each with p < .05, in terms of the actual strength of evidence quantified by Bayes factors. Our results show that different cases where two trials have a p-value below .05 have wildly differing Bayes factors. Bayes factors of at least 20 in favor of the alternative hypothesis are not necessarily achieved and they fail to be reached in a large proportion of cases, in particular when the true effect size is small (0.2 standard deviations) or zero. In a non-trivial number of cases, evidence actually points to the null hypothesis, in particular when the true effect size is zero, when the number of trials is large, and when the number of participants in both groups is low. We recommend use of Bayes factors as a routine tool to assess endorsement of new medications, because Bayes factors consistently quantify strength of evidence. Use of p-values may lead to paradoxical and spurious decision-making regarding the use of new medications.
Collapse
Affiliation(s)
- Don van Ravenzwaaij
- Department of Psychology, University of Groningen, Groningen, the Netherlands
| | - John P. A. Ioannidis
- Department of Medicine, Stanford University, Stanford, California, United States of America
- Department of Health Research and Policy, Stanford University, Stanford, California, United States of America
- Department of Statistics and Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, United States of America
| |
Collapse
|
7
|
Monden R, de Vos S, Morey R, Wagenmakers E, de Jonge P, Roest AM. Toward evidence-based medical statistics: a Bayesian analysis of double-blind placebo-controlled antidepressant trials in the treatment of anxiety disorders. Int J Methods Psychiatr Res 2016; 25:299-308. [PMID: 27219132 PMCID: PMC6860243 DOI: 10.1002/mpr.1507] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Revised: 02/01/2016] [Accepted: 02/15/2016] [Indexed: 11/09/2022] Open
Abstract
The Food and Drug Administration (FDA) uses a p < 0.05 null-hypothesis significance testing framework to evaluate "substantial evidence" for drug efficacy. This framework only allows dichotomous conclusions and does not quantify the strength of evidence supporting efficacy. The efficacy of FDA-approved antidepressants for the treatment of anxiety disorders was re-evaluated in a Bayesian framework that quantifies the strength of the evidence. Data from 58 double-blind placebo-controlled trials were retrieved from the FDA for the second-generation antidepressants for the treatment of anxiety disorders. Bayes factors (BFs) were calculated for all treatment arms compared to placebo and were compared with the corresponding p-values and the FDA conclusion categories. BFs ranged from 0.07 to 131,400, indicating a range of no support of evidence to strong evidence for the efficacy. Results also indicate a varying strength of evidence between the trials with p < 0.05. In sum, there were large differences in BFs across trials. Among trials providing "substantial evidence" according to the FDA, only 27 out of 59 dose groups obtained strong support for efficacy according to the typically used cutoff of BF ≥ 20. The Bayesian framework can provide valuable information on the strength of the evidence for drug efficacy. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Rei Monden
- Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), Department of PsychiatryUniversity of Groningen, University Medical Center GroningenGroningenThe Netherlands
| | - Stijn de Vos
- Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), Department of PsychiatryUniversity of Groningen, University Medical Center GroningenGroningenThe Netherlands
| | - Richard Morey
- Faculty of Behavioral and Social SciencesUniversity of GroningenGroningenThe Netherlands
| | - Eric‐Jan Wagenmakers
- Department of Experimental PsychologyUniversity of GroningenGroningenThe Netherlands
| | - Peter de Jonge
- Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), Department of PsychiatryUniversity of Groningen, University Medical Center GroningenGroningenThe Netherlands
| | - Annelieke M. Roest
- Interdisciplinary Center Psychopathology and Emotion regulation (ICPE), Department of PsychiatryUniversity of Groningen, University Medical Center GroningenGroningenThe Netherlands
| |
Collapse
|