1
|
Esterhuizen TM, Mbuagbaw L, Thabane L. Disparity between statistical and clinical significance in published randomised controlled trials indexed in PubMed: a protocol for a cross-sectional methodological survey. BMJ Open 2024; 14:e084375. [PMID: 39059809 DOI: 10.1136/bmjopen-2024-084375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/28/2024] Open
Abstract
INTRODUCTION The commonly used frequentist paradigm of null hypothesis statistics testing with its reliance on the p-value and the corresponding notion of 'statistical significance' has been under ongoing criticism. Misinterpretation and misuse of the p-value have contributed to publication bias, unreliable studies, frequent false positives, fraud and mistrust in results of scientific studies. While p-values themselves are still useful, part of the problem may be the confusion between statistical and clinical significance. In randomised controlled trials of health interventions, this confusion could lead to erroneous conclusions about treatment efficacy, research waste and compromised patient outcomes. The extent to which clinical and statistical significance of published randomised clinical trials do not match is not known. This is a protocol for a methodological study to understand the extent of the problem of disparities between statistical and clinical significance in published clinical trials, and to identify and assess the factors associated with discrepant results in these studies. METHODS AND ANALYSIS A methodological survey of published randomised controlled trials is planned. Trials published between 2018 and 2022 and their protocols will be searched and screened for inclusion, with a planned sample size of 500 studies. The reported minimum clinically important difference, the study effect size and confidence intervals will be used to assess clinical importance of trial results. Comparison of statistical significance and clinical importance of the trial results will be used to determine disparity. Data will be analysed to estimate the outcomes, and factors associated with disparate study results will be assessed using logistic regression analysis. ETHICS AND DISSEMINATION Ethical approval for the study has been granted by Stellenbosch University's Health Research Ethics Committee. This is part of a larger study towards a PhD in Biostatistics and will be disseminated as a thesis, conference abstract and peer-reviewed manuscript.
Collapse
Affiliation(s)
| | - Lawrence Mbuagbaw
- Division of Epidemiology and Biostatistics, Stellenbosch University, Stellenbosch, South Africa
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada
- Department of Anaesthesia, McMaster University, Hamilton, Ontario, Canada
- Department of Pediatrics, McMaster University, Hamilton, Ontario, Canada
- Biostatistics Unit, Father Sean O'Sulivan Research Centre, St Joseph's Healthcare, Hamilton, Ontario, Canada
- Centre for Development of Best Practices in Health, Yaounde Central Hospital, Yaounde, Cameroon
| | - Lehana Thabane
- Division of Epidemiology and Biostatistics, Stellenbosch University, Stellenbosch, South Africa
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada
- Biostatistics Unit, Father Sean O'Sulivan Research Centre, St Joseph's Healthcare, Hamilton, Ontario, Canada
- Faculty of Health Sciences, University of Johannesburg, Johannesburg, South Africa
| |
Collapse
|
2
|
Shi X, Du J. Constructing a finer-grained representation of clinical trial results from ClinicalTrials.gov. Sci Data 2024; 11:41. [PMID: 38184674 PMCID: PMC10771511 DOI: 10.1038/s41597-023-02869-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 12/17/2023] [Indexed: 01/08/2024] Open
Abstract
Randomized controlled trials are essential for evaluating clinical interventions; however, selective reporting and publication bias in medical journals have undermined the integrity of the clinical evidence system. ClinicalTrials.gov serves as a valuable and complementary repository, yet synthesizing information from it remains challenging. This study introduces a curated dataset that extends beyond the traditional PICO framework. It links efficacy with safety results at the experimental arm group level within each trial, and connects them across all trials through a knowledge graph. This novel representation effectively bridges the gap between generally described searchable information and specifically detailed yet underutilized reported results, and promotes a dual-faceted understanding of interventional effects. Adhering to the "calculate once, use many times" principle, the structured dataset will enhance the reuse and interpretation of ClinicalTrials.gov results data. It aims to facilitate more systematic evidence synthesis and health technology assessment, by incorporating both positive and negative results, distinguishing biomarkers, patient-reported outcomes, and clinical endpoints, while also balancing both efficacy and safety outcomes for a given medical intervention.
Collapse
Affiliation(s)
- Xuanyu Shi
- Institute of Medical Technology, Peking University, Beijing, 100191, China
- National Institute of Health Data Science, Peking University, Beijing, 100191, China
| | - Jian Du
- Institute of Medical Technology, Peking University, Beijing, 100191, China.
- National Institute of Health Data Science, Peking University, Beijing, 100191, China.
| |
Collapse
|
3
|
Dungate B, Tucker DR, Goodwin E, Yong PJ. Assessing the Utility of artificial intelligence in endometriosis: Promises and pitfalls. WOMEN'S HEALTH (LONDON, ENGLAND) 2024; 20:17455057241248121. [PMID: 38686828 PMCID: PMC11062212 DOI: 10.1177/17455057241248121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/29/2024] [Accepted: 03/29/2024] [Indexed: 05/02/2024]
Abstract
Endometriosis, a chronic condition characterized by the growth of endometrial-like tissue outside of the uterus, poses substantial challenges in terms of diagnosis and treatment. Artificial intelligence (AI) has emerged as a promising tool in the field of medicine, offering opportunities to address the complexities of endometriosis. This review explores the current landscape of endometriosis diagnosis and treatment, highlighting the potential of AI to alleviate some of the associated burdens and underscoring common pitfalls and challenges when employing AI algorithms in this context. Women's health research in endometriosis has suffered from underfunding, leading to limitations in diagnosis, classification, and treatment approaches. The heterogeneity of symptoms in patients with endometriosis has further complicated efforts to address this condition. New, powerful methods of analysis have the potential to uncover previously unidentified patterns in data relating to endometriosis. AI, a collection of algorithms replicating human decision-making in data analysis, has been increasingly adopted in medical research, including endometriosis studies. While AI offers the ability to identify novel patterns in data and analyze large datasets, its effectiveness hinges on data quality and quantity and the expertise of those implementing the algorithms. Current applications of AI in endometriosis range from diagnostic tools for ultrasound imaging to predicting treatment success. These applications show promise in reducing diagnostic delays, healthcare costs, and providing patients with more treatment options, improving their quality of life. AI holds significant potential in advancing the diagnosis and treatment of endometriosis, but it must be applied carefully and transparently to avoid pitfalls and ensure reproducibility. This review calls for increased scrutiny and accountability in AI research. Addressing these challenges can lead to more effective AI-driven solutions for endometriosis and other complex medical conditions.
Collapse
Affiliation(s)
- Brie Dungate
- Faculty of Medicine, The University of British Columbia, Vancouver, BC, Canada
- Department of Obstetrics and Gynecology, The University of British Columbia, Vancouver, BC, Canada
- Women’s Health Research Institute, Vancouver, BC, Canada
| | - Dwayne R Tucker
- Department of Obstetrics and Gynecology, The University of British Columbia, Vancouver, BC, Canada
- Women’s Health Research Institute, Vancouver, BC, Canada
- Centre for Pelvic Pain & Endometriosis, BC Women’s Hospital & Health Centre, Vancouver, BC, Canada
| | - Emma Goodwin
- Department of Obstetrics and Gynecology, The University of British Columbia, Vancouver, BC, Canada
- Women’s Health Research Institute, Vancouver, BC, Canada
| | - Paul J Yong
- Department of Obstetrics and Gynecology, The University of British Columbia, Vancouver, BC, Canada
- Women’s Health Research Institute, Vancouver, BC, Canada
- Centre for Pelvic Pain & Endometriosis, BC Women’s Hospital & Health Centre, Vancouver, BC, Canada
| |
Collapse
|
4
|
White N, Parsons R, Collins G, Barnett A. Evidence of questionable research practices in clinical prediction models. BMC Med 2023; 21:339. [PMID: 37667344 PMCID: PMC10478406 DOI: 10.1186/s12916-023-03048-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 08/24/2023] [Indexed: 09/06/2023] Open
Abstract
BACKGROUND Clinical prediction models are widely used in health and medical research. The area under the receiver operating characteristic curve (AUC) is a frequently used estimate to describe the discriminatory ability of a clinical prediction model. The AUC is often interpreted relative to thresholds, with "good" or "excellent" models defined at 0.7, 0.8 or 0.9. These thresholds may create targets that result in "hacking", where researchers are motivated to re-analyse their data until they achieve a "good" result. METHODS We extracted AUC values from PubMed abstracts to look for evidence of hacking. We used histograms of the AUC values in bins of size 0.01 and compared the observed distribution to a smooth distribution from a spline. RESULTS The distribution of 306,888 AUC values showed clear excesses above the thresholds of 0.7, 0.8 and 0.9 and shortfalls below the thresholds. CONCLUSIONS The AUCs for some models are over-inflated, which risks exposing patients to sub-optimal clinical decision-making. Greater modelling transparency is needed, including published protocols, and data and code sharing.
Collapse
Affiliation(s)
- Nicole White
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, Kelvin Grove, Queensland, Australia
| | - Rex Parsons
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, Kelvin Grove, Queensland, Australia
| | - Gary Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Adrian Barnett
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Faculty of Health, Queensland University of Technology, Kelvin Grove, Queensland, Australia.
| |
Collapse
|
5
|
Polevikov S. Advancing AI in healthcare: A comprehensive review of best practices. Clin Chim Acta 2023; 548:117519. [PMID: 37595864 DOI: 10.1016/j.cca.2023.117519] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 08/14/2023] [Accepted: 08/15/2023] [Indexed: 08/20/2023]
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are powerful tools shaping the healthcare sector. This review considers twelve key aspects of AI in clinical practice: 1) Ethical AI; 2) Explainable AI; 3) Health Equity and Bias in AI; 4) Sponsorship Bias; 5) Data Privacy; 6) Genomics and Privacy; 7) Insufficient Sample Size and Self-Serving Bias; 8) Bridging the Gap Between Training Datasets and Real-World Scenarios; 9) Open Source and Collaborative Development; 10) Dataset Bias and Synthetic Data; 11) Measurement Bias; 12) Reproducibility in AI Research. These categories represent both the challenges and opportunities of AI implementation in healthcare. While AI holds significant potential for improving patient care, it also presents risks and challenges, such as ensuring privacy, combating bias, and maintaining transparency and ethics. The review underscores the necessity of developing comprehensive best practices for healthcare organizations and fostering a diverse dialogue involving data scientists, clinicians, patient advocates, ethicists, economists, and policymakers. We are at the precipice of significant transformation in healthcare powered by AI. By continuing to reassess and refine our approach, we can ensure that AI is implemented responsibly and ethically, maximizing its benefit to patient care and public health.
Collapse
|
6
|
Lammers D, McClellan J. Modern Statistical Methods for the Surgeon Scientist: The Clash of Frequentist versus Bayesian Paradigms. Surg Clin North Am 2023; 103:259-269. [PMID: 36948717 DOI: 10.1016/j.suc.2022.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2023]
Abstract
The practice of evidence-based medicine is the result of a multitude of research and trials aimed toward improving health-care outcomes. An understanding of the associated data remains paramount toward optimizing patient outcomes. Medical statistics commonly revolve around frequentist concepts that are convoluted and nonintuitive for nonstatisticians. Within this article, we will discuss frequentist statistics, their limitations, as well as introduce Bayesian statistics as an alternative approach for data interpretation. By doing so, we intend to highlight the importance of correct statistical interpretations through clinically relevant examples while providing a deeper understanding of the underlying philosophies of frequentist and Bayesian statistics.
Collapse
Affiliation(s)
- Daniel Lammers
- Department of General Surgery, Madigan Army Medical Center, 9040 Jackson Avenue, Tacoma, WA 98431, USA.
| | - John McClellan
- Department of General Surgery, Madigan Army Medical Center, 9040 Jackson Avenue, Tacoma, WA 98431, USA
| |
Collapse
|
7
|
Lammers D, Richman J, Holcomb JB, Jansen JO. Use of Bayesian Statistics to Reanalyze Data From the Pragmatic Randomized Optimal Platelet and Plasma Ratios Trial. JAMA Netw Open 2023; 6:e230421. [PMID: 36811858 PMCID: PMC9947730 DOI: 10.1001/jamanetworkopen.2023.0421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/24/2023] Open
Abstract
IMPORTANCE Frequentist statistical approaches are the most common strategies for clinical trial design; however, bayesian trial design may provide a more optimal study technique for trauma-related studies. OBJECTIVE To describe the outcomes of bayesian statistical approaches using data from the Pragmatic Randomized Optimal Platelet and Plasma Ratios (PROPPR) Trial. DESIGN, SETTING, AND PARTICIPANTS This quality improvement study performed a post hoc bayesian analysis of the PROPPR Trial using multiple hierarchical models to assess the association of resuscitation strategy with mortality. The PROPPR Trial took place at 12 US level I trauma centers from August 2012 to December 2013. A total of 680 severely injured trauma patients who were anticipated to require large volume transfusions were included in the study. Data analysis for this quality improvement study was conducted from December 2021 and June 2022. INTERVENTIONS In the PROPPR Trial, patients were randomized to receive a balanced transfusion (equal portions of plasma, platelets, and red blood cells [1:1:1]) vs a red blood cell-heavy strategy (1:1:2) during their initial resuscitation. MAIN OUTCOMES AND MEASURES Primary outcomes from the PROPPR trial included 24-hour and 30-day all-cause mortality using frequentist statistical methods. Bayesian methods were used to define the posterior probabilities associated with the resuscitation strategies at each of the original primary end points. RESULTS Overall, 680 patients (546 [80.3%] male; median [IQR] age, 34 [24-51] years, 330 [48.5%] with penetrating injury; median [IQR] Injury Severity Score, 26 [17-41]; 591 [87.0%] with severe hemorrhage) were included in the original PROPPR Trial. Between the groups, no significant differences in mortality were originally detected at 24 hours (12.7% vs 17.0%; adjusted risk ratio [RR], 0.75 [95% CI, 0.52-1.08]; P = .12) or 30 days (22.4% vs 26.1%; adjusted RR, 0.86 [95% CI, 0.65-1.12]; P = .26). Using bayesian approaches, a 1:1:1 resuscitation was found to have a 93% (Bayes factor, 13.7; RR, 0.75 [95% credible interval, 0.45-1.11]) and 87% (Bayes factor, 6.56; RR, 0.82 [95% credible interval, 0.57-1.16]) probability of being superior to a 1:1:2 resuscitation with regards to 24-hour and 30-day mortality, respectively. CONCLUSIONS AND RELEVANCE In this quality improvement study, a post hoc bayesian analysis of the PROPPR Trial found evidence in support of mortality reduction with a balanced resuscitation strategy for patients in hemorrhagic shock. Bayesian statistical methods offer probability-based results capable of direct comparison between various interventions and should be considered for future studies assessing trauma-related outcomes.
Collapse
Affiliation(s)
- Daniel Lammers
- Department of Surgery, Madigan Army Medical Center and Center for Injury Science, University of Alabama at Birmingham
| | - Joshua Richman
- Center for Injury Science, University of Alabama at Birmingham
| | - John B. Holcomb
- Center for Injury Science, University of Alabama at Birmingham
| | - Jan O. Jansen
- Center for Injury Science, University of Alabama at Birmingham
| |
Collapse
|
8
|
Cautionary Observations Concerning the Introduction of Psychophysiological Biomarkers into Neuropsychiatric Practice. PSYCHIATRY INTERNATIONAL 2022. [DOI: 10.3390/psychiatryint3020015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The combination of statistical learning technologies with large databases of psychophysiological data has appropriately generated enthusiastic interest in future clinical applicability. It is argued here that this enthusiasm should be tempered with the understanding that significant obstacles must be overcome before the systematic introduction of psychophysiological measures into neuropsychiatric practice becomes possible. The objective of this study is to identify challenges to this effort. The nonspecificity of psychophysiological measures complicates their use in diagnosis. Low test-retest reliability complicates use in longitudinal assessment, and quantitative psychophysiological measures can normalize in response to placebo intervention. Ten cautionary observations are introduced and, in some instances, possible directions for remediation are suggested.
Collapse
|
9
|
Brown RCH, de Barra M, Earp BD. Broad Medical Uncertainty and the ethical obligation for openness. SYNTHESE 2022; 200:121. [PMID: 35431349 PMCID: PMC8994926 DOI: 10.1007/s11229-022-03666-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 03/20/2022] [Indexed: 06/14/2023]
Abstract
This paper argues that there exists a collective epistemic state of 'Broad Medical Uncertainty' (BMU) regarding the effectiveness of many medical interventions. We outline the features of BMU, and describe some of the main contributing factors. These include flaws in medical research methodologies, bias in publication practices, financial and other conflicts of interest, and features of how evidence is translated into practice. These result in a significant degree of uncertainty regarding the effectiveness of many medical treatments and unduly optimistic beliefs about the benefit/harm profiles of such treatments. We argue for an ethical presumption in favour of openness regarding BMU as part of a 'Corrective Response'. We then consider some objections to this position (the 'Anti-Corrective Response'), including concerns that public honesty about flaws in medical research could undermine trust in healthcare institutions. We suggest that, as it stands, the Anti-Corrective Response is unconvincing.
Collapse
Affiliation(s)
| | - Mícheál de Barra
- Centre for Culture and Evolution, Brunel University London, London, UK
| | - Brian D. Earp
- Oxford Uehiro Centre for Practical Ethics, University of Oxford, Oxford, UK
| |
Collapse
|
10
|
Otte WM, Vinkers CH, Habets PC, van IJzendoorn DGP, Tijdink JK. Analysis of 567,758 randomized controlled trials published over 30 years reveals trends in phrases used to discuss results that do not reach statistical significance. PLoS Biol 2022; 20:e3001562. [PMID: 35180228 PMCID: PMC8893613 DOI: 10.1371/journal.pbio.3001562] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 03/03/2022] [Accepted: 01/31/2022] [Indexed: 11/19/2022] Open
Abstract
The power of language to modify the reader's perception of interpreting biomedical results cannot be underestimated. Misreporting and misinterpretation are pressing problems in randomized controlled trials (RCT) output. This may be partially related to the statistical significance paradigm used in clinical trials centered around a P value below 0.05 cutoff. Strict use of this P value may lead to strategies of clinical researchers to describe their clinical results with P values approaching but not reaching the threshold to be "almost significant." The question is how phrases expressing nonsignificant results have been reported in RCTs over the past 30 years. To this end, we conducted a quantitative analysis of English full texts containing 567,758 RCTs recorded in PubMed between 1990 and 2020 (81.5% of all published RCTs in PubMed). We determined the exact presence of 505 predefined phrases denoting results that approach but do not cross the line of formal statistical significance (P < 0.05). We modeled temporal trends in phrase data with Bayesian linear regression. Evidence for temporal change was obtained through Bayes factor (BF) analysis. In a randomly sampled subset, the associated P values were manually extracted. We identified 61,741 phrases in 49,134 RCTs indicating almost significant results (8.65%; 95% confidence interval (CI): 8.58% to 8.73%). The overall prevalence of these phrases remained stable over time, with the most prevalent phrases being "marginally significant" (in 7,735 RCTs), "all but significant" (7,015), "a nonsignificant trend" (3,442), "failed to reach statistical significance" (2,578), and "a strong trend" (1,700). The strongest evidence for an increased temporal prevalence was found for "a numerical trend," "a positive trend," "an increasing trend," and "nominally significant." In contrast, the phrases "all but significant," "approaches statistical significance," "did not quite reach statistical significance," "difference was apparent," "failed to reach statistical significance," and "not quite significant" decreased over time. In a random sampled subset of 29,000 phrases, the manually identified and corresponding 11,926 P values, 68,1% ranged between 0.05 and 0.15 (CI: 67. to 69.0; median 0.06). Our results show that RCT reports regularly contain specific phrases describing marginally nonsignificant results to report P values close to but above the dominant 0.05 cutoff. The fact that the prevalence of the phrases remained stable over time indicates that this practice of broadly interpreting P values close to a predefined threshold remains prevalent. To enhance responsible and transparent interpretation of RCT results, researchers, clinicians, reviewers, and editors may reduce the focus on formal statistical significance thresholds and stimulate reporting of P values with corresponding effect sizes and CIs and focus on the clinical relevance of the statistical difference found in RCTs.
Collapse
Affiliation(s)
- Willem M. Otte
- Biomedical MR Imaging and Spectroscopy, Center for Image Sciences, University Medical Center Utrecht, Utrecht, the Netherlands
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Christiaan H. Vinkers
- Department of Psychiatry, Department of Anatomy and Neurosciences, Amsterdam UMC, Amsterdam, the Netherlands
| | - Philippe C. Habets
- Department of Psychiatry, Department of Anatomy and Neurosciences, Amsterdam UMC, Amsterdam, the Netherlands
| | - David G. P. van IJzendoorn
- Department of Pathology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Joeri K. Tijdink
- Department of Ethics, Law and Humanities, Amsterdam UMC, Amsterdam, the Netherlands
- Department of Philosophy, Vrije Universiteit, Amsterdam, the Netherlands
| |
Collapse
|
11
|
Making ERP research more transparent: Guidelines for preregistration. Int J Psychophysiol 2021; 164:52-63. [PMID: 33676957 DOI: 10.1016/j.ijpsycho.2021.02.016] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 02/17/2021] [Accepted: 02/18/2021] [Indexed: 12/16/2022]
Abstract
A combination of confirmation bias, hindsight bias, and pressure to publish may prompt the (unconscious) exploration of various methodological options and reporting only the ones that lead to a (statistically) significant outcome. This undisclosed analytic flexibility is particularly relevant in EEG research, where a myriad of preprocessing and analysis pipelines can be used to extract information from complex multidimensional data. One solution to limit confirmation and hindsight bias by disclosing analytic choices is preregistration: researchers write a time-stamped, publicly accessible research plan with hypotheses, data collection plan, and the intended preprocessing and statistical analyses before the start of a research project. In this manuscript, we present an overview of the problems associated with undisclosed analytic flexibility, discuss why and how EEG researchers would benefit from adopting preregistration, provide guidelines and examples on how to preregister data preprocessing and analysis steps in typical ERP studies, and conclude by discussing possibilities and limitations of this open science practice.
Collapse
|
12
|
Adda J, Decker C, Ottaviani M. P-hacking in clinical trials and how incentives shape the distribution of results across phases. Proc Natl Acad Sci U S A 2020; 117:13386-13392. [PMID: 32487730 PMCID: PMC7306753 DOI: 10.1073/pnas.1919906117] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Clinical research should conform to high standards of ethical and scientific integrity, given that human lives are at stake. However, economic incentives can generate conflicts of interest for investigators, who may be inclined to withhold unfavorable results or even tamper with data in order to achieve desired outcomes. To shed light on the integrity of clinical trial results, this paper systematically analyzes the distribution of P values of primary outcomes for phase II and phase III drug trials reported to the ClinicalTrials.gov registry. First, we detect no bunching of results just above the classical 5% threshold for statistical significance. Second, a density-discontinuity test reveals an upward jump at the 5% threshold for phase III results by small industry sponsors. Third, we document a larger fraction of significant results in phase III compared to phase II. Linking trials across phases, we find that early favorable results increase the likelihood of continuing into the next phase. Once we take into account this selective continuation, we can explain almost completely the excess of significant results in phase III for trials conducted by large industry sponsors. For small industry sponsors, instead, part of the excess remains unexplained.
Collapse
Affiliation(s)
- Jérôme Adda
- Department of Economics, Bocconi University, 20136 Milan, Italy
- Bocconi Institute for Data Science and Analytics, Bocconi University, 20136 Milan, Italy
- Innocenzo Gasparini Institute for Economic Research, Bocconi University, 20136 Milan, Italy
| | - Christian Decker
- Department of Economics, University of Zurich, 8001 Zurich, Switzerland
- UBS Center for Economics in Society, University of Zurich, 8001 Zurich, Switzerland
| | - Marco Ottaviani
- Department of Economics, Bocconi University, 20136 Milan, Italy;
- Bocconi Institute for Data Science and Analytics, Bocconi University, 20136 Milan, Italy
- Innocenzo Gasparini Institute for Economic Research, Bocconi University, 20136 Milan, Italy
| |
Collapse
|