1
|
Rovetta A, Mansournia MA, Vitale A. For a proper use of frequentist inferential statistics in public health. GLOBAL EPIDEMIOLOGY 2024; 8:100151. [PMID: 39021384 PMCID: PMC11252774 DOI: 10.1016/j.gloepi.2024.100151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/13/2024] [Accepted: 06/13/2024] [Indexed: 07/20/2024] Open
Abstract
As widely noted in the literature and by international bodies such as the American Statistical Association, severe misinterpretations of P-values, confidence intervals, and statistical significance are sadly common in public health. This scenario poses serious risks concerning terminal decisions such as the approval or rejection of therapies. Cognitive distortions about statistics likely stem from poor teaching in schools and universities, overly simplified interpretations, and - as we suggest - the reckless use of calculation software with predefined standardized procedures. In light of this, we present a framework to recalibrate the role of frequentist-inferential statistics within clinical and epidemiological research. In particular, we stress that statistics is only a set of rules and numbers that make sense only when properly placed within a well-defined scientific context beforehand. Practical examples are discussed for educational purposes. Alongside this, we propose some tools to better evaluate statistical outcomes, such as multiple compatibility or surprisal intervals or tuples of various point hypotheses. Lastly, we emphasize that every conclusion must be informed by different kinds of scientific evidence (e.g., biochemical, clinical, statistical, etc.) and must be based on a careful examination of costs, risks, and benefits.
Collapse
Affiliation(s)
| | - Mohammad Ali Mansournia
- Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
| | - Alessandro Vitale
- Department of Surgical, Oncological and Gastroenterological Sciences (DiSCOG), Padova University, Padova, Italy
| |
Collapse
|
2
|
Al-Asadi M, Sherren M, Abdel Khalik H, Leroux T, Ayeni OR, Madden K, Khan M. The Continuous Fragility Index of Statistically Significant Findings in Randomized Controlled Trials That Compare Interventions for Anterior Shoulder Instability. Am J Sports Med 2024; 52:2667-2675. [PMID: 38258495 PMCID: PMC11344964 DOI: 10.1177/03635465231202522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 07/31/2023] [Indexed: 01/24/2024]
Abstract
BACKGROUND Evidence-based care relies on robust research. The fragility index (FI) is used to assess the robustness of statistically significant findings in randomized controlled trials (RCTs). While the traditional FI is limited to dichotomous outcomes, a novel tool, the continuous fragility index (CFI), allows for the assessment of the robustness of continuous outcomes. PURPOSE To calculate the CFI of statistically significant continuous outcomes in RCTs evaluating interventions for managing anterior shoulder instability (ASI). STUDY DESIGN Meta-analysis; Level of evidence, 2. METHODS A search was conducted across the MEDLINE, Embase, and CENTRAL databases for RCTs assessing management strategies for ASI from inception to October 6, 2022. Studies that reported a statistically significant difference between study groups in ≥1 continuous outcome were included. The CFI was calculated and applied to all available RCTs reporting interventions for ASI. Multivariable linear regression was performed between the CFI and various study characteristics as predictors. RESULTS There were 27 RCTs, with a total of 1846 shoulders, included. The median sample size was 61 shoulders (IQR, 43). The median CFI across 27 RCTs was 8.2 (IQR, 17.2; 95% CI, 3.6-15.4). The median CFI was 7.9 (IQR, 21; 95% CI, 1-22) for 11 studies comparing surgical methods, 22.6 (IQR, 16; 95% CI, 8.2-30.4) for 6 studies comparing nonsurgical reduction interventions, 2.8 for 3 studies comparing immobilization methods, and 2.4 for 3 studies comparing surgical versus nonsurgical interventions. Significantly, 22 of 57 included outcomes (38.6%) from studies with completed follow-up data had a loss to follow-up exceeding their CFI. Multivariable regression demonstrated that there was a statistically significant positive correlation between a trial's sample size and the CFI of its outcomes (r = 0.23 [95% CI, 0.13-0.33]; P < .001). CONCLUSION More than a third of continuous outcomes in ASI trials had a CFI less than the reported loss to follow-up. This carries the significant risk of reversing trial findings and should be considered when evaluating available RCT data. We recommend including the FI, CFI, and loss to follow-up in the abstracts of future RCTs.
Collapse
Affiliation(s)
- Mohammed Al-Asadi
- Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
| | | | - Hassaan Abdel Khalik
- Division of Orthopaedic Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
| | - Timothy Leroux
- Division of Orthopaedic Surgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
| | - Olufemi R. Ayeni
- Division of Orthopaedic Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Kim Madden
- Division of Orthopaedic Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| | - Moin Khan
- Division of Orthopaedic Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
3
|
Schwarzer G, Rücker G, Semaca C. LFK index does not reliably detect small-study effects in meta-analysis: A simulation study. Res Synth Methods 2024; 15:603-615. [PMID: 38467140 DOI: 10.1002/jrsm.1714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 02/27/2024] [Accepted: 02/28/2024] [Indexed: 03/13/2024]
Abstract
The LFK index has been promoted as an improved method to detect bias in meta-analysis. Putatively, its performance does not depend on the number of studies in the meta-analysis. We conducted a simulation study, comparing the LFK index test to three standard tests for funnel plot asymmetry in settings with smaller or larger group sample sizes. In general, false positive rates of the LFK index test markedly depended on the number and size of studies as well as the between-study heterogeneity with values between 0% and almost 30%. Egger's test adhered well to the pre-specified significance level of 5% under homogeneity, but was too liberal (smaller groups) or conservative (larger groups) under heterogeneity. The rank test was too conservative for most simulation scenarios. The Thompson-Sharp test was too conservative under homogeneity, but adhered well to the significance level in case of heterogeneity. The true positive rate of the LFK index test was only larger compared with classic tests if the false positive rate was inflated. The power of classic tests was similar or larger than the LFK index test if the false positive rate of the LFK index test was used as significance level for the classic tests. Under ideal conditions, the false positive rate of the LFK index test markedly and unpredictably depends on the number and sample size of studies as well as the extent of between-study heterogeneity. The LFK index test in its current implementation should not be used to assess funnel plot asymmetry in meta-analysis.
Collapse
Affiliation(s)
- Guido Schwarzer
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
| | - Gerta Rücker
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
| | - Cristina Semaca
- Master's Degree Program, Medical Biometry/Biostatistics, University of Heidelberg, Heidelberg, Germany
| |
Collapse
|
4
|
Mathur MB. P-hacking in meta-analyses: A formalization and new meta-analytic methods. Res Synth Methods 2024; 15:483-499. [PMID: 38273211 DOI: 10.1002/jrsm.1701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 11/28/2023] [Accepted: 12/14/2023] [Indexed: 01/27/2024]
Abstract
As traditionally conceived, publication bias arises from selection operating on a collection of individually unbiased estimates. A canonical form of such selection across studies (SAS) is the preferential publication of affirmative studies (i.e., those with significant, positive estimates) versus nonaffirmative studies (i.e., those with nonsignificant or negative estimates). However, meta-analyses can also be compromised by selection within studies (SWS), in which investigators "p-hack" results within their study to obtain an affirmative estimate. Published estimates can then be biased even conditional on affirmative status, which comprises the performance of existing methods that only consider SAS. We propose two new analysis methods that accommodate joint SAS and SWS; both analyze only the published nonaffirmative estimates. First, we propose estimating the underlying meta-analytic mean by fitting "right-truncated meta-analysis" (RTMA) to the published nonaffirmative estimates. This method essentially imputes the entire underlying distribution of population effects. Second, we propose conducting a standard meta-analysis of only the nonaffirmative studies (MAN); this estimate is conservative (negatively biased) under weakened assumptions. We provide an R package (phacking) and website (metabias.io). Our proposed methods supplement existing methods by assessing the robustness of meta-analyses to joint SAS and SWS.
Collapse
Affiliation(s)
- Maya B Mathur
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, California, USA
| |
Collapse
|
5
|
Boscardin CK, Sewell JL, Tolsgaard MG, Pusic MV. How to Use and Report on p-values. PERSPECTIVES ON MEDICAL EDUCATION 2024; 13:250-254. [PMID: 38680196 PMCID: PMC11049675 DOI: 10.5334/pme.1324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 02/08/2024] [Indexed: 05/01/2024]
Abstract
The use of the p-value in quantitative research, particularly its threshold of "P < 0.05" for determining "statistical significance," has long been a cornerstone of statistical analysis in research. However, this standard has been increasingly scrutinized for its potential to mislead findings, especially when the practical significance, the number of comparisons, or the suitability of statistical tests are not properly considered. In response to controversy around use of p-values, the American Statistical Association published a statement in 2016 that challenged the research community to abandon the term "statistically significant". This stance has been echoed by leading scientific journals to urge a significant reduction or complete elimination in the reliance on p-values when reporting results. To provide guidance to researchers in health professions education, this paper provides a succinct overview of the ongoing debate regarding the use of p-values and the definition of p-values. It reflects on the controversy by highlighting the common pitfalls associated with p-value interpretation and usage, such as misinterpretation, overemphasis, and false dichotomization between "significant" and "non-significant" results. This paper also outlines specific recommendations for the effective use of p-values in statistical reporting including the importance of reporting effect sizes, confidence intervals, the null hypothesis, and conducting sensitivity analyses for appropriate interpretation. These considerations aim to guide researchers toward a more nuanced and informative use of p-values.
Collapse
Affiliation(s)
- Christy K. Boscardin
- Department of Medicine, University of California, San Francisco, California, US
- Department of Anesthesia, University of California, San Francisco, California, US
| | - Justin L. Sewell
- Department of Medicine, University of California, San Francisco, California, US
| | - Martin G. Tolsgaard
- Medical Education, Copenhagen Academy for Medical Education and Simulation, Copenhagen, DK
| | | |
Collapse
|
6
|
Zettersten M, Cox C, Bergmann C, Tsui ASM, Soderstrom M, Mayor J, Lundwall RA, Lewis M, Kosie JE, Kartushina N, Fusaroli R, Frank MC, Byers-Heinlein K, Black AK, Mathur MB. Evidence for Infant-directed Speech Preference Is Consistent Across Large-scale, Multi-site Replication and Meta-analysis. Open Mind (Camb) 2024; 8:439-461. [PMID: 38665547 PMCID: PMC11045035 DOI: 10.1162/opmi_a_00134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 02/19/2024] [Indexed: 04/28/2024] Open
Abstract
There is substantial evidence that infants prefer infant-directed speech (IDS) to adult-directed speech (ADS). The strongest evidence for this claim has come from two large-scale investigations: i) a community-augmented meta-analysis of published behavioral studies and ii) a large-scale multi-lab replication study. In this paper, we aim to improve our understanding of the IDS preference and its boundary conditions by combining and comparing these two data sources across key population and design characteristics of the underlying studies. Our analyses reveal that both the meta-analysis and multi-lab replication show moderate effect sizes (d ≈ 0.35 for each estimate) and that both of these effects persist when relevant study-level moderators are added to the models (i.e., experimental methods, infant ages, and native languages). However, while the overall effect size estimates were similar, the two sources diverged in the effects of key moderators: both infant age and experimental method predicted IDS preference in the multi-lab replication study, but showed no effect in the meta-analysis. These results demonstrate that the IDS preference generalizes across a variety of experimental conditions and sampling characteristics, while simultaneously identifying key differences in the empirical picture offered by each source individually and pinpointing areas where substantial uncertainty remains about the influence of theoretically central moderators on IDS preference. Overall, our results show how meta-analyses and multi-lab replications can be used in tandem to understand the robustness and generalizability of developmental phenomena.
Collapse
Affiliation(s)
| | - Christopher Cox
- Department of Linguistics, Cognitive Science and Semiotics, School of Communication and Culture, Aarhus University; Interacting Minds Center, School of Culture and Society, Aarhus University
| | | | | | | | - Julien Mayor
- Department of Linguistics and Scandinavian Studies, University of Oslo
| | | | - Molly Lewis
- Department of Psychology/Social and Decision Sciences, Carnegie Mellon University
| | | | | | - Riccardo Fusaroli
- Department of Linguistics, Cognitive Science and Semiotics, School of Communication and Culture, Aarhus University; Interacting Minds Center, School of Culture and Society, Aarhus University
| | | | | | - Alexis K. Black
- School of Audiology and Speech Sciences, University of British Columbia
| | | |
Collapse
|
7
|
Mathur MB. Sensitivity analysis for the interactive effects of internal bias and publication bias in meta-analyses. Res Synth Methods 2024; 15:21-43. [PMID: 37743567 PMCID: PMC11164126 DOI: 10.1002/jrsm.1667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 06/27/2023] [Accepted: 08/15/2023] [Indexed: 09/26/2023]
Abstract
Meta-analyses can be compromised by studies' internal biases (e.g., confounding in nonrandomized studies) as well as publication bias. These biases often operate nonadditively: publication bias that favors significant, positive results selects indirectly for studies with more internal bias. We propose sensitivity analyses that address two questions: (1) "For a given severity of internal bias across studies and of publication bias, how much could the results change?"; and (2) "For a given severity of publication bias, how severe would internal bias have to be, hypothetically, to attenuate the results to the null or by a given amount?" These methods consider the average internal bias across studies, obviating specifying the bias in each study individually. The analyst can assume that internal bias affects all studies, or alternatively that it only affects a known subset (e.g., nonrandomized studies). The internal bias can be of unknown origin or, for certain types of bias in causal estimates, can be bounded analytically. The analyst can specify the severity of publication bias or, alternatively, consider a "worst-case" form of publication bias. Robust estimation methods accommodate non-normal effects, small meta-analyses, and clustered estimates. As we illustrate by re-analyzing published meta-analyses, the methods can provide insights that are not captured by simply considering each bias in turn. An R package implementing the methods is available (multibiasmeta).
Collapse
Affiliation(s)
- Maya B Mathur
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Palo Alto, California, USA
| |
Collapse
|
8
|
Msaouel P, Lee J, Thall PF. Interpreting Randomized Controlled Trials. Cancers (Basel) 2023; 15:4674. [PMID: 37835368 PMCID: PMC10571666 DOI: 10.3390/cancers15194674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 09/19/2023] [Accepted: 09/19/2023] [Indexed: 10/15/2023] Open
Abstract
This article describes rationales and limitations for making inferences based on data from randomized controlled trials (RCTs). We argue that obtaining a representative random sample from a patient population is impossible for a clinical trial because patients are accrued sequentially over time and thus comprise a convenience sample, subject only to protocol entry criteria. Consequently, the trial's sample is unlikely to represent a definable patient population. We use causal diagrams to illustrate the difference between random allocation of interventions within a clinical trial sample and true simple or stratified random sampling, as executed in surveys. We argue that group-specific statistics, such as a median survival time estimate for a treatment arm in an RCT, have limited meaning as estimates of larger patient population parameters. In contrast, random allocation between interventions facilitates comparative causal inferences about between-treatment effects, such as hazard ratios or differences between probabilities of response. Comparative inferences also require the assumption of transportability from a clinical trial's convenience sample to a targeted patient population. We focus on the consequences and limitations of randomization procedures in order to clarify the distinctions between pairs of complementary concepts of fundamental importance to data science and RCT interpretation. These include internal and external validity, generalizability and transportability, uncertainty and variability, representativeness and inclusiveness, blocking and stratification, relevance and robustness, forward and reverse causal inference, intention to treat and per protocol analyses, and potential outcomes and counterfactuals.
Collapse
Affiliation(s)
- Pavlos Msaouel
- Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- David H. Koch Center for Applied Research of Genitourinary Cancers, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Juhee Lee
- Department of Statistics, University of California Santa Cruz, Santa Cruz, CA 95064, USA;
| | - Peter F. Thall
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA;
| |
Collapse
|
9
|
Ávila P, Berruezo A, Jiménez-Candil J, Tercedor L, Calvo D, Arribas F, Fernández-Portales J, Merino JL, Hernández-Madrid A, Fernández-Avilés F, Arenal Á. Bayesian analysis of the Substrate Ablation vs. Antiarrhythmic Drug Therapy for Symptomatic Ventricular Tachycardia trial. Europace 2023; 25:euad181. [PMID: 37366571 PMCID: PMC10326301 DOI: 10.1093/europace/euad181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 06/21/2023] [Indexed: 06/28/2023] Open
Abstract
BACKGROUND AND AIMS Bayesian analyses can provide additional insights into the results of clinical trials, aiding in the decision-making process. We analysed the Substrate Ablation vs. Antiarrhythmic Drug Therapy for Symptomatic Ventricular Tachycardia (SURVIVE-VT) trial using Bayesian survival models. METHODS AND RESULTS The SURVIVE-VT trial randomized patients with ischaemic cardiomyopathy and monomorphic ventricular tachycardia (VT) to catheter ablation or antiarrhythmic drugs (AAD) as a first-line strategy. The primary outcome was a composite of cardiovascular death, appropriate implantable cardioverter-defibrillator shocks, unplanned heart failure hospitalizations, or severe treatment-related complications. We used informative, skeptical, and non-informative priors with different probabilities of large effects to compute the posterior distributions using Markov Chain Monte Carlo methods. We calculated the probabilities of hazard ratios (HR) being <1, <0.9, and <0.75, as well as 2-year survival estimates. Of the 144 randomized patients, 71 underwent catheter ablation and 73 received AAD. Regardless of the prior, catheter ablation had a >98% probability of reducing the primary outcome (HR < 1) and a >96% probability of achieving a reduction of >10% (HR < 0.9). The probability of a >25% (HR < 0.75) reduction of treatment-related complications was >90%. Catheter ablation had a high probability (>93%) of reducing incessant/slow undetected VT/electric storm, unplanned hospitalizations for ventricular arrhythmias, and overall cardiovascular admissions > 25%, with absolute differences of 15.2%, 21.2%, and 20.2%, respectively. CONCLUSION In patients with ischaemic cardiomyopathy and VT, catheter ablation as a first-line therapy resulted in a high probability of reducing several clinical outcomes compared to AAD. Our study highlights the value of Bayesian analysis in clinical trials and its potential for guiding treatment decisions. TRIAL REGISTRATION ClinicalTrials.gov identifier: NCT03734562.
Collapse
Affiliation(s)
- Pablo Ávila
- Cardiology Department, Hospital General Universitario Gregorio Marañón, IiSGM, Universidad Complutense, CIBERCV, Dr Esquerdo 46, 28007, Madrid, Spain
| | - Antonio Berruezo
- Arrhythmia Unit, Cardiology Department, Hospital Clinic and Teknon Medical Centre, c/Villarroel 170, 08036, Barcelona, Spain
| | - Javier Jiménez-Candil
- Arrhythmia Unit, Cardiology Department, IBSAL-Hospital Universitario, Universidad de Salamanca, CIBERCV, Paseo San Vicente 58-182, 37007, Salamanca, Spain
| | - Luis Tercedor
- Arrhythmia Unit, Cardiology Department, Hospital Universitario Virgen de las Nieves, Avd. Fuerzas Armadas 2, 18014, Granada, Spain
| | - David Calvo
- Arrhythmia Unit, Cardiology Department, Hospital Universitario Central de Asturias, Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), Avd Roma, s/n, 33011, Oviedo, Spain
- Arrhythmia Unit, Cardiology Department, Hospital Clínico San Carlos, Prof Martín Lagos, S/N, Madrid, 28040, Spain
| | - Fernando Arribas
- Cardiology Department, Hospital Doce de Octubre, Av. de Córdoba, s/n, 28041, Madrid, Spain
| | - Javier Fernández-Portales
- Cardiology Department, Complejo Hospitalario Universitario de Cáceres, Av. de la Universidad 75, 10004, Cáceres, Spain
| | - José Luis Merino
- Arrhythmia Unit, Cardiology Department, Hospital Universitario La Paz, IdiPAZ, Universidad Autónoma, P.º de la Castellana 261, 28046, Madrid, Spain
| | - Antonio Hernández-Madrid
- Arrhythmia Unit, Hospital Ramón y Cajal, Universidad de Alcalá de Henares, M-607, 9, 100, 28034, Madrid, Spain
| | - Francisco Fernández-Avilés
- Cardiology Department, Hospital General Universitario Gregorio Marañón, IiSGM, Universidad Complutense, CIBERCV, Dr Esquerdo 46, 28007, Madrid, Spain
| | - Ángel Arenal
- Cardiology Department, Hospital General Universitario Gregorio Marañón, IiSGM, Universidad Complutense, CIBERCV, Dr Esquerdo 46, 28007, Madrid, Spain
| |
Collapse
|
10
|
Choi WS. Problems and alternatives of testing significance using null hypothesis and P-value in food research. Food Sci Biotechnol 2023; 32:1-9. [PMID: 37363053 PMCID: PMC10227784 DOI: 10.1007/s10068-023-01348-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 05/12/2023] [Accepted: 05/16/2023] [Indexed: 06/28/2023] Open
Abstract
A testing method to identify statistically significant differences by comparing the significance level and the probability value based on the Null Hypothesis Significance Test (NHST) has been used in food research. However, problems with this testing method have been discussed. Several alternatives to the NHST and the P-value test methods have been proposed including lowering the P-value threshold and using confidence interval (CI), effect size, and Bayesian statistics. The CI estimates the extent of the effect or difference and determines the presence or absence of statistical significance. The effect size index determines the degree of effect difference and allows for the comparison of various statistical results. Bayesian statistics enable predictions to be made even when only a small amount of data is available. In conclusion, CI, effect size, and Bayesian statistics can complement or replace traditional statistical tests in food research by replacing the use of NHST and P-value.
Collapse
Affiliation(s)
- Won-Seok Choi
- Department of Food Science and Technology, Korea National University of Transportation, Jeungpyeong-gun, 27909 Chungbuk Republic of Korea
| |
Collapse
|
11
|
Wang A, Menon R, Li T, Harris L, Harris IA, Naylor J, Adie S. Has the degree of outcome reporting bias in surgical randomized trials changed? A meta-regression analysis. ANZ J Surg 2023; 93:76-82. [PMID: 36655339 DOI: 10.1111/ans.18273] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 12/17/2022] [Accepted: 01/04/2023] [Indexed: 01/20/2023]
Abstract
BACKGROUND Outcome reporting bias in individual trials can compromise the validity of pooled estimates within systematic reviews. Recent strategies have attempted to address outcome reporting bias, which favours the full reporting of statistically significant outcomes over non-significant outcomes. We examined whether the association between full outcome reporting and statistical significance in surgical trials has changed from 2009 to 2019. METHODS We systematically searched for 350 surgical randomized controlled trials (RCTs) from 2009 and 350 surgical RCTs from 2019. Outcomes were classified as fully reported, partially reported, qualitatively reported or unreported. For each outcome, a contingency table was populated with full outcome reporting (yes/no) and statistical significance (yes/no). We combined odds ratios in random effects meta-analysis to estimate the association between full outcome reporting and statistical significance in 2009 compared with 2019. RESULTS Twenty-eight percent of outcomes in 2009 were incompletely reported, compared with 30% in 2019. In 2009, significant outcomes were more likely to be fully reported than non-significant outcomes (OR = 2.4, 95% CI 1.7-3.4, I2 = 35%), but the opposite association was seen in 2019 (OR = 0.51, 95% CI 0.34-0.77, I2 = 43%). RCTs from 2019 were less likely to demonstrate outcome reporting bias favouring significant outcomes (OR = 0.21, 95% CI 0.12-0.35, P < 0.001). CONCLUSION Outcome reporting bias favouring the full reporting of significant over non-significant outcomes was demonstrated in 2009, but the opposite association was seen in 2019. There remains a high prevalence of incomplete outcome reporting. We recommend ongoing adherence to trial protocol guidelines to improve outcome reporting transparency and completeness.
Collapse
Affiliation(s)
- Andy Wang
- School of Clinical Medicine, UNSW Medicine and Health, UNSW Sydney, Australia
| | - Rahul Menon
- School of Clinical Medicine, UNSW Medicine and Health, UNSW Sydney, Australia
| | - Tom Li
- School of Clinical Medicine, UNSW Medicine and Health, UNSW Sydney, Australia
| | - Laura Harris
- SCORe, Sydney Orthopaedic Trauma and Reconstructive Surgery, Sydney, Australia
| | - Ian A Harris
- School of Clinical Medicine, UNSW Medicine and Health, UNSW Sydney, Australia
| | - Justine Naylor
- South Western Sydney Clinical School, UNSW Medicine and Health, UNSW Sydney, Australia
| | - Sam Adie
- St George and Sutherland Clinical School, UNSW Medicine and Health, UNSW Sydney, Australia
| |
Collapse
|
12
|
Neves K, Tan PB, Amaral OB. Are most published research findings false in a continuous universe? PLoS One 2022; 17:e0277935. [PMID: 36538521 PMCID: PMC9767354 DOI: 10.1371/journal.pone.0277935] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 10/30/2022] [Indexed: 12/24/2022] Open
Abstract
Diagnostic screening models for the interpretation of null hypothesis significance test (NHST) results have been influential in highlighting the effect of selective publication on the reproducibility of the published literature, leading to John Ioannidis' much-cited claim that most published research findings are false. These models, however, are typically based on the assumption that hypotheses are dichotomously true or false, without considering that effect sizes for different hypotheses are not the same. To address this limitation, we develop a simulation model that overcomes this by modeling effect sizes explicitly using different continuous distributions, while retaining other aspects of previous models such as publication bias and the pursuit of statistical significance. Our results show that the combination of selective publication, bias, low statistical power and unlikely hypotheses consistently leads to high proportions of false positives, irrespective of the effect size distribution assumed. Using continuous effect sizes also allows us to evaluate the degree of effect size overestimation and prevalence of estimates with the wrong sign in the literature, showing that the same factors that drive false-positive results also lead to errors in estimating effect size direction and magnitude. Nevertheless, the relative influence of these factors on different metrics varies depending on the distribution assumed for effect sizes. The model is made available as an R ShinyApp interface, allowing one to explore features of the literature in various scenarios.
Collapse
Affiliation(s)
- Kleber Neves
- Institute of Medical Biochemistry Leopoldo de Meis, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Pedro B. Tan
- Institute of Medical Biochemistry Leopoldo de Meis, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Olavo B. Amaral
- Institute of Medical Biochemistry Leopoldo de Meis, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
13
|
Transient Acute Kidney Injury Versus Persistent Acute Kidney Injury in Acute Liver Failure-Helpful Differentiation or Confusing Dichotomization. Crit Care Med 2022; 50:1402-1405. [PMID: 35984055 DOI: 10.1097/ccm.0000000000005590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
14
|
Maier M, VanderWeele TJ, Mathur MB. Using selection models to assess sensitivity to publication bias: A tutorial and call for more routine use. CAMPBELL SYSTEMATIC REVIEWS 2022; 18:e1256. [PMID: 36909879 PMCID: PMC9247867 DOI: 10.1002/cl2.1256] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
In meta-analyses, it is critical to assess the extent to which publication bias might have compromised the results. Classical methods based on the funnel plot, including Egger's test and Trim-and-Fill, have become the de facto default methods to do so, with a large majority of recent meta-analyses in top medical journals (85%) assessing for publication bias exclusively using these methods. However, these classical funnel plot methods have important limitations when used as the sole means of assessing publication bias: they essentially assume that the publication process favors large point estimates for small studies and does not affect the largest studies, and they can perform poorly when effects are heterogeneous. In light of these limitations, we recommend that meta-analyses routinely apply other publication bias methods in addition to or instead of classical funnel plot methods. To this end, we describe how to use and interpret selection models. These methods make the often more realistic assumption that publication bias favors "statistically significant" results, and the methods also directly accommodate effect heterogeneity. Selection models have been established for decades in the statistics literature and are supported by user-friendly software, yet remain rarely reported in many disciplines. We use a previously published meta-analysis to demonstrate that selection models can yield insights that extend beyond those provided by funnel plot methods, suggesting the importance of establishing more comprehensive reporting practices for publication bias assessment.
Collapse
Affiliation(s)
- Maximilian Maier
- Department of Experimental PsychologyUniversity College LondonLondonUK
- Department of PsychologyUniversity of AmsterdamAmsterdamThe Netherlands
| | | | - Maya B. Mathur
- Quantitative Sciences Unit, Department of PediatricsStanford UniversityStanfordCaliforniaUSA
| |
Collapse
|
15
|
O'Brien E. Losing Sight of Piecemeal Progress: People Lump and Dismiss Improvement Efforts That Fall Short of Categorical Change-Despite Improving. Psychol Sci 2022; 33:1278-1299. [PMID: 35920814 DOI: 10.1177/09567976221075302] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Fourteen experiments (N = 10,556 adult participants, including more than 20,000 observed choices across 25 issues) documented how people perceive and respond to relative progress out in the world, revealing a robust "negative-lumping" effect. As problematic entities worked to better their ways, participants shifted to dismiss them if they fell short of categorical reform-despite distinctions in improvement. This increased dismissal of relative gains as "all the same" was driven by the belief that falling short signals an eschewal of doing the bare minimum and lacking serious intent to change, making these gains seem less deserving of recognition. Critically, participants then "checked out": They underrewarded and underinvested in efforts toward "merely" incremental improvement. Finally, in all experiments, participants lumped together absolute failures but not absolute successes, highlighting a unique blindness to gradations of badness. When attempts to eradicate a problem fail, people might dismiss smaller but critical steps that were and can still be made.
Collapse
Affiliation(s)
- Ed O'Brien
- Booth School of Business, The University of Chicago
| |
Collapse
|
16
|
Zaccardi F, Kloecker DE, Khunti K, Davies MJ. Non-inferiority and clinical superiority of glucagon-like peptide-1 receptor agonists and sodium-glucose co-transporter-2 inhibitors: Systematic analysis of cardiorenal outcome trials in type 2 diabetes. Diabetes Obes Metab 2022; 24:1598-1606. [PMID: 35491523 PMCID: PMC9543971 DOI: 10.1111/dom.14735] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 04/21/2022] [Accepted: 04/28/2022] [Indexed: 11/27/2022]
Abstract
AIMS Most trials leading to the approval of glucagon-like peptide receptor agonists (GLP-1RAs) and sodium-glucose co-transporter-2 inhibitors (SGLT2is) were primarily designed to confirm their non-inferiority to placebo (commonly using an upper 95% confidence limit threshold of 1.3) and, if confirmed, superiority (threshold 1): this asymmetry of margins (1 vs. 1.3) favours the active intervention. We aimed to quantify the probability of clinical superiority of the active treatment by applying the same threshold used to claim non-inferiority. MATERIALS AND METHODS We searched PubMed and Cochrane CENTRAL for cardiorenal outcome trials in subjects with type 2 diabetes published before 5 December 2021, to reconstruct from Kaplan-Meier plots individual-level data for the primary outcome or all-cause mortality. We calculated Bayesian posterior densities to obtain the probability for a treatment effect (hazard ratio) <0.769, which is symmetric to the 1.3 threshold (i.e. its reciprocal 1/1.3), emulating a scenario where the active treatment is placebo and placebo is the active treatment. RESULTS We extracted data from 27 Kaplan-Meier plots (18 for the primary outcome, nine for mortality). Probabilities of clinical superiority to placebo varied significantly: for GLP-1RAs, from a minimum of 0% to a maximum of 69% for the primary outcome and from 0% to 8% for mortality; corresponding estimates for SGLT2is were 0% to 96% and 0% to 93%. Probabilities were on average greater for SGLT2is, particularly in trials investigating kidney or heart failure outcomes. CONCLUSIONS The probability of clinical superiority to placebo varies widely across trials previously reported as showing superiority of GLP-1RAs or SGLT2is compared with placebo. These results showed within- and between-class differences, highlight the drawbacks of a binary interpretation of the results, particularly in the context of the current designs of non-inferiority trials, and have implications for decision makers and future clinical recommendations.
Collapse
Affiliation(s)
- Francesco Zaccardi
- Leicester Real World Evidence UnitUniversity of Leicester, Leicester General HospitalLeicesterUK
- Diabetes Research CentreUniversity of Leicester, Leicester General HospitalLeicesterUK
- NIHR Collaboration for Leadership in Applied Health Research and Care‐East MidlandsUniversity of LeicesterLeicesterUK
| | - David E. Kloecker
- Leicester Real World Evidence UnitUniversity of Leicester, Leicester General HospitalLeicesterUK
- Diabetes Research CentreUniversity of Leicester, Leicester General HospitalLeicesterUK
| | - Kamlesh Khunti
- Leicester Real World Evidence UnitUniversity of Leicester, Leicester General HospitalLeicesterUK
- Diabetes Research CentreUniversity of Leicester, Leicester General HospitalLeicesterUK
- NIHR Collaboration for Leadership in Applied Health Research and Care‐East MidlandsUniversity of LeicesterLeicesterUK
| | - Melanie J. Davies
- Diabetes Research CentreUniversity of Leicester, Leicester General HospitalLeicesterUK
- NIHR Leicester Biomedical Research CentreUniversity Hospitals of Leicester NHS Trust and University of LeicesterLeicesterUK
| |
Collapse
|
17
|
Flutre T, Le Cunff L, Fodor A, Launay A, Romieu C, Berger G, Bertrand Y, Terrier N, Beccavin I, Bouckenooghe V, Roques M, Pinasseau L, Verbaere A, Sommerer N, Cheynier V, Bacilieri R, Boursiquot JM, Lacombe T, Laucou V, This P, Péros JP, Doligez A. A genome-wide association and prediction study in grapevine deciphers the genetic architecture of multiple traits and identifies genes under many new QTLs. G3 (BETHESDA, MD.) 2022; 12:6575896. [PMID: 35485948 PMCID: PMC9258538 DOI: 10.1093/g3journal/jkac103] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/21/2022] [Indexed: 12/11/2022]
Abstract
To cope with the challenges facing agriculture, speeding-up breeding programs is a worthy endeavor, especially for perennial species such as grapevine, but requires understanding the genetic architecture of target traits. To go beyond the mapping of quantitative trait loci in bi-parental crosses, we exploited a diversity panel of 279 Vitis vinifera L. cultivars planted in 5 blocks in the vineyard. This panel was phenotyped over several years for 127 traits including yield components, organic acids, aroma precursors, polyphenols, and a water stress indicator. The panel was genotyped for 63k single nucleotide polymorphisms by combining an 18K microarray and genotyping-by-sequencing. The experimental design allowed to reliably assess the genotypic values for most traits. Marker densification via genotyping-by-sequencing markedly increased the proportion of genetic variance explained by single nucleotide polymorphisms, and 2 multi-single nucleotide polymorphism models identified quantitative trait loci not found by a single nucleotide polymorphism-by-single nucleotide polymorphism model. Overall, 489 reliable quantitative trait loci were detected for 41% more response variables than by a single nucleotide polymorphism-by-single nucleotide polymorphism model with microarray-only single nucleotide polymorphisms, many new ones compared with the results from bi-parental crosses. A prediction accuracy higher than 0.42 was obtained for 50% of the response variables. Our overall approach as well as quantitative trait locus and prediction results provide insights into the genetic architecture of target traits. New candidate genes and the application into breeding are discussed.
Collapse
Affiliation(s)
- Timothée Flutre
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France.,Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| | - Loïc Le Cunff
- UMT Géno-Vigne, 34398 Montpellier, France.,IFV, 30240 Le Grau-du-Roi, France
| | - Agota Fodor
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| | - Amandine Launay
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| | - Charles Romieu
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| | - Gilles Berger
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| | - Yves Bertrand
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| | - Nancy Terrier
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France
| | | | | | - Maryline Roques
- UMT Géno-Vigne, 34398 Montpellier, France.,IFV, 30240 Le Grau-du-Roi, France
| | - Lucie Pinasseau
- SPO, Univ Montpellier, INRAE, Institut Agro, 34060 Montpellier, France
| | - Arnaud Verbaere
- SPO, Univ Montpellier, INRAE, Institut Agro, 34060 Montpellier, France
| | - Nicolas Sommerer
- SPO, Univ Montpellier, INRAE, Institut Agro, 34060 Montpellier, France
| | | | - Roberto Bacilieri
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| | - Jean-Michel Boursiquot
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| | - Thierry Lacombe
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| | - Valérie Laucou
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| | - Patrice This
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| | - Jean-Pierre Péros
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| | - Agnès Doligez
- AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France.,UMT Géno-Vigne, 34398 Montpellier, France
| |
Collapse
|
18
|
Rosenberg JM, Kubsch M, Wagenmakers EJ, Dogucu M. Making Sense of Uncertainty in the Science Classroom: A Bayesian Approach. SCIENCE & EDUCATION 2022; 31:1239-1262. [PMID: 35729987 PMCID: PMC9196155 DOI: 10.1007/s11191-022-00341-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/15/2022] [Indexed: 06/15/2023]
Abstract
UNLABELLED Uncertainty is ubiquitous in science, but scientific knowledge is often represented to the public and in educational contexts as certain and immutable. This contrast can foster distrust when scientific knowledge develops in a way that people perceive as a reversals, as we have observed during the ongoing COVID-19 pandemic. Drawing on research in statistics, child development, and several studies in science education, we argue that a Bayesian approach can support science learners to make sense of uncertainty. We provide a brief primer on Bayes' theorem and then describe three ways to make Bayesian reasoning practical in K-12 science education contexts. There are a) using principles informed by Bayes' theorem that relate to the nature of knowing and knowledge, b) interacting with a web-based application (or widget-Confidence Updater) that makes the calculations needed to apply Bayes' theorem more practical, and c) adopting strategies for supporting even young learners to engage in Bayesian reasoning. We conclude with directions for future research and sum up how viewing science and scientific knowledge from a Bayesian perspective can build trust in science. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11191-022-00341-3.
Collapse
Affiliation(s)
- Joshua M. Rosenberg
- University of Tennessee, Knoxville, 1122 Volunteer Blvd, TN 37996 Knoxville, USA
| | - Marcus Kubsch
- IPN–Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, D-24118 Kiel, Germany
| | | | | |
Collapse
|
19
|
Habiger J, Liang Y. Publication Policies for Replicable Research and the Community-Wide False Discovery Rate. AM STAT 2022. [DOI: 10.1080/00031305.2021.1999857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Affiliation(s)
- Joshua Habiger
- Department of Statistics, Oklahoma State University, Stillwater, OK
| | - Ye Liang
- Department of Statistics, Oklahoma State University, Stillwater, OK
| |
Collapse
|
20
|
Lytsy P, Hartman M, Pingel R. Misinterpretations of P-values and statistical tests persists among researchers and professionals working with statistics and epidemiology. Ups J Med Sci 2022; 127:8760. [PMID: 35991465 PMCID: PMC9383044 DOI: 10.48101/ujms.v127.8760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The aim was to investigate inferences of statistically significant test results among persons with more or less statistical education and research experience. METHODS A total of 75 doctoral students and 64 statisticians/epidemiologist responded to a web questionnaire about inferences of statistically significant findings. Participants were asked about their education and research experience, and also whether a 'statistically significant' test result (P = 0.024, α-level 0.05) could be inferred as proof or probability statements about the truth or falsehood of the null hypothesis (H0) and the alternative hypothesis (H1). RESULTS Almost all participants reported having a university degree, and among statisticians/epidemiologist, most reported having a university degree in statistics and were working professionally with statistics. Overall, 9.4% of statisticians/epidemiologist and 24.0% of doctoral students responded that the statistically significant finding proved that H0 is not true, and 73.4% of statisticians/epidemiologists and 53.3% of doctoral students responded that the statistically significant finding indicated that H0 is improbable. Corresponding numbers about inferences about the alternative hypothesis (H1) were 12.0% and 6.2% about proving H1 being true and 62.7 and 62.5% for the conclusion that H1 is probable. Correct inferences to both questions, which is that a statistically significant finding cannot be inferred as either proof or a measure of a hypothesis' probability, were given by 10.7% of doctoral students and 12.5% of statisticians/epidemiologists. CONCLUSIONS Misinterpretation of P-values and statistically significant test results persists also among persons who have substantial statistical education and who work professionally with statistics.
Collapse
Affiliation(s)
- Per Lytsy
- Department of Public Health and Caring Sciences, University of Uppsala, Uppsala, Sweden
| | | | - Ronnie Pingel
- Department of Public Health and Caring Sciences, University of Uppsala, Uppsala, Sweden
- Department of Statistics, University of Uppsala, Uppsala, Sweden
| |
Collapse
|
21
|
Hartnack S, Roos M. Teaching: confidence, prediction and tolerance intervals in scientific practice: a tutorial on binary variables. Emerg Themes Epidemiol 2021; 18:17. [PMID: 34863186 PMCID: PMC8645111 DOI: 10.1186/s12982-021-00108-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 11/16/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the emerging themes in epidemiology is the use of interval estimates. Currently, three interval estimates for confidence (CI), prediction (PI), and tolerance (TI) are at a researcher's disposal and are accessible within the open access framework in R. These three types of statistical intervals serve different purposes. Confidence intervals are designed to describe a parameter with some uncertainty due to sampling errors. Prediction intervals aim to predict future observation(s), including some uncertainty present in the actual and future samples. Tolerance intervals are constructed to capture a specified proportion of a population with a defined confidence. It is well known that interval estimates support a greater knowledge gain than point estimates. Thus, a good understanding and the use of CI, PI, and TI underlie good statistical practice. While CIs are taught in introductory statistical classes, PIs and TIs are less familiar. RESULTS In this paper, we provide a concise tutorial on two-sided CI, PI and TI for binary variables. This hands-on tutorial is based on our teaching materials. It contains an overview of the meaning and applicability from both a classical and a Bayesian perspective. Based on a worked-out example from veterinary medicine, we provide guidance and code that can be directly applied in R. CONCLUSIONS This tutorial can be used by others for teaching, either in a class or for self-instruction of students and senior researchers.
Collapse
Affiliation(s)
- Sonja Hartnack
- Section of Epidemiology, Vetsuisse Faculty, University of Zurich, Winterthurerstr. 270, 8057 Zurich, Switzerland
| | - Malgorzata Roos
- Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Hirschengraben 84, 8001 Zurich, Switzerland
| |
Collapse
|
22
|
Helske J, Helske S, Cooper M, Ynnerman A, Besancon L. Can Visualization Alleviate Dichotomous Thinking? Effects of Visual Representations on the Cliff Effect. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3397-3409. [PMID: 33856998 DOI: 10.1109/tvcg.2021.3073466] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Common reporting styles for statistical results in scientific articles, such as p-values and confidence intervals (CI), have been reported to be prone to dichotomous interpretations, especially with respect to the null hypothesis significance testing framework. For example when the p-value is small enough or the CIs of the mean effects of a studied drug and a placebo are not overlapping, scientists tend to claim significant differences while often disregarding the magnitudes and absolute differences in the effect sizes. This type of reasoning has been shown to be potentially harmful to science. Techniques relying on the visual estimation of the strength of evidence have been recommended to reduce such dichotomous interpretations but their effectiveness has also been challenged. We ran two experiments on researchers with expertise in statistical analysis to compare several alternative representations of confidence intervals and used Bayesian multilevel models to estimate the effects of the representation styles on differences in researchers' subjective confidence in the results. We also asked the respondents' opinions and preferences in representation styles. Our results suggest that adding visual information to classic CI representation can decrease the tendency towards dichotomous interpretations - measured as the 'cliff effect': the sudden drop in confidence around p-value 0.05 - compared with classic CI visualization and textual representation of the CI with p-values. All data and analyses are publicly available at https://github.com/helske/statvis.
Collapse
|
23
|
Saed B, Munaweera R, Anderson J, O'Neill WD, Hu YS. Rapid statistical discrimination of fluorescence images of T cell receptors on immobilizing surfaces with different coating conditions. Sci Rep 2021; 11:15488. [PMID: 34326382 PMCID: PMC8322097 DOI: 10.1038/s41598-021-94730-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 07/15/2021] [Indexed: 11/24/2022] Open
Abstract
The spatial organization of T cell receptors (TCRs) correlates with membrane-associated signal amplification, dispersion, and regulation during T cell activation. Despite its potential clinical importance, quantitative analysis of the spatial arrangement of TCRs from standard fluorescence images remains difficult. Here, we report Statistical Classification Analyses of Membrane Protein Images or SCAMPI as a technique capable of analyzing the spatial arrangement of TCRs on the plasma membrane of T cells. We leveraged medical image analysis techniques that utilize pixel-based values. We transformed grayscale pixel values from fluorescence images of TCRs into estimated model parameters of partial differential equations. The estimated model parameters enabled an accurate classification using linear discrimination techniques, including Fisher Linear Discriminant (FLD) and Logistic Regression (LR). In a proof-of-principle study, we modeled and discriminated images of fluorescently tagged TCRs from Jurkat T cells on uncoated cover glass surfaces (Null) or coated cover glass surfaces with either positively charged poly-L-lysine (PLL) or TCR cross-linking anti-CD3 antibodies (OKT3). Using 80 training images and 20 test images per class, our statistical technique achieved 85% discrimination accuracy for both OKT3 versus PLL and OKT3 versus Null conditions. The run time of image data download, model construction, and image discrimination was 21.89 s on a laptop computer, comprised of 20.43 s for image data download, 1.30 s on the FLD-SCAMPI analysis, and 0.16 s on the LR-SCAMPI analysis. SCAMPI represents an alternative approach to morphology-based qualifications for discriminating complex patterns of membrane proteins conditioned on a small sample size and fast runtime. The technique paves pathways to characterize various physiological and pathological conditions using the spatial organization of TCRs from patient T cells.
Collapse
Affiliation(s)
- Badeia Saed
- Department of Chemistry, College of Liberal Arts and Sciences, University of Illinois at Chicago, Chicago, IL, 60607, USA
| | - Rangika Munaweera
- Department of Chemistry, College of Liberal Arts and Sciences, University of Illinois at Chicago, Chicago, IL, 60607, USA
| | - Jesse Anderson
- Department of Chemical Engineering, University of Illinois at Chicago, Chicago, IL, 60607, USA
| | - William D O'Neill
- Department of Bioengineering, Colleges of Engineering and Medicine, University of Illinois at Chicago, Chicago, IL, 60607, USA.
| | - Ying S Hu
- Department of Chemistry, College of Liberal Arts and Sciences, University of Illinois at Chicago, Chicago, IL, 60607, USA.
| |
Collapse
|
24
|
Kloecker DE, Davies MJ, Khunti K, Zaccardi F. Cardiovascular effects of sodium-glucose co-transporter-2 inhibitors and glucagon-like peptide-1 receptor agonists: The P value and beyond. Diabetes Obes Metab 2021; 23:1685-1691. [PMID: 33764645 DOI: 10.1111/dom.14384] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 03/08/2021] [Accepted: 03/19/2021] [Indexed: 12/22/2022]
Abstract
Despite growing awareness of the dangers of a dichotomous interpretation of trial results based on the 'statistical significance' of a treatment effect, the uptake of new approaches has been slow in diabetes medicine. We showcase a number of ways to interpret the evidence for a treatment effect applied to the cardiovascular outcome trials of glucagon-like peptide-1 receptor agonists (GLP-1RAs) and sodium-glucose co-transporter-2 inhibitors (SGLT-2is): the P value function (or confidence curves), which depicts the treatment effect across the whole spectrum of confidence levels; the counternull value, which is the hazard ratio (i.e. treatment effect size) supported by the same amount of evidence as the null value (i.e. no treatment effect); and the S value, which quantifies the strength of the evidence against the null hypothesis in terms of the number of coin tosses yielding the same side. We show how this approach identifies potential treatment effects, highlights similarities among trials straddling the threshold of statistical significance, and quantifies differences in the strength of the evidence from trials reporting statistically significant results. For example, while REWIND, CANVAS and CREDENCE failed to reach statistical significance at the .05 level for all-cause mortality, their counternull values indicate that reduced death rates by 19%, 24% and 31%, respectively, are supported by the same amount of evidence as that indicating no treatment effect. Moreover, similarities among results emerge in trials of GLP-1RAs (REWIND, EXSCEL and LEADER) lying closely around the threshold of 'statistical significance'. Lastly, several S values, such as for the primary outcome in HARMONY Outcomes (S value 10.9) and all-cause death in EMPAREG-OUTCOME (S value 15.0), stand out compared with values for other outcomes and other trials, suggesting much larger differences in the evidence between these studies and several others that cluster around the .05 significance threshold. P value functions, counternull values and S values should complement the standard reporting of the treatment effect to help interpret clinical trials and make decisions among competing glucose-lowering medications.
Collapse
Affiliation(s)
- David E Kloecker
- Diabetes Research Centre, Leicester Diabetes Centre, University of Leicester, Leicester, UK
- Leicester Real World Evidence Unit, Leicester Diabetes Centre, University of Leicester, Leicester, UK
| | - Melanie J Davies
- Diabetes Research Centre, Leicester Diabetes Centre, University of Leicester, Leicester, UK
- NIHR Leicester Biomedical Research Centre, Leicester General Hospital, Leicester, UK
| | - Kamlesh Khunti
- Diabetes Research Centre, Leicester Diabetes Centre, University of Leicester, Leicester, UK
- Leicester Real World Evidence Unit, Leicester Diabetes Centre, University of Leicester, Leicester, UK
| | - Francesco Zaccardi
- Diabetes Research Centre, Leicester Diabetes Centre, University of Leicester, Leicester, UK
- Leicester Real World Evidence Unit, Leicester Diabetes Centre, University of Leicester, Leicester, UK
| |
Collapse
|
25
|
Reito A. Past, Present and Future With p-Values, Confidence Intervals and Effect Sizes. J Foot Ankle Surg 2021; 60:642-643. [PMID: 33958041 DOI: 10.1053/j.jfas.2020.04.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 04/08/2020] [Accepted: 04/09/2020] [Indexed: 02/03/2023]
Affiliation(s)
- Aleksi Reito
- Adjunct Professor, Department of Orthopedics, Tampere University, Faculty of Medicine and Health Technology and Tampere University Hospital, Tampere, Finland; Department of Surgery, Central Finland Hospital, Jyväskylä, Finland.
| |
Collapse
|
26
|
Dempsey W, Mukherjee B. Reflecting on "A Statistician in Medicine" in 2020. Stat Med 2021; 40:42-48. [PMID: 33368360 DOI: 10.1002/sim.8830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 11/09/2020] [Indexed: 11/08/2022]
Abstract
In this commentary, we revisit Sir Austin Bradford Hill's seminal Alfred Watson Memorial Lecture in 1962 through the eyes of two practicing biostatisticians of the current era. We summarize some eternal takeaway messages from Hill's lecture regarding observations and experiments translated through the modern lexicon of causal inference. Finally, we pose a series of questions that we would have liked to pose to Sir Austin Bradford Hill if he were to deliver the lecture in 2020.
Collapse
Affiliation(s)
- Walter Dempsey
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.,Institute of Social Research, University of Michigan, Ann Arbor, Michigan, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
27
|
Watt HC. Reflection on modern methods: Statistics education beyond 'significance': novel plain English interpretations to deepen understanding of statistics and to steer away from misinterpretations. Int J Epidemiol 2021; 49:2083-2088. [PMID: 32710113 DOI: 10.1093/ije/dyaa080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/01/2020] [Indexed: 11/14/2022] Open
Abstract
Concerns have been expressed over standards of statistical interpretation. Results with P <0.05 are often referred to as 'significant' which, in plain English, implies important. This leads some people directly into the misconception that this provides proof that associations are clinically relevant. There are calls for statistics educators to respond to these concerns. This article provides novel plain English interpretations that are designed to deepen understanding. Experience teaching postgraduates at Imperial College is discussed. A key issue with focusing on 'significance' is the common inappropriate practice of implying no association exists, simply because P >0.05. Referring to strengths of association in 'study participants' gives them gravitas, which may help to avoid this. This contrasts with the common practice of focusing on imprecision, by referring to the 'sample' and to 'point estimates'. Unlike formal statistical definitions, interpretations developed and presented here are rooted in the application of statistics. They are based on one set of study participants (not many random samples). Precision of strengths of association are based on using strengths in study participants to estimate strengths of association in the population (from which participants were selected by probability random sampling). Reference to 'compatibility with study data, dependent on statistical modelling assumptions' reminds us of the importance of data quality and modelling assumptions. A straightforward graph shows the relationship between P-values and test statistics. This figure and associated interpretations were developed to illuminate the continuous nature of P-values. This is designed to discourage focus on whether P <0.05, and encourage interpretation of exact P-values.
Collapse
|
28
|
Turner AN, Parmar N, Jovanovski A, Hearne G. Assessing Group-Based Changes in High-Performance Sport. Part 1: Null Hypothesis Significance Testing and the Utility of p Values. Strength Cond J 2021. [DOI: 10.1519/ssc.0000000000000625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
29
|
Ruberg SJ. Détente: A Practical Understanding of P values and Bayesian Posterior Probabilities. Clin Pharmacol Ther 2021; 109:1489-1498. [PMID: 32748400 PMCID: PMC8246739 DOI: 10.1002/cpt.2004] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 06/27/2020] [Indexed: 11/18/2022]
Abstract
Null hypothesis significance testing (NHST) with its benchmark P value < 0.05 has long been a stalwart of scientific reporting and such statistically significant findings have been used to imply scientifically or clinically significant findings. Challenges to this approach have arisen over the past 6 decades, but they have largely been unheeded. There is a growing movement for using Bayesian statistical inference to quantify the probability that a scientific finding is credible. There have been differences of opinion between the frequentist (i.e., NHST) and Bayesian schools of inference, and warnings about the use or misuse of P values have come from both schools of thought spanning many decades. Controversies in this arena have been heightened by the American Statistical Association statement on P values and the further denouncement of the term "statistical significance" by others. My experience has been that many scientists, including many statisticians, do not have a sound conceptual grasp of the fundamental differences in these approaches, thereby creating even greater confusion and acrimony. If we let A represent the observed data, and B represent the hypothesis of interest, then the fundamental distinction between these two approaches can be described as the frequentist approach using the conditional probability pr(A | B) (i.e., the P value), and the Bayesian approach using pr(B | A) (the posterior probability). This paper will further explain the fundamental differences in NHST and Bayesian approaches and demonstrate how they can co-exist harmoniously to guide clinical trial design and inference.
Collapse
|
30
|
Ringland V, Lewis MA, Dunleavy D. Beyond the p-value: Bayesian Statistics and Causation. JOURNAL OF EVIDENCE-BASED SOCIAL WORK (2019) 2021; 18:284-307. [PMID: 33131464 DOI: 10.1080/26408066.2020.1832011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Statistical paradigms limit the perspective and tools social work researchers use to study the world and answer questions impacting people and policy. Currently, quantitative social work researchers overwhelmingly rely on the frequentist paradigm of statistics. This paper discusses foundational differences between the frequentist and Bayesian statistical paradigms, describes basic concepts of Bayesian analysis, compares Bayesian and frequentist statistical analysis for a sample social work problem, and introduces two types of causal analyses built on Bayesian statistical thinking: counterfactual causality, and causality based on work by computer scientist Judea Pearl. Implications for social work research are discussed.
Collapse
Affiliation(s)
| | - Michael A Lewis
- Silberman College of Social Work, Hunter College, CUNY, New York, New York, USA
| | - Daniel Dunleavy
- College of Social Work, Florida State University, Tallahassee, Florida, USA
| |
Collapse
|
31
|
Mathur MB, VanderWeele TJ. Estimating publication bias in meta-analyses of peer-reviewed studies: A meta-meta-analysis across disciplines and journal tiers. Res Synth Methods 2021; 12:176-191. [PMID: 33108053 PMCID: PMC7954980 DOI: 10.1002/jrsm.1464] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 08/31/2020] [Accepted: 10/22/2020] [Indexed: 12/31/2022]
Abstract
Selective publication and reporting in individual papers compromise the scientific record, but are meta-analyses as compromised as their constituent studies? We systematically sampled 63 meta-analyses (each comprising at least 40 studies) in PLoS One, top medical journals, top psychology journals, and Metalab, an online, open-data database of developmental psychology meta-analyses. We empirically estimated publication bias in each, including only the peer-reviewed studies. Across all meta-analyses, we estimated that "statistically significant" results in the expected direction were only 1.17 times more likely to be published than "nonsignificant" results or those in the unexpected direction (95% CI: [0.93, 1.47]), with a confidence interval substantially overlapping the null. Comparable estimates were 0.83 for meta-analyses in PLoS One, 1.02 for top medical journals, 1.54 for top psychology journals, and 4.70 for Metalab. The severity of publication bias did differ across individual meta-analyses; in a small minority (10%; 95% CI: [2%, 21%]), publication bias appeared to favor "significant" results in the expected direction by more than threefold. We estimated that for 89% of meta-analyses, the amount of publication bias that would be required to attenuate the point estimate to the null exceeded the amount of publication bias estimated to be actually present in the vast majority of meta-analyses from the relevant scientific discipline (exceeding the 95th percentile of publication bias). Study-level measures ("statistical significance" with a point estimate in the expected direction and point estimate size) did not indicate more publication bias in higher-tier versus lower-tier journals, nor in the earliest studies published on a topic versus later studies. Overall, we conclude that the mere act of performing a meta-analysis with a large number of studies (at least 40) and that includes non-headline results may largely mitigate publication bias in meta-analyses, suggesting optimism about the validity of meta-analytic results.
Collapse
Affiliation(s)
- Maya B. Mathur
- Quantitative Sciences Unit, Stanford University, Palo Alto, California
| | - Tyler J. VanderWeele
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| |
Collapse
|
32
|
Rothman KJ. Rothman Responds to "Surprise!". Am J Epidemiol 2021; 190:194-195. [PMID: 33524113 DOI: 10.1093/aje/kwaa137] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 06/29/2020] [Accepted: 07/07/2020] [Indexed: 11/14/2022] Open
|
33
|
Greenland S. Analysis goals, error-cost sensitivity, and analysis hacking: Essential considerations in hypothesis testing and multiple comparisons. Paediatr Perinat Epidemiol 2021; 35:8-23. [PMID: 33269490 DOI: 10.1111/ppe.12711] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 06/28/2020] [Indexed: 01/03/2023]
Abstract
The "replication crisis" has been attributed to perverse incentives that lead to selective reporting and misinterpretations of P-values and confidence intervals. A crude fix offered for this problem is to lower testing cut-offs (α levels), either directly or in the form of null-biased multiple comparisons procedures such as naïve Bonferroni adjustments. Methodologists and statisticians have expressed positions that range from condemning all such procedures to demanding their application in almost all analyses. Navigating between these unjustifiable extremes requires defining analysis goals precisely enough to separate inappropriate from appropriate adjustments. To meet this need, I here review issues arising in single-parameter inference (such as error costs and loss functions) that are often skipped in basic statistics, yet are crucial to understanding controversies in testing and multiple comparisons. I also review considerations that should be made when examining arguments for and against modifications of decision cut-offs and adjustments for multiple comparisons. The goal is to provide researchers a better understanding of what is assumed by each side and to enable recognition of hidden assumptions. Basic issues of goal specification and error costs are illustrated with simple fixed cut-off hypothesis testing scenarios. These illustrations show how adjustment choices are extremely sensitive to implicit decision costs, making it inevitable that different stakeholders will vehemently disagree about what is necessary or appropriate. Because decisions cannot be justified without explicit costs, resolution of inference controversies is impossible without recognising this sensitivity. Pre-analysis statements of funding, scientific goals, and analysis plans can help counter demands for inappropriate adjustments, and can provide guidance as to what adjustments are advisable. Hierarchical (multilevel) regression methods (including Bayesian, semi-Bayes, and empirical-Bayes methods) provide preferable alternatives to conventional adjustments, insofar as they facilitate use of background information in the analysis model, and thus can provide better-informed estimates on which to base inferences and decisions.
Collapse
Affiliation(s)
- Sander Greenland
- Department of Epidemiology and Department of Statistics, University of California, Los Angeles, CA, USA
| |
Collapse
|
34
|
Rafi Z, Greenland S. Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise. BMC Med Res Methodol 2020; 20:244. [PMID: 32998683 PMCID: PMC7528258 DOI: 10.1186/s12874-020-01105-9] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 08/25/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Researchers often misinterpret and misrepresent statistical outputs. This abuse has led to a large literature on modification or replacement of testing thresholds and P-values with confidence intervals, Bayes factors, and other devices. Because the core problems appear cognitive rather than statistical, we review some simple methods to aid researchers in interpreting statistical outputs. These methods emphasize logical and information concepts over probability, and thus may be more robust to common misinterpretations than are traditional descriptions. METHODS We use the Shannon transform of the P-value p, also known as the binary surprisal or S-value s = -log2(p), to provide a measure of the information supplied by the testing procedure, and to help calibrate intuitions against simple physical experiments like coin tossing. We also use tables or graphs of test statistics for alternative hypotheses, and interval estimates for different percentile levels, to thwart fallacies arising from arbitrary dichotomies. Finally, we reinterpret P-values and interval estimates in unconditional terms, which describe compatibility of data with the entire set of analysis assumptions. We illustrate these methods with a reanalysis of data from an existing record-based cohort study. CONCLUSIONS In line with other recent recommendations, we advise that teaching materials and research reports discuss P-values as measures of compatibility rather than significance, compute P-values for alternative hypotheses whenever they are computed for null hypotheses, and interpret interval estimates as showing values of high compatibility with data, rather than regions of confidence. Our recommendations emphasize cognitive devices for displaying the compatibility of the observed data with various hypotheses of interest, rather than focusing on single hypothesis tests or interval estimates. We believe these simple reforms are well worth the minor effort they require.
Collapse
Affiliation(s)
- Zad Rafi
- Department of Population Health, NYU Langone Medical Center, 227 East 30th Street, New York, NY, 10016, USA.
| | - Sander Greenland
- Department of Epidemiology and Department of Statistics, University of California, Los Angeles, CA, USA
| |
Collapse
|
35
|
Mathur MB, VanderWeele TJ. Sensitivity analysis for publication bias in meta-analyses. J R Stat Soc Ser C Appl Stat 2020; 69:1091-1119. [PMID: 33132447 PMCID: PMC7590147 DOI: 10.1111/rssc.12440] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
We propose sensitivity analyses for publication bias in meta‐analyses. We consider a publication process such that ‘statistically significant’ results are more likely to be published than negative or “non‐significant” results by an unknown ratio, η. Our proposed methods also accommodate some plausible forms of selection based on a study's standard error. Using inverse probability weighting and robust estimation that accommodates non‐normal population effects, small meta‐analyses, and clustering, we develop sensitivity analyses that enable statements such as ‘For publication bias to shift the observed point estimate to the null, “significant” results would need to be at least 30 fold more likely to be published than negative or “non‐significant” results’. Comparable statements can be made regarding shifting to a chosen non‐null value or shifting the confidence interval. To aid interpretation, we describe empirical benchmarks for plausible values of η across disciplines. We show that a worst‐case meta‐analytic point estimate for maximal publication bias under the selection model can be obtained simply by conducting a standard meta‐analysis of only the negative and ‘non‐significant’ studies; this method sometimes indicates that no amount of such publication bias could ‘explain away’ the results. We illustrate the proposed methods by using real meta‐analyses and provide an R package: PublicationBias.
Collapse
|
36
|
Bendtsen M. The P Value Line Dance: When Does the Music Stop? J Med Internet Res 2020; 22:e21345. [PMID: 32852275 PMCID: PMC7484773 DOI: 10.2196/21345] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 07/05/2020] [Accepted: 07/27/2020] [Indexed: 12/02/2022] Open
Abstract
When should a trial stop? Such a seemingly innocent question evokes concerns of type I and II errors among those who believe that certainty can be the product of uncertainty and among researchers who have been told that they need to carefully calculate sample sizes, consider multiplicity, and not spend P values on interim analyses. However, the endeavor to dichotomize evidence into significant and nonsignificant has led to the basic driving force of science, namely uncertainty, to take a back seat. In this viewpoint we discuss that if testing the null hypothesis is the ultimate goal of science, then we need not worry about writing protocols, consider ethics, apply for funding, or run any experiments at all-all null hypotheses will be rejected at some point-everything has an effect. The job of science should be to unearth the uncertainties of the effects of treatments, not to test their difference from zero. We also show the fickleness of P values, how they may one day point to statistically significant results; and after a few more participants have been recruited, the once statistically significant effect suddenly disappears. We show plots which we hope would intuitively highlight that all assessments of evidence will fluctuate over time. Finally, we discuss the remedy in the form of Bayesian methods, where uncertainty leads; and which allows for continuous decision making to stop or continue recruitment, as new data from a trial is accumulated.
Collapse
Affiliation(s)
- Marcus Bendtsen
- Department of Health, Medicine and Caring Sciences, Division of Society and Health, Linköping, Sweden
| |
Collapse
|
37
|
Siegenthaler J, Pleyers T, Raillard M, Spadavecchia C, Levionnois OL. Effect of Medetomidine, Dexmedetomidine, and Their Reversal with Atipamezole on the Nociceptive Withdrawal Reflex in Beagles. Animals (Basel) 2020; 10:E1240. [PMID: 32708294 PMCID: PMC7401557 DOI: 10.3390/ani10071240] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 07/17/2020] [Accepted: 07/17/2020] [Indexed: 11/16/2022] Open
Abstract
The objectives were: (1) to compare the antinociceptive activity of dexmedetomidine and medetomidine, and (2) to investigate its modulation by atipamezole. This prospective, randomized, blinded experimental trial was carried out on eight beagles. During the first session, dogs received either medetomidine (MED) (0.02 mg kg-1 intravenously (IV)] or dexmedetomidine (DEX) [0.01 mg kg-1 IV), followed by either atipamezole (ATI) (0.1 mg kg-1) or an equivalent volume of saline (SAL) administered intramuscularly 45 min later. The opposite treatments were administered in a second session 10-14 days later. The nociceptive withdrawal reflex (NWR) threshold was determined using a continuous tracking approach. Sedation was scored (0 to 21) every 10 min. Both drugs (MED and DEX) increased the NWR thresholds significantly up to 5.0 (3.7-5.9) and 4.4 (3.9-4.8) times the baseline (p = 0.547), at seven (3-11) and six (4-9) minutes (p = 0.938), respectively. Sedation scores were not different between MED and DEX during the first 45 min (15 (12-17), p = 0.67). Atipamezole antagonized sedation within 25 (15-25) minutes (p = 0.008) and antinociception within five (3-6) minutes (p = 0.008). Following atipamezole, additional analgesics may be needed to maintain pain relief.
Collapse
Affiliation(s)
- Joëlle Siegenthaler
- Section of Anaesthesiology and Pain Therapy, Department of Clinical Veterinary Sciences, Vetsuisse Faculty, University of Berne, 3012 Bern, Switzerland; (J.S.); (T.P.); (M.R.); (C.S.)
| | - Tekla Pleyers
- Section of Anaesthesiology and Pain Therapy, Department of Clinical Veterinary Sciences, Vetsuisse Faculty, University of Berne, 3012 Bern, Switzerland; (J.S.); (T.P.); (M.R.); (C.S.)
| | - Mathieu Raillard
- Section of Anaesthesiology and Pain Therapy, Department of Clinical Veterinary Sciences, Vetsuisse Faculty, University of Berne, 3012 Bern, Switzerland; (J.S.); (T.P.); (M.R.); (C.S.)
- University Veterinary Teaching Hospital, School of Veterinary Science, Faculty of Science, The University of Sydney, Sydney 2006, Australia
| | - Claudia Spadavecchia
- Section of Anaesthesiology and Pain Therapy, Department of Clinical Veterinary Sciences, Vetsuisse Faculty, University of Berne, 3012 Bern, Switzerland; (J.S.); (T.P.); (M.R.); (C.S.)
| | - Olivier Louis Levionnois
- Section of Anaesthesiology and Pain Therapy, Department of Clinical Veterinary Sciences, Vetsuisse Faculty, University of Berne, 3012 Bern, Switzerland; (J.S.); (T.P.); (M.R.); (C.S.)
| |
Collapse
|
38
|
Loux T, Davy O. Adjusting Published Estimates for Exploratory Biases Using the Truncated Normal Distribution. AM STAT 2020. [DOI: 10.1080/00031305.2020.1775700] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Travis Loux
- Department of Epidemiology and Biostatistics, Saint Louis University, St. Louis, MO
| | - Orlando Davy
- Department of Epidemiology and Biostatistics, Saint Louis University, St. Louis, MO
| |
Collapse
|
39
|
Colling LJ, Szűcs D, De Marco D, Cipora K, Ulrich R, Nuerk HC, Soltanlou M, Bryce D, Chen SC, Schroeder PA, Henare DT, Chrystall CK, Corballis PM, Ansari D, Goffin C, Sokolowski HM, Hancock PJB, Millen AE, Langton SRH, Holmes KJ, Saviano MS, Tummino TA, Lindemann O, Zwaan RA, Lukavský J, Becková A, Vranka MA, Cutini S, Mammarella IC, Mulatti C, Bell R, Buchner A, Mieth L, Röer JP, Klein E, Huber S, Moeller K, Ocampo B, Lupiáñez J, Ortiz-Tudela J, de la Fuente J, Santiago J, Ouellet M, Hubbard EM, Toomarian EY, Job R, Treccani B, McShane BB. Registered Replication Report on Fischer, Castel, Dodd, and Pratt (2003). ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE 2020. [DOI: 10.1177/2515245920903079] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The attentional spatial-numerical association of response codes (Att-SNARC) effect (Fischer, Castel, Dodd, & Pratt, 2003)—the finding that participants are quicker to detect left-side targets when the targets are preceded by small numbers and quicker to detect right-side targets when they are preceded by large numbers—has been used as evidence for embodied number representations and to support strong claims about the link between number and space (e.g., a mental number line). We attempted to replicate Experiment 2 of Fischer et al. by collecting data from 1,105 participants at 17 labs. Across all 1,105 participants and four interstimulus-interval conditions, the proportion of times the effect we observed was positive (i.e., directionally consistent with the original effect) was .50. Further, the effects we observed both within and across labs were minuscule and incompatible with those observed by Fischer et al. Given this, we conclude that we failed to replicate the effect reported by Fischer et al. In addition, our analysis of several participant-level moderators (finger-counting habits, reading and writing direction, handedness, and mathematics fluency and mathematics anxiety) revealed no substantial moderating effects. Our results indicate that the Att-SNARC effect cannot be used as evidence to support strong claims about the link between number and space.
Collapse
|
40
|
Begg CB. In Defense of P Values. JNCI Cancer Spectr 2020; 4:pkaa012. [PMID: 32373778 PMCID: PMC7191891 DOI: 10.1093/jncics/pkaa012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Revised: 02/10/2020] [Accepted: 02/14/2020] [Indexed: 11/12/2022] Open
Abstract
Recently, a controversy has erupted regarding the use of statistical significance tests and the associated P values. Prominent academic statisticians have recommended that the use of statistical tests be discouraged or not used at all. This has naturally led to a lot of confusion among research investigators about the support in the academic statistical community for statistical methods in general. In fact, the controversy surrounding the use of P values has a long history. Critics of P values argue that their use encourages bad scientific practice, leading to the publication of far more false-positive and false-negative findings than the methodology would imply. The thesis of this commentary is that the problem is really human nature, the natural proclivity of scientists to believe their own theories and present data in the most favorable light. This is strongly encouraged by a celebrity culture that is fueled by academic institutions, the scientific journals, and the media. The importance of the truth-seeking tradition of the scientific method needs to be reinforced, and this is being helped by current initiatives to improve transparency in science and to encourage reproducible and replicable research. Statistical testing, used correctly, has an important and valuable place in the scientific tradition.
Collapse
Affiliation(s)
- Colin B Begg
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
41
|
Harvesting-induced stress in broilers: Comparison of a manual and a mechanical harvesting method under field conditions. Appl Anim Behav Sci 2019. [DOI: 10.1016/j.applanim.2019.104877] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
42
|
Affiliation(s)
- Luigi Pace
- Department of Economics and Statistics University of Udine Udine Italy
| | - Alessandra Salvan
- Department of Statistical Sciences University of Padova Padova Italy
| |
Collapse
|
43
|
Segal BD. Toward Replicability With Confidence Intervals for the Exceedance Probability. AM STAT 2019. [DOI: 10.1080/00031305.2019.1678521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
44
|
Barnett AG, Wren JD. Examination of CIs in health and medical journals from 1976 to 2019: an observational study. BMJ Open 2019; 9:e032506. [PMID: 31753893 PMCID: PMC6887056 DOI: 10.1136/bmjopen-2019-032506] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 11/04/2019] [Accepted: 11/06/2019] [Indexed: 11/05/2022] Open
Abstract
OBJECTIVES Previous research has shown clear biases in the distribution of published p values, with an excess below the 0.05 threshold due to a combination of p-hacking and publication bias. We aimed to examine the bias for statistical significance using published confidence intervals. DESIGN Observational study. SETTING Papers published in Medline since 1976. PARTICIPANTS Over 968 000 confidence intervals extracted from abstracts and over 350 000 intervals extracted from the full-text. OUTCOME MEASURES Cumulative distributions of lower and upper confidence interval limits for ratio estimates. RESULTS We found an excess of statistically significant results with a glut of lower intervals just above one and upper intervals just below 1. These excesses have not improved in recent years. The excesses did not appear in a set of over 100 000 confidence intervals that were not subject to p-hacking or publication bias. CONCLUSIONS The huge excesses of published confidence intervals that are just below the statistically significant threshold are not statistically plausible. Large improvements in research practice are needed to provide more results that better reflect the truth.
Collapse
Affiliation(s)
- Adrian Gerard Barnett
- Institute of Health and Biomedical Innovation, Queensland University of Technology, Kelvin Grove, Queensland, Australia
| | - Jonathan D Wren
- Arthritis and Clinical Immunology Research Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, Oklahoma, USA
- Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, USA
| |
Collapse
|
45
|
Bendtsen M. Electronic Screening for Alcohol Use and Brief Intervention by Email for University Students: Reanalysis of Findings From a Randomized Controlled Trial Using a Bayesian Framework. J Med Internet Res 2019; 21:e14419. [PMID: 31697242 PMCID: PMC6873145 DOI: 10.2196/14419] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 08/06/2019] [Accepted: 08/14/2019] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Almost a decade ago, Sweden became the first country to implement a national system enabling student health care centers across all universities to routinely administer (via email) an electronic alcohol screening and brief intervention to their students. The Alcohol email assessment and feedback study dismantling effectiveness for university students (AMADEUS-1) trial aimed to assess the effect of the student health care centers' routine practices by exploiting the lack of any standard timing for the email invitation and by masking trial participation from students. The original analyses adopted the conventional null hypothesis framework, and the results were consistently in the expected direction. However, since for some tests the P values did not pass the conventional .05 threshold, some of the analyses were necessarily inconclusive. OBJECTIVE The outcomes of the AMADEUS-1 trial were derived from the first 3 items of the Alcohol Use Disorders Identification Test (AUDIT-C). The aim of this paper was to reanalyze the two primary outcomes of the AMADEUS-1 trial (AUDIT-C scores and prevalence of risky drinking), using the same models used in the original publication but applying a Bayesian inference framework and interpretation. METHODS The same regression models used in the original analysis were employed in this reanalysis (linear and logistic regression). Model parameters were given uniform priors. Markov chain Monte Carlo was used for Bayesian inference, and posterior probabilities were calculated for prespecified thresholds of interest. RESULTS Where the null hypothesis tests showed inconclusive results, the Bayesian analysis showed that offering an intervention at baseline was preferable compared to offering nothing. At follow-up, the probability of a lower AUDIT-C score among those who had been offered an intervention at baseline was greater than 95%, as was the case when comparing the prevalence of risky drinking. CONCLUSIONS The Bayesian analysis allows for a more consistent perspective of the data collected in the trial, since dichotomization of evidence is not looked for at some arbitrary threshold. Results are presented that represent the data collected in the trial rather than trying to make conclusions about the existence of a population effect. Thus, policy makers can think about the value of keeping the national system without having to navigate the treacherous landscape of statistical significance. TRIAL REGISTRATION ISRCTN Registry ISRCTN28328154; http://www.isrctn.com/ISRCTN28328154.
Collapse
Affiliation(s)
- Marcus Bendtsen
- Department of Medical and Health Sciences, Linköping University, Linköping, Sweden
| |
Collapse
|
46
|
Abstract
Zusammenfassung. In der (Pädagogischen) Psychologie sind Replikationsstudien bislang extrem seltene Ausnahmen. Dieser Artikel legt dar, dass und warum Wiederholungsstudien unentbehrlich sind. Weiterhin wird der Frage nachgegangen, warum – trotz des enormen Mehrwerts – nahezu keine Replikationen publiziert werden und warum viele „Ergebnisse“ der psychologischen Forschung nicht replizierbar sind. Dass es sich bei diesen Sachverhalten nicht um Vermutungen handelt, wird durch vorliegende Untersuchungen belegt. Die Ursachen dafür liegen in verschiedenen – teilweise voneinander abhängigen – Ebenen des Wissenschaftssystems: die verbreitete – aber abwegige – Ansicht, „statistische Signifikanz“ indiziere auch die Wahrscheinlichkeit, einen Befund replizieren zu können; die Verwechslung von „statistisch signifikant“ mit relevant; die Unsitte, getestete Untersuchungshypothesen erst im Nachhinein (ex post), also in Kenntnis der Resultate einer Studie, aufgestellt zu haben, aber in der Publikation als theoretisch abgeleiteten Ausgangspunkt (d. h. a priori formuliert) auszugeben; die α-Fehler-Inflationierung durch multiple statistische Signifikanztestungen; das exklusive Berichten von Ergebnissen, welche die Forschungshypothesen stützen, verbunden mit dem Unterschlagen abweichender Befunde; mangelnde Konstruktvalidität der verwendeten Messinstrumente; Lug und Betrug in der Wissenschaft; die Geringschätzung von Replikationen durch Zeitschriftenherausgeber, Gutachter und Drittmittelgeber. All das führt dazu, dass fast ausschließlich „statistisch signifikante“ und „neue“ Ergebnisse veröffentlicht werden und falsche Theorien persistieren. Als Gegenmaßnahmen werden beispielhaft genannt: eine großzügige finanzielle Förderung von Replikationsprojekten und ihrer Publikation; die nachdrückliche gutachterliche Befürwortung der Veröffentlichung methodisch adäquater Wiederholungsstudien; die Bereitschaft von Fachzeitschriften, dafür genug Platz bereitzustellen; die Anerkennung des großen wissenschaftlichen Werts von Wiederholungsstudien, auch in Berufungsverfahren. Daraus ergibt sich, dass mit den aufgezeigten Möglichkeiten und Forderungen zur Etablierung und Förderung von Replikationsstudien unterschiedliche Adressaten parallel angesprochen werden müssen. Nachhaltige Veränderungen sind allerdings nur erreichbar, wenn die einzelnen Akteure (Forscher; Gutachter; Zeitschriftenherausgeber; Berufungskommissionen; Drittmittelgeber) ihre individuelle Verantwortung anerkennen und entsprechende Taten folgen lassen.
Collapse
Affiliation(s)
- Detlef H. Rost
- Southwest University Chongqing, Faculty of Psychology, Chongqing, P. R. China
- Philipps-Universität Marburg, Fachbereich Psychologie, Marburg, Deutschland
| | - Marc Bienefeld
- Universität Bielefeld, Fakultät für Erziehungswissenschaft, Bielefeld, Deutschland
| |
Collapse
|
47
|
Parsons N, Carey-Smith R, Dritsaki M, Griffin X, Metcalfe D, Perry D, Stengel D, Costa M. Statistical significance and p-values: guidelines for use and reporting. Bone Joint J 2019; 101-B:1179-1183. [PMID: 31564151 DOI: 10.1302/0301-620x.101b10.bjj-2019-0890] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Nick Parsons
- Statistics and Epidemiology Unit, Warwick Medical School, University of Warwick, Coventry, UK
| | - Richard Carey-Smith
- Sir Charles Gairdner Hospital and The University of Western Australia, Nedlands, Perth, Australia
| | - Melina Dritsaki
- Oxford Clinical Trials Research Unit, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Xavier Griffin
- Oxford Trauma, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - David Metcalfe
- Oxford Trauma, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Daniel Perry
- Oxford Trauma, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Dirk Stengel
- Department of Trauma and Orthopaedic Surgery, Centre for Clinical Research, Unfallkrankenhaus Berlin, Berlin, Germany
| | - Matthew Costa
- Oxford Trauma, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| |
Collapse
|
48
|
Gates S, Ealing E. Reporting and interpretation of results from clinical trials that did not claim a treatment difference: survey of four general medical journals. BMJ Open 2019; 9:e024785. [PMID: 31501094 PMCID: PMC6738699 DOI: 10.1136/bmjopen-2018-024785] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 06/03/2019] [Accepted: 07/30/2019] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVES To describe and summarise how the results of randomised controlled trials (RCTs) that did not find a significant treatment effect are reported, and to estimate how commonly trial reports make unwarranted claims. DESIGN We performed a retrospective survey of published RCTs, published in four high impact factor general medical journals between June 2016 and June 2017. SETTING Trials conducted in all settings were included. PARTICIPANTS 94 reports of RCTs that did not find a difference in their main comparison or comparisons were included. INTERVENTIONS All interventions. PRIMARY AND SECONDARY OUTCOMES We recorded the way the results of each trial for its primary outcome or outcomes were described in Results and Conclusions sections of the Abstract, using a 10-category classification. Other outcomes were whether confidence intervals (CIs) and p values were presented for the main treatment comparisons, and whether the results and conclusions referred to measures of uncertainty. We estimated the proportion of papers that made claims that were not justified by the results, or were open to multiple interpretations. RESULTS 94 trial reports (120 treatment comparisons) were included. In Results sections, for 58/120 comparisons (48.3%) the results of the study were re-stated, without interpretation, and 38/120 (31.7%) stated that there was no statistically significant difference. In Conclusions, 65/120 treatment comparisons (54.2%) stated that there was no treatment benefit, 14/120 (11.7%) that there was no significant benefit and 16/120 (13.3%) that there was no significant difference. CIs and p values were both presented by 84% of studies (79/94), but only 3/94 studies referred to uncertainty when drawing conclusions. CONCLUSIONS The majority of trials (54.2%) inappropriately interpreted a result that was not statistically significant as indicating no treatment benefit. Very few studies interpreted the result as indicating a lack of evidence against the null hypothesis of zero difference between the trial arms.
Collapse
Affiliation(s)
- Simon Gates
- Cancer Research UK Clinical Trials Unit, University of Birmingham, Birmingham, UK
| | - Elizabeth Ealing
- Warwick Clinical Trials Unit, University of Warwick, Coventry, UK
| |
Collapse
|
49
|
Wang H, Snapp SS, Fisher M, Viens F. A Bayesian analysis of longitudinal farm surveys in Central Malawi reveals yield determinants and site-specific management strategies. PLoS One 2019; 14:e0219296. [PMID: 31393872 PMCID: PMC6687183 DOI: 10.1371/journal.pone.0219296] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 06/20/2019] [Indexed: 11/18/2022] Open
Abstract
Understanding the challenges to increasing maize productivity in sub-Saharan Africa, especially agronomic factors that reduce on-farm crop yield, has important implications for policies to reduce national and global food insecurity. Previous research on the maize yield gap has tended to emphasize the size of the gap (theoretical vs. achievable yields), rather than what determines maize yield in specific contexts. As a result, there is insufficient evidence on the key agronomic and environmental factors that influence maize yield in a smallholder farm environment. In this study, we implemented a Bayesian analysis with plot-level longitudinal household survey data covering 1,197 plots and 320 farms in Central Malawi. Households were interviewed and monitored three times per year, in 2015 and 2016, to document farmer management practices and seasonal rainfall, and direct measurements were taken of plant and soil characteristics to quantify impact on plot-level maize yield stability. The results revealed a high positive association between a leaf chlorophyll indicator and maize yield, with significance levels exceeding 95% Bayesian credibility at all sites and a regression coefficient posterior mean from 28% to 42% on a relative scale. A parasitic weed, Striga asiatica, was the variable most consistently negatively associated with maize yield, exceeding 95% credibility in most cases, of high intensity, with regression means ranging from 23% to 38% on a relative scale. The influence of rainfall, either directly or indirectly, varied by site and season. We conclude that the factors preventing Striga infestation and enhancing nitrogen fertility will lead to higher maize yield in Malawi. To improve plant nitrogen status, fertilizer was effective at higher productivity sites, whereas soil carbon and organic inputs were important at marginal sites. Uniquely, a Bayesian approach allowed differentiation of response by site for a relatively modest sample size study (given the complexity of farm environments and management practices). Considering the biophysical constraints, our findings highlight management strategies for crop yields, and point towards area-specific recommendations for nitrogen management and crop yield.
Collapse
Affiliation(s)
- Han Wang
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, United States of America
| | - Sieglinde S. Snapp
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, United States of America
| | - Monica Fisher
- International Centre of Insect Physiology and Ecology, Nairobi, Kenya
| | - Frederi Viens
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, United States of America
| |
Collapse
|
50
|
Infanger D, Schmidt‐Trucksäss A. P
value functions: An underused method to present research results and to promote quantitative reasoning. Stat Med 2019; 38:4189-4197. [DOI: 10.1002/sim.8293] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Revised: 06/01/2019] [Accepted: 06/03/2019] [Indexed: 01/30/2023]
Affiliation(s)
- Denis Infanger
- Division of Sports and Exercise Medicine, Department of Sport, Exercise and HealthUniversity of Basel Basel Switzerland
| | - Arno Schmidt‐Trucksäss
- Division of Sports and Exercise Medicine, Department of Sport, Exercise and HealthUniversity of Basel Basel Switzerland
| |
Collapse
|