1
|
Fitzpatrick BG, Gorman DM, Trombatore C. Impact of redefining statistical significance on P-hacking and false positive rates: An agent-based model. PLoS One 2024; 19:e0303262. [PMID: 38753677 PMCID: PMC11098386 DOI: 10.1371/journal.pone.0303262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 04/23/2024] [Indexed: 05/18/2024] Open
Abstract
In recent years, concern has grown about the inappropriate application and interpretation of P values, especially the use of P<0.05 to denote "statistical significance" and the practice of P-hacking to produce results below this threshold and selectively reporting these in publications. Such behavior is said to be a major contributor to the large number of false and non-reproducible discoveries found in academic journals. In response, it has been proposed that the threshold for statistical significance be changed from 0.05 to 0.005. The aim of the current study was to use an evolutionary agent-based model comprised of researchers who test hypotheses and strive to increase their publication rates in order to explore the impact of a 0.005 P value threshold on P-hacking and published false positive rates. Three scenarios were examined, one in which researchers tested a single hypothesis, one in which they tested multiple hypotheses using a P<0.05 threshold, and one in which they tested multiple hypotheses using a P<0.005 threshold. Effects sizes were varied across models and output assessed in terms of researcher effort, number of hypotheses tested and number of publications, and the published false positive rate. The results supported the view that a more stringent P value threshold can serve to reduce the rate of published false positive results. Researchers still engaged in P-hacking with the new threshold, but the effort they expended increased substantially and their overall productivity was reduced, resulting in a decline in the published false positive rate. Compared to other proposed interventions to improve the academic publishing system, changing the P value threshold has the advantage of being relatively easy to implement and could be monitored and enforced with minimal effort by journal editors and peer reviewers.
Collapse
Affiliation(s)
- Ben G. Fitzpatrick
- Department of Mathematics, Loyola Marymount University, Los Angeles, California, United States of America
- Tempest Technologies, Los Angeles, California, United States of America
| | - Dennis M. Gorman
- Department of Epidemiology & Biostatistics, School of Public Health, Texas A&M University, College Station, Texas, United States of America
| | - Caitlin Trombatore
- Department of Mathematics, Loyola Marymount University, Los Angeles, California, United States of America
| |
Collapse
|
2
|
Kohrt F, Smaldino PE, McElreath R, Schönbrodt F. Replication of the natural selection of bad science. ROYAL SOCIETY OPEN SCIENCE 2023; 10:221306. [PMID: 36844805 PMCID: PMC9943874 DOI: 10.1098/rsos.221306] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 01/20/2023] [Indexed: 06/18/2023]
Abstract
This study reports an independent replication of the findings presented by Smaldino and McElreath (Smaldino, McElreath 2016 R. Soc. Open Sci. 3, 160384 (doi:10.1098/rsos.160384)). The replication was successful with one exception. We find that selection acting on scientist's propensity for replication frequency caused a brief period of exuberant replication not observed in the original paper due to a coding error. This difference does not, however, change the authors' original conclusions. We call for more replication studies for simulations as unique contributions to scientific quality assurance.
Collapse
Affiliation(s)
- Florian Kohrt
- Department of Psychology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Paul E. Smaldino
- Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
- Santa Fe Institute, Santa Fe, NM 87501, USA
| | - Richard McElreath
- Department of Human Behavior, Ecology, and Culture, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Felix Schönbrodt
- Department of Psychology, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
3
|
Kurreck C, Castaños-Vélez E, Freyer D, Blumenau S, Przesdzing I, Bernard R, Dirnagl U. Improving quality of preclinical academic research through auditing: A feasibility study. PLoS One 2020; 15:e0240719. [PMID: 33057427 PMCID: PMC7561085 DOI: 10.1371/journal.pone.0240719] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 09/27/2020] [Indexed: 11/19/2022] Open
Abstract
How much can we rely on whether what was reported in a study was actually done? Systematic and independent examination of records, documents and processes through audits are a central element of quality management systems. In the context of current concerns about the robustness and reproducibility of experimental biomedical research audits have been suggested as a remedy a number of times. However, audits are resource intense and time consuming, and due to their very nature may be perceived as inquisition. Consequently, there is very little experience or literature on auditing and assessments in the complex preclinical biomedical research environment. To gain some insight into which audit approaches might best suit biomedical research in academia, in this study we have applied a number of them in a typical academic neuroscience environment consisting of twelve research groups with about 100 researchers, students and technicians, utilizing the full gamut of state-of-the-art methodology. Several types of assessments and internal as well as external audits (including the novel format of a peer audit) were systematically explored by a team of quality management specialists. An experimental design template was developed (and is provided here) that takes into account and mitigates difficulties, risks and systematic errors that may occur during the course of a study. All audits were performed according to a pre-defined workflow developed by us. Outcomes were assessed qualitatively. We asked for feedback from participating employees in every final discussion of an audit and documented this in the audit reports. Based on these reports follow-up audits were improved. We conclude that several realistic options for auditing exist which have the potential to improve preclinical biomedical research in academia, and have listed specific recommendations regarding their benefits and provided practical resources for their implementation (e.g. study design and audit templates, audit workflow).
Collapse
Affiliation(s)
- Claudia Kurreck
- Department of Experimental Neurology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | | | - Dorette Freyer
- Department of Experimental Neurology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Sonja Blumenau
- Department of Experimental Neurology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Ingo Przesdzing
- Department of Experimental Neurology, Charité - Universitätsmedizin Berlin, Berlin, Germany
- QUEST Center for Transforming Biomedical Research, Berlin Institute of Health, Berlin, Germany
| | - Rene Bernard
- Department of Experimental Neurology, Charité - Universitätsmedizin Berlin, Berlin, Germany
- NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, Berlin, Germany
- QUEST Center for Transforming Biomedical Research, Berlin Institute of Health, Berlin, Germany
| | - Ulrich Dirnagl
- Department of Experimental Neurology, Charité - Universitätsmedizin Berlin, Berlin, Germany
- NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, Berlin, Germany
- QUEST Center for Transforming Biomedical Research, Berlin Institute of Health, Berlin, Germany
| |
Collapse
|
4
|
Abstract
Background: Universities closely watch international league tables because these tables influence governments, donors and students. Achieving a high ranking in a table, or an annual rise in ranking, allows universities to promote their achievements using an externally validated measure. However, league tables predominantly reward measures of research output, such as publications and citations, and may therefore be promoting poor research practices by encouraging the “publish or perish” mentality. Methods: We examined whether a league table could be created based on good research practice. We rewarded researchers who cited a reporting guideline, which help researchers report their research completely, accurately and transparently, and were created to reduce the waste of poorly described research. We used the EQUATOR guidelines, which means our tables are mostly relevant to health and medical research. We used Scopus to identify the citations. Results: Our cross-sectional tables for the years 2016 and 2017 included 14,408 papers with 47,876 author affiliations. We ranked universities and included a bootstrap measure of uncertainty. We clustered universities in five similar groups in an effort to avoid over-interpreting small differences in ranks. Conclusions: We believe there is merit in considering more socially responsible criteria for ranking universities, and this could encourage better research practice internationally if such tables become as valued as the current quantity-focused tables.
Collapse
Affiliation(s)
- Adrian G Barnett
- School of Public Health and Social Work & Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, QLD, 4059, Australia
| | - David Moher
- Centre for Journalology, Ottawa Hospital Research Institute, Ottawa, Ontario, ON K1H 8L6, Canada
| |
Collapse
|
5
|
Smaldino PE, Turner MA, Contreras Kallens PA. Correction to 'Open science and modified funding lotteries can impede the natural selection of bad science'. ROYAL SOCIETY OPEN SCIENCE 2019; 6:191249. [PMID: 31543978 PMCID: PMC6731693 DOI: 10.1098/rsos.191249] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
[This corrects the article DOI: 10.1098/rsos.190194.].
Collapse
|
6
|
Smaldino PE, Turner MA, Contreras Kallens PA. Open science and modified funding lotteries can impede the natural selection of bad science. ROYAL SOCIETY OPEN SCIENCE 2019; 6:190194. [PMID: 31417725 PMCID: PMC6689639 DOI: 10.1098/rsos.190194] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 06/04/2019] [Indexed: 06/10/2023]
Abstract
Assessing scientists using exploitable metrics can lead to the degradation of research methods even without any strategic behaviour on the part of individuals, via 'the natural selection of bad science.' Institutional incentives to maximize metrics like publication quantity and impact drive this dynamic. Removing these incentives is necessary, but institutional change is slow. However, recent developments suggest possible solutions with more rapid onsets. These include what we call open science improvements, which can reduce publication bias and improve the efficacy of peer review. In addition, there have been increasing calls for funders to move away from prestige- or innovation-based approaches in favour of lotteries. We investigated whether such changes are likely to improve the reproducibility of science even in the presence of persistent incentives for publication quantity through computational modelling. We found that modified lotteries, which allocate funding randomly among proposals that pass a threshold for methodological rigour, effectively reduce the rate of false discoveries, particularly when paired with open science improvements that increase the publication of negative results and improve the quality of peer review. In the absence of funding that targets rigour, open science improvements can still reduce false discoveries in the published literature but are less likely to improve the overall culture of research practices that underlie those publications.
Collapse
Affiliation(s)
- Paul E. Smaldino
- Department of Cognitive and Information Sciences, University of California, Merced, CA, USA
| | - Matthew A. Turner
- Department of Cognitive and Information Sciences, University of California, Merced, CA, USA
| | | |
Collapse
|
7
|
Wass MN, Ray L, Michaelis M. Understanding of researcher behavior is required to improve data reliability. Gigascience 2019; 8:giz017. [PMID: 30715291 PMCID: PMC6528747 DOI: 10.1093/gigascience/giz017] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Revised: 01/20/2019] [Accepted: 01/25/2019] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND A lack of data reproducibility ("reproducibility crisis") has been extensively debated across many academic disciplines. RESULTS Although a reproducibility crisis is widely perceived, conclusive data on the scale of the problem and the underlying reasons are largely lacking. The debate is primarily focused on methodological issues. However, examples such as the use of misidentified cell lines illustrate that the availability of reliable methods does not guarantee good practice. Moreover, research is often characterized by a lack of established methods. Despite the crucial importance of researcher conduct, research and conclusive data on the determinants of researcher behavior are widely missing. CONCLUSION Meta-research that establishes an understanding of the factors that determine researcher behavior is urgently needed. This knowledge can then be used to implement and iteratively improve measures that incentivize researchers to apply the highest standards, resulting in high-quality data.
Collapse
Affiliation(s)
- Mark N Wass
- Industrial Biotechnology Centre and School of Biosciences, University of Kent, Canterbury, CT2 7NJ, UK
| | - Larry Ray
- School of Social Policy, Sociology and Social Research, University of Kent, Canterbury, CT2 7NJ, UK
| | - Martin Michaelis
- Industrial Biotechnology Centre and School of Biosciences, University of Kent, Canterbury, CT2 7NJ, UK
| |
Collapse
|