1
|
Rooprai P, Islam N, Salameh JP, Ebrahimzadeh S, Kazi A, Frank R, Ramsay T, Mathur MB, Absi M, Khalil A, Kazi S, Dawit H, Lam E, Fabiano N, McInnes MDF. Is There Evidence of P-Hacking in Imaging Research? Can Assoc Radiol J 2023; 74:497-507. [PMID: 36412994 PMCID: PMC10338063 DOI: 10.1177/08465371221139418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND P-hacking, the tendency to run selective analyses until they become significant, is prevalent in many scientific disciplines. PURPOSE This study aims to assess if p-hacking exists in imaging research. METHODS Protocol, data, and code available here https://osf.io/xz9ku/?view_only=a9f7c2d841684cb7a3616f567db273fa. We searched imaging journals Ovid MEDLINE from 1972 to 2021. Text mining using Python script was used to collect metadata: journal, publication year, title, abstract, and P-values from abstracts. One P-value was randomly sampled per abstract. We assessed for evidence of p-hacking using a p-curve, by evaluating for a concentration of P-values just below .05. We conducted a one-tailed binomial test (α = .05 level of significance) to assess whether there were more P-values falling in the upper range (e.g., .045 < P < .05) than in the lower range (e.g., .04 < P < .045). To assess variation in results introduced by our random sampling of a single P-value per abstract, we repeated the random sampling process 1000 times and pooled results across the samples. Analysis was done (divided into 10-year periods) to determine if p-hacking practices evolved over time. RESULTS Our search of 136 journals identified 967,981 abstracts. Text mining identified 293,687 P-values, and a total of 4105 randomly sampled P-values were included in the p-hacking analysis. The number of journals and abstracts that were included in the analysis as a fraction and percentage of the total number was, respectively, 108/136 (80%) and 4105/967,981 (.4%). P-values did not concentrate just under .05; in fact, there were more P-values falling in the lower range (e.g., .04 < P < .045) than falling just below .05 (e.g., .045 < P < .05), indicating lack of evidence for p-hacking. Time trend analysis did not identify p-hacking in any of the five 10-year periods. CONCLUSION We did not identify evidence of p-hacking in abstracts published in over 100 imaging journals since 1972. These analyses cannot detect all forms of p-hacking, and other forms of bias may exist in imaging research such as publication bias and selective outcome reporting.
Collapse
Affiliation(s)
- Paul Rooprai
- Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Nayaar Islam
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
| | - Jean-Paul Salameh
- Department of Radiology, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Sanam Ebrahimzadeh
- Department of Radiology, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | | | - Robert Frank
- Department of Radiology, Faculty of Medicine, Ottawa Hospital, Ottawa, ON, Canada
| | - Tim Ramsay
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, ON, Canada
| | - Maya B. Mathur
- Quantitative Sciences Unit and Department of Pediatrics, Stanford University, Ottawa, ON, Canada
| | - Marissa Absi
- Department of Radiology, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Ahmed Khalil
- Department of Radiology, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Sakib Kazi
- Department of Radiology, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Haben Dawit
- Department of Radiology, Faculty of Medicine, Ottawa Hospital, Ottawa, ON, Canada
| | - Eric Lam
- Department of Radiology, Faculty of Medicine, Ottawa Hospital, Ottawa, ON, Canada
| | | | | |
Collapse
|
2
|
Gupta A, Bosco F. Tempest in a teacup: An analysis of p-Hacking in organizational research. PLoS One 2023; 18:e0281938. [PMID: 36827325 PMCID: PMC9955613 DOI: 10.1371/journal.pone.0281938] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 02/04/2023] [Indexed: 02/25/2023] Open
Abstract
We extend questionable research practices (QRPs) research by conducting a robust, large-scale analysis of p-hacking in organizational research. We leverage a manually curated database of more than 1,000,000 correlation coefficients and sample sizes, with which we calculate exact p-values. We test for the prevalence and magnitude of p-hacking across the complete database as well as various subsets of the database according to common bivariate relation types in the organizational literature (e.g., attitudes-behaviors). Results from two analytical approaches (i.e., z-curve, critical bin comparisons) were consistent in both direction and significance in nine of 18 datasets. Critical bin comparisons indicated p-hacking in 12 of 18 subsets, three of which reached statistical significance. Z-curve analyses indicated p-hacking in 11 of 18 subsets, two of which reached statistical significance. Generally, results indicated that p-hacking is detectable but small in magnitude. We also tested for three predictors of p-hacking: Publication year, journal prestige, and authorship team size. Across two analytic approaches, we observed a relatively consistent positive relation between p-hacking and journal prestige, and no relationship between p-hacking and authorship team size. Results were mixed regarding the temporal trends (i.e., evidence for p-hacking over time). In sum, the present study of p-hacking in organizational research indicates that the prevalence of p-hacking is smaller and less concerning than earlier research has suggested.
Collapse
Affiliation(s)
- Alisha Gupta
- Department of Management and Entrepreneurship, School of Business, Virginia Commonwealth University, Richmond, VA, United States of America
- * E-mail:
| | - Frank Bosco
- Department of Management and Entrepreneurship, School of Business, Virginia Commonwealth University, Richmond, VA, United States of America
| |
Collapse
|
3
|
Picking apart p values: common problems and points of confusion. Knee Surg Sports Traumatol Arthrosc 2022; 30:3245-3248. [PMID: 35920843 DOI: 10.1007/s00167-022-07083-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 07/20/2022] [Indexed: 10/16/2022]
Abstract
Due to its frequent misuse, the p value has become a point of contention in the research community. In this editorial, we seek to clarify some of the common misconceptions about p values and the hazardous implications associated with misunderstanding this commonly used statistical concept. This article will discuss issues related to p value interpretation in addition to problems such as p-hacking and statistical fragility; we will also offer some thoughts on addressing these issues. The aim of this editorial is to provide clarity around the concept of statistical significance for those attempting to increase their statistical literacy in Orthopedic research.
Collapse
|
4
|
Reddy AK, Scott JT, Joshua Stephens B, Patel A, Checketts JX, Stotler WM, Hawkins BJ, Vassar M. Evaluation of Proposed Protocol Changing Statistical Significance From 0.05 to 0.005 in Foot and Ankle Randomized Controlled Trials. J Foot Ankle Surg 2022; 61:925-926. [PMID: 35367112 DOI: 10.1053/j.jfas.2022.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 03/17/2022] [Indexed: 02/03/2023]
Affiliation(s)
- Arjun K Reddy
- Department of Orthopedic Surgery, Oklahoma State University Medical Center, Tulsa, OK; Office of Medical Student Research, Oklahoma State University Center for Health Sciences, Tulsa, OK.
| | - Jared T Scott
- Department of Orthopedic Surgery, Oklahoma State University Medical Center, Tulsa, OK
| | - B Joshua Stephens
- Department of Orthopedic Surgery, Oklahoma State University Medical Center, Tulsa, OK
| | - Ashini Patel
- Department of Orthopedic Surgery, Oklahoma State University Medical Center, Tulsa, OK
| | - Jake X Checketts
- Department of Orthopedic Surgery, Oklahoma State University Medical Center, Tulsa, OK
| | - Wesley M Stotler
- Department of Orthopedic Surgery, Oklahoma State University Medical Center, Tulsa, OK
| | - Bryan J Hawkins
- Department of Orthopedic Surgery, Oklahoma State University Medical Center, Tulsa, OK
| | - Matt Vassar
- Office of Medical Student Research, Oklahoma State University Center for Health Sciences, Tulsa, OK; Department of Psychiatry and Behavioral Sciences, Oklahoma State University Center for Health Sciences, Tulsa, OK
| |
Collapse
|
5
|
SANKARAN A. Significance Chasing in Hand Surgery Literature. J Hand Surg Asian Pac Vol 2022; 27:661-664. [DOI: 10.1142/s2424835522500643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Introduction: Significance chasing occurs when data is manipulated to achieve statistical significance. Evidence for such practice is now well known across scientific disciplines. This study aimed to identify if such a phenomenon exists in Hand Surgery literature. Methods: All p values contained in the articles published in three prominent Hand Surgery journals were analysed. The preponderance of values just under 0.05 was then studied by statistical methods. Results: 3,124 p values were recorded, with 1,320 values <0.05. A statistically significant preponderance of values between 0.04 and 0.05 was noted (Binomial test, p = 0.0441). The 0.05 point was also found to have the greatest deviation from a best fit exponential curve. Conclusions: Significance chasing is possibly existent in Hand Surgery literature as well.
Collapse
Affiliation(s)
- Ajeesh SANKARAN
- Department of Hand and Reconstructive Microsurgery, KIMS Al Shifa Hospital, Perinthalmanna, Kerala, India
| |
Collapse
|
6
|
Williams BR, Freking WG, Ridley TJ, Agel J, Swiontkowski MF. The Proportion of Abstracts Presented at the 2010 American Academy of Orthopaedic Surgeons Annual Meeting Ultimately Published. Orthopedics 2020; 43:e263-e269. [PMID: 32324249 DOI: 10.3928/01477447-20200415-02] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 04/15/2019] [Indexed: 02/03/2023]
Abstract
As attendees of orthopedic meetings consider how to integrate presented information into their practice, it is helpful to consider the quality of the data presented. One surrogate metric is the proportion of and changes to presented abstracts that become journal publications. With this study, using the 2010 American Academy of Orthopaedic Surgeons (AAOS) Annual Meeting abstracts, the authors sought to answer the following questions: Did the publications following abstract presentations differ in terms of the conclusions, study subjects, or coauthors? What proportion of abstracts was published? What are the most common subtopics and journals, and what is the most common author country? Keywords and authors from the 2010 AAOS Annual Meeting proceedings program (698 podium and 548 poster abstracts) were searched in PubMed, Embase, and Google Scholar. If a publication resulted, differences in the conclusion, number of study subjects, and authorship between the abstract and the journal publication were tabulated. The proportion of abstracts published, specialty subtopics, authorship country, and journals of publication were collected. At journal publication, 1.7% of podium and 1.7% of poster conclusions changed. Mean number of authors for podium and poster increased significantly (P<.001), and 30% of podium and 44% of poster had a change in the number of study subjects. The overall journal publication percentage was 61% (68% podium and 53% poster). The majority of the authors were from the United States. The most common journal was The Journal of Bone & Joint Surgery. It is important to evaluate the usefulness and clinical applicability of meetings, especially the final disposition of conference abstracts, from various angles to ensure that they are as worthwhile and educational as possible. [Orthopedics. 2020;xx(x):xx-xx.].
Collapse
|
7
|
Reito A, Raittio L, Helminen O. Revisiting the Sample Size and Statistical Power of Randomized Controlled Trials in Orthopaedics After 2 Decades. JBJS Rev 2020; 8:e0079. [DOI: 10.2106/jbjs.rvw.19.00079] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
|
8
|
Reito A, Raittio L, Helminen O. Fragility Index, power, strength and robustness of findings in sports medicine and arthroscopic surgery: a secondary analysis of data from a study on use of the Fragility Index in sports surgery. PeerJ 2019; 7:e6813. [PMID: 31179168 PMCID: PMC6536113 DOI: 10.7717/peerj.6813] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 03/19/2019] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND A recent study concluded that most findings reported as significant in sports medicine and arthroscopic surgery are not "robust" when evaluated with the Fragility Index (FI). A secondary analysis of data from a previous study was performed to investigate (1) the correctness of the findings, (2) the association between FI, p-value and post hoc power, (3) median power to detect a medium effect size, and (4) the implementation of sample size analysis in these randomized controlled trials (RCTs). METHODS In addition to the 48 studies listed in the appendix accompanying the original study by Khan et al. (2017) we did a follow-up literature search and 18 additional studies were found. In total 66 studies were included in the analysis. We calculated post hoc power, p-values and confidence intervals associated with the main outcome variable. Use of a priori power analysis was recorded. The median power to detect small (h > 0.2), medium (h > 0.5), or large effect (h > 0.8) with a baseline proportion of events of 10% and 30% in each study included was calculated. Three simulation data sets were used to validate our findings. RESULTS Inconsistencies were found in eight studies. A priori power analysis was missing in one-fourth of studies (16/66). The median power to detect a medium effect size with a baseline proportion of events of 10% and 30% was 42% and 43%, respectively. The FI was inherently associated with the achieved p-value and post hoc power. DISCUSSION A relatively high proportion of studies had inconsistencies. The FI is a surrogate measure for p-value and post hoc power. Based on these studies, the median power in this field of research is suboptimal. There is an urgent need to investigate how well research claims in orthopedics hold in a replicated setting and the validity of research findings.
Collapse
Affiliation(s)
- Aleksi Reito
- Department of Surgery, Central Finland Hospital, Jyväskylä, Keski-Suomi, Finland
- Coxa Hospital for Joint Replacement Ltd, Tampere, Pirkanmaa, Finland
| | - Lauri Raittio
- Medical School, University of Tampere, Tampere, Finland
| | - Olli Helminen
- Department of Surgery, Central Finland Hospital, Jyväskylä, Keski-Suomi, Finland
- Cancer and Translational Medicine Research Unit, Oulu University Hospital, Oulu, Pohjois-Pohjanmaa, Finland
| |
Collapse
|