1
|
Kulke L. Coregistration of EEG and eye-tracking in infants and developing populations. Atten Percept Psychophys 2024:10.3758/s13414-024-02857-y. [PMID: 38388851 DOI: 10.3758/s13414-024-02857-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/06/2024] [Indexed: 02/24/2024]
Abstract
Infants cannot be instructed where to look; therefore, infant researchers rely on observation of their participant's gaze to make inferences about their cognitive processes. They therefore started studying infant attention in the real world from early on. Developmental researchers were early adopters of methods combining observations of gaze and behaviour with electroencephalography (EEG) to study attention and other cognitive functions. However, the direct combination of eye-tracking methods and EEG to test infants is still rare, as it includes specific challenges. The current article reviews the development of co-registration research in infancy. It points out specific challenges of co-registration in infant research and suggests ways to overcome them. It ends with recommendations for implementing the co-registration of EEG and eye-tracking in infant research to maximise the benefits of the two measures and their combination and to orient on Open Science principles while doing so. In summary, this work shows that the co-registration of EEG and eye-tracking in infant research can be beneficial to studying natural and real-world behaviour despite its challenges.
Collapse
Affiliation(s)
- Louisa Kulke
- Department of Developmental Psychology with Educational Psychology, University of Bremen, Hochschulring 18, 28359, Bremen, Germany.
| |
Collapse
|
2
|
Schneck A. Are most published research findings false? Trends in statistical power, publication selection bias, and the false discovery rate in psychology (1975-2017). PLoS One 2023; 18:e0292717. [PMID: 37847689 PMCID: PMC10581498 DOI: 10.1371/journal.pone.0292717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 09/27/2023] [Indexed: 10/19/2023] Open
Abstract
The validity of scientific findings may be challenged by the replicability crisis (or cases of fraud), which may result not only in a loss of trust within society but may also lead to wrong or even harmful policy or medical decisions. The question is: how reliable are scientific results that are reported as statistically significant, and how does this reliability develop over time? Based on 35,515 papers in psychology published between 1975 and 2017 containing 487,996 test values, this article empirically examines the statistical power, publication bias, and p-hacking, as well as the false discovery rate. Assuming constant true effects, the statistical power was found to be lower than the suggested 80% except for large underlying true effects (d = 0.8) and increased only slightly over time. Also, publication bias and p-hacking were found to be substantial. The share of false discoveries among all significant results was estimated at 17.7%, assuming a proportion θ = 50% of all hypotheses being true and assuming that p-hacking is the only mechanism generating a higher proportion of just significant results compared to just nonsignificant results. As the analyses rely on multiple assumptions that cannot be tested, alternative scenarios were laid out, again resulting in the rather optimistic result that although research results may suffer from low statistical power and publication selection bias, most of the results reported as statistically significant may contain substantial results, rather than statistical artifacts.
Collapse
Affiliation(s)
- Andreas Schneck
- Department of Sociology, Ludwig-Maximilians-University, Munich, Germany
| |
Collapse
|
3
|
Kimmel K, Avolio ML, Ferraro PJ. Empirical evidence of widespread exaggeration bias and selective reporting in ecology. Nat Ecol Evol 2023; 7:1525-1536. [PMID: 37537387 DOI: 10.1038/s41559-023-02144-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 06/28/2023] [Indexed: 08/05/2023]
Abstract
In many scientific disciplines, common research practices have led to unreliable and exaggerated evidence about scientific phenomena. Here we describe some of these practices and quantify their pervasiveness in recent ecology publications in five popular journals. In an analysis of over 350 studies published between 2018 and 2020, we detect empirical evidence of exaggeration bias and selective reporting of statistically significant results. This evidence implies that the published effect sizes in ecology journals exaggerate the importance of the ecological relationships that they aim to quantify. An exaggerated evidence base hinders the ability of empirical ecology to reliably contribute to science, policy, and management. To increase the credibility of ecology research, we describe a set of actions that ecologists should take, including changes to scientific norms about what high-quality ecology looks like and expectations about what high-quality studies can deliver.
Collapse
Affiliation(s)
- Kaitlin Kimmel
- Mad Agriculture, Boulder, CO, USA
- Department of Earth and Planetary Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - Meghan L Avolio
- Department of Earth and Planetary Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - Paul J Ferraro
- Carey Business School, Johns Hopkins University, Baltimore, MD, USA.
- Department of Environmental Health and Engineering, a joint department of the Bloomberg School of Public Health and the Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
4
|
Keener SK, Kepes S, Torka AK. The trustworthiness of the cumulative knowledge in industrial/organizational psychology: The current state of affairs and a path forward. Acta Psychol (Amst) 2023; 239:104005. [PMID: 37625919 DOI: 10.1016/j.actpsy.2023.104005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 07/13/2023] [Accepted: 08/04/2023] [Indexed: 08/27/2023] Open
Abstract
The goal of industrial/organizational (IO) psychology, is to build and organize trustworthy knowledge about people-related phenomena in the workplace. Unfortunately, as with other scientific disciplines, our discipline may be experiencing a "crisis of confidence" stemming from the lack of reproducibility and replicability of many of our field's research findings, which would suggest that much of our research may be untrustworthy. If a scientific discipline's research is deemed untrustworthy, it can have dire consequences, including the withdraw of funding for future research. In this focal article, we review the current state of reproducibility and replicability in IO psychology and related fields. As part of this review, we discuss factors that make it less likely that research findings will be trustworthy, including the prevalence of scientific misconduct, questionable research practices (QRPs), and errors. We then identify some root causes of these issues and provide several potential remedies. In particular, we highlight the need for improved research methods and statistics training as well as a re-alignment of the incentive structure in academia. To accomplish this, we advocate for changes in the reward structure, improvements to the peer review process, and the implementation of open science practices. Overall, addressing the current "crisis of confidence" in IO psychology requires individual researchers, academic institutions, and publishers to embrace system-wide change.
Collapse
Affiliation(s)
- Sheila K Keener
- Department of Management, Old Dominion University, Norfolk, VA, United States of America.
| | - Sven Kepes
- Department of Management and Entrepreneurship, Virginia Commonwealth University, Richmond, VA, United States of America.
| | - Ann-Kathrin Torka
- Department of Social, Work, and Organizational Psychology, TU Dortmund University, Dortmund, Germany.
| |
Collapse
|
5
|
van den Akker OR, Wicherts JM, Alvarez LD, Bakker M, van Assen MALM. How do psychology researchers interpret the results of multiple replication studies? Psychon Bull Rev 2023; 30:1609-1620. [PMID: 36635588 PMCID: PMC10482796 DOI: 10.3758/s13423-022-02235-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/29/2022] [Indexed: 01/14/2023]
Abstract
Employing two vignette studies, we examined how psychology researchers interpret the results of a set of four experiments that all test a given theory. In both studies, we found that participants' belief in the theory increased with the number of statistically significant results, and that the result of a direct replication had a stronger effect on belief in the theory than the result of a conceptual replication. In Study 2, we additionally found that participants' belief in the theory was lower when they assumed the presence of p-hacking, but that belief in the theory did not differ between preregistered and non-preregistered replication studies. In analyses of individual participant data from both studies, we examined the heuristics academics use to interpret the results of four experiments. Only a small proportion (Study 1: 1.6%; Study 2: 2.2%) of participants used the normative method of Bayesian inference, whereas many of the participants' responses were in line with generally dismissed and problematic vote-counting approaches. Our studies demonstrate that many psychology researchers overestimate the evidence in favor of a theory if one or more results from a set of replication studies are statistically significant, highlighting the need for better statistical education.
Collapse
Affiliation(s)
- Olmo R van den Akker
- Department of Methodology and Statistics, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands.
| | - Jelte M Wicherts
- Department of Methodology and Statistics, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands
| | - Linda Dominguez Alvarez
- Department of Methodology and Statistics, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands
| | - Marjan Bakker
- Department of Methodology and Statistics, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands
| | - Marcel A L M van Assen
- Department of Methodology and Statistics, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands
- Department of Sociology, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
6
|
Winter B, Marghetis T. Multimodality matters in numerical communication. Front Psychol 2023; 14:1130777. [PMID: 37564312 PMCID: PMC10411739 DOI: 10.3389/fpsyg.2023.1130777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 05/10/2023] [Indexed: 08/12/2023] Open
Abstract
Modern society depends on numerical information, which must be communicated accurately and effectively. Numerical communication is accomplished in different modalities-speech, writing, sign, gesture, graphs, and in naturally occurring settings it almost always involves more than one modality at once. Yet the modalities of numerical communication are often studied in isolation. Here we argue that, to understand and improve numerical communication, we must take seriously this multimodality. We first discuss each modality on its own terms, identifying their commonalities and differences. We then argue that numerical communication is shaped critically by interactions among modalities. We boil down these interactions to four types: one modality can amplify the message of another; it can direct attention to content from another modality (e.g., using a gesture to guide attention to a relevant aspect of a graph); it can explain another modality (e.g., verbally explaining the meaning of an axis in a graph); and it can reinterpret a modality (e.g., framing an upwards-oriented trend as a bad outcome). We conclude by discussing how a focus on multimodality raises entirely new research questions about numerical communication.
Collapse
Affiliation(s)
- Bodo Winter
- Department of English Language and Linguistics, University of Birmingham, Birmingham, United Kingdom
| | - Tyler Marghetis
- Cognitive and Information Sciences, University of California, Merced, Merced, CA, United States
| |
Collapse
|
7
|
Gupta A, Bosco F. Tempest in a teacup: An analysis of p-Hacking in organizational research. PLoS One 2023; 18:e0281938. [PMID: 36827325 PMCID: PMC9955613 DOI: 10.1371/journal.pone.0281938] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 02/04/2023] [Indexed: 02/25/2023] Open
Abstract
We extend questionable research practices (QRPs) research by conducting a robust, large-scale analysis of p-hacking in organizational research. We leverage a manually curated database of more than 1,000,000 correlation coefficients and sample sizes, with which we calculate exact p-values. We test for the prevalence and magnitude of p-hacking across the complete database as well as various subsets of the database according to common bivariate relation types in the organizational literature (e.g., attitudes-behaviors). Results from two analytical approaches (i.e., z-curve, critical bin comparisons) were consistent in both direction and significance in nine of 18 datasets. Critical bin comparisons indicated p-hacking in 12 of 18 subsets, three of which reached statistical significance. Z-curve analyses indicated p-hacking in 11 of 18 subsets, two of which reached statistical significance. Generally, results indicated that p-hacking is detectable but small in magnitude. We also tested for three predictors of p-hacking: Publication year, journal prestige, and authorship team size. Across two analytic approaches, we observed a relatively consistent positive relation between p-hacking and journal prestige, and no relationship between p-hacking and authorship team size. Results were mixed regarding the temporal trends (i.e., evidence for p-hacking over time). In sum, the present study of p-hacking in organizational research indicates that the prevalence of p-hacking is smaller and less concerning than earlier research has suggested.
Collapse
Affiliation(s)
- Alisha Gupta
- Department of Management and Entrepreneurship, School of Business, Virginia Commonwealth University, Richmond, VA, United States of America
- * E-mail:
| | - Frank Bosco
- Department of Management and Entrepreneurship, School of Business, Virginia Commonwealth University, Richmond, VA, United States of America
| |
Collapse
|
8
|
Stefan AM, Schönbrodt FD. Big little lies: a compendium and simulation of p-hacking strategies. ROYAL SOCIETY OPEN SCIENCE 2023; 10:220346. [PMID: 36778954 PMCID: PMC9905987 DOI: 10.1098/rsos.220346] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 01/11/2023] [Indexed: 06/18/2023]
Abstract
In many research fields, the widespread use of questionable research practices has jeopardized the credibility of scientific results. One of the most prominent questionable research practices is p-hacking. Typically, p-hacking is defined as a compound of strategies targeted at rendering non-significant hypothesis testing results significant. However, a comprehensive overview of these p-hacking strategies is missing, and current meta-scientific research often ignores the heterogeneity of strategies. Here, we compile a list of 12 p-hacking strategies based on an extensive literature review, identify factors that control their level of severity, and demonstrate their impact on false-positive rates using simulation studies. We also use our simulation results to evaluate several approaches that have been proposed to mitigate the influence of questionable research practices. Our results show that investigating p-hacking at the level of strategies can provide a better understanding of the process of p-hacking, as well as a broader basis for developing effective countermeasures. By making our analyses available through a Shiny app and R package, we facilitate future meta-scientific research aimed at investigating the ramifications of p-hacking across multiple strategies, and we hope to start a broader discussion about different manifestations of p-hacking in practice.
Collapse
Affiliation(s)
- Angelika M. Stefan
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
- Department of Psychology, Universität der Bundeswehr München, München, Germany
| | - Felix D. Schönbrodt
- Department of Psychology, Ludwig-Maximilians-Universität München, Munchen, Germany
| |
Collapse
|
9
|
Mesquida C, Murphy J, Lakens D, Warne J. Replication concerns in sports and exercise science: a narrative review of selected methodological issues in the field. ROYAL SOCIETY OPEN SCIENCE 2022; 9:220946. [PMID: 36533197 PMCID: PMC9748505 DOI: 10.1098/rsos.220946] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 11/07/2022] [Indexed: 06/17/2023]
Abstract
Known methodological issues such as publication bias, questionable research practices and studies with underpowered designs are known to decrease the replicability of study findings. The presence of such issues has been widely established across different research fields, especially in psychology. Their presence raised the first concerns that the replicability of study findings could be low and led researchers to conduct large replication projects. These replication projects revealed that a significant portion of original study findings could not be replicated, giving rise to the conceptualization of the replication crisis. Although previous research in the field of sports and exercise science has identified the first warning signs, such as an overwhelming proportion of significant findings, small sample sizes and lack of data availability, their possible consequences for the replicability of our field have been overlooked. We discuss the consequences of the above issues on the replicability of our field and offer potential solutions to improve replicability.
Collapse
Affiliation(s)
- Cristian Mesquida
- Centre of Applied Science for Health, Technological University Dublin, Tallaght, Dublin, Ireland
| | - Jennifer Murphy
- Centre of Applied Science for Health, Technological University Dublin, Tallaght, Dublin, Ireland
| | - Daniël Lakens
- Human-Technology Interaction Group, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Joe Warne
- Centre of Applied Science for Health, Technological University Dublin, Tallaght, Dublin, Ireland
| |
Collapse
|
10
|
Schiavone SR, Vazire S. Reckoning With Our Crisis: An Agenda for the Field of Social and Personality Psychology. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2022; 18:710-722. [PMID: 36301777 DOI: 10.1177/17456916221101060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The replication crisis and credibility revolution in the 2010s brought a wave of doubts about the credibility of social and personality psychology. We argue that as a field, we must reckon with the concerns brought to light during this critical decade. How the field responds to this crisis will reveal our commitment to self-correction. If we do not take the steps necessary to address our problems and simply declare the crisis to be over or the problems to be fixed without evidence, we risk further undermining our credibility. To fully reckon with this crisis, we must empirically assess the state of the field to take stock of how credible our science actually is and whether it is improving. We propose an agenda for metascientific research, and we review approaches to empirically evaluate and track where we are as a field (e.g., analyzing the published literature, surveying researchers). We describe one such project (Surveying the Past and Present State of Published Studies in Social and Personality Psychology) underway in our research group. Empirical evidence about the state of our field is necessary if we are to take self-correction seriously and if we hope to avert future crises.
Collapse
Affiliation(s)
| | - Simine Vazire
- Melbourne School of Psychological Sciences, University of Melbourne
| |
Collapse
|
11
|
Grant S, Wendt KE, Leadbeater BJ, Supplee LH, Mayo-Wilson E, Gardner F, Bradshaw CP. Transparent, Open, and Reproducible Prevention Science. PREVENTION SCIENCE : THE OFFICIAL JOURNAL OF THE SOCIETY FOR PREVENTION RESEARCH 2022; 23:701-722. [PMID: 35175501 PMCID: PMC9283153 DOI: 10.1007/s11121-022-01336-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/05/2022] [Indexed: 01/20/2023]
Abstract
The field of prevention science aims to understand societal problems, identify effective interventions, and translate scientific evidence into policy and practice. There is growing interest among prevention scientists in the potential for transparency, openness, and reproducibility to facilitate this mission by providing opportunities to align scientific practice with scientific ideals, accelerate scientific discovery, and broaden access to scientific knowledge. The overarching goal of this manuscript is to serve as a primer introducing and providing an overview of open science for prevention researchers. In this paper, we discuss factors motivating interest in transparency and reproducibility, research practices associated with open science, and stakeholders engaged in and impacted by open science reform efforts. In addition, we discuss how and why different types of prevention research could incorporate open science practices, as well as ways that prevention science tools and methods could be leveraged to advance the wider open science movement. To promote further discussion, we conclude with potential reservations and challenges for the field of prevention science to address as it transitions to greater transparency, openness, and reproducibility. Throughout, we identify activities that aim to strengthen the reliability and efficiency of prevention science, facilitate access to its products and outputs, and promote collaborative and inclusive participation in research activities. By embracing principles of transparency, openness, and reproducibility, prevention science can better achieve its mission to advance evidence-based solutions to promote individual and collective well-being.
Collapse
Affiliation(s)
- Sean Grant
- Department of Social & Behavioral Sciences, Fairbanks School of Public Health, Indiana University Richard M, 1050 Wishard Blvd, Indianapolis, IN, 46202, USA.
| | - Kathleen E Wendt
- Department of Human Development and Family Studies, Colorado State University, Fort Collins, CO, USA
| | | | | | - Evan Mayo-Wilson
- Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, IN, USA
| | - Frances Gardner
- Department of Social Policy and Intervention, University of Oxford, Oxford, UK
| | - Catherine P Bradshaw
- School of Education & Human Development, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
12
|
Suter WN. Questionable Research Practices: How to Recognize and Avoid Them. HOME HEALTH CARE MANAGEMENT AND PRACTICE 2020. [DOI: 10.1177/1084822320934468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This article focuses on questionable research practices (QRPs) that bias findings and conclusions. QRPs cast doubt on the credibility of research findings in home health and nursing science in general. They assault the research integrity of all researchers to the extent they are permitted to exist at all. Each QRP is defined via bundles of specific research behaviors with unifying labels that include deceptive mirages and phantom sharpshooters among others. These questionable behaviors are described in ways that enhance research understanding and enable QRP avoidance by careful home health nurse researchers using higher standards of scientific rigor. QRPs impede scientific progress by generating false conclusions. They threaten the validity and dependability of scientific research and confuse other researchers who practice rigorous science and maintain integrity. QRPs also clog the literature with studies that cannot be replicated. When researchers engage in QRPs at the expense of rigor, overall trust in the scientific knowledge base erodes.
Collapse
|
13
|
Adda J, Decker C, Ottaviani M. P-hacking in clinical trials and how incentives shape the distribution of results across phases. Proc Natl Acad Sci U S A 2020; 117:13386-13392. [PMID: 32487730 PMCID: PMC7306753 DOI: 10.1073/pnas.1919906117] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Clinical research should conform to high standards of ethical and scientific integrity, given that human lives are at stake. However, economic incentives can generate conflicts of interest for investigators, who may be inclined to withhold unfavorable results or even tamper with data in order to achieve desired outcomes. To shed light on the integrity of clinical trial results, this paper systematically analyzes the distribution of P values of primary outcomes for phase II and phase III drug trials reported to the ClinicalTrials.gov registry. First, we detect no bunching of results just above the classical 5% threshold for statistical significance. Second, a density-discontinuity test reveals an upward jump at the 5% threshold for phase III results by small industry sponsors. Third, we document a larger fraction of significant results in phase III compared to phase II. Linking trials across phases, we find that early favorable results increase the likelihood of continuing into the next phase. Once we take into account this selective continuation, we can explain almost completely the excess of significant results in phase III for trials conducted by large industry sponsors. For small industry sponsors, instead, part of the excess remains unexplained.
Collapse
Affiliation(s)
- Jérôme Adda
- Department of Economics, Bocconi University, 20136 Milan, Italy
- Bocconi Institute for Data Science and Analytics, Bocconi University, 20136 Milan, Italy
- Innocenzo Gasparini Institute for Economic Research, Bocconi University, 20136 Milan, Italy
| | - Christian Decker
- Department of Economics, University of Zurich, 8001 Zurich, Switzerland
- UBS Center for Economics in Society, University of Zurich, 8001 Zurich, Switzerland
| | - Marco Ottaviani
- Department of Economics, Bocconi University, 20136 Milan, Italy;
- Bocconi Institute for Data Science and Analytics, Bocconi University, 20136 Milan, Italy
- Innocenzo Gasparini Institute for Economic Research, Bocconi University, 20136 Milan, Italy
| |
Collapse
|
14
|
Yeager DS, Krosnick JA, Visser PS, Holbrook AL, Tahk AM. Moderation of classic social psychological effects by demographics in the U.S. adult population: New opportunities for theoretical advancement. J Pers Soc Psychol 2019; 117:e84-e99. [PMID: 31464480 PMCID: PMC6918461 DOI: 10.1037/pspa0000171] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
For decades, social psychologists have collected data primarily from college undergraduates and, recently, from haphazard samples of adults. Yet researchers have routinely presumed that thus observed treatment effects characterize "people" in general. Tests of seven highly cited social psychological phenomena (two involving opinion change resulting from social influence and five involving the use of heuristics in social judgments) using data collected from randomly sampled, representative groups of American adults documented generalizability of the six phenomena that have been replicated previously with undergraduate samples. The 1 phenomenon (a cross-over interaction revealing an ease of retrieval effect) that has not been replicated successfully previously in undergraduate samples was also not observed here. However, the observed effect sizes for the replicated phenomena were notably smaller on average than the meta-analytic effect sizes documented by past studies of college students. Furthermore, the phenomena were strongest among participants with the demographic characteristics of the college students who typically provided data for past published studies, even after correcting for publication bias in past studies using a new method, called the behaviorally-informed file-drawer adjustment. The six successful replications suggest that phenomena identified in traditional laboratory research also appear as expected in representative samples but more weakly, so observed effect sizes should be generalized with caution. The evidence of demographic moderators suggests interesting opportunities for future research to better understand the mechanisms of the effects and their limiting conditions. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Jon A Krosnick
- Department of Communications, Political Science, and Psychology
| | | | | | | |
Collapse
|
15
|
Olsen J, Mosen J, Voracek M, Kirchler E. Research practices and statistical reporting quality in 250 economic psychology master's theses: a meta-research investigation. ROYAL SOCIETY OPEN SCIENCE 2019; 6:190738. [PMID: 31903199 PMCID: PMC6936276 DOI: 10.1098/rsos.190738] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Accepted: 11/28/2019] [Indexed: 05/30/2023]
Abstract
The replicability of research findings has recently been disputed across multiple scientific disciplines. In constructive reaction, the research culture in psychology is facing fundamental changes, but investigations of research practices that led to these improvements have almost exclusively focused on academic researchers. By contrast, we investigated the statistical reporting quality and selected indicators of questionable research practices (QRPs) in psychology students' master's theses. In a total of 250 theses, we investigated utilization and magnitude of standardized effect sizes, along with statistical power, the consistency and completeness of reported results, and possible indications of p-hacking and further testing. Effect sizes were reported for 36% of focal tests (median r = 0.19), and only a single formal power analysis was reported for sample size determination (median observed power 1 - β = 0.67). Statcheck revealed inconsistent p-values in 18% of cases, while 2% led to decision errors. There were no clear indications of p-hacking or further testing. We discuss our findings in the light of promoting open science standards in teaching and student supervision.
Collapse
Affiliation(s)
- Jerome Olsen
- Faculty of Psychology, Department of Applied Psychology: Work, Education and Economy, University of Vienna, Universitaetsstrasse 7, 1010 Vienna, Austria
| | - Johanna Mosen
- Faculty of Psychology, Department of Applied Psychology: Work, Education and Economy, University of Vienna, Universitaetsstrasse 7, 1010 Vienna, Austria
| | - Martin Voracek
- Faculty of Psychology, Department of Basic Psychological Research and Research Methods, University of Vienna, Liebiggasse 5, 1010 Vienna, Austria
| | - Erich Kirchler
- Faculty of Psychology, Department of Applied Psychology: Work, Education and Economy, University of Vienna, Universitaetsstrasse 7, 1010 Vienna, Austria
| |
Collapse
|
16
|
Bruns SB, Asanov I, Bode R, Dunger M, Funk C, Hassan SM, Hauschildt J, Heinisch D, Kempa K, König J, Lips J, Verbeck M, Wolfschütz E, Buenstorf G. Reporting errors and biases in published empirical findings: Evidence from innovation research. RESEARCH POLICY 2019. [DOI: 10.1016/j.respol.2019.05.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
17
|
Erdfelder E, Heck DW. Detecting Evidential Value and p-Hacking With the p-Curve Tool. ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY 2019. [DOI: 10.1027/2151-2604/a000383] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Abstract. Simonsohn, Nelson, and Simmons (2014a) proposed p-curve – the distribution of statistically significant p-values for a set of studies – as a tool to assess the evidential value of these studies. They argued that, whereas right-skewed p-curves indicate true underlying effects, left-skewed p-curves indicate selective reporting of significant results when there is no true effect (“ p-hacking”). We first review previous research showing that, in contrast to the first claim, null effects may produce right-skewed p-curves under some conditions. We then question the second claim by showing that not only selective reporting but also selective nonreporting of significant results due to a significant outcome of a more popular alternative test of the same hypothesis may produce left-skewed p-curves, even if all studies reflect true effects. Hence, just as right-skewed p-curves do not necessarily imply evidential value, left-skewed p-curves do not necessarily imply p-hacking and absence of true effects in the studies involved.
Collapse
Affiliation(s)
- Edgar Erdfelder
- Department of Psychology, School of Social Sciences, University of Mannheim, Germany
| | - Daniel W. Heck
- Department of Psychology, School of Social Sciences, University of Mannheim, Germany
- Department of Psychology, University of Marburg, Germany
| |
Collapse
|
18
|
P-curving the fusiform face area: Meta-analyses support the expertise hypothesis. Neurosci Biobehav Rev 2019; 104:209-221. [DOI: 10.1016/j.neubiorev.2019.07.003] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 06/30/2019] [Accepted: 07/01/2019] [Indexed: 11/22/2022]
|
19
|
Carter EC, Schönbrodt FD, Gervais WM, Hilgard J. Correcting for Bias in Psychology: A Comparison of Meta-Analytic Methods. ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE 2019. [DOI: 10.1177/2515245919847196] [Citation(s) in RCA: 169] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Publication bias and questionable research practices in primary research can lead to badly overestimated effects in meta-analysis. Methodologists have proposed a variety of statistical approaches to correct for such overestimation. However, it is not clear which methods work best for data typically seen in psychology. Here, we present a comprehensive simulation study in which we examined how some of the most promising meta-analytic methods perform on data that might realistically be produced by research in psychology. We simulated several levels of questionable research practices, publication bias, and heterogeneity, and used study sample sizes empirically derived from the literature. Our results clearly indicated that no single meta-analytic method consistently outperformed all the others. Therefore, we recommend that meta-analysts in psychology focus on sensitivity analyses—that is, report on a variety of methods, consider the conditions under which these methods fail (as indicated by simulation studies such as ours), and then report how conclusions might change depending on which conditions are most plausible. Moreover, given the dependence of meta-analytic methods on untestable assumptions, we strongly recommend that researchers in psychology continue their efforts to improve the primary literature and conduct large-scale, preregistered replications. We provide detailed results and simulation code at https://osf.io/rf3ys and interactive figures at http://www.shinyapps.org/apps/metaExplorer/ .
Collapse
Affiliation(s)
- Evan C. Carter
- Human Research and Engineering Directorate, U.S. Army Research Laboratory, Aberdeen, Maryland
| | | | | | | |
Collapse
|
20
|
Carbine KA, Lindsey HM, Rodeback RE, Larson MJ. Quantifying evidential value and selective reporting in recent and 10-year past psychophysiological literature: A pre-registered P-curve analysis. Int J Psychophysiol 2019; 142:33-49. [PMID: 31195065 DOI: 10.1016/j.ijpsycho.2019.06.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Revised: 06/04/2019] [Accepted: 06/06/2019] [Indexed: 12/20/2022]
Abstract
Selective reporting (i.e., only reporting significant findings as opposed to all analyses or results) is a questionable research practice that undermines the integrity of published research. Psychophysiology research may be susceptible to selective reporting, given the high number of decision points and methodological complexity in analyses of psychophysiology data. We aimed to assess the presence of selective reporting and evidential value (i.e., that significant results are due to true underlying effects) in recent and past psychophysiological research by utilizing p-curve analyses. Study protocols and methods were pre-registered on the Open Science Framework (OSF). P-values and the associated test statistics were extracted from articles in the most recent issue (as of January 2018) and 10-year previous counterpart issue of three major psychophysiology journals: Psychophysiology, International Journal of Psychophysiology, and Journal of Psychophysiology. Using the p-curve application, 10 primary p-curves were conducted: all recent articles, all past articles, recent articles split by journal, past articles split by journal, recent cognitive electrophysiology articles, and past cognitive electrophysiology articles. Evidential value and generally adequate average power (≥78% average power) were present in all p-curves, except those that only included articles from the Journal of Psychophysiology because of the small number of articles published in the journal. Findings provide some positive news and indicate that, generally, results were not selectively reported, and selective reporting may not be a primary issue for this sample of psychophysiological research. Future p-curve analyses examining sub-disciplines of psychophysiology are recommended.
Collapse
Affiliation(s)
- Kaylie A Carbine
- Department of Psychology, Brigham Young University, Provo, UT 84602, United States of America
| | - Hannah M Lindsey
- Department of Psychology, Brigham Young University, Provo, UT 84602, United States of America
| | - Rebekah E Rodeback
- Department of Psychology, Brigham Young University, Provo, UT 84602, United States of America
| | - Michael J Larson
- Department of Psychology, Brigham Young University, Provo, UT 84602, United States of America; Neuroscience Center, Brigham Young University, Provo, UT 84602, United States of America.
| |
Collapse
|
21
|
van Aert RCM, Wicherts JM, van Assen MALM. Publication bias examined in meta-analyses from psychology and medicine: A meta-meta-analysis. PLoS One 2019; 14:e0215052. [PMID: 30978228 PMCID: PMC6461282 DOI: 10.1371/journal.pone.0215052] [Citation(s) in RCA: 117] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 03/26/2019] [Indexed: 01/08/2023] Open
Abstract
Publication bias is a substantial problem for the credibility of research in general and of meta-analyses in particular, as it yields overestimated effects and may suggest the existence of non-existing effects. Although there is consensus that publication bias exists, how strongly it affects different scientific literatures is currently less well-known. We examined evidence of publication bias in a large-scale data set of primary studies that were included in 83 meta-analyses published in Psychological Bulletin (representing meta-analyses from psychology) and 499 systematic reviews from the Cochrane Database of Systematic Reviews (CDSR; representing meta-analyses from medicine). Publication bias was assessed on all homogeneous subsets (3.8% of all subsets of meta-analyses published in Psychological Bulletin) of primary studies included in meta-analyses, because publication bias methods do not have good statistical properties if the true effect size is heterogeneous. Publication bias tests did not reveal evidence for bias in the homogeneous subsets. Overestimation was minimal but statistically significant, providing evidence of publication bias that appeared to be similar in both fields. However, a Monte-Carlo simulation study revealed that the creation of homogeneous subsets resulted in challenging conditions for publication bias methods since the number of effect sizes in a subset was rather small (median number of effect sizes equaled 6). Our findings are in line with, in its most extreme case, publication bias ranging from no bias until only 5% statistically nonsignificant effect sizes being published. These and other findings, in combination with the small percentages of statistically significant primary effect sizes (28.9% and 18.9% for subsets published in Psychological Bulletin and CDSR), led to the conclusion that evidence for publication bias in the studied homogeneous subsets is weak, but suggestive of mild publication bias in both psychology and medicine.
Collapse
Affiliation(s)
- Robbie C. M. van Aert
- Department of Methodology and Statistics, Tilburg University, Tilburg, the Netherlands
| | - Jelte M. Wicherts
- Department of Methodology and Statistics, Tilburg University, Tilburg, the Netherlands
| | - Marcel A. L. M. van Assen
- Department of Methodology and Statistics, Tilburg University, Tilburg, the Netherlands
- Department of Sociology, Utrecht University, Utrecht, the Netherlands
| |
Collapse
|
22
|
Olsson-Collentine A, van Assen MALM, Hartgerink CHJ. The Prevalence of Marginally Significant Results in Psychology Over Time. Psychol Sci 2019; 30:576-586. [PMID: 30789796 PMCID: PMC6472145 DOI: 10.1177/0956797619830326] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Accepted: 11/01/2018] [Indexed: 11/15/2022] Open
Abstract
We examined the percentage of p values (.05 < p ≤ .10) reported as marginally significant in 44,200 articles, across nine psychology disciplines, published in 70 journals belonging to the American Psychological Association between 1985 and 2016. Using regular expressions, we extracted 42,504 p values between .05 and .10. Almost 40% of p values in this range were reported as marginally significant, although there were considerable differences between disciplines. The practice is most common in organizational psychology (45.4%) and least common in clinical psychology (30.1%). Contrary to what was reported by previous researchers, our results showed no evidence of an increasing trend in any discipline; in all disciplines, the percentage of p values reported as marginally significant was decreasing or constant over time. We recommend against reporting these results as marginally significant because of the low evidential value of p values between .05 and .10.
Collapse
Affiliation(s)
| | - Marcel A. L. M. van Assen
- Department of Methodology and Statistics, Tilburg University
- Department of Sociology, Utrecht University
| | | |
Collapse
|
23
|
Turner DP. P-Hacking in Headache Research. Headache 2019; 58:196-198. [PMID: 29411370 DOI: 10.1111/head.13257] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Accepted: 01/04/2018] [Indexed: 11/29/2022]
Affiliation(s)
- Dana P Turner
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
24
|
Davis WE, Giner-Sorolla R, Lindsay DS, Lougheed JP, Makel MC, Meier ME, Sun J, Vaughn LA, Zelenski JM. Peer-Review Guidelines Promoting Replicability and Transparency in Psychological Science. ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE 2018. [DOI: 10.1177/2515245918806489] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
More and more psychological researchers have come to appreciate the perils of common but poorly justified research practices and are rethinking commonly held standards for evaluating research. As this methodological reform expresses itself in psychological research, peer reviewers of such work must also adapt their practices to remain relevant. Reviewers of journal submissions wield considerable power to promote methodological reform, and thereby contribute to the advancement of a more robust psychological literature. We describe concrete practices that reviewers can use to encourage transparency, intellectual humility, and more valid assessments of the methods and statistics reported in articles.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Jessie Sun
- Department of Psychology, University of California, Davis
| | | | | |
Collapse
|
25
|
|
26
|
Aczel B, Palfi B, Szollosi A, Kovacs M, Szaszi B, Szecsi P, Zrubka M, Gronau QF, van den Bergh D, Wagenmakers EJ. Quantifying Support for the Null Hypothesis in Psychology: An Empirical Investigation. ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE 2018. [DOI: 10.1177/2515245918773742] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In the traditional statistical framework, nonsignificant results leave researchers in a state of suspended disbelief. In this study, we examined, empirically, the treatment and evidential impact of nonsignificant results. Our specific goals were twofold: to explore how psychologists interpret and communicate nonsignificant results and to assess how much these results constitute evidence in favor of the null hypothesis. First, we examined all nonsignificant findings mentioned in the abstracts of the 2015 volumes of Psychonomic Bulletin & Review, Journal of Experimental Psychology: General, and Psychological Science ( N = 137). In 72% of these cases, nonsignificant results were misinterpreted, in that the authors inferred that the effect was absent. Second, a Bayes factor reanalysis revealed that fewer than 5% of the nonsignificant findings provided strong evidence (i.e., BF01 > 10) in favor of the null hypothesis over the alternative hypothesis. We recommend that researchers expand their statistical tool kit in order to correctly interpret nonsignificant results and to be able to evaluate the evidence for and against the null hypothesis.
Collapse
Affiliation(s)
- Balazs Aczel
- Institute of Psychology, ELTE Eötvös Loránd University
| | - Bence Palfi
- School of Psychology, University of Sussex
- Sackler Centre for Consciousness Science, University of Sussex
| | - Aba Szollosi
- School of Psychology, University of New South Wales
| | - Marton Kovacs
- Institute of Psychology, ELTE Eötvös Loránd University
| | - Barnabas Szaszi
- Institute of Psychology, ELTE Eötvös Loránd University
- Doctoral School of Psychology, ELTE Eötvös Loránd University
| | - Peter Szecsi
- Institute of Psychology, ELTE Eötvös Loránd University
| | - Mark Zrubka
- Institute of Psychology, ELTE Eötvös Loránd University
| | | | | | | |
Collapse
|
27
|
Steiger A, Kühberger A. A Meta-Analytic Re-Appraisal of the Framing Effect. ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY 2018. [DOI: 10.1027/2151-2604/a000321] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Abstract. We reevaluated and reanalyzed the data of Kühberger’s (1998) meta-analysis on framing effects in risky decision making by using p-curve. This method uses the distribution of only significant p-values to correct the effect size, thus taking publication bias into account. We found a corrected overall effect size of d = 0.52, which is considerably higher than the effect reported by Kühberger (d = 0.31). Similarly to the original analysis, most moderators proved to be effective, indicating that there is not the risky-choice framing effect. Rather, the effect size varies with different manipulations of the framing task. Taken together, the p-curve analysis shows that there are reliable risky-choice framing effects, and that there is no evidence of intense p-hacking. Comparing the corrected estimate to the effect size reported in the Many Labs Replication Project (MLRP) on gain-loss framing (d = 0.60) shows that the two estimates are surprisingly similar in size. Finally, we conducted a new meta-analysis of risk framing experiments published in 2016 and again found a similar effect size (d = 0.56). Thus, although there is discussion on the adequate explanation for framing effects, there is no doubt about their existence: risky-choice framing effects are highly reliable and robust. No replicability crisis there.
Collapse
Affiliation(s)
| | - Anton Kühberger
- Department of Psychology, University of Salzburg, Austria
- Centre of Cognitive Neurosciences, University of Salzburg, Austria
| |
Collapse
|
28
|
Wicherts JM. The Weak Spots in Contemporary Science (and How to Fix Them). Animals (Basel) 2017; 7:E90. [PMID: 29186879 PMCID: PMC5742784 DOI: 10.3390/ani7120090] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2017] [Revised: 11/19/2017] [Accepted: 11/23/2017] [Indexed: 01/17/2023] Open
Abstract
In this review, the author discusses several of the weak spots in contemporary science, including scientific misconduct, the problems of post hoc hypothesizing (HARKing), outcome switching, theoretical bloopers in formulating research questions and hypotheses, selective reading of the literature, selective citing of previous results, improper blinding and other design failures, p-hacking or researchers' tendency to analyze data in many different ways to find positive (typically significant) results, errors and biases in the reporting of results, and publication bias. The author presents some empirical results highlighting problems that lower the trustworthiness of reported results in scientific literatures, including that of animal welfare studies. Some of the underlying causes of these biases are discussed based on the notion that researchers are only human and hence are not immune to confirmation bias, hindsight bias, and minor ethical transgressions. The author discusses solutions in the form of enhanced transparency, sharing of data and materials, (post-publication) peer review, pre-registration, registered reports, improved training, reporting guidelines, replication, dealing with publication bias, alternative inferential techniques, power, and other statistical tools.
Collapse
Affiliation(s)
- Jelte M Wicherts
- Department of Methodology and Statistics, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands.
| |
Collapse
|
29
|
van Aert RCM, Wicherts JM, van Assen MALM. Conducting Meta-Analyses Based on p Values: Reservations and Recommendations for Applying p-Uniform and p-Curve. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2017; 11:713-729. [PMID: 27694466 PMCID: PMC5117126 DOI: 10.1177/1745691616650874] [Citation(s) in RCA: 98] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Because of overwhelming evidence of publication bias in psychology, techniques to correct meta-analytic estimates for such bias are greatly needed. The methodology on which the p-uniform and p-curve methods are based has great promise for providing accurate meta-analytic estimates in the presence of publication bias. However, in this article, we show that in some situations, p-curve behaves erratically, whereas p-uniform may yield implausible estimates of negative effect size. Moreover, we show that (and explain why) p-curve and p-uniform result in overestimation of effect size under moderate-to-large heterogeneity and may yield unpredictable bias when researchers employ p-hacking. We offer hands-on recommendations on applying and interpreting results of meta-analyses in general and p-uniform and p-curve in particular. Both methods as well as traditional methods are applied to a meta-analysis on the effect of weight on judgments of importance. We offer guidance for applying p-uniform or p-curve using R and a user-friendly web application for applying p-uniform.
Collapse
Affiliation(s)
| | | | - Marcel A L M van Assen
- Department of Methodology and Statistics, Tilburg University Department of Social and Behavioral Sciences, Utrecht University
| |
Collapse
|
30
|
Aczel B, Palfi B, Szaszi B. Estimating the evidential value of significant results in psychological science. PLoS One 2017; 12:e0182651. [PMID: 28820905 PMCID: PMC5562314 DOI: 10.1371/journal.pone.0182651] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 07/21/2017] [Indexed: 11/18/2022] Open
Abstract
Quantifying evidence is an inherent aim of empirical science, yet the customary statistical methods in psychology do not communicate the degree to which the collected data serve as evidence for the tested hypothesis. In order to estimate the distribution of the strength of evidence that individual significant results offer in psychology, we calculated Bayes factors (BF) for 287,424 findings of 35,515 articles published in 293 psychological journals between 1985 and 2016. Overall, 55% of all analyzed results were found to provide BF > 10 (often labeled as strong evidence) for the alternative hypothesis, while more than half of the remaining results do not pass the level of BF = 3 (labeled as anecdotal evidence). The results estimate that at least 82% of all published psychological articles contain one or more significant results that do not provide BF > 10 for the hypothesis. We conclude that due to the threshold of acceptance having been set too low for psychological findings, a substantial proportion of the published results have weak evidential support.
Collapse
Affiliation(s)
- Balazs Aczel
- Institute of Psychology, ELTE, Eotvos Lorand University, Budapest, Hungary
| | - Bence Palfi
- School of Psychology, University of Sussex, Brighton, United Kingdom
- Sackler Centre for Consciousness Science, University of Sussex, Brighton, United Kingdom
| | - Barnabas Szaszi
- Institute of Psychology, ELTE, Eotvos Lorand University, Budapest, Hungary
- Doctoral School of Psychology, ELTE, Eotvos Lorand University, Budapest, Hungary
| |
Collapse
|
31
|
Hartgerink CHJ, Wicherts JM, van Assen MALM. Too Good to be False: Nonsignificant Results Revisited. COLLABRA-PSYCHOLOGY 2017. [DOI: 10.1525/collabra.71] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Due to its probabilistic nature, Null Hypothesis Significance Testing (NHST) is subject to decision errors. The concern for false positives has overshadowed the concern for false negatives in the recent debates in psychology. This might be unwarranted, since reported statistically nonsignificant findings may just be ‘too good to be false’. We examined evidence for false negatives in nonsignificant results in three different ways. We adapted the Fisher test to detect the presence of at least one false negative in a set of statistically nonsignificant results. Simulations show that the adapted Fisher method generally is a powerful method to detect false negatives. We examined evidence for false negatives in the psychology literature in three applications of the adapted Fisher method. These applications indicate that (i) the observed effect size distribution of nonsignificant effects exceeds the expected distribution assuming a null-effect, and approximately two out of three (66.7%) psychology articles reporting nonsignificant results contain evidence for at least one false negative, (ii) nonsignificant results on gender effects contain evidence of true nonzero effects, and (iii) the statistically nonsignificant replications from the Reproducibility Project Psychology (RPP) do not warrant strong conclusions about the absence or presence of true zero effects underlying these nonsignificant results. We conclude that false negatives deserve more attention in the current debate on statistical practices in psychology. Potentially neglecting effects due to a lack of statistical power can lead to a waste of research resources and stifle the scientific discovery process.
Collapse
Affiliation(s)
| | - J. M. Wicherts
- Department of Methodology and Statistics, Tilburg University, NL
| | | |
Collapse
|
32
|
Hartgerink CHJ. Reanalyzing Head et al. (2015): investigating the robustness of widespread p-hacking. PeerJ 2017; 5:e3068. [PMID: 28265523 PMCID: PMC5337083 DOI: 10.7717/peerj.3068] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2016] [Accepted: 02/05/2017] [Indexed: 02/01/2023] Open
Abstract
Head et al. (2015) provided a large collection of p-values that, from their perspective, indicates widespread statistical significance seeking (i.e., p-hacking). This paper inspects this result for robustness. Theoretically, the p-value distribution should be a smooth, decreasing function, but the distribution of reported p-values shows systematically more reported p-values for .01, .02, .03, .04, and .05 than p-values reported to three decimal places, due to apparent tendencies to round p-values to two decimal places. Head et al. (2015) correctly argue that an aggregate p-value distribution could show a bump below .05 when left-skew p-hacking occurs frequently. Moreover, the elimination of p = .045 and p = .05, as done in the original paper, is debatable. Given that eliminating p = .045 is a result of the need for symmetric bins and systematically more p-values are reported to two decimal places than to three decimal places, I did not exclude p = .045 and p = .05. I conducted Fisher's method .04 < p < .05 and reanalyzed the data by adjusting the bin selection to .03875 < p ≤ .04 versus .04875 < p ≤ .05. Results of the reanalysis indicate that no evidence for left-skew p-hacking remains when we look at the entire range between .04 < p < .05 or when we inspect the second-decimal. Taking into account reporting tendencies when selecting the bins to compare is especially important because this dataset does not allow for the recalculation of the p-values. Moreover, inspecting the bins that include two-decimal reported p-values potentially increases sensitivity if strategic rounding down of p-values as a form of p-hacking is widespread. Given the far-reaching implications of supposed widespread p-hacking throughout the sciences Head et al. (2015), it is important that these findings are robust to data analysis choices if the conclusion is to be considered unequivocal. Although no evidence of widespread left-skew p-hacking is found in this reanalysis, this does not mean that there is no p-hacking at all. These results nuance the conclusion by Head et al. (2015), indicating that the results are not robust and that the evidence for widespread left-skew p-hacking is ambiguous at best.
Collapse
Affiliation(s)
- Chris H J Hartgerink
- Department of Methodology and Statistics, Tilburg University , Tilburg , The Netherlands
| |
Collapse
|
33
|
Szucs D, Ioannidis JPA. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biol 2017; 15:e2000797. [PMID: 28253258 PMCID: PMC5333800 DOI: 10.1371/journal.pbio.2000797] [Citation(s) in RCA: 346] [Impact Index Per Article: 49.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Accepted: 02/06/2017] [Indexed: 11/19/2022] Open
Abstract
We have empirically assessed the distribution of published effect sizes and estimated power by analyzing 26,841 statistical records from 3,801 cognitive neuroscience and psychology papers published recently. The reported median effect size was D = 0.93 (interquartile range: 0.64-1.46) for nominally statistically significant results and D = 0.24 (0.11-0.42) for nonsignificant results. Median power to detect small, medium, and large effects was 0.12, 0.44, and 0.73, reflecting no improvement through the past half-century. This is so because sample sizes have remained small. Assuming similar true effect sizes in both disciplines, power was lower in cognitive neuroscience than in psychology. Journal impact factors negatively correlated with power. Assuming a realistic range of prior probabilities for null hypotheses, false report probability is likely to exceed 50% for the whole literature. In light of our findings, the recently reported low replication success in psychology is realistic, and worse performance may be expected for cognitive neuroscience.
Collapse
Affiliation(s)
- Denes Szucs
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - John P. A. Ioannidis
- Meta-Research Innovation Center at Stanford (METRICS) and Department of Medicine, Department of Health Research and Policy, and Department of Statistics, Stanford University, Stanford, California, United States of America
| |
Collapse
|
34
|
688,112 Statistical Results: Content Mining Psychology Articles for Statistical Test Results. DATA 2016. [DOI: 10.3390/data1030014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
35
|
Lakens D. On the challenges of drawing conclusions from p-values just below 0.05. PeerJ 2015; 3:e1142. [PMID: 26246976 PMCID: PMC4525697 DOI: 10.7717/peerj.1142] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 07/10/2015] [Indexed: 11/20/2022] Open
Abstract
In recent years, researchers have attempted to provide an indication of the prevalence of inflated Type 1 error rates by analyzing the distribution of p-values in the published literature. De Winter & Dodou (2015) analyzed the distribution (and its change over time) of a large number of p-values automatically extracted from abstracts in the scientific literature. They concluded there is a 'surge of p-values between 0.041-0.049 in recent decades' which 'suggests (but does not prove) questionable research practices have increased over the past 25 years.' I show the changes in the ratio of fractions of p-values between 0.041-0.049 over the years are better explained by assuming the average power has decreased over time. Furthermore, I propose that their observation that p-values just below 0.05 increase more strongly than p-values above 0.05 can be explained by an increase in publication bias (or the file drawer effect) over the years (cf. Fanelli, 2012; Pautasso, 2010, which has led to a relative decrease of 'marginally significant' p-values in abstracts in the literature (instead of an increase in p-values just below 0.05). I explain why researchers analyzing large numbers of p-values need to relate their assumptions to a model of p-value distributions that takes into account the average power of the performed studies, the ratio of true positives to false positives in the literature, the effects of publication bias, and the Type 1 error rate (and possible mechanisms through which it has inflated). Finally, I discuss why publication bias and underpowered studies might be a bigger problem for science than inflated Type 1 error rates, and explain the challenges when attempting to draw conclusions about inflated Type 1 error rates from a large heterogeneous set of p-values.
Collapse
Affiliation(s)
- Daniël Lakens
- School of Innovation Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands
| |
Collapse
|