1
|
Holzmeister F, Johannesson M, Böhm R, Dreber A, Huber J, Kirchler M. Heterogeneity in effect size estimates. Proc Natl Acad Sci U S A 2024; 121:e2403490121. [PMID: 39078672 PMCID: PMC11317577 DOI: 10.1073/pnas.2403490121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 06/28/2024] [Indexed: 07/31/2024] Open
Abstract
A typical empirical study involves choosing a sample, a research design, and an analysis path. Variation in such choices across studies leads to heterogeneity in results that introduce an additional layer of uncertainty, limiting the generalizability of published scientific findings. We provide a framework for studying heterogeneity in the social sciences and divide heterogeneity into population, design, and analytical heterogeneity. Our framework suggests that after accounting for heterogeneity, the probability that the tested hypothesis is true for the average population, design, and analysis path can be much lower than implied by nominal error rates of statistically significant individual studies. We estimate each type's heterogeneity from 70 multilab replication studies, 11 prospective meta-analyses of studies employing different experimental designs, and 5 multianalyst studies. In our data, population heterogeneity tends to be relatively small, whereas design and analytical heterogeneity are large. Our results should, however, be interpreted cautiously due to the limited number of studies and the large uncertainty in the heterogeneity estimates. We discuss several ways to parse and account for heterogeneity in the context of different methodologies.
Collapse
|
2
|
Sarafoglou A, Hoogeveen S, van den Bergh D, Aczel B, Albers CJ, Althoff T, Botvinik-Nezer R, Busch NA, Cataldo AM, Devezer B, van Dongen NNN, Dreber A, Fried EI, Hoekstra R, Hoffman S, Holzmeister F, Huber J, Huntington-Klein N, Ioannidis J, Johannesson M, Kirchler M, Loken E, Mangin JF, Matzke D, Menkveld AJ, Nilsonne G, van Ravenzwaaij D, Schweinsberg M, Schulz-Kuempel H, Shanks DR, Simons DJ, Spellman BA, Stoevenbelt AH, Szaszi B, Trübutschek D, Tuerlinckx F, Uhlmann EL, Vanpaemel W, Wicherts J, Wagenmakers EJ. Subjective evidence evaluation survey for many-analysts studies. ROYAL SOCIETY OPEN SCIENCE 2024; 11:240125. [PMID: 39050728 PMCID: PMC11265885 DOI: 10.1098/rsos.240125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 04/22/2024] [Indexed: 07/27/2024]
Abstract
Many-analysts studies explore how well an empirical claim withstands plausible alternative analyses of the same dataset by multiple, independent analysis teams. Conclusions from these studies typically rely on a single outcome metric (e.g. effect size) provided by each analysis team. Although informative about the range of plausible effects in a dataset, a single effect size from each team does not provide a complete, nuanced understanding of how analysis choices are related to the outcome. We used the Delphi consensus technique with input from 37 experts to develop an 18-item subjective evidence evaluation survey (SEES) to evaluate how each analysis team views the methodological appropriateness of the research design and the strength of evidence for the hypothesis. We illustrate the usefulness of the SEES in providing richer evidence assessment with pilot data from a previous many-analysts study.
Collapse
|
3
|
Francioli SP, Shakeri A, North MS. Americans harbor much less favorable explicit sentiments toward young adults than toward older adults. Proc Natl Acad Sci U S A 2024; 121:e2311009121. [PMID: 38885376 PMCID: PMC11213976 DOI: 10.1073/pnas.2311009121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 04/15/2024] [Indexed: 06/20/2024] Open
Abstract
Public and academic discourse on ageism focuses primarily on prejudices targeting older adults, implicitly assuming that this age group experiences the most age bias. We test this assumption in a large, preregistered study surveying Americans' explicit sentiments toward young, middle-aged, and older adults. Contrary to certain expectations about the scope and nature of ageism, responses from two crowdsourced online samples matched to the US adult population (N = 1,820) revealed that older adults garner the most favorable sentiments and young adults, the least favorable ones. This pattern held across a wide range of participant demographics and outcome variables, in both samples. Signaling derogation of young adults more than benign liking of older adults, participants high on SDO (i.e., a key antecedent of group prejudice) expressed even less favorable sentiments toward young adults-and more favorable ones toward older adults. In two follow-up, preregistered, forecasting surveys, lay participants (N = 500) were generally quite accurate at predicting these results; in contrast, social scientists (N = 241) underestimated how unfavorably respondents viewed young adults and how favorably they viewed older adults. In fact, the more expertise in ageism scientists had, the more biased their forecasts. In a rapidly aging world with exacerbated concerns over older adults' welfare, young adults also face increasing economic, social, political, and ecological hardship. Our findings highlight the need for policymakers and social scientists to broaden their understanding of age biases and develop theory and policies that ponder discriminations targeting all age groups.
Collapse
|
4
|
González-Márquez R, Schmidt L, Schmidt BM, Berens P, Kobak D. The landscape of biomedical research. PATTERNS (NEW YORK, N.Y.) 2024; 5:100968. [PMID: 39005482 PMCID: PMC11240179 DOI: 10.1016/j.patter.2024.100968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/16/2024] [Accepted: 03/15/2024] [Indexed: 07/16/2024]
Abstract
The number of publications in biomedicine and life sciences has grown so much that it is difficult to keep track of new scientific works and to have an overview of the evolution of the field as a whole. Here, we present a two-dimensional (2D) map of the entire corpus of biomedical literature, based on the abstract texts of 21 million English articles from the PubMed database. To embed the abstracts into 2D, we used the large language model PubMedBERT, combined with t-SNE tailored to handle samples of this size. We used our map to study the emergence of the COVID-19 literature, the evolution of the neuroscience discipline, the uptake of machine learning, the distribution of gender imbalance in academic authorship, and the distribution of retracted paper mill articles. Furthermore, we present an interactive website that allows easy exploration and will enable further insights and facilitate future research.
Collapse
|
5
|
Stafford T, Rombach I, Hind D, Mateen B, Woods HB, Dimario M, Wilsdon J. Where next for partial randomisation of research funding? The feasibility of RCTs and alternatives. Wellcome Open Res 2024; 8:309. [PMID: 37663796 PMCID: PMC10474338 DOI: 10.12688/wellcomeopenres.19565.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/24/2024] [Indexed: 09/05/2023] Open
Abstract
We outline essential considerations for any study of partial randomisation of research funding, and consider scenarios in which randomised controlled trials (RCTs) would be feasible and appropriate. We highlight the interdependence of target outcomes, sample availability and statistical power for determining the cost and feasibility of a trial. For many choices of target outcome, RCTs may be less practical and more expensive than they at first appear (in large part due to issues pertaining to sample size and statistical power). As such, we briefly discuss alternatives to RCTs. It is worth noting that many of the considerations relevant to experiments on partial randomisation may also apply to other potential experiments on funding processes (as described in The Experimental Research Funder's Handbook. RoRI, June 2022).
Collapse
|
6
|
Clark CJ, Fjeldmark M, Lu L, Baumeister RF, Ceci S, Frey K, Miller G, Reilly W, Tice D, von Hippel W, Williams WM, Winegard BM, Tetlock PE. Taboos and Self-Censorship Among U.S. Psychology Professors. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2024:17456916241252085. [PMID: 38752984 DOI: 10.1177/17456916241252085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024]
Abstract
We identify points of conflict and consensus regarding (a) controversial empirical claims and (b) normative preferences for how controversial scholarship-and scholars-should be treated. In 2021, we conducted qualitative interviews (n = 41) to generate a quantitative survey (N = 470) of U.S. psychology professors' beliefs and values. Professors strongly disagreed on the truth status of 10 candidate taboo conclusions: For each conclusion, some professors reported 100% certainty in its veracity and others 100% certainty in its falsehood. Professors more confident in the truth of the taboo conclusions reported more self-censorship, a pattern that could bias perceived scientific consensus regarding the inaccuracy of controversial conclusions. Almost all professors worried about social sanctions if they were to express their own empirical beliefs. Tenured professors reported as much self-censorship and as much fear of consequences as untenured professors, including fear of getting fired. Most professors opposed suppressing scholarship and punishing peers on the basis of moral concerns about research conclusions and reported contempt for peers who petition to retract papers on moral grounds. Younger, more left-leaning, and female faculty were generally more opposed to controversial scholarship. These results do not resolve empirical or normative disagreements among psychology professors, but they may provide an empirical context for their discussion.
Collapse
|
7
|
Porto de Oliveira JVM, de Oliveira Júnior ALF, de Freitas Martins LP, Dourado HN, Purificação IR, Kolias AG, Paiva WS, Solla DJF. Spin in traumatic brain injury literature: prevalence and associated factors. A systematic review. J Neurosurg 2024:1-8. [PMID: 38728757 DOI: 10.3171/2023.11.jns231822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 11/21/2023] [Indexed: 05/12/2024]
Abstract
OBJECTIVE Spin is characterized as a misinterpretation of results that, whether deliberate or unintentional, culminates in misleading conclusions and steers readers toward an excessively optimistic perspective of the data. The primary objective of this systematic review was to estimate the prevalence and nature of spin within the traumatic brain injury (TBI) literature. Additionally, the identification of associated factors is intended to provide guidance for future research practices. METHODS The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) recommendations were followed. A search of the MEDLINE/PubMed database was conducted to identify English-language articles published between January 1960 and July 2020. Inclusion criteria encompassed randomized controlled trials (RCTs) that exclusively enrolled TBI patients, investigating various interventions, whether surgical or nonsurgical, and that were published in high-impact journals. Spin was defined as 1) a focus on statistically significant results not based on the primary outcome; 2) interpreting statistically nonsignificant results for a superiority analysis of the primary outcome; 3) claiming or emphasizing the beneficial effect of the treatment despite statistically nonsignificant results; 4) conclusion focused in the per-protocol or as-treated analysis instead of the intention-to-treat (ITT) results; 5) incorrect statistical analysis; or 6) republication of a significant secondary analysis without proper acknowledgment of the primary outcome analysis result. Primary outcomes were those explicitly reported as such in the published article. Studies without a clear primary outcome were excluded. The study characteristics were described using traditional descriptive statistics and an exploratory inferential analysis was performed to identify those associated with spin. The studies' risk of bias was evaluated by the Cochrane Risk of Bias Tool. RESULTS A total of 150 RCTs were included and 22% (n = 33) had spin, most commonly spin types 1 and 3. The overall risk of bias (p < 0.001), a neurosurgery department member as the first author (p = 0.009), absence of a statistician among authors (p = 0.042), and smaller sample sizes (p = 0.033) were associated with spin. CONCLUSIONS The prevalence of spin in the TBI literature is high, even at leading medical journals. Studies with higher risks of bias are more frequently associated with spin. Critical interpretation of results and authors' conclusions is advisable regardless of the study design and published journal.
Collapse
|
8
|
Slaney KL, Graham ME, Dhillon RS, Hohn RE. Rhetoric of psychological measurement theory and practice. Front Psychol 2024; 15:1374330. [PMID: 38699572 PMCID: PMC11064813 DOI: 10.3389/fpsyg.2024.1374330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 04/02/2024] [Indexed: 05/05/2024] Open
Abstract
Metascience scholars have long been concerned with tracking the use of rhetorical language in scientific discourse, oftentimes to analyze the legitimacy and validity of scientific claim-making. Psychology, however, has only recently become the explicit target of such metascientific scholarship, much of which has been in response to the recent crises surrounding replicability of quantitative research findings and questionable research practices. The focus of this paper is on the rhetoric of psychological measurement and validity scholarship, in both the theoretical and methodological and empirical literatures. We examine various discourse practices in published psychological measurement and validity literature, including: (a) clear instances of rhetoric (i.e., persuasion or performance); (b) common or rote expressions and tropes (e.g., perfunctory claims or declarations); (c) metaphors and other "literary" styles; and (d) ambiguous, confusing, or unjustifiable claims. The methodological approach we use is informed by a combination of conceptual analysis and exploratory grounded theory, the latter of which we used to identify relevant themes within the published psychological discourse. Examples of both constructive and useful or misleading and potentially harmful discourse practices will be given. Our objectives are both to contribute to the critical methodological literature on psychological measurement and connect metascience in psychology to broader interdisciplinary examinations of science discourse.
Collapse
|
9
|
Grimes DR. Region of Attainable Redaction, an extension of Ellipse of Insignificance analysis for gauging impacts of data redaction in dichotomous outcome trials. eLife 2024; 13:e93050. [PMID: 38284745 PMCID: PMC10871715 DOI: 10.7554/elife.93050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 01/23/2024] [Indexed: 01/30/2024] Open
Abstract
In biomedical science, it is a reality that many published results do not withstand deeper investigation, and there is growing concern over a replicability crisis in science. Recently, Ellipse of Insignificance (EOI) analysis was introduced as a tool to allow researchers to gauge the robustness of reported results in dichotomous outcome design trials, giving precise deterministic values for the degree of miscoding between events and non-events tolerable simultaneously in both control and experimental arms (Grimes, 2022). While this is useful for situations where potential miscoding might transpire, it does not account for situations where apparently significant findings might result from accidental or deliberate data redaction in either the control or experimental arms of an experiment, or from missing data or systematic redaction. To address these scenarios, we introduce Region of Attainable Redaction (ROAR), a tool that extends EOI analysis to account for situations of potential data redaction. This produces a bounded cubic curve rather than an ellipse, and we outline how this can be used to identify potential redaction through an approach analogous to EOI. Applications are illustrated, and source code, including a web-based implementation that performs EOI and ROAR analysis in tandem for dichotomous outcome trials is provided.
Collapse
|
10
|
Grimes DR. Is biomedical research self-correcting? Modelling insights on the persistence of spurious science. ROYAL SOCIETY OPEN SCIENCE 2024; 11:231056. [PMID: 38298396 PMCID: PMC10827424 DOI: 10.1098/rsos.231056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 01/08/2024] [Indexed: 02/02/2024]
Abstract
The reality that volumes of published biomedical research are not reproducible is an increasingly recognized problem. Spurious results reduce trustworthiness of reported science, increasing research waste. While science should be self-correcting from a philosophical perspective, that in insolation yields no information on efforts required to nullify suspect findings or factors shaping how quickly science may be corrected. There is also a paucity of information on how perverse incentives in the publishing ecosystem favouring novel positive findings over null results shape the ability of published science to self-correct. Knowledge of factors shaping self-correction of science remain obscure, limiting our ability to mitigate harms. This modelling study introduces a simple model to capture dynamics of the publication ecosystem, exploring factors influencing research waste, trustworthiness, corrective effort and time to correction. Results from this work indicate that research waste and corrective effort are highly dependent on field-specific false positive rates and time delays to corrective results to spurious findings are propagated. The model also suggests conditions under which biomedical science is self-correcting and those under which publication of correctives alone cannot stem propagation of untrustworthy results. Finally, this work models a variety of potential mitigation strategies, including researcher- and publisher-driven interventions.
Collapse
|
11
|
Schneider J. Sorry we're open, come in we're closed: different profiles in the perceived applicability of open science practices to completed research projects. ROYAL SOCIETY OPEN SCIENCE 2024; 11:230595. [PMID: 38298393 PMCID: PMC10827419 DOI: 10.1098/rsos.230595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 01/08/2024] [Indexed: 02/02/2024]
Abstract
Open science is an increasingly important topic for research, politics and funding agencies. However, the discourse on open science is heavily influenced by certain research fields and paradigms, leading to the risk of generalizing what counts as openness to other research fields, regardless of its applicability. In our paper, we provide evidence that researchers perceive different profiles in the potential to apply open science practices to their projects, making a one-size-fits-all approach unsuitable. In a pilot study, we first systematized the breadth of open science practices. The subsequent survey study examined the perceived applicability of 13 open science practices across completed research projects in a broad variety of research disciplines. We were able to identify four different profiles in the perceived applicability of open science practices. For researchers conducting qualitative-empirical research projects, comprehensively implementing the breadth of open science practices is tendentially not feasible. Further, research projects from some disciplines tended to fit a profile with little opportunity for public participation. Yet, disciplines and research paradigms appear not to be the key factors in predicting the perceived applicability of open science practices. Our findings underscore the case for considering project-related conditions when implementing open science practices. This has implications for the establishment of policies, guidelines and standards concerning open science.
Collapse
|
12
|
Watson NM, Thomas JD. Studying Adherence to Reporting Standards in Kinesiology: A Post-publication Peer Review Brief Report. INTERNATIONAL JOURNAL OF EXERCISE SCIENCE 2024; 17:25-37. [PMID: 38666001 PMCID: PMC11042891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
To demonstrate how post-publication peer reviews-using journal article reporting standards-could improve the design and write-up of kinesiology research, the authors performed a post-publication peer review on one systematic literature review published in 2020. Two raters (1st & 2nd authors) critically appraised the case article between April and May 2021. The latest Journal Article Reporting Standards by the American Psychological Association relevant to the review were used: i.e., Table 1 (quantitative research standards) and Table 9 (research synthesis standards). A standard fully met was deemed satisfactory. Per Krippendorff's alpha-coefficient, inter-rater agreement was moderate for Table 1 (k-alpha = .57, raw-agreement = 72.2%) and poor for Table 9 (k-alpha = .09, raw-agreement = 53.6%). A 100% consensus was reached on all discrepancies. Results suggest the case article's Abstract, Methods, and Discussion sections required clarification or more detail. Per Table 9 standards, four sections were largely incomplete: i.e., Abstract (100%-incomplete), Introduction (66%-incomplete), Methods (75%-incomplete), and Discussion (66%-incomplete). Case article strengths included tabular summary of studies analyzed in the systematic review and a cautionary comment about the review's generalizability. The article's write-up gave detail to help the reader understand the scope of the study and decisions made by the authors. However, adequate detail was not provided to assess the credibility of all claims made in the article. This could affect readers' ability to obtain critical and nuanced understanding of the article's topics. The results of this critique should encourage (continuing) education on journal article reporting standards for diverse stakeholders (e.g., authors, reviewers).
Collapse
|
13
|
Qian W, Zhang C, Piersiak HA, Humphreys KL, Mitchell C. Biomarker adoption in developmental science: A data-driven modelling of trends from 90 biomarkers across 20 years. INFANT AND CHILD DEVELOPMENT 2024; 33:e2366. [PMID: 38389732 PMCID: PMC10882483 DOI: 10.1002/icd.2366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 07/26/2022] [Indexed: 11/11/2022]
Abstract
Developmental scientists have adopted numerous biomarkers in their research to better understand the biological underpinnings of development, environmental exposures, and variation in long-term health. Yet, adoption patterns merit investigation given the substantial resources used to collect, analyse, and train to use biomarkers in research with infants and children. We document trends in use of 90 biomarkers between 2000 and 2020 from approximately 430,000 publications indexed by the Web of Science. We provide a tool for researchers to examine each of these biomarkers individually using a data-driven approach to estimate the biomarker growth trajectory based on yearly publication number, publication growth rate, number of author affiliations, National Institutes of Health dedicated funding resources, journal impact factor, and years since the first publication. Results indicate that most biomarkers fit a "learning curve" trajectory (i.e., experience rapid growth followed by a plateau), though a small subset decline in use over time.
Collapse
|
14
|
Dresler M. FENS-Kavli Network of Excellence: Postponed, non-competitive peer review for research funding. Eur J Neurosci 2023; 58:4441-4448. [PMID: 36085597 DOI: 10.1111/ejn.15818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 08/17/2022] [Accepted: 08/29/2022] [Indexed: 11/29/2022]
Abstract
Receiving research grants is among the highlights of an academic career, affirming previous accomplishments and enabling new research endeavours. Much of the process of acquiring research funding, however, belongs to the less favourite duties of many researchers: It is time consuming, often stressful and, in the majority of cases, unsuccessful. This resentment towards funding acquisition is backed up by empirical research: The current system to distribute research funding, via competitive calls for extensive research applications that undergo peer review, has repeatedly been shown to fail in its task to reliably rank proposals according to their merit, while at the same time being highly inefficient. The simplest, fairest and broadly supported alternative would be to distribute funding more equally across researchers, for example, by an increase of universities' base funding, thereby saving considerable time that can be spent on research instead. Here, I propose how to combine such a 'funding flat rate' model-or other efficient distribution strategies-with quality control through postponed, non-competitive peer review using open science practices.
Collapse
|
15
|
Eidels A. Prior beliefs and the interpretation of scientific results. ROYAL SOCIETY OPEN SCIENCE 2023; 10:231613. [PMID: 38126060 PMCID: PMC10731315 DOI: 10.1098/rsos.231613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 11/22/2023] [Indexed: 12/23/2023]
Abstract
How do prior beliefs affect the interpretation of scientific results? I discuss a hypothetical scenario where researchers publish results that could either support a theory they believe in, or refute that theory, and ask if the two instances carry the same weight. More colloquially, I ask if we should overweigh scientific results supporting a given theory and reported by a researcher, or a team, that initially did not support that theory. I illustrate the challenge using two examples from psychology: evidence accumulation models, and extra sensory perception.
Collapse
|
16
|
Williams AJ, Botanov Y, Giovanetti AK, Perko VL, Sutherland CL, Youngren W, Sakaluk JK. A Metascientific Review of the Evidential Value of Acceptance and Commitment Therapy for Depression. Behav Ther 2023; 54:989-1005. [PMID: 37863589 DOI: 10.1016/j.beth.2022.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 05/31/2022] [Accepted: 06/13/2022] [Indexed: 11/02/2022]
Abstract
In the past three-and-a-half decades, nearly 500 randomized controlled trials (RCTs) have examined Acceptance and Commitment Therapy (ACT) for a range of health problems, including depression. However, emerging concerns regarding the replicability of scientific findings across psychology and mental health treatment outcome research highlight a need to re-examine the strength of evidence for treatment efficacy. Therefore, we conducted a metascientific review of the evidential value of ACT in treating depression. Whereas reporting accuracy was generally high across all trials, we found important differences in evidential value metrics corresponding to the types of control conditions used. RCTs of ACT compared to weaker controls (e.g., no treatment, waitlist) were well-powered, with sample sizes appropriate for detecting plausible effect sizes. They typically yielded stronger Bayesian evidence for (and larger posterior estimates of) ACT efficacy, though there was some evidence of significance inflation among these effects. RCTs of ACT against stronger controls (e.g., other psychotherapies), meanwhile, were poorly powered, designed to detect implausibly large effect sizes, and yielded ambiguous-if not contradicting-Bayesian evidence and estimates of efficacy. Although our review supports a view of ACT as efficacious for treating depression compared to weaker controls, future RCTs must provide more transparent reporting with larger groups of participants to properly assess the difference between ACT and competitor treatments such as behavioral activation and other forms of cognitive behavioral therapy. Clinicians and health organizations should reassess the use of ACT for depression if costs and resources are higher than for other efficacious treatments. Clinical trials contributing effects to our synthesis can be found at https://osf.io/qky35.
Collapse
|
17
|
Boyce V, Mathur M, Frank MC. Eleven years of student replication projects provide evidence on the correlates of replicability in psychology. ROYAL SOCIETY OPEN SCIENCE 2023; 10:231240. [PMID: 38026006 PMCID: PMC10645069 DOI: 10.1098/rsos.231240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 10/20/2023] [Indexed: 12/01/2023]
Abstract
Cumulative scientific progress requires empirical results that are robust enough to support theory construction and extension. Yet in psychology, some prominent findings have failed to replicate, and large-scale studies suggest replicability issues are widespread. The identification of predictors of replication success is limited by the difficulty of conducting large samples of independent replication experiments, however: most investigations reanalyse the same set of 170 replications . We introduce a new dataset of 176 replications from students in a graduate-level methods course. Replication results were judged to be successful in 49% of replications; of the 136 where effect sizes could be numerically compared, 46% had point estimates within the prediction interval of the original outcome (versus the expected 95%). Larger original effect sizes and within-participants designs were especially related to replication success. Our results indicate that, consistent with prior reports, the robustness of the psychology literature is low enough to limit cumulative progress by student investigators.
Collapse
|
18
|
Pownall M, Talbot CV, Kilby L, Branney P. Opportunities, challenges and tensions: Open science through a lens of qualitative social psychology. BRITISH JOURNAL OF SOCIAL PSYCHOLOGY 2023; 62:1581-1589. [PMID: 36718588 DOI: 10.1111/bjso.12628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In recent years, there has been a focus in social psychology on efforts to improve the robustness, rigour, transparency and openness of psychological research. This has led to a plethora of new tools, practices and initiatives that each aim to combat questionable research practices and improve the credibility of social psychological scholarship. However, the majority of these efforts derive from quantitative, deductive, hypothesis-testing methodologies, and there has been a notable lack of in-depth exploration about what the tools, practices and values may mean for research that uses qualitative methodologies. Here, we introduce a Special Section of BJSP: Open Science, Qualitative Methods and Social Psychology: Possibilities and Tensions. The authors critically discuss a range of issues, including authorship, data sharing and broader research practices. Taken together, these papers urge the discipline to carefully consider the ontological, epistemological and methodological underpinnings of efforts to improve psychological science, and advocate for a critical appreciation of how mainstream open science discourse may (or may not) be compatible with the goals of qualitative research.
Collapse
|
19
|
Thompson WH, Skau S. On the scope of scientific hypotheses. ROYAL SOCIETY OPEN SCIENCE 2023; 10:230607. [PMID: 37650069 PMCID: PMC10465209 DOI: 10.1098/rsos.230607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 08/04/2023] [Indexed: 09/01/2023]
Abstract
Hypotheses are frequently the starting point when undertaking the empirical portion of the scientific process. They state something that the scientific process will attempt to evaluate, corroborate, verify or falsify. Their purpose is to guide the types of data we collect, analyses we conduct, and inferences we would like to make. Over the last decade, metascience has advocated for hypotheses being in preregistrations or registered reports, but how to formulate these hypotheses has received less attention. Here, we argue that hypotheses can vary in specificity along at least three independent dimensions: the relationship, the variables, and the pipeline. Together, these dimensions form the scope of the hypothesis. We demonstrate how narrowing the scope of a hypothesis in any of these three ways reduces the hypothesis space and that this reduction is a type of novelty. Finally, we discuss how this formulation of hypotheses can guide researchers to formulate the appropriate scope for their hypotheses and should aim for neither too broad nor too narrow a scope. This framework can guide hypothesis-makers when formulating their hypotheses by helping clarify what is being tested, chaining results to previous known findings, and demarcating what is explicitly tested in the hypothesis.
Collapse
|
20
|
Clark CJ, Connor P, Isch C. Failing to replicate predicts citation declines in psychology. Proc Natl Acad Sci U S A 2023; 120:e2304862120. [PMID: 37428904 PMCID: PMC10629524 DOI: 10.1073/pnas.2304862120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 06/03/2023] [Indexed: 07/12/2023] Open
Abstract
With a sample of 228 psychology papers that failed to replicate, we tested whether the trajectory of citation patterns changes following the publication of a failure to replicate. Across models, we found consistent evidence that failing to replicate predicted lower future citations and that the size of this reduction increased over time. In a 14-y postpublication period, we estimated that the publication of a failed replication was associated with an average citation decline of 14% for original papers. These findings suggest that the publication of failed replications may contribute to a self-correcting science by decreasing scholars' reliance on unreplicable original findings.
Collapse
|
21
|
Huber C, Dreber A, Huber J, Johannesson M, Kirchler M, Weitzel U, Abellán M, Adayeva X, Ay FC, Barron K, Berry Z, Bönte W, Brütt K, Bulutay M, Campos-Mercade P, Cardella E, Claassen MA, Cornelissen G, Dawson IGJ, Delnoij J, Demiral EE, Dimant E, Doerflinger JT, Dold M, Emery C, Fiala L, Fiedler S, Freddi E, Fries T, Gasiorowska A, Glogowsky U, M Gorny P, Gretton JD, Grohmann A, Hafenbrädl S, Handgraaf M, Hanoch Y, Hart E, Hennig M, Hudja S, Hütter M, Hyndman K, Ioannidis K, Isler O, Jeworrek S, Jolles D, Juanchich M, Kc RP, Khadjavi M, Kugler T, Li S, Lucas B, Mak V, Mechtel M, Merkle C, Meyers EA, Mollerstrom J, Nesterov A, Neyse L, Nieken P, Nussberger AM, Palumbo H, Peters K, Pirrone A, Qin X, Rahal RM, Rau H, Rincke J, Ronzani P, Roth Y, Saral AS, Schmitz J, Schneider F, Schram A, Schudy S, Schweitzer ME, Schwieren C, Scopelliti I, Sirota M, Sonnemans J, Soraperra I, Spantig L, Steimanis I, Steinmetz J, Suetens S, Theodoropoulou A, Urbig D, Vorlaufer T, Waibel J, Woods D, Yakobi O, Yilmaz O, Zaleskiewicz T, Zeisberger S, Holzmeister F. Competition and moral behavior: A meta-analysis of forty-five crowd-sourced experimental designs. Proc Natl Acad Sci U S A 2023; 120:e2215572120. [PMID: 37252958 DOI: 10.1073/pnas.2215572120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023] Open
Abstract
Does competition affect moral behavior? This fundamental question has been debated among leading scholars for centuries, and more recently, it has been tested in experimental studies yielding a body of rather inconclusive empirical evidence. A potential source of ambivalent empirical results on the same hypothesis is design heterogeneity-variation in true effect sizes across various reasonable experimental research protocols. To provide further evidence on whether competition affects moral behavior and to examine whether the generalizability of a single experimental study is jeopardized by design heterogeneity, we invited independent research teams to contribute experimental designs to a crowd-sourced project. In a large-scale online data collection, 18,123 experimental participants were randomly allocated to 45 randomly selected experimental designs out of 95 submitted designs. We find a small adverse effect of competition on moral behavior in a meta-analysis of the pooled data. The crowd-sourced design of our study allows for a clean identification and estimation of the variation in effect sizes above and beyond what could be expected due to sampling variance. We find substantial design heterogeneity-estimated to be about 1.6 times as large as the average standard error of effect size estimates of the 45 research designs-indicating that the informativeness and generalizability of results based on a single experimental design are limited. Drawing strong conclusions about the underlying hypotheses in the presence of substantive design heterogeneity requires moving toward much larger data collections on various experimental designs testing the same hypothesis.
Collapse
|
22
|
Wintle BC, Smith ET, Bush M, Mody F, Wilkinson DP, Hanea AM, Marcoci A, Fraser H, Hemming V, Thorn FS, McBride MF, Gould E, Head A, Hamilton DG, Kambouris S, Rumpff L, Hoekstra R, Burgman MA, Fidler F. Predicting and reasoning about replicability using structured groups. ROYAL SOCIETY OPEN SCIENCE 2023; 10:221553. [PMID: 37293358 PMCID: PMC10245209 DOI: 10.1098/rsos.221553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 04/14/2023] [Indexed: 06/10/2023]
Abstract
This paper explores judgements about the replicability of social and behavioural sciences research and what drives those judgements. Using a mixed methods approach, it draws on qualitative and quantitative data elicited from groups using a structured approach called the IDEA protocol ('investigate', 'discuss', 'estimate' and 'aggregate'). Five groups of five people with relevant domain expertise evaluated 25 research claims that were subject to at least one replication study. Participants assessed the probability that each of the 25 research claims would replicate (i.e. that a replication study would find a statistically significant result in the same direction as the original study) and described the reasoning behind those judgements. We quantitatively analysed possible correlates of predictive accuracy, including self-rated expertise and updating of judgements after feedback and discussion. We qualitatively analysed the reasoning data to explore the cues, heuristics and patterns of reasoning used by participants. Participants achieved 84% classification accuracy in predicting replicability. Those who engaged in a greater breadth of reasoning provided more accurate replicability judgements. Some reasons were more commonly invoked by more accurate participants, such as 'effect size' and 'reputation' (e.g. of the field of research). There was also some evidence of a relationship between statistical literacy and accuracy.
Collapse
|
23
|
Clark CJ, Graso M, Redstone I, Tetlock PE. Harm Hypervigilance in Public Reactions to Scientific Evidence. Psychol Sci 2023:9567976231168777. [PMID: 37260038 DOI: 10.1177/09567976231168777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023] Open
Abstract
Two preregistered studies from two different platforms with representative U.S. adult samples (N = 1,865) tested the harm-hypervigilance hypothesis in risk assessments of controversial behavioral science. As expected, across six sets of scientific findings, people consistently overestimated others' harmful reactions (medium to large average effect sizes) and underestimated helpful ones, even when incentivized for accuracy. Additional analyses found that (a) harm overestimations were associated with support for censoring science, (b) people who were more offended by scientific findings reported greater difficulty understanding them, and (c) evidence was moderately consistent for an association between more conservative ideology and harm overestimations. These findings are particularly relevant because journals have begun evaluating potential downstream harms of scientific findings. We discuss implications of our work and invite scholars to develop rigorous tests of (a) the social pressures that lead science astray and (b) the actual costs and benefits of publishing or not publishing potentially controversial conclusions.
Collapse
|
24
|
Buzbas EO, Devezer B, Baumgaertner B. The logical structure of experiments lays the foundation for a theory of reproducibility. ROYAL SOCIETY OPEN SCIENCE 2023; 10:221042. [PMID: 36938532 PMCID: PMC10014247 DOI: 10.1098/rsos.221042] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 02/02/2023] [Indexed: 06/18/2023]
Abstract
The scientific reform movement has proposed openness as a potential remedy to the putative reproducibility or replication crisis. However, the conceptual relationship among openness, replication experiments and results reproducibility has been obscure. We analyse the logical structure of experiments, define the mathematical notion of idealized experiment and use this notion to advance a theory of reproducibility. Idealized experiments clearly delineate the concepts of replication and results reproducibility, and capture key differences with precision, allowing us to study the relationship among them. We show how results reproducibility varies as a function of the elements of an idealized experiment, the true data-generating mechanism, and the closeness of the replication experiment to an original experiment. We clarify how openness of experiments is related to designing informative replication experiments and to obtaining reproducible results. With formal backing and evidence, we argue that the current 'crisis' reflects inadequate attention to a theoretical understanding of results reproducibility.
Collapse
|
25
|
Kummerfeld E, Jones GL. One data set, many analysts: Implications for practicing scientists. Front Psychol 2023; 14:1094150. [PMID: 36865366 PMCID: PMC9971968 DOI: 10.3389/fpsyg.2023.1094150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 01/27/2023] [Indexed: 02/16/2023] Open
Abstract
Researchers routinely face choices throughout the data analysis process. It is often opaque to readers how these choices are made, how they affect the findings, and whether or not data analysis results are unduly influenced by subjective decisions. This concern is spurring numerous investigations into the variability of data analysis results. The findings demonstrate that different teams analyzing the same data may reach different conclusions. This is the "many-analysts" problem. Previous research on the many-analysts problem focused on demonstrating its existence, without identifying specific practices for solving it. We address this gap by identifying three pitfalls that have contributed to the variability observed in many-analysts publications and providing suggestions on how to avoid them.
Collapse
|