1
|
Pollo P, Lagisz M, Yang Y, Culina A, Nakagawa S. Synthesis of sexual selection: a systematic map of meta-analyses with bibliometric analysis. Biol Rev Camb Philos Soc 2024; 99:2134-2175. [PMID: 38982618 DOI: 10.1111/brv.13117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 06/16/2024] [Accepted: 06/19/2024] [Indexed: 07/11/2024]
Abstract
Sexual selection has been a popular subject within evolutionary biology because of its central role in explaining odd and counterintuitive traits observed in nature. Consequently, the literature associated with this field of study became vast. Meta-analytical studies attempting to draw inferences from this literature have now accumulated, varying in scope and quality, thus calling for a synthesis of these syntheses. We conducted a systematic literature search to create a systematic map with a report appraisal of meta-analyses on topics associated with sexual selection, aiming to identify the conceptual and methodological gaps in this secondary literature. We also conducted bibliometric analyses to explore whether these gaps are associated with the gender and origin of the authors of these meta-analyses. We included 152 meta-analytical studies in our systematic map. We found that most meta-analyses focused on males and on certain animal groups (e.g. birds), indicating severe sex and taxonomic biases. The topics in these studies varied greatly, from proximate (e.g. relationship of ornaments with other traits) to ultimate questions (e.g. formal estimates of sexual selection strength), although the former were more common. We also observed several common methodological issues in these studies, such as lack of detailed information regarding searches, screening, and analyses, which ultimately impairs the reliability of many of these meta-analyses. In addition, most of the meta-analyses' authors were men affiliated to institutions from developed countries, pointing to both gender and geographical authorship biases. Most importantly, we found that certain authorship aspects were associated with conceptual and methodological issues in meta-analytical studies. Many of our findings might simply reflect patterns in the current state of the primary literature and academia, suggesting that our study can serve as an indicator of issues within the field of sexual selection at large. Based on our findings, we provide both conceptual and analytical recommendations to improve future studies in the field of sexual selection.
Collapse
Affiliation(s)
- Pietro Pollo
- Evolution & Ecology Research Centre, School of Biological, Earth & Environmental Sciences, University of New South Wales, Gate 9 High St., Kensington, Sydney, NSW, 2052, Australia
| | - Malgorzata Lagisz
- Evolution & Ecology Research Centre, School of Biological, Earth & Environmental Sciences, University of New South Wales, Gate 9 High St., Kensington, Sydney, NSW, 2052, Australia
| | - Yefeng Yang
- Evolution & Ecology Research Centre, School of Biological, Earth & Environmental Sciences, University of New South Wales, Gate 9 High St., Kensington, Sydney, NSW, 2052, Australia
| | - Antica Culina
- Ruđer Bošković Institute, Bijenička Cesta 54, Zagreb, 10000, Croatia
| | - Shinichi Nakagawa
- Evolution & Ecology Research Centre, School of Biological, Earth & Environmental Sciences, University of New South Wales, Gate 9 High St., Kensington, Sydney, NSW, 2052, Australia
| |
Collapse
|
2
|
Best AM, Lang TA, Greenberg BL, Gunsolley JC, Ioannidou E. The Oral Health Statistics Guidelines for Reporting Observational Studies and Clinical Trials in Oral Health Research: Explanation and Elaboration. J Oral Maxillofac Surg 2024; 82:1475-1493. [PMID: 39032518 DOI: 10.1016/j.joms.2024.06.174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 06/24/2024] [Indexed: 07/23/2024]
Abstract
Adequate and transparent reporting is necessary for critically appraising research. Yet, evidence suggests that the design, conduct, analysis, interpretation, and reporting of oral health research could be greatly improved. Accordingly, the Task Force on Design and Analysis in Oral Health Research-statisticians and trialists from academia and industry-empaneled a group of authors to develop methodological and statistical reporting guidelines identifying the minimum information needed to document and evaluate observational studies and clinical trials in oral health: the Oral Health Statistics (OHStat) Guidelines. Drafts were circulated to the editors of 85 oral health journals and to task force members and sponsors and discussed at a December 2020 workshop attended by 49 researchers. The final version was subsequently approved by the task force in September 2021, submitted for journal review in 2022, and revised in 2023. The checklist consists of 48 guidelines: 5 for introductory information, 17 for methods, 13 for statistical analysis, 6 for results, and 7 for interpretation; 7 are specific to clinical trials. Each of these guidelines identifies relevant information, explains its importance, and often describes best practices. The checklist was published in multiple journals. The article was published simultaneously in Journal of Dental Research Clinical and Translational Research, the Journal of the American Dental Association, and the Journal of Oral and Maxillofacial Surgery. Completed checklists should accompany manuscripts submitted for publication to these and other oral health journals to help authors, journal editors, and reviewers verify that the manuscript provides the information necessary to adequately document and evaluate the research.
Collapse
Affiliation(s)
- Al M Best
- Professor Emeritus, School of Dentistry and Department of Biostatistics, School of Medicine, Virginia Commonwealth University, Richmond, VA
| | - Thomas A Lang
- Adjunct Faculty, University of Chicago Medical Writing Program, Chicago, IL
| | - Barbara L Greenberg
- Adjunct Professor, Epidemiology and Biostatistics, Touro College of Dental Medicine at New York Medical College, Valhalla, NY
| | - John C Gunsolley
- Professor Emeritus, School of Dentistry, Virginia Commonwealth University, Richmond, VA
| | - Effie Ioannidou
- Professor and Chair of Orofacial Sciences, UCSF School of Dentistry, San Francisco, CA.
| |
Collapse
|
3
|
Auspurg K, Brüderl J. Toward a more credible assessment of the credibility of science by many-analyst studies. Proc Natl Acad Sci U S A 2024; 121:e2404035121. [PMID: 39236231 PMCID: PMC11420151 DOI: 10.1073/pnas.2404035121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024] Open
Abstract
We discuss a relatively new meta-scientific research design: many-analyst studies that attempt to assess the replicability and credibility of research based on large-scale observational data. In these studies, a large number of analysts try to answer the same research question using the same data. The key idea is the greater the variation in results, the greater the uncertainty in answering the research question and, accordingly, the lower the credibility of any individual research finding. Compared to individual replications, the large crowd of analysts allows for a more systematic investigation of uncertainty and its sources. However, many-analyst studies are also resource-intensive, and there are some doubts about their potential to provide credible assessments. We identify three issues that any many-analyst study must address: 1) identifying the source of variation in the results; 2) providing an incentive structure similar to that of standard research; and 3) conducting a proper meta-analysis of the results. We argue that some recent many-analyst studies have failed to address these issues satisfactorily and have therefore provided an overly pessimistic assessment of the credibility of science. We also provide some concrete guidance on how future many-analyst studies could provide a more constructive assessment.
Collapse
Affiliation(s)
- Katrin Auspurg
- Department of Sociology, Ludwig-Maximilians-Universität (LMU) Munich, Munich80801, Germany
| | - Josef Brüderl
- Department of Sociology, Ludwig-Maximilians-Universität (LMU) Munich, Munich80801, Germany
| |
Collapse
|
4
|
Fan K, Subedi S, Yang G, Lu X, Ren J, Wu C. Is Seeing Believing? A Practitioner's Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies. ENTROPY (BASEL, SWITZERLAND) 2024; 26:794. [PMID: 39330127 PMCID: PMC11430850 DOI: 10.3390/e26090794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 08/23/2024] [Accepted: 09/06/2024] [Indexed: 09/28/2024]
Abstract
Variable selection methods have been extensively developed for and applied to cancer genomics data to identify important omics features associated with complex disease traits, including cancer outcomes. However, the reliability and reproducibility of the findings are in question if valid inferential procedures are not available to quantify the uncertainty of the findings. In this article, we provide a gentle but systematic review of high-dimensional frequentist and Bayesian inferential tools under sparse models which can yield uncertainty quantification measures, including confidence (or Bayesian credible) intervals, p values and false discovery rates (FDR). Connections in high-dimensional inferences between the two realms have been fully exploited under the "unpenalized loss function + penalty term" formulation for regularization methods and the "likelihood function × shrinkage prior" framework for regularized Bayesian analysis. In particular, we advocate for robust Bayesian variable selection in cancer genomics studies due to its ability to accommodate disease heterogeneity in the form of heavy-tailed errors and structured sparsity while providing valid statistical inference. The numerical results show that robust Bayesian analysis incorporating exact sparsity has yielded not only superior estimation and identification results but also valid Bayesian credible intervals under nominal coverage probabilities compared with alternative methods, especially in the presence of heavy-tailed model errors and outliers.
Collapse
Affiliation(s)
- Kun Fan
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA
| | - Srijana Subedi
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA
| | - Gongshun Yang
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA
| | - Xi Lu
- Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston, Houston, TX 77204, USA
| | - Jie Ren
- Department of Biostatistics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Cen Wu
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA
| |
Collapse
|
5
|
Goligher EC, Heath A, Harhay MO. Bayesian statistics for clinical research. Lancet 2024; 404:1067-1076. [PMID: 39277290 DOI: 10.1016/s0140-6736(24)01295-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/25/2024] [Accepted: 06/16/2024] [Indexed: 09/17/2024]
Abstract
Frequentist and Bayesian statistics represent two differing paradigms for the analysis of data. Frequentism became the dominant mode of statistical thinking in medical practice during the 20th century. The advent of modern computing has made Bayesian analysis increasingly accessible, enabling growing use of Bayesian methods in a range of disciplines, including medical research. Rather than conceiving of probability as the expected frequency of an event (purported to be measurable and objective), Bayesian thinking conceives of probability as a measure of strength of belief (an explicitly subjective concept). Bayesian analysis combines previous information (represented by a mathematical probability distribution, the prior) with information from the study (the likelihood function) to generate an updated probability distribution (the posterior) representing the information available for clinical decision making. Owing to its fundamentally different conception of probability, Bayesian statistics offers an intuitive, flexible, and informative approach that facilitates the design, analysis, and interpretation of clinical trials. In this Review, we provide a brief account of the philosophical and methodological differences between Bayesian and frequentist approaches and survey the use of Bayesian methods for the design and analysis of clinical research.
Collapse
Affiliation(s)
- Ewan C Goligher
- Interdepartmental Division of Critical Care Medicine and Department of Physiology, University of Toronto, Toronto, ON, Canada; Department of Medicine, Division of Respirology, University Health Network, Toronto, ON, Canada; Toronto General Hospital Research Institute, Toronto, ON, Canada.
| | - Anna Heath
- Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada; Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Michael O Harhay
- Department of Statistical Science (A Heath), University College London, London, UK; MRC Clinical Trials Unit, University College London, London, UK; Department of Biostatistics, Epidemiology, and Informatics and Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
6
|
Holzmeister F, Johannesson M, Böhm R, Dreber A, Huber J, Kirchler M. Heterogeneity in effect size estimates. Proc Natl Acad Sci U S A 2024; 121:e2403490121. [PMID: 39078672 PMCID: PMC11317577 DOI: 10.1073/pnas.2403490121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 06/28/2024] [Indexed: 07/31/2024] Open
Abstract
A typical empirical study involves choosing a sample, a research design, and an analysis path. Variation in such choices across studies leads to heterogeneity in results that introduce an additional layer of uncertainty, limiting the generalizability of published scientific findings. We provide a framework for studying heterogeneity in the social sciences and divide heterogeneity into population, design, and analytical heterogeneity. Our framework suggests that after accounting for heterogeneity, the probability that the tested hypothesis is true for the average population, design, and analysis path can be much lower than implied by nominal error rates of statistically significant individual studies. We estimate each type's heterogeneity from 70 multilab replication studies, 11 prospective meta-analyses of studies employing different experimental designs, and 5 multianalyst studies. In our data, population heterogeneity tends to be relatively small, whereas design and analytical heterogeneity are large. Our results should, however, be interpreted cautiously due to the limited number of studies and the large uncertainty in the heterogeneity estimates. We discuss several ways to parse and account for heterogeneity in the context of different methodologies.
Collapse
Affiliation(s)
- Felix Holzmeister
- Department of Economics, University of Innsbruck, A-6020Innsbruck, Austria
| | - Magnus Johannesson
- Department of Economics, Stockholm School of Economics, SE-113 83Stockholm, Sweden
| | - Robert Böhm
- Department of Occupational, Economic, and Social Psychology, University of Vienna, A-1010Vienna, Austria
- Department of Psychology and Center for Social Data Science, University of Copenhagen, DK-1353Copenhagen, Denmark
| | - Anna Dreber
- Department of Economics, University of Innsbruck, A-6020Innsbruck, Austria
- Department of Economics, Stockholm School of Economics, SE-113 83Stockholm, Sweden
| | - Jürgen Huber
- Department of Banking and Finance, University of Innsbruck, A-6020Innsbruck, Austria
| | - Michael Kirchler
- Department of Banking and Finance, University of Innsbruck, A-6020Innsbruck, Austria
| |
Collapse
|
7
|
Sherry AD, Msaouel P, Kupferman GS, Lin TA, Abi Jaoude J, Kouzy R, El-Alam MB, Patel R, Koong A, Lin C, Passy AH, Miller AM, Beck EJ, Fuller CD, Meirson T, McCaw ZR, Ludmir EB. Towards Treatment Effect Interpretability: A Bayesian Re-analysis of 194,129 Patient Outcomes Across 230 Oncology Trials. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.23.24310891. [PMID: 39108512 PMCID: PMC11302607 DOI: 10.1101/2024.07.23.24310891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]
Abstract
Most oncology trials define superiority of an experimental therapy compared to a control therapy according to frequentist significance thresholds, which are widely misinterpreted. Posterior probability distributions computed by Bayesian inference may be more intuitive measures of uncertainty, particularly for measures of clinical benefit such as the minimum clinically important difference (MCID). Here, we manually reconstructed 194,129 individual patient-level outcomes across 230 phase III, superiority-design, oncology trials. Posteriors were calculated by Markov Chain Monte Carlo sampling using standard priors. All trials interpreted as positive had probabilities > 90% for marginal benefits (HR < 1). However, 38% of positive trials had ≤ 90% probabilities of achieving the MCID (HR < 0.8), even under an enthusiastic prior. A subgroup analysis of 82 trials that led to regulatory approval showed 30% had ≤ 90% probability for meeting the MCID under an enthusiastic prior. Conversely, 24% of negative trials had > 90% probability of achieving marginal benefits, even under a skeptical prior, including 12 trials with a primary endpoint of overall survival. Lastly, a phase III oncology-specific prior from a previous work, which uses published summary statistics rather than reconstructed data to compute posteriors, validated the individual patient-level data findings. Taken together, these results suggest that Bayesian models add considerable unique interpretative value to phase III oncology trials and provide a robust solution for overcoming the discrepancies between refuting the null hypothesis and obtaining a MCID.
Collapse
Affiliation(s)
- Alexander D Sherry
- Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Pavlos Msaouel
- Department of Genitourinary Medical Oncology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Gabrielle S Kupferman
- Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Timothy A Lin
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Joseph Abi Jaoude
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Ramez Kouzy
- Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Molly B El-Alam
- Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Roshal Patel
- Department of Radiation Oncology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Alex Koong
- Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Christine Lin
- Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Adina H Passy
- Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Avital M Miller
- Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Esther J Beck
- Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - C David Fuller
- Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tomer Meirson
- Davidoff Cancer Center, Rabin Medical Center-Beilinson Hospital, Petach Tikva, Israel
| | - Zachary R McCaw
- Insitro, South San Francisco, CA, USA
- Department of Biomedical Informatics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ethan B Ludmir
- Department of Gastrointestinal Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
8
|
Mandl MM, Becker-Pennrich AS, Hinske LC, Hoffmann S, Boulesteix AL. Addressing researcher degrees of freedom through minP adjustment. BMC Med Res Methodol 2024; 24:152. [PMID: 39020325 PMCID: PMC11253496 DOI: 10.1186/s12874-024-02279-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 07/09/2024] [Indexed: 07/19/2024] Open
Abstract
When different researchers study the same research question using the same dataset they may obtain different and potentially even conflicting results. This is because there is often substantial flexibility in researchers' analytical choices, an issue also referred to as "researcher degrees of freedom". Combined with selective reporting of the smallest p-value or largest effect, researcher degrees of freedom may lead to an increased rate of false positive and overoptimistic results. In this paper, we address this issue by formalizing the multiplicity of analysis strategies as a multiple testing problem. As the test statistics of different analysis strategies are usually highly dependent, a naive approach such as the Bonferroni correction is inappropriate because it leads to an unacceptable loss of power. Instead, we propose using the "minP" adjustment method, which takes potential test dependencies into account and approximates the underlying null distribution of the minimal p-value through a permutation-based procedure. This procedure is known to achieve more power than simpler approaches while ensuring a weak control of the family-wise error rate. We illustrate our approach for addressing researcher degrees of freedom by applying it to a study on the impact of perioperative p a O 2 on post-operative complications after neurosurgery. A total of 48 analysis strategies are considered and adjusted using the minP procedure. This approach allows to selectively report the result of the analysis strategy yielding the most convincing evidence, while controlling the type 1 error-and thus the risk of publishing false positive results that may not be replicable.
Collapse
Affiliation(s)
- Maximilian M Mandl
- Institute for Medical Information Processing, Biometry, and Epidemiology, Faculty of Medicine, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany.
- Munich Center for Machine Learning (MCML), Munich, Germany.
| | - Andrea S Becker-Pennrich
- Institute for Medical Information Processing, Biometry, and Epidemiology, Faculty of Medicine, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany
- Department of Anaesthesiology, LMU University Hospital, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany
| | - Ludwig C Hinske
- Department of Anaesthesiology, LMU University Hospital, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany
- Institute for Digital Medicine, University Hospital of Augsburg, University of Augsburg, Stenglinstr. 2, Augsburg, 86156, Germany
| | - Sabine Hoffmann
- Department of Statistics, LMU Munich, Ludwigstr. 33, Munich, 80539, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry, and Epidemiology, Faculty of Medicine, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany
- Munich Center for Machine Learning (MCML), Munich, Germany
| |
Collapse
|
9
|
Best AM, Lang TA, Greenberg BL, Gunsolley JC, Ioannidou E. The OHStat Guidelines for Reporting Observational Studies and Clinical Trials in Oral Health Research: explanation and elaboration. J Am Dent Assoc 2024:S0002-8177(24)00316-7. [PMID: 39001723 DOI: 10.1016/j.adaj.2024.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/15/2024]
Abstract
Adequate and transparent reporting is necessary for critically appraising research. Yet, evidence suggests that the design, conduct, analysis, interpretation, and reporting of oral health research could be greatly improved. Accordingly, the Task Force on Design and Analysis in Oral Health Research-statisticians and trialists from academia and industry-empaneled a group of authors to develop methodological and statistical reporting guidelines identifying the minimum information needed to document and evaluate observational studies and clinical trials in oral health: the OHstat Guidelines. Drafts were circulated to the editors of 85 oral health journals and to Task Force members and sponsors and discussed at a December 2020 workshop attended by 49 researchers. The final version was subsequently approved by the Task Force in September 2021, submitted for journal review in 2022, and revised in 2023. The checklist consists of 48 guidelines: 5 for introductory information, 17 for methods, 13 for statistical analysis, 6 for results, and 7 for interpretation; 7 are specific to clinical trials. Each of these guidelines identifies relevant information, explains its importance, and often describes best practices. The checklist was published in multiple journals. The article was published simultaneously in JDR Clinical and Translational Research, the Journal of the American Dental Association, and the Journal of Oral and Maxillofacial Surgery. Completed checklists should accompany manuscripts submitted for publication to these and other oral health journals to help authors, journal editors, and reviewers verify that the manuscript provides the information necessary to adequately document and evaluate the research.
Collapse
|
10
|
Best AM, Lang TA, Greenberg BL, Gunsolley JC, Ioannidou E. The OHStat Guidelines for Reporting Observational Studies and Clinical Trials in Oral Health Research: Explanation and Elaboration. JDR Clin Trans Res 2024:23800844241247029. [PMID: 38993046 DOI: 10.1177/23800844241247029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024] Open
Abstract
Adequate and transparent reporting is necessary for critically appraising research. Yet, evidence suggests that the design, conduct, analysis, interpretation, and reporting of oral health research could be greatly improved. Accordingly, the Task Force on Design and Analysis in Oral Health Research-statisticians and trialists from academia and industry-empaneled a group of authors to develop methodological and statistical reporting guidelines identifying the minimum information needed to document and evaluate observational studies and clinical trials in oral health: the OHstat Guidelines. Drafts were circulated to the editors of 85 oral health journals and to Task Force members and sponsors and discussed at a December 2020 workshop attended by 49 researchers. The final version was subsequently approved by the Task Force in September 2021, submitted for journal review in 2022, and revised in 2023. The checklist consists of 48 guidelines: 5 for introductory information, 17 for methods, 13 for statistical analysis, 6 for results, and 7 for interpretation; 7 are specific to clinical trials. Each of these guidelines identifies relevant information, explains its importance, and often describes best practices. The checklist was published in multiple journals. The article was published simultaneously in JDR Clinical and Translational Research, the Journal of the American Dental Association, and the Journal of Oral and Maxillofacial Surgery. Completed checklists should accompany manuscripts submitted for publication to these and other oral health journals to help authors, journal editors, and reviewers verify that the manuscript provides the information necessary to adequately document and evaluate the research.
Collapse
Affiliation(s)
- A M Best
- School of Dentistry and Department of Biostatistics, School of Medicine, Virginia Commonwealth University, Richmond, VA, USA
| | - T A Lang
- University of Chicago Medical Writing Program, Chicago, IL, USA
| | - B L Greenberg
- Epidemiology and Biostatistics, Touro College of Dental Medicine at New York Medical College, Valhalla, NY, USA
| | - J C Gunsolley
- School of Dentistry, Virginia Commonwealth University, Richmond, VA, USA
| | - E Ioannidou
- UCSF School of Dentistry, San Francisco, CA, USA
| |
Collapse
|
11
|
Demidenko MI, Mumford JA, Poldrack RA. Impact of analytic decisions on test-retest reliability of individual and group estimates in functional magnetic resonance imaging: a multiverse analysis using the monetary incentive delay task. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.19.585755. [PMID: 38562804 PMCID: PMC10983911 DOI: 10.1101/2024.03.19.585755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Empirical studies reporting low test-retest reliability of individual blood oxygen-level dependent (BOLD) signal estimates in functional magnetic resonance imaging (fMRI) data have resurrected interest among cognitive neuroscientists in methods that may improve reliability in fMRI. Over the last decade, several individual studies have reported that modeling decisions, such as smoothing, motion correction and contrast selection, may improve estimates of test-retest reliability of BOLD signal estimates. However, it remains an empirical question whether certain analytic decisions consistently improve individual and group level reliability estimates in an fMRI task across multiple large, independent samples. This study used three independent samples (Ns: 60, 81, 119) that collected the same task (Monetary Incentive Delay task) across two runs and two sessions to evaluate the effects of analytic decisions on the individual (intraclass correlation coefficient [ICC(3,1)]) and group (Jaccard/Spearman rho) reliability estimates of BOLD activity of task fMRI data. The analytic decisions in this study vary across four categories: smoothing kernel (five options), motion correction (four options), task parameterizing (three options) and task contrasts (four options), totaling 240 different pipeline permutations. Across all 240 pipelines, the median ICC estimates are consistently low, with a maximum median ICC estimate of .43 - .55 across the three samples. The analytic decisions with the greatest impact on the median ICC and group similarity estimates are the Implicit Baseline contrast, Cue Model parameterization and a larger smoothing kernel. Using an Implicit Baseline in a contrast condition meaningfully increased group similarity and ICC estimates as compared to using the Neutral cue. This effect was largest for the Cue Model parameterization; however, improvements in reliability came at the cost of interpretability. This study illustrates that estimates of reliability in the MID task are consistently low and variable at small samples, and a higher test-retest reliability may not always improve interpretability of the estimated BOLD signal.
Collapse
|
12
|
Phalen PL, Kivisto AJ. Research on Youth Suicide and Sexual Orientation is Impacted by High Rates of Missingness in National Surveillance Systems. Arch Suicide Res 2024; 28:791-799. [PMID: 37350065 DOI: 10.1080/13811118.2023.2227233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/24/2023]
Abstract
OBJECTIVE The sexual orientation of youth who die by suicide in the United States is usually unknown. This study assessed how observed patterns of unknown sexual orientation are likely to affect research findings. METHODS We analyzed the National Violent Death Reporting System (NVDRS) Restricted Access Dataset to assess whether sexual orientation among youth suicide decedents is disproportionately known for different demographics. We then assessed the degree to which estimated sexual minority rates would be affected if researchers were to assume either (a) that sexual orientation data is missing completely at random, or (b) that orientation information is missing at random after accounting for observed demographic patterns. RESULTS <10% of the sample had known sexual orientation. Sexual orientation was more frequently known for females, white people, and older people, and missingness varied by geography. The choice between modeling the data as missing completely at random versus at random conditional upon demographics had a > 2-fold impact on estimated sexual minority rates among youth suicide decedents. CONCLUSION Research on sexual orientation and youth suicide is strongly impacted by how researchers account (or do not account) for missingness.
Collapse
|
13
|
Sarafoglou A, Hoogeveen S, van den Bergh D, Aczel B, Albers CJ, Althoff T, Botvinik-Nezer R, Busch NA, Cataldo AM, Devezer B, van Dongen NNN, Dreber A, Fried EI, Hoekstra R, Hoffman S, Holzmeister F, Huber J, Huntington-Klein N, Ioannidis J, Johannesson M, Kirchler M, Loken E, Mangin JF, Matzke D, Menkveld AJ, Nilsonne G, van Ravenzwaaij D, Schweinsberg M, Schulz-Kuempel H, Shanks DR, Simons DJ, Spellman BA, Stoevenbelt AH, Szaszi B, Trübutschek D, Tuerlinckx F, Uhlmann EL, Vanpaemel W, Wicherts J, Wagenmakers EJ. Subjective evidence evaluation survey for many-analysts studies. ROYAL SOCIETY OPEN SCIENCE 2024; 11:240125. [PMID: 39050728 PMCID: PMC11265885 DOI: 10.1098/rsos.240125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 04/22/2024] [Indexed: 07/27/2024]
Abstract
Many-analysts studies explore how well an empirical claim withstands plausible alternative analyses of the same dataset by multiple, independent analysis teams. Conclusions from these studies typically rely on a single outcome metric (e.g. effect size) provided by each analysis team. Although informative about the range of plausible effects in a dataset, a single effect size from each team does not provide a complete, nuanced understanding of how analysis choices are related to the outcome. We used the Delphi consensus technique with input from 37 experts to develop an 18-item subjective evidence evaluation survey (SEES) to evaluate how each analysis team views the methodological appropriateness of the research design and the strength of evidence for the hypothesis. We illustrate the usefulness of the SEES in providing richer evidence assessment with pilot data from a previous many-analysts study.
Collapse
Affiliation(s)
| | | | - Don van den Bergh
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
| | - Balazs Aczel
- Institute of Psychology, ELTE Eötvös Lorénd University, Budapest, Hungary
| | - Casper J. Albers
- Heymans Institute for Psychological Research, University of Groningen, Groningen, The Netherlands
| | - Tim Althoff
- Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Rotem Botvinik-Nezer
- Hebrew University of Jerusalem, Jerusalem, Israel
- Dartmouth College, Hanover, NH, USA
| | - Niko A. Busch
- Institute for Psychology, University of Münster, Münster, Germany
| | - Andrea M. Cataldo
- Center for Depression, Anxiety and Stress Research, McLean Hospital, Belmont, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
| | - Berna Devezer
- Department of Business, University of Idaho, Moscow, ID, USA
| | | | - Anna Dreber
- Stockholm School of Economics, Stockholm, Sweden
- University of Innsbruck, Innsbruck, Tirol, Austria
| | - Eiko I. Fried
- Department of Psychology, Leiden University, Leiden, The Netherlands
| | - Rink Hoekstra
- Nieuwenhuis Institute for Educational Research, University of Groningen, Groningen, The Netherlands
| | - Sabine Hoffman
- Department of Statistics, Ludwig-Maximilians-Universität München, Munchen, Bayern, Germany
| | | | - Jürgen Huber
- University of Innsbruck, Innsbruck, Tirol, Austria
| | | | - John Ioannidis
- Meta-Research Innovation Center at Stanford (METRICS) and Departments of Medicine, of Epidemiology and of Population Health, of Biomedical Data Science, and of Statistics, Stanford University, Stanford, CA, USA
| | | | | | - Eric Loken
- University of Conneticut, Storrs, CT, USA
| | - Jan-Francois Mangin
- University Paris-Saclay, Gif-sur-Yvette, France
- Neurospin CEA, Gif-sur-Yvette, Île-de-France, France
| | - Dora Matzke
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
| | | | | | - Don van Ravenzwaaij
- Heymans Institute for Psychological Research, University of Groningen, Groningen, The Netherlands
| | | | - Hannah Schulz-Kuempel
- Department of Statistics and The Institute for Medical Information Processing, Biometry, and Epidemiology, LMU Munich, Munchen, Bayern, Germany
- The Institute for Medical Information Processing, Biometry, and Epidemiology, LMU Munich, Munchen, Bayern, Germany
| | - David R. Shanks
- Division of Psychology and Language Sciences, University College London, 26 Bedford Way, London WC1H 0AP, UK
| | | | - Barbara A. Spellman
- School of Law, University of Virginia, 580 Massie Road, Charlottesville, VA, USA
| | - Andrea H. Stoevenbelt
- Nieuwenhuis Institute for Educational Research, University of Groningen, Groningen, The Netherlands
| | - Barnabas Szaszi
- Institute of Psychology, ELTE Eötvös Lorénd University, Budapest, Hungary
| | | | | | | | | | - Jelte Wicherts
- Department of Methodology and Statistics, Tilburg University, Tilburg, The Netherlands
| | | |
Collapse
|
14
|
Scoggins B, Robertson MP. Measuring transparency in the social sciences: political science and international relations. ROYAL SOCIETY OPEN SCIENCE 2024; 11:240313. [PMID: 39076374 PMCID: PMC11285849 DOI: 10.1098/rsos.240313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/11/2024] [Accepted: 05/23/2024] [Indexed: 07/31/2024]
Abstract
The scientific method is predicated on transparency-yet the pace at which transparent research practices are being adopted by the scientific community is slow. The replication crisis in psychology showed that published findings employing statistical inference are threatened by undetected errors, data manipulation and data falsification. To mitigate these problems and bolster research credibility, open data and preregistration practices have gained traction in the natural and social sciences. However, the extent of their adoption in different disciplines is unknown. We introduce computational procedures to identify the transparency of a research field using large-scale text analysis and machine learning classifiers. Using political science and international relations as an illustrative case, we examine 93 931 articles across the top 160 political science and international relations journals between 2010 and 2021. We find that approximately 21% of all statistical inference papers have open data and 5% of all experiments are preregistered. Despite this shortfall, the example of leading journals in the field shows that change is feasible and can be effected quickly.
Collapse
Affiliation(s)
- Bermond Scoggins
- School of Politics and International Relations, Australian National University, Canberra, Australia
- School of Social Sciences, Monash University, Melbourne, Australia
| | - Matthew P. Robertson
- School of Politics and International Relations, Australian National University, Canberra, Australia
- Department of Social Data Science, University of Mannheim, Mannheim, Germany
| |
Collapse
|
15
|
Handel DV, Hanushek EA. Contexts of Convenience: Generalizing from Published Evaluations of School Finance Policies. EVALUATION REVIEW 2024; 48:461-494. [PMID: 38297893 DOI: 10.1177/0193841x241228335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2024]
Abstract
Recent attention to the causal identification of spending impacts provides improved estimates of spending outcomes in a variety of circumstances, but the estimates are substantially different across studies. Half of the variation in estimated funding impact on test scores and over three-quarters of the variation of impacts on school attainment reflect differences in the true parameters across study contexts. Unfortunately, inability to describe the circumstances underlying effective school spending impedes any attempts to generalize from the extant results to new policy situations. The evidence indicates that how funds are used is crucial to the outcomes, but such factors as targeting of funds or court interventions fail to explain the existing pattern of results.
Collapse
|
16
|
Girardi P, Vesely A, Lakens D, Altoè G, Pastore M, Calcagnì A, Finos L. Post-selection Inference in Multiverse Analysis (PIMA): An Inferential Framework Based on the Sign Flipping Score Test. PSYCHOMETRIKA 2024; 89:542-568. [PMID: 38664342 DOI: 10.1007/s11336-024-09973-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Indexed: 06/11/2024]
Abstract
When analyzing data, researchers make some choices that are either arbitrary, based on subjective beliefs about the data-generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse analysis provides researchers with a method to evaluate the stability of the results across reasonable choices that could be made when analyzing data. Multiverse analysis is confined to a descriptive role, lacking a proper and comprehensive inferential procedure. Recently, specification curve analysis adds an inferential procedure to multiverse analysis, but this approach is limited to simple cases related to the linear model, and only allows researchers to infer whether at least one specification rejects the null hypothesis, but not which specifications should be selected. In this paper, we present a Post-selection Inference approach to Multiverse Analysis (PIMA) which is a flexible and general inferential approach that considers for all possible models, i.e., the multiverse of reasonable analyses. The approach allows for a wide range of data specifications (i.e., preprocessing) and any generalized linear model; it allows testing the null hypothesis that a given predictor is not associated with the outcome, by combining information from all reasonable models of multiverse analysis, and provides strong control of the family-wise error rate allowing researchers to claim that the null hypothesis can be rejected for any specification that shows a significant effect. The inferential proposal is based on a conditional resampling procedure. We formally prove that the Type I error rate is controlled, and compute the statistical power of the test through a simulation study. Finally, we apply the PIMA procedure to the analysis of a real dataset on the self-reported hesitancy for the COronaVIrus Disease 2019 (COVID-19) vaccine before and after the 2020 lockdown in Italy. We conclude with practical recommendations to be considered when implementing the proposed procedure.
Collapse
Affiliation(s)
- Paolo Girardi
- Department of Environmental Sciences, Informatics and Statistics, Ca' Foscari University of Venice, Via Torino 155, 30172, Venezia-Mestre, VE, Italy.
| | - Anna Vesely
- Department of Statistical Sciences, University of Bologna, Bologna, Italy
| | - Daniël Lakens
- Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Gianmarco Altoè
- Department of Developmental Psychology and Socialisation, University of Padova, Padua, Italy
| | - Massimiliano Pastore
- Department of Developmental Psychology and Socialisation, University of Padova, Padua, Italy
| | - Antonio Calcagnì
- Department of Developmental Psychology and Socialisation, University of Padova, Padua, Italy
- GNCS Research Group, GNCS-INdAM RESEARCH GROUP, Rome, Italy
| | - Livio Finos
- Department of Statistical Sciences, University of Padova, Padua, Italy
| |
Collapse
|
17
|
Fitzpatrick BG, Gorman DM, Trombatore C. Impact of redefining statistical significance on P-hacking and false positive rates: An agent-based model. PLoS One 2024; 19:e0303262. [PMID: 38753677 PMCID: PMC11098386 DOI: 10.1371/journal.pone.0303262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 04/23/2024] [Indexed: 05/18/2024] Open
Abstract
In recent years, concern has grown about the inappropriate application and interpretation of P values, especially the use of P<0.05 to denote "statistical significance" and the practice of P-hacking to produce results below this threshold and selectively reporting these in publications. Such behavior is said to be a major contributor to the large number of false and non-reproducible discoveries found in academic journals. In response, it has been proposed that the threshold for statistical significance be changed from 0.05 to 0.005. The aim of the current study was to use an evolutionary agent-based model comprised of researchers who test hypotheses and strive to increase their publication rates in order to explore the impact of a 0.005 P value threshold on P-hacking and published false positive rates. Three scenarios were examined, one in which researchers tested a single hypothesis, one in which they tested multiple hypotheses using a P<0.05 threshold, and one in which they tested multiple hypotheses using a P<0.005 threshold. Effects sizes were varied across models and output assessed in terms of researcher effort, number of hypotheses tested and number of publications, and the published false positive rate. The results supported the view that a more stringent P value threshold can serve to reduce the rate of published false positive results. Researchers still engaged in P-hacking with the new threshold, but the effort they expended increased substantially and their overall productivity was reduced, resulting in a decline in the published false positive rate. Compared to other proposed interventions to improve the academic publishing system, changing the P value threshold has the advantage of being relatively easy to implement and could be monitored and enforced with minimal effort by journal editors and peer reviewers.
Collapse
Affiliation(s)
- Ben G. Fitzpatrick
- Department of Mathematics, Loyola Marymount University, Los Angeles, California, United States of America
- Tempest Technologies, Los Angeles, California, United States of America
| | - Dennis M. Gorman
- Department of Epidemiology & Biostatistics, School of Public Health, Texas A&M University, College Station, Texas, United States of America
| | - Caitlin Trombatore
- Department of Mathematics, Loyola Marymount University, Los Angeles, California, United States of America
| |
Collapse
|
18
|
Lambert WC, Lambert MW, Emamian MH, Woźniak M, Grzybowski A. Artificial intelligence and the scientific method: How to cope with a complete oxymoron. Clin Dermatol 2024; 42:275-279. [PMID: 38216002 DOI: 10.1016/j.clindermatol.2023.12.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2024]
Abstract
Artificial intelligence (AI) can be a powerful tool for data analysis, but it can also mislead investigators, due in part to a fundamental difference between classic data analysis and data analysis using AI. A more or less limited data set is analyzed in classic data analysis, and a hypothesis is generated. That hypothesis is then tested using a separate data set, and the data are examined again. The premise is either accepted or rejected with a value p, indicating that any difference observed is due merely to chance. By contrast, a new hypothesis is generated in AI as each datum is added to the data set. We explore this discrepancy and suggest means to overcome it.
Collapse
Affiliation(s)
- W Clark Lambert
- Departments of Dermatology and of Pathology, Immunology and Laboratory Medicine, Rutgers-New Jersey Medical School, Newark, New Jersey, USA.
| | - Muriel W Lambert
- Departments of Dermatology and of Pathology, Immunology and Laboratory Medicine, Rutgers-New Jersey Medical School, Newark, New Jersey, USA
| | - Mohammad Hassan Emamian
- Ophthalmic Epidemiology Research Center, Shahroud University of Medical Sciences, Shahroud, Iran
| | - Michał Woźniak
- Department of Systems and Computer Networks, Faculty of ICT, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - Andrzej Grzybowski
- Institute for Research in Ophthalmology, Foundation for Ophthalmology Development, Poznan, Poland
| |
Collapse
|
19
|
Calignano G, Girardi P, Altoè G. First steps into the pupillometry multiverse of developmental science. Behav Res Methods 2024; 56:3346-3365. [PMID: 37442879 PMCID: PMC11133157 DOI: 10.3758/s13428-023-02172-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/14/2023] [Indexed: 07/15/2023]
Abstract
Pupillometry has been widely implemented to investigate cognitive functioning since infancy. Like most psychophysiological and behavioral measures, it implies hierarchical levels of arbitrariness in preprocessing before statistical data analysis. By means of an illustrative example, we checked the robustness of the results of a familiarization procedure that compared the impact of audiovisual and visual stimuli in 12-month-olds. We adopted a multiverse approach to pupillometry data analysis to explore the role of (1) the preprocessing phase, that is, handling of extreme values, selection of the areas of interest, management of blinks, baseline correction, participant inclusion/exclusion and (2) the modeling structure, that is, the incorporation of smoothers, fixed and random effects structure, in guiding the parameter estimation. The multiverse of analyses shows how the preprocessing steps influenced the regression results, and when visual stimuli plausibly predicted an increase of resource allocation compared with audiovisual stimuli. Importantly, smoothing time in statistical models increased the plausibility of the results compared to those nested models that do not weigh the impact of time. Finally, we share theoretical and methodological tools to move the first steps into (rather than being afraid of) the inherent uncertainty of infant pupillometry.
Collapse
Affiliation(s)
- Giulia Calignano
- Department of Developmental and Social Psychology, University of Padua, Padua, Italy.
| | - Paolo Girardi
- Department of Developmental and Social Psychology, University of Padua, Padua, Italy
- Department of Environmental Sciences Informatics and Statistics, Ca' Foscari University, Venice, Italy
| | - Gianmarco Altoè
- Department of Developmental and Social Psychology, University of Padua, Padua, Italy
| |
Collapse
|
20
|
Nardo D, Anderson MC. Everything you ever wanted to know about the Think/No-Think task, but forgot to ask. Behav Res Methods 2024; 56:3831-3860. [PMID: 38379115 PMCID: PMC11133138 DOI: 10.3758/s13428-024-02349-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/22/2024] [Indexed: 02/22/2024]
Abstract
The Think/No-Think (TNT) task has just celebrated 20 years since its inception, and its use has been growing as a tool to investigate the mechanisms underlying memory control and its neural underpinnings. Here, we present a theoretical and practical guide for designing, implementing, and running TNT studies. For this purpose, we provide a step-by-step description of the structure of the TNT task, methodological choices that can be made, parameters that can be chosen, instruments available, aspects to be aware of, systematic information about how to run a study and analyze the data. Importantly, we provide a TNT training package (as Supplementary Material), that is, a series of multimedia materials (e.g., tutorial videos, informative HTML pages, MATLAB code to run experiments, questionnaires, scoring sheets, etc.) to complement this method paper and facilitate a deeper understanding of the TNT task, its rationale, and how to set it up in practice. Given the recent discussion about the replication crisis in the behavioral sciences, we hope that this contribution will increase standardization, reliability, and replicability across laboratories.
Collapse
Affiliation(s)
- Davide Nardo
- Department of Education, University of Roma Tre, Rome, Italy.
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK.
| | - Michael C Anderson
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
21
|
Khan K, Hall CL, Babbage C, Dodzo S, Greenhalgh C, Lucassen M, Merry S, Sayal K, Sprange K, Stasiak K, Tench CR, Townsend E, Stallard P, Hollis C. Precision computerised cognitive behavioural therapy (cCBT) for adolescents with depression: a pilot and feasibility randomised controlled trial protocol for SPARX-UK. Pilot Feasibility Stud 2024; 10:53. [PMID: 38532490 DOI: 10.1186/s40814-024-01475-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 03/12/2024] [Indexed: 03/28/2024] Open
Abstract
BACKGROUND A serious game called SPARX (Smart, Positive, Active, Realistic, X-factor thoughts), originally developed in New Zealand and incorporating cognitive behavioural therapy (CBT) principles, has been shown to help reduce symptoms of depression and anxiety in adolescents with mild to moderate depression in studies undertaken in Australasia. However, SPARX has never been trialled in the United Kingdom (UK), and there have been issues relating to low engagement when it has been used in a real-world context. AIMS To conduct the first pilot and feasibility randomised controlled trial (RCT) in England to explore the use of SPARX in different settings. The trial will explore whether SPARX supported by an e-coach (assistant psychologists) improves adherence and engagement compared with self-directed (i.e. self-help) use. The trial results will be used to inform the optimal mode of delivery (SPARX supported vs. SPARX self-directed), to calculate an appropriate sample size for a full RCT, and to decide which setting is most suitable. METHODS Following consultation with young people to ensure study suitability/appropriateness, a total of 120 adolescents (11-19 years) will be recruited for this three-arm study. Adolescents recruited for the study across England will be randomised to receive either SPARX with human support (from an e-coach), self-directed SPARX, or a waitlist control group. Assessments will be conducted online at baseline, week 4, and 8-10-week post-randomisation. The assessments will include measures which capture demographic, depression (Patient Health Questionnaire modified for adolescents [PHQ-A]) and anxiety (Revised Child Anxiety and Depression Scale [RCADS]) symptomatology, and health-related quality-of-life data (EQ-5D-Y and proxy version). Analyses will be primarily descriptive. Qualitative interviews will be undertaken with a proportion of the participants and clinical staff as part of a process evaluation, and the qualitative data gathered will be thematically analysed. Finally, feasibility data will be collected on recruitment details, overall study uptake and engagement with SPARX, participant retention, and youth-reported acceptability of the intervention. DISCUSSION The findings will inform the design of a future definitive RCT of SPARX in the UK. If the subsequent definitive RCT demonstrates that SPARX is effective, then an online serious game utilising CBT principles ultimately has the potential to improve the provision of care within the UK's health services if delivered en masse. TRIAL REGISTRATION ISRCTN: ISRCTN15124804. Registered on 16 January 2023, https://www.isrctn.com/ISRCTN15124804 .
Collapse
Affiliation(s)
- K Khan
- Mental Health & Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, UK.
- NIHR MindTech MedTech Co-operative, Institute of Mental Health, University of Nottingham, Nottingham, NG7 2TU, UK.
| | - C L Hall
- Mental Health & Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, UK
- NIHR MindTech MedTech Co-operative, Institute of Mental Health, University of Nottingham, Nottingham, NG7 2TU, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham, UK
| | - C Babbage
- Mental Health & Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, UK
- NIHR MindTech MedTech Co-operative, Institute of Mental Health, University of Nottingham, Nottingham, NG7 2TU, UK
| | - S Dodzo
- NIHR MindTech MedTech Co-operative, Institute of Mental Health, University of Nottingham, Nottingham, NG7 2TU, UK
| | - C Greenhalgh
- School of Computer Science, University of Nottingham, Nottingham, UK
| | - M Lucassen
- School of Health and Psychological Sciences, University of London, London, UK
- School of Medicine, University of Auckland, Auckland, New Zealand
| | - S Merry
- School of Medicine, University of Auckland, Auckland, New Zealand
| | - K Sayal
- Mental Health & Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, UK
- Centre for Mood Disorders, Institute of Mental Health, University of Nottingham, Nottingham, UK
| | - K Sprange
- Nottingham Clinical Trials Unit, University of Nottingham, Nottingham, UK
| | - K Stasiak
- School of Medicine, University of Auckland, Auckland, New Zealand
| | - C R Tench
- Mental Health & Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham, UK
- Precision Imaging Beacon, Queen's Medical Centre, Nottingham, UK
| | - E Townsend
- School of Psychology, University of Nottingham, Nottingham, UK
| | - P Stallard
- Department for Health, University of Bath, Bath, UK
| | - C Hollis
- Mental Health & Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, UK
- NIHR MindTech MedTech Co-operative, Institute of Mental Health, University of Nottingham, Nottingham, NG7 2TU, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham, UK
| |
Collapse
|
22
|
Clayson PE. Beyond single paradigms, pipelines, and outcomes: Embracing multiverse analyses in psychophysiology. Int J Psychophysiol 2024; 197:112311. [PMID: 38296000 DOI: 10.1016/j.ijpsycho.2024.112311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/02/2024] [Accepted: 01/24/2024] [Indexed: 02/10/2024]
Abstract
Psychophysiological research is an inherently complex undertaking due to the nature of the data, and its analysis is characterized by many decision points that shape the final dataset and a study's findings. These decisions create a "multiverse" of possible outcomes, and each decision from study conceptualization to statistical analysis can lead to different results and interpretations. This review describes the concept of multiverse analyses, a methodological approach designed to understand the impact of different decisions on the robustness of a study's findings and interpretation. The emphasis is on transparently showcasing different reasonable approaches for constructing a final dataset and on highlighting the influence of various decision points, from experimental design to data processing and outcome selection. For example, the choice of an experimental task can significantly impact event-related brain potential (ERP) scores or skin conductance responses (SCRs), and different tasks might elicit unique variances in each measure. This review underscores the importance of transparently embracing the flexibility inherent in psychophysiological research and the potential consequences of not understanding the fragility or robustness of experimental findings. By navigating the intricate terrain of the psychophysiological multiverse, this review serves as an introduction, helping researchers to make informed decisions, improve the collective understanding of psychophysiological findings, and push the boundaries of the field.
Collapse
Affiliation(s)
- Peter E Clayson
- Department of Psychology, University of South Florida, Tampa, FL, USA.
| |
Collapse
|
23
|
Shatz I. Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics. Behav Res Methods 2024; 56:826-845. [PMID: 36869217 PMCID: PMC10830673 DOI: 10.3758/s13428-023-02072-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/17/2023] [Indexed: 03/05/2023]
Abstract
Statistical methods generally have assumptions (e.g., normality in linear regression models). Violations of these assumptions can cause various issues, like statistical errors and biased estimates, whose impact can range from inconsequential to critical. Accordingly, it is important to check these assumptions, but this is often done in a flawed way. Here, I first present a prevalent but problematic approach to diagnostics-testing assumptions using null hypothesis significance tests (e.g., the Shapiro-Wilk test of normality). Then, I consolidate and illustrate the issues with this approach, primarily using simulations. These issues include statistical errors (i.e., false positives, especially with large samples, and false negatives, especially with small samples), false binarity, limited descriptiveness, misinterpretation (e.g., of p-value as an effect size), and potential testing failure due to unmet test assumptions. Finally, I synthesize the implications of these issues for statistical diagnostics, and provide practical recommendations for improving such diagnostics. Key recommendations include maintaining awareness of the issues with assumption tests (while recognizing they can be useful), using appropriate combinations of diagnostic methods (including visualization and effect sizes) while recognizing their limitations, and distinguishing between testing and checking assumptions. Additional recommendations include judging assumption violations as a complex spectrum (rather than a simplistic binary), using programmatic tools that increase replicability and decrease researcher degrees of freedom, and sharing the material and rationale involved in the diagnostics.
Collapse
|
24
|
Nivison M, Caldo PD, Magro SW, Raby KL, Groh AM, Vandell DL, Booth-LaForce C, Fraley RC, Carlson EA, Simpson JA, Roisman GI. The predictive validity of the strange situation procedure: Evidence from registered analyses of two landmark longitudinal studies. Dev Psychopathol 2023:1-17. [PMID: 38086607 PMCID: PMC11169091 DOI: 10.1017/s0954579423001487] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
Meta-analyses demonstrate that the quality of early attachment is modestly associated with peer social competence (r = .19) and externalizing behavior (r = -.15), but weakly associated with internalizing symptoms (r = -.07) across early development (Groh et al., Child Development Perspectives, 11(1), 70-76, 2017). Nonetheless, these reviews suffer from limitations that undermine confidence in reported estimates, including evidence for publication bias and the lack of comprehensive assessments of outcome measures from longitudinal studies in the literature. Moreover, theoretical claims regarding the specificity of the predictive significance of early attachment variation for socioemotional versus academic outcomes had not been evaluated when the analyses for this report were registered (but see Dagan et al., Child Development, 1-20, 2023; Deneault et al., Developmental Review, 70, 101093, 2023). To address these limitations, we conducted a set of registered analyses to evaluate the predictive validity of infant attachment in two landmark studies of the Strange Situation: the Minnesota Longitudinal Study of Risk and Adaptation (MLSRA) and the NICHD Study of Early Child Care and Youth Development (SECCYD). Across-time composite assessments reflecting teacher report, mother report, and self-reports of each outcome measure were created. Bivariate associations between infant attachment security and socioemotional outcomes in the MLSRA were comparable to, or slightly weaker than, those reported in the recent meta-analyses, whereas those in the SECCYD were weaker for these outcomes. Controlling for four demographic covariates, partial correlation coefficients between infant attachment and all socioemotional outcomes were r ≤ .10 to .15 in both samples. Compositing Strange Situations at ages 12 and 18 months did not substantively alter the predictive validity of the measure in the MLSRA, though a composite measure of three different early attachment measures in the SECCYD did increase predictive validity coefficients. Associations between infant attachment security and academic skills were unexpectedly comparable to (SECCYD) or larger than (MLSRA) those observed with respect to socioemotional outcomes.
Collapse
Affiliation(s)
- Marissa Nivison
- Institute of Child Development, University of Minnesota, Minneapolis, MN, USA
| | - Paul D. Caldo
- Institute of Child Development, University of Minnesota, Minneapolis, MN, USA
| | - Sophia W. Magro
- Institute of Child Development, University of Minnesota, Minneapolis, MN, USA
| | - K. Lee Raby
- Department of Psychology, University of Utah, Salt Lake City, UT, USA
| | | | | | | | | | | | - Jeffry A. Simpson
- Institute of Child Development, University of Minnesota, Minneapolis, MN, USA
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Glenn I. Roisman
- Institute of Child Development, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
25
|
Chén OY, Bodelet JS, Saraiva RG, Phan H, Di J, Nagels G, Schwantje T, Cao H, Gou J, Reinen JM, Xiong B, Zhi B, Wang X, de Vos M. The roles, challenges, and merits of the p value. PATTERNS (NEW YORK, N.Y.) 2023; 4:100878. [PMID: 38106615 PMCID: PMC10724370 DOI: 10.1016/j.patter.2023.100878] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Since the 18th century, the p value has been an important part of hypothesis-based scientific investigation. As statistical and data science engines accelerate, questions emerge: to what extent are scientific discoveries based on p values reliable and reproducible? Should one adjust the significance level or find alternatives for the p value? Inspired by these questions and everlasting attempts to address them, here, we provide a systematic examination of the p value from its roles and merits to its misuses and misinterpretations. For the latter, we summarize modest recommendations to handle them. In parallel, we present the Bayesian alternatives for seeking evidence and discuss the pooling of p values from multiple studies and datasets. Overall, we argue that the p value and hypothesis testing form a useful probabilistic decision-making mechanism, facilitating causal inference, feature selection, and predictive modeling, but that the interpretation of the p value must be contextual, considering the scientific question, experimental design, and statistical principles.
Collapse
Affiliation(s)
- Oliver Y. Chén
- Département Médecine de Laboratoire et Pathologie, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Faculté de Biologie et de Médecine, Université de Lausanne, Lausanne, Switzerland
| | - Julien S. Bodelet
- Département Médecine de Laboratoire et Pathologie, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Raúl G. Saraiva
- Department of Molecular Microbiology and Immunology, Johns Hopkins University, Baltimore, MD, USA
| | - Huy Phan
- Department of Computer Science, Queen Mary University of London, London, UK
| | - Junrui Di
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | - Guy Nagels
- St. Edmund Hall, University of Oxford, Oxford, UK
- Department of Neurology, Universitair Ziekenhuis Brussel, Vrije Universiteit Brussel, Jette, Belgium
| | - Tom Schwantje
- Department of Economics, University of Oxford, Oxford, UK
| | - Hengyi Cao
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Manhasset, NY, USA
- Division of Psychiatry Research, Zucker Hillside Hospital, Glen Oaks, NY, USA
| | - Jiangtao Gou
- Department of Mathematics and Statistics, Villanova University, Villanova, PA, USA
| | - Jenna M. Reinen
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
| | - Bin Xiong
- Department of Statistics, Northwestern University, Evanston, IL, USA
| | - Bangdong Zhi
- School of Business, University of Bristol, Bristol, UK
| | - Xiaojun Wang
- Birmingham Business School, University of Birmingham, Birmingham, UK
| | - Maarten de Vos
- Faculty of Engineering Science, KU Leuven, Leuven, Belgium
- Faculty of Medicine, KU Leuven, Leuven, Belgium
| |
Collapse
|
26
|
Sirois S, Brisson J, Blaser E, Calignano G, Donenfeld J, Hepach R, Hochmann JR, Kaldy Z, Liszkowski U, Mayer M, Ross-Sheehy S, Russo S, Valenza E. The pupil collaboration: A multi-lab, multi-method analysis of goal attribution in infants. Infant Behav Dev 2023; 73:101890. [PMID: 37944367 DOI: 10.1016/j.infbeh.2023.101890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 09/27/2023] [Accepted: 09/27/2023] [Indexed: 11/12/2023]
Abstract
The rise of pupillometry in infant research over the last decade is associated with a variety of methods for data preprocessing and analysis. Although pupil diameter is increasingly recognized as an alternative measure of the popular cumulative looking time approach used in many studies (Jackson & Sirois, 2022), an open question is whether the many approaches used to analyse this variable converge. To this end, we proposed a crowdsourced approach to pupillometry analysis. A dataset from 30 9-month-old infants (15 girls; Mage = 282.9 days, SD = 8.10) was provided to 7 distinct teams for analysis. The data were obtained from infants watching video sequences showing a hand, initially resting between two toys, grabbing one of them (after Woodward, 1998). After habituation, infants were shown (in random order) a sequence of four test events that varied target position and target toy. Results show that looking times reflect primarily the familiar path of the hand, regardless of target toy. Gaze data similarly show this familiarity effect of path. The pupil dilation analyses show that features of pupil baseline measures (duration and temporal location) as well as data retention variation (trial and/or participant) due to different inclusion criteria from the various analysis methods are linked to divergences in findings. Two of the seven teams found no significant findings, whereas the remaining five teams differ in the pattern of findings for main and interaction effects. The discussion proposes guidelines for best practice in the analysis of pupillometry data.
Collapse
Affiliation(s)
- Sylvain Sirois
- Département de Psychologie, Université du Québec à Trois-Rivières, Canada.
| | - Julie Brisson
- Centre de Recherche sur les fonctionnements et dysfonctionnements psychologiques (EA7475), Université de Rouen Normandie, France
| | - Erik Blaser
- Department of Psychology, University of Massachusetts Boston, USA
| | - Giulia Calignano
- Department of Developmental and Social Psychology, University of Padova, Italy
| | - Jamie Donenfeld
- Department of Psychology, University of Massachusetts Boston, USA
| | - Robert Hepach
- Department of Experimental Psychology, University of Oxford, UK
| | - Jean-Rémy Hochmann
- CNRS UMR5229 - Institut des Sciences Cognitives Marc Jeannerod, Université Lyon 1, France
| | - Zsuzsa Kaldy
- Department of Psychology, University of Massachusetts Boston, USA
| | - Ulf Liszkowski
- Department of Developmental Psychology, University of Hamburg, Germany
| | - Marlena Mayer
- Department of Developmental Psychology, University of Hamburg, Germany
| | | | - Sofia Russo
- Department of Developmental and Social Psychology, University of Padova, Italy
| | - Eloisa Valenza
- Department of Developmental and Social Psychology, University of Padova, Italy
| |
Collapse
|
27
|
Grebe NM, Eckardt W, Stoinski TS, Umuhoza R, Santymire RM, Rosenbaum S. An empirical comparison of several commercial enzyme immunoassays for the non-invasive assessment of adrenocortical and gonadal function in mountain gorillas. Gen Comp Endocrinol 2023; 342:114351. [PMID: 37532156 DOI: 10.1016/j.ygcen.2023.114351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 06/29/2023] [Accepted: 07/27/2023] [Indexed: 08/04/2023]
Abstract
Wildlife researchers seeking to non-invasively examine endocrine function in their study species are presented with a dense and technical 'garden of forking paths' to navigate between collecting a biological sample and obtaining a final measurement. In particular, the choice of which enzyme immunoassay (EIA) to use with collected fecal samples, out of the many options offered by different manufacturers and research laboratories, may be one of the most consequential for final results. However, guidance for making this decision is still emerging. With this gap in mind, we performed a head-to-head comparison of results obtained from four different EIAs for fecal glucocorticoid metabolites (FGCMs), and three different EIAs for fecal androgen metabolites (FAMs), applied to the same set of fecal samples collected from the mountain gorillas (Gorilla beringei beringei) monitored by the Dian Fossey Gorilla Fund in Volcanoes National Park, Rwanda. We provide a) an analytical validation of the different EIAs via tests of parallelism and linearity; b) an estimate of inter-assay correlation between EIA kits designed for the same metabolites; and c) a test of the kits' ecological validity, in which we examine how well each captures endocrine changes following events that theory predicts should result in elevated FGCM and/or FAM concentrations. Our results show that kits differ to some degree in their performance; at the same time, nearly all assays exhibited at least moderate evidence of validity and covariance with others for the same analyte. Our findings, which differ somewhat from similar comparisons performed in other species, demonstrate the need to directly assess assay performance in a species- and context-specific manner as part of efforts to develop the burgeoning discipline of wildlife endocrinology.
Collapse
Affiliation(s)
- Nicholas M Grebe
- Department of Anthropology, University of Michigan, Ann Arbor MI, United States.
| | | | | | | | - Rachel M Santymire
- Department of Biology, Georgia State University, Atlanta GA, United States
| | - Stacy Rosenbaum
- Department of Anthropology, University of Michigan, Ann Arbor MI, United States
| |
Collapse
|
28
|
Wiens S, Andersson A, Gravenfors J. Neural electrophysiological correlates of detection and identification awareness. COGNITIVE, AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2023; 23:1303-1321. [PMID: 37656374 PMCID: PMC10545648 DOI: 10.3758/s13415-023-01120-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/28/2023] [Indexed: 09/02/2023]
Abstract
Humans have conscious experiences of the events in their environment. Previous research from electroencephalography (EEG) has shown visual awareness negativity (VAN) at about 200 ms to be a neural correlate of consciousness (NCC). However, when considering VAN as an NCC, it is important to explore which particular experiences are associated with VAN. Recent research proposes that VAN is an NCC of lower-level experiences (detection) rather than higher-level experiences (identification). However, previous results are mixed and have several limitations. In the present study, the stimulus was a ring with a Gabor patch tilting either left or right. On each trial, subjects rated their awareness on a three-level perceptual awareness scale that captured both detection (something vs. nothing) and identification (identification vs. something). Separate staircases were used to adjust stimulus opacity to the detection threshold and the identification threshold. Bayesian linear mixed models provided extreme evidence (BF10 = 131) that VAN was stronger at the detection threshold than at the identification threshold. Mean VAN decreased from [Formula: see text]2.12 microV [[Formula: see text]2.86, [Formula: see text]1.42] at detection to [Formula: see text]0.46 microV [[Formula: see text]0.79, [Formula: see text]0.11] at identification. These results strongly support the claim that VAN is an NCC of lower-level experiences of seeing something rather than of higher-level experiences of specific properties of the stimuli. Thus, results are consistent with recurrent processing theory in that phenomenal visual consciousness is reflected by VAN. Further, results emphasize that it is important to consider the level of experience when searching for NCC.
Collapse
Affiliation(s)
- Stefan Wiens
- Department of Psychology, Stockholm University, Stockholm, Sweden.
| | - Annika Andersson
- Department of Psychology, Stockholm University, Stockholm, Sweden
| | | |
Collapse
|
29
|
Affiliation(s)
- David J Hunter
- From the Nuffield Department of Population Health (D.J.H.) and the Department of Statistics and Nuffield Department of Medicine (C.H.), University of Oxford, Oxford, and the Alan Turing Institute, London (C.H.) - both in the United Kingdom
| | - Christopher Holmes
- From the Nuffield Department of Population Health (D.J.H.) and the Department of Statistics and Nuffield Department of Medicine (C.H.), University of Oxford, Oxford, and the Alan Turing Institute, London (C.H.) - both in the United Kingdom
| |
Collapse
|
30
|
Spiess M, Jordan P. In models we trust: preregistration, large samples, and replication may not suffice. Front Psychol 2023; 14:1266447. [PMID: 37809287 PMCID: PMC10551181 DOI: 10.3389/fpsyg.2023.1266447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 09/04/2023] [Indexed: 10/10/2023] Open
Abstract
Despite discussions about the replicability of findings in psychological research, two issues have been largely ignored: selection mechanisms and model assumptions. Both topics address the same fundamental question: Does the chosen statistical analysis tool adequately model the data generation process? In this article, we address both issues and show, in a first step, that in the face of selective samples and contrary to common practice, the validity of inferences, even when based on experimental designs, can be claimed without further justification and adaptation of standard methods only in very specific situations. We then broaden our perspective to discuss consequences of violated assumptions in linear models in the context of psychological research in general and in generalized linear mixed models as used in item response theory. These types of misspecification are oftentimes ignored in the psychological research literature. It is emphasized that the above problems cannot be overcome by strategies such as preregistration, large samples, replications, or a ban on testing null hypotheses. To avoid biased conclusions, we briefly discuss tools such as model diagnostics, statistical methods to compensate for selectivity and semi- or non-parametric estimation. At a more fundamental level, however, a twofold strategy seems indispensable: (1) iterative, cumulative theory development based on statistical methods with theoretically justified assumptions, and (2) empirical research on variables that affect (self-) selection into the observed part of the sample and the use of this information to compensate for selectivity.
Collapse
Affiliation(s)
- Martin Spiess
- Institute of Psychology, Department of Psychology and Human Movement Science, University of Hamburg, Hamburg, Germany
| | | |
Collapse
|
31
|
Kimmel K, Avolio ML, Ferraro PJ. Empirical evidence of widespread exaggeration bias and selective reporting in ecology. Nat Ecol Evol 2023; 7:1525-1536. [PMID: 37537387 DOI: 10.1038/s41559-023-02144-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 06/28/2023] [Indexed: 08/05/2023]
Abstract
In many scientific disciplines, common research practices have led to unreliable and exaggerated evidence about scientific phenomena. Here we describe some of these practices and quantify their pervasiveness in recent ecology publications in five popular journals. In an analysis of over 350 studies published between 2018 and 2020, we detect empirical evidence of exaggeration bias and selective reporting of statistically significant results. This evidence implies that the published effect sizes in ecology journals exaggerate the importance of the ecological relationships that they aim to quantify. An exaggerated evidence base hinders the ability of empirical ecology to reliably contribute to science, policy, and management. To increase the credibility of ecology research, we describe a set of actions that ecologists should take, including changes to scientific norms about what high-quality ecology looks like and expectations about what high-quality studies can deliver.
Collapse
Affiliation(s)
- Kaitlin Kimmel
- Mad Agriculture, Boulder, CO, USA
- Department of Earth and Planetary Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - Meghan L Avolio
- Department of Earth and Planetary Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - Paul J Ferraro
- Carey Business School, Johns Hopkins University, Baltimore, MD, USA.
- Department of Environmental Health and Engineering, a joint department of the Bloomberg School of Public Health and the Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
32
|
Parker TH, Yang Y. Exaggerated effects in ecology. Nat Ecol Evol 2023; 7:1356-1357. [PMID: 37537386 DOI: 10.1038/s41559-023-02156-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Affiliation(s)
| | - Yefeng Yang
- Evolution & Ecology Research Centre, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
33
|
Nebe S, Reutter M, Baker DH, Bölte J, Domes G, Gamer M, Gärtner A, Gießing C, Gurr C, Hilger K, Jawinski P, Kulke L, Lischke A, Markett S, Meier M, Merz CJ, Popov T, Puhlmann LMC, Quintana DS, Schäfer T, Schubert AL, Sperl MFJ, Vehlen A, Lonsdorf TB, Feld GB. Enhancing precision in human neuroscience. eLife 2023; 12:e85980. [PMID: 37555830 PMCID: PMC10411974 DOI: 10.7554/elife.85980] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 07/23/2023] [Indexed: 08/10/2023] Open
Abstract
Human neuroscience has always been pushing the boundary of what is measurable. During the last decade, concerns about statistical power and replicability - in science in general, but also specifically in human neuroscience - have fueled an extensive debate. One important insight from this discourse is the need for larger samples, which naturally increases statistical power. An alternative is to increase the precision of measurements, which is the focus of this review. This option is often overlooked, even though statistical power benefits from increasing precision as much as from increasing sample size. Nonetheless, precision has always been at the heart of good scientific practice in human neuroscience, with researchers relying on lab traditions or rules of thumb to ensure sufficient precision for their studies. In this review, we encourage a more systematic approach to precision. We start by introducing measurement precision and its importance for well-powered studies in human neuroscience. Then, determinants for precision in a range of neuroscientific methods (MRI, M/EEG, EDA, Eye-Tracking, and Endocrinology) are elaborated. We end by discussing how a more systematic evaluation of precision and the application of respective insights can lead to an increase in reproducibility in human neuroscience.
Collapse
Affiliation(s)
- Stephan Nebe
- Zurich Center for Neuroeconomics, Department of Economics, University of ZurichZurichSwitzerland
| | - Mario Reutter
- Department of Psychology, Julius-Maximilians-UniversityWürzburgGermany
| | - Daniel H Baker
- Department of Psychology and York Biomedical Research Institute, University of YorkYorkUnited Kingdom
| | - Jens Bölte
- Institute for Psychology, University of Münster, Otto-Creuzfeldt Center for Cognitive and Behavioral NeuroscienceMünsterGermany
| | - Gregor Domes
- Department of Biological and Clinical Psychology, University of TrierTrierGermany
- Institute for Cognitive and Affective NeuroscienceTrierGermany
| | - Matthias Gamer
- Department of Psychology, Julius-Maximilians-UniversityWürzburgGermany
| | - Anne Gärtner
- Faculty of Psychology, Technische Universität DresdenDresdenGermany
| | - Carsten Gießing
- Biological Psychology, Department of Psychology, School of Medicine and Health Sciences, Carl von Ossietzky University of OldenburgOldenburgGermany
| | - Caroline Gurr
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, University Hospital, Goethe UniversityFrankfurtGermany
- Brain Imaging Center, Goethe UniversityFrankfurtGermany
| | - Kirsten Hilger
- Department of Psychology, Julius-Maximilians-UniversityWürzburgGermany
- Department of Psychology, Psychological Diagnostics and Intervention, Catholic University of Eichstätt-IngolstadtEichstättGermany
| | - Philippe Jawinski
- Department of Psychology, Humboldt-Universität zu BerlinBerlinGermany
| | - Louisa Kulke
- Department of Developmental with Educational Psychology, University of BremenBremenGermany
| | - Alexander Lischke
- Department of Psychology, Medical School HamburgHamburgGermany
- Institute of Clinical Psychology and Psychotherapy, Medical School HamburgHamburgGermany
| | - Sebastian Markett
- Department of Psychology, Humboldt-Universität zu BerlinBerlinGermany
| | - Maria Meier
- Department of Psychology, University of KonstanzKonstanzGermany
- University Psychiatric Hospitals, Child and Adolescent Psychiatric Research Department (UPKKJ), University of BaselBaselSwitzerland
| | - Christian J Merz
- Department of Cognitive Psychology, Institute of Cognitive Neuroscience, Faculty of Psychology, Ruhr University BochumBochumGermany
| | - Tzvetan Popov
- Department of Psychology, Methods of Plasticity Research, University of ZurichZurichSwitzerland
| | - Lara MC Puhlmann
- Leibniz Institute for Resilience ResearchMainzGermany
- Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
| | - Daniel S Quintana
- Max Planck Institute for Human Cognitive and Brain SciencesLeipzigGermany
- NevSom, Department of Rare Disorders & Disabilities, Oslo University HospitalOsloNorway
- KG Jebsen Centre for Neurodevelopmental Disorders, University of OsloOsloNorway
- Norwegian Centre for Mental Disorders Research (NORMENT), University of OsloOsloNorway
| | - Tim Schäfer
- Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, University Hospital, Goethe UniversityFrankfurtGermany
- Brain Imaging Center, Goethe UniversityFrankfurtGermany
| | | | - Matthias FJ Sperl
- Department of Clinical Psychology and Psychotherapy, University of GiessenGiessenGermany
- Center for Mind, Brain and Behavior, Universities of Marburg and GiessenGiessenGermany
| | - Antonia Vehlen
- Department of Biological and Clinical Psychology, University of TrierTrierGermany
| | - Tina B Lonsdorf
- Department of Systems Neuroscience, University Medical Center Hamburg-EppendorfHamburgGermany
- Department of Psychology, Biological Psychology and Cognitive Neuroscience, University of BielefeldBielefeldGermany
| | - Gordon B Feld
- Department of Clinical Psychology, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg UniversityMannheimGermany
- Department of Psychology, Heidelberg UniversityHeidelbergGermany
- Department of Addiction Behavior and Addiction Medicine, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg UniversityMannheimGermany
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg UniversityMannheimGermany
| |
Collapse
|
34
|
Cotton K, Sandry J, Ricker TJ. Secondary task engagement drives the McCabe effect in long-term memory. Mem Cognit 2023:10.3758/s13421-023-01450-2. [PMID: 37552382 DOI: 10.3758/s13421-023-01450-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/31/2023] [Indexed: 08/09/2023]
Abstract
Processing that occurs while information is held in working memory is critical in long-term retention of that information. One counterintuitive finding is that the concurrent processing required during complex span tasks typically impairs immediate memory, while also leading to improved delayed memory. One proposed mechanism for this effect is retrieval practice that occurs each time memory items are displaced to allow for concurrent processing during complex span tasks. Other research has instead suggested that increased free time during complex span procedures underlies this effect. In the present study, we presented participants with memory items in simple, complex, and slow span tasks and compared their performance on immediate and delayed memory tests. We found that how much a participant engaged with the secondary task of the complex span task corresponded with how strongly they exhibited a complex span boost on delayed memory performance. We also probed what participants were thinking about during the task, and found that participants' focus varied depending both on task type and secondary task engagement. The results support repeated retrieval as a key mechanism in the relationship between working memory processing and long-term retention. Further, the present study highlights the importance of variation in individual cognitive processing in predicting long-term outcomes even when objective conditions remain unchanged.
Collapse
Affiliation(s)
- Kelly Cotton
- Department of Psychology, The Graduate Center, City University of New York, 365 5th Ave, New York, NY, 10016, USA.
| | - Joshua Sandry
- Department of Psychology, Montclair State University, Montclair, NJ, USA
| | - Timothy J Ricker
- Department of Psychology, University of South Dakota, Vermillion, SD, USA
| |
Collapse
|
35
|
Thompson WH, Skau S. On the scope of scientific hypotheses. ROYAL SOCIETY OPEN SCIENCE 2023; 10:230607. [PMID: 37650069 PMCID: PMC10465209 DOI: 10.1098/rsos.230607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 08/04/2023] [Indexed: 09/01/2023]
Abstract
Hypotheses are frequently the starting point when undertaking the empirical portion of the scientific process. They state something that the scientific process will attempt to evaluate, corroborate, verify or falsify. Their purpose is to guide the types of data we collect, analyses we conduct, and inferences we would like to make. Over the last decade, metascience has advocated for hypotheses being in preregistrations or registered reports, but how to formulate these hypotheses has received less attention. Here, we argue that hypotheses can vary in specificity along at least three independent dimensions: the relationship, the variables, and the pipeline. Together, these dimensions form the scope of the hypothesis. We demonstrate how narrowing the scope of a hypothesis in any of these three ways reduces the hypothesis space and that this reduction is a type of novelty. Finally, we discuss how this formulation of hypotheses can guide researchers to formulate the appropriate scope for their hypotheses and should aim for neither too broad nor too narrow a scope. This framework can guide hypothesis-makers when formulating their hypotheses by helping clarify what is being tested, chaining results to previous known findings, and demarcating what is explicitly tested in the hypothesis.
Collapse
Affiliation(s)
- William Hedley Thompson
- Department of Applied Information Technology, University of Gothenburg, Gothenburg, Sweden
- Institute of Neuroscience and Physiology, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Simon Skau
- Department of Pedagogical, Curricular and Professional Studies, Faculty of Education, University of Gothenburg, Gothenburg, Sweden
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
36
|
Wiechert S, Loewy L, Wessel I, Fawcett JM, Ben-Shakhar G, Pertzov Y, Verschuere B. Suppression-induced forgetting: a pre-registered replication of the think/no-think paradigm. Memory 2023; 31:989-1002. [PMID: 37165713 DOI: 10.1080/09658211.2023.2208791] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 04/25/2023] [Indexed: 05/12/2023]
Abstract
Post-traumatic stress disorder is characterised by recurring memories of a traumatic experience despite deliberate attempts to forget (i.e., suppression). The Think/No-Think (TNT) task has been used widely in the laboratory to study suppression-induced forgetting. During the task, participants learn a series of cue-target word pairs. Subsequently, they are presented with a subset of the cue words and are instructed to think (respond items) or not think about the corresponding target (suppression items). Baseline items are not shown during this phase. Successful suppression-induced forgetting is indicated by the reduced recall of suppression compared to baseline items in recall tests using either the same or different cues than originally studied (i.e., same- and independent-probe tests, respectively). The current replication was a pre-registered collaborative effort to evaluate an online experimenter-present version of the paradigm in 150 English-speaking healthy individuals (89 females; MAge = 31.14, SDAge = 7.73). Overall, we did not replicate the suppression-induced forgetting effect (same-probe: BF01 = 7.84; d = 0.03 [95% CI: -0.13; 0.20]; independent-probe: BF01 = 5.71; d = 0.06 [95% CI: -0.12; 0.24]). These null results should be considered in light of our online implementation of the paradigm. Nevertheless, our findings call into question the robustness of suppression-induced forgetting.
Collapse
Affiliation(s)
- Sera Wiechert
- Department of Clinical Psychology, University of Amsterdam, Amsterdam, The Netherlands
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Leonie Loewy
- Department of Clinical Psychology, University of Amsterdam, Amsterdam, The Netherlands
| | - Ineke Wessel
- Department of Clinical Psychology and Experimental Psychopathology, University of Groningen, Groningen, The Netherlands
| | - Jonathan M Fawcett
- Department of Psychology, Memorial University of Newfoundland, St. John's, Canada
| | - Gershon Ben-Shakhar
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Yoni Pertzov
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Bruno Verschuere
- Department of Clinical Psychology, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
37
|
Axfors C, Patel CJ, Ioannidis JPA. Published registry-based pharmacoepidemiologic associations show limited concordance with agnostic medication-wide analyses. J Clin Epidemiol 2023; 160:33-45. [PMID: 37224981 DOI: 10.1016/j.jclinepi.2023.05.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 04/13/2023] [Accepted: 05/16/2023] [Indexed: 05/26/2023]
Abstract
OBJECTIVES To assess how the results of published national registry-based pharmacoepidemiology studies (where select associations are of interest) compare with an agnostic medication-wide approach (where all possible drug associations are tested). STUDY DESIGN AND SETTING We systematically searched for publications that reported drug associations with any, breast, colon/colorectal, or prostate cancer in the Swedish Prescribed Drug Registry. Results were compared against a previously performed agnostic medication-wide study on the same registry. PROTOCOL https://osf.io/kqj8n. RESULTS Most published studies (25/32) investigated previously reported associations. 421/913 (46%) associations had statistically significant results. 134 of the 162 unique drug-cancer associations could be paired with 70 associations in the agnostic study (corresponding drug categories and cancer types). Published studies reported smaller effect sizes and absolute effect sizes than the agnostic study, and generally used more adjustments. Agnostic analyses were less likely to report statistically significant protective associations (based on a multiplicity-corrected threshold) than their paired associations in published studies (McNemar odds ratio 0.13, P = 0.0022). Among 162 published associations, 36 (22%) showed increased risk signal and 25 (15%) protective signal at P < 0.05, while for agnostic associations, 237 (11%) showed increased risk signal and 108 (5%) protective signal at a multiplicity-corrected threshold. Associations belonging to drug categories targeted by individual published studies vs. nontargeted had smaller average effect sizes; smaller P values; and more frequent risk signals. CONCLUSION Published pharmacoepidemiology studies using a national registry addressed mostly previously proposed associations, were mostly "negative", and showed only modest concordance with their respective agnostic analyses in the same registry.
Collapse
Affiliation(s)
- Cathrine Axfors
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA; Department for Women's and Children's Health, Uppsala University, Uppsala, Sweden.
| | - Chirag J Patel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - John P A Ioannidis
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA; Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, Stanford University, Stanford, CA, USA
| |
Collapse
|
38
|
Korbmacher M, Azevedo F, Pennington CR, Hartmann H, Pownall M, Schmidt K, Elsherif M, Breznau N, Robertson O, Kalandadze T, Yu S, Baker BJ, O'Mahony A, Olsnes JØS, Shaw JJ, Gjoneska B, Yamada Y, Röer JP, Murphy J, Alzahawi S, Grinschgl S, Oliveira CM, Wingen T, Yeung SK, Liu M, König LM, Albayrak-Aydemir N, Lecuona O, Micheli L, Evans T. The replication crisis has led to positive structural, procedural, and community changes. COMMUNICATIONS PSYCHOLOGY 2023; 1:3. [PMID: 39242883 PMCID: PMC11290608 DOI: 10.1038/s44271-023-00003-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 05/22/2023] [Indexed: 09/09/2024]
Abstract
The emergence of large-scale replication projects yielding successful rates substantially lower than expected caused the behavioural, cognitive, and social sciences to experience a so-called 'replication crisis'. In this Perspective, we reframe this 'crisis' through the lens of a credibility revolution, focusing on positive structural, procedural and community-driven changes. Second, we outline a path to expand ongoing advances and improvements. The credibility revolution has been an impetus to several substantive changes which will have a positive, long-term impact on our research environment.
Collapse
Affiliation(s)
- Max Korbmacher
- Department of Health and Functioning, Western Norway University of Applied Sciences, Bergen, Norway
- NORMENT Centre for Psychosis Research, University of Oslo and Oslo University Hospital, Oslo, Norway
- Mohn Medical Imaging and Visualisation Center, Bergen, Norway
| | - Flavio Azevedo
- Department of Psychology, University of Cambridge, Cambridge, UK.
- Department of Social Psychology, University of Groningen, Groningen, The Netherlands.
| | | | - Helena Hartmann
- Department of Neurology, University of Essen, Essen, Germany
| | | | | | | | - Nate Breznau
- SOCIUM Research Center on Inequality and Social Policy, University of Bremen, Bremen, Germany
| | - Olly Robertson
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Tamara Kalandadze
- Department of Education, ICT and Learning, Ostfold University College, Halden, Norway
| | - Shijun Yu
- School of Psychology, University of Birmingham, Birmingham, UK
| | - Bradley J Baker
- Department of Sport and Recreation Management, Temple University, Philadelphia, USA
| | | | - Jørgen Ø-S Olsnes
- Kavli Institute for Systems Neuroscience, Norwegian University of Science and Technology, Trondheim, Norway
| | - John J Shaw
- Division of Psychology, De Montfort University, Leicester, UK
| | - Biljana Gjoneska
- Macedonian Academy of Sciences and Arts, Skopje, North Macedonia
| | - Yuki Yamada
- Faculty of Arts and Science, Kyushu University, Fukuoka, Japan
| | - Jan P Röer
- Department of Psychology and Psychotherapy, Witten/Herdecke University, Witten, Germany
| | - Jennifer Murphy
- Department of Applied Science, Technological University Dublin, Dublin, Ireland
| | - Shilaan Alzahawi
- Graduate School of Business, Stanford University, Standford, USA
| | | | | | - Tobias Wingen
- Institute of General Practice and Family Medicine, University of Bonn, Bonn, Germany
| | - Siu Kit Yeung
- Department of Psychology, Chinese University of Hong Kong, Hong Kong, China
| | - Meng Liu
- Faculty of Education, University of Cambridge, Cambridge, UK
| | - Laura M König
- Faculty of Life Sciences: Food, Nutrition and Health, University of Bayreuth, Bayreuth, Germany
| | - Nihan Albayrak-Aydemir
- Open Psychology Research Centre, Open University, Milton Keynes, UK
- Department of Psychological and Behavioural Science, London School of Economics and Political Science, London, UK
| | - Oscar Lecuona
- Department of Psychology, Universidad Rey Juan Carlos, Madrid, Spain
- Faculty of Psychology, Universidad Autónoma de Madrid, Madrid, Spain
| | - Leticia Micheli
- Institute of Psychology, Leiden University, Leiden, The Netherlands
| | - Thomas Evans
- School of Human Sciences, University of Greenwich, Greenwich, UK
- Institute for Lifecourse Development, University of Greenwich, Greenwich, UK
| |
Collapse
|
39
|
Taylor PA, Reynolds RC, Calhoun V, Gonzalez-Castillo J, Handwerker DA, Bandettini PA, Mejia AF, Chen G. Highlight results, don't hide them: Enhance interpretation, reduce biases and improve reproducibility. Neuroimage 2023; 274:120138. [PMID: 37116766 PMCID: PMC10233921 DOI: 10.1016/j.neuroimage.2023.120138] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 04/05/2023] [Accepted: 04/26/2023] [Indexed: 04/30/2023] Open
Abstract
Most neuroimaging studies display results that represent only a tiny fraction of the collected data. While it is conventional to present "only the significant results" to the reader, here we suggest that this practice has several negative consequences for both reproducibility and understanding. This practice hides away most of the results of the dataset and leads to problems of selection bias and irreproducibility, both of which have been recognized as major issues in neuroimaging studies recently. Opaque, all-or-nothing thresholding, even if well-intentioned, places undue influence on arbitrary filter values, hinders clear communication of scientific results, wastes data, is antithetical to good scientific practice, and leads to conceptual inconsistencies. It is also inconsistent with the properties of the acquired data and the underlying biology being studied. Instead of presenting only a few statistically significant locations and hiding away the remaining results, studies should "highlight" the former while also showing as much as possible of the rest. This is distinct from but complementary to utilizing data sharing repositories: the initial presentation of results has an enormous impact on the interpretation of a study. We present practical examples and extensions of this approach for voxelwise, regionwise and cross-study analyses using publicly available data that was analyzed previously by 70 teams (NARPS; Botvinik-Nezer, et al., 2020), showing that it is possible to balance the goals of displaying a full set of results with providing the reader reasonably concise and "digestible" findings. In particular, the highlighting approach sheds useful light on the kind of variability present among the NARPS teams' results, which is primarily a varied strength of agreement rather than disagreement. Using a meta-analysis built on the informative "highlighting" approach shows this relative agreement, while one using the standard "hiding" approach does not. We describe how this simple but powerful change in practice-focusing on highlighting results, rather than hiding all but the strongest ones-can help address many large concerns within the field, or at least to provide more complete information about them. We include a list of practical suggestions for results reporting to improve reproducibility, cross-study comparisons and meta-analyses.
Collapse
Affiliation(s)
- Paul A Taylor
- Scientific and Statistical Computing Core, NIMH, NIH, Bethesda, MD, USA.
| | | | - Vince Calhoun
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State, Georgia Tech, and Emory University, Atlanta, GA, USA
| | | | | | - Peter A Bandettini
- Section on Functional Imaging Methods, NIMH, NIH, Bethesda, MD, USA; Functional MRI Core Facility, NIMH, NIH, Bethesda, MD, USA
| | | | - Gang Chen
- Scientific and Statistical Computing Core, NIMH, NIH, Bethesda, MD, USA
| |
Collapse
|
40
|
Cintron DW, Gottlieb LM, Hagan E, Tan ML, Vlahov D, Glymour MM, Matthay EC. A quantitative assessment of the frequency and magnitude of heterogeneous treatment effects in studies of the health effects of social policies. SSM Popul Health 2023; 22:101352. [PMID: 36873266 PMCID: PMC9975308 DOI: 10.1016/j.ssmph.2023.101352] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/25/2023] [Accepted: 01/26/2023] [Indexed: 02/05/2023] Open
Abstract
Substantial heterogeneity in effects of social policies on health across subgroups may be common, but has not been systematically characterized. Using a sample of 55 contemporary studies on health effects of social policies, we recorded how often heterogeneous treatment effects (HTEs) were assessed, for what subgroups (e.g., male, female), and the subgroup-specific effect estimates expressed as Standardized Mean Differences (SMDs). For each study, outcome, and dimension (e.g., gender), we fit a random-effects meta-analysis. We characterized the magnitude of heterogeneity in policy effects using the standard deviation of the subgroup-specific effect estimates (τ). Among the 44% of studies reporting subgroup-specific estimates, policy effects were generally small (<0.1 SMDs) with mixed impacts on health (67% beneficial) and disparities (50% implied narrowing of disparities). Across study-outcome-dimensions, 54% indicated any heterogeneity in effects, and 20% had τ > 0.1 SMDs. For 26% of study-outcome-dimensions, the magnitude of τ indicated that effects of opposite signs were plausible across subgroups. Heterogeneity was more common in policy effects not specified a priori. Our findings suggest social policies commonly have heterogeneous effects on health of different populations; these HTEs may substantially impact disparities. Studies of social policies and health should routinely evaluate HTEs.
Collapse
Affiliation(s)
- Dakota W Cintron
- Center for Health and Community, University of California, San Francisco, 3333 California St., Suite 465, Campus Box 0844, San Francisco, CA, 94143, USA.,Department of Epidemiology and Biostatistics, University of California, San Francisco, 550 16th Street, 2nd Floor, Campus Box 0560, San Francisco, CA, 94143, USA
| | - Laura M Gottlieb
- Center for Health and Community, University of California, San Francisco, 3333 California St., Suite 465, Campus Box 0844, San Francisco, CA, 94143, USA
| | - Erin Hagan
- Center for Health and Community, University of California, San Francisco, 3333 California St., Suite 465, Campus Box 0844, San Francisco, CA, 94143, USA
| | - May Lynn Tan
- Center for Health and Community, University of California, San Francisco, 3333 California St., Suite 465, Campus Box 0844, San Francisco, CA, 94143, USA
| | - David Vlahov
- Yale School of Nursing at Yale University, 400 West Campus Drive, Room 32306, Orange, CT, 06477, USA
| | - M Maria Glymour
- Center for Health and Community, University of California, San Francisco, 3333 California St., Suite 465, Campus Box 0844, San Francisco, CA, 94143, USA.,Department of Epidemiology and Biostatistics, University of California, San Francisco, 550 16th Street, 2nd Floor, Campus Box 0560, San Francisco, CA, 94143, USA
| | - Ellicott C Matthay
- Center for Opioid Epidemiology and Policy, Division of Epidemiology, Department of Population Health, New York University School of Medicine, 180 Madison Ave, New York, NY, 10016, USA
| |
Collapse
|
41
|
Kozlowski AC, Van Gunten TS. Are economists overconfident? Ideology and uncertainty in expert opinion. THE BRITISH JOURNAL OF SOCIOLOGY 2023; 74:476-500. [PMID: 36792913 DOI: 10.1111/1468-4446.13001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 01/12/2023] [Accepted: 01/24/2023] [Indexed: 06/07/2023]
Abstract
Economics frequently serves as an advisory discipline to policymakers, bolstered in part by its claims to a unified intellectual framework and high disciplinary consensus. Recent research challenges this perspective, providing empirical evidence that economists' professional opinions are divided by ideological commitments to either free markets on one hand or state intervention on the other. We investigate the influence of ideology in economics by examining the relation between economists' ideological commitments and the certainty with which they express their expert opinions. To examine this relationship, we analyze data from the Initiative on Global Markets Economic Experts Panel, a unique survey of 51 economists at seven elite American universities. Our results suggest that economists with ideologically patterned views report higher levels of certainty in their opinions than their less ideologically consistent peers, but this boost in confidence is limited to topics that closely pertain to the free market versus interventionism divide.
Collapse
|
42
|
Bansal R, Peterson BS. Geometry-derived statistical significance: A probabilistic framework for detecting true positive findings in MRI data. Brain Behav 2023; 13:e2865. [PMID: 36869597 PMCID: PMC10097156 DOI: 10.1002/brb3.2865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 11/29/2022] [Accepted: 12/06/2022] [Indexed: 03/05/2023] Open
Abstract
INTRODUCTION The false discovery rate (FDR) procedure does not incorporate the geometry of the random field and requires high statistical power at each voxel, a requirement not satisfied by the limited number of participants in imaging studies. Topological FDR, threshold free cluster enhancement (TFCE), and probabilistic TFCE improve statistical power by incorporating local geometry. However, topological FDR requires specifying a cluster defining threshold and TFCE requires specifying transformation weights. METHODS Geometry-derived statistical significance (GDSS) procedure overcomes these limitations by combining voxelwise p-values for the test statistic with the probabilities computed from the local geometry for the random field, thereby providing substantially greater statistical power than the procedures currently used to control for multiple comparisons. We use synthetic data and real-world data to compare its performance against the performance of these other, previously developed procedures. RESULTS GDSS provided substantially greater statistical power relative to the comparator procedures, which was less variable to the number of participants. GDSS was more conservative than TFCE: that is, it rejected null hypotheses at voxels with much higher effect sizes than TFCE. Our experiments also showed that the Cohen's D effect size decreases as the number of participants increases. Therefore, sample size calculations from small studies may underestimate the participants required in larger studies. Our findings also suggest effect size maps should be presented along with p-value maps for correct interpretation of findings. CONCLUSIONS GDSS compared with the other procedures provides considerably greater statistical power for detecting true positives while limiting false positives, especially in small sized (<40 participants) imaging cohorts.
Collapse
Affiliation(s)
- Ravi Bansal
- Institute for the Developing MindChildren's Hospital Los AngelesCaliforniaUSA
- Department of Pediatrics and PsychiatryKeck School of Medicine at the University of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Bradley S. Peterson
- Institute for the Developing MindChildren's Hospital Los AngelesCaliforniaUSA
- Department of PsychiatryKeck School of Medicine at the University of Southern CaliforniaLos AngelesCaliforniaUSA
| |
Collapse
|
43
|
Bouchard TJ. The Garden of Forking Paths; An Evaluation of Joseph's 'A Reevaluation of the 1990 "Minnesota Study of Twins Reared Apart" IQ Study'. Twin Res Hum Genet 2023; 26:133-142. [PMID: 37272376 DOI: 10.1017/thg.2023.19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Joseph has written what purports to be a refutation of studies of Twins Reared-Apart (TRAs) with a singular focus on the Minnesota Study of Twins Reared-Apart (MISTRA). I show, in detail, that (a) his criticisms of previous TRA studies depend on sources that were discredited prior to MISTRA, as they all failed the test of replicability, (b) the list of biases he uses to invalidate MISTRA do not support his arguments, (c) the accusations of questionable research practices are unsubstantiated, (d) his claim that MISTRA should be evaluated in the context of psychology's replication crisis is refuted. The TRA studies are constructive replications. Like many other scholars, past and present, he has been misled by the variation introduced by small samples (sampling error) and the distortion created by walking in the garden of forking paths. His endeavor is a concatenation of elision and erroneous statistical/scientific reasoning.
Collapse
Affiliation(s)
- Thomas J Bouchard
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
44
|
Ernst NA, Baldassarre MT. Registered reports in software engineering. EMPIRICAL SOFTWARE ENGINEERING 2023; 28:55. [PMID: 36937703 PMCID: PMC10006549 DOI: 10.1007/s10664-022-10277-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 12/14/2022] [Indexed: 06/18/2023]
Abstract
Registered reports are scientific publications which begin the publication process by first having the detailed research protocol, including key research questions, reviewed and approved by peers. Subsequent analysis and results are published with minimal additional review, even if there was no clear support for the underlying hypothesis, as long as the approved protocol is followed. Registered reports can prevent several questionable research practices and give early feedback on research designs. In software engineering research, registered reports were first introduced in the International Conference on Mining Software Repositories (MSR) in 2020. They are now established in three conferences and two pre-eminent journals, including this one (EMSE). We explain the motivation for registered reports, outline the way they have been implemented in software engineering, and outline some ongoing challenges for addressing high quality software engineering research.
Collapse
Affiliation(s)
- Neil A. Ernst
- Department of Computer Science, University of Victoria, Victoria, BC Canada
| | | |
Collapse
|
45
|
Schütz LM, Betsch T, Plessner H, Schweizer G. The impact of physical load on duration estimation in sport. PSYCHOLOGY OF SPORT AND EXERCISE 2023; 65:102368. [PMID: 37665840 DOI: 10.1016/j.psychsport.2022.102368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 11/04/2022] [Accepted: 12/15/2022] [Indexed: 09/06/2023]
Abstract
We investigated whether physical load has an influence on the accuracy of duration estimation of sporting activities presented in real time and slow motion. 86 participants were studied in two single sessions of 45 min each. Our results showed no general effects for physical load, when comparing physical load versus rest. However, we could replicate findings of past research (Schütz et al., 2021), showing that the duration of sports performance is estimated more accurately when presented in real time compared to slow motion. Further we found, that under physical load, participants perceiving the physical exercise as hard (RPE ≥15) estimated time significantly shorter and more accurately compared to participants perceiving the physical exercise as light or moderate (RPE <15). Thus, our results suggest that using slow motion may worsen the assessment of sports performance. Additionally, we could show that intense physical exertion contributes to reducing the overestimation of time.
Collapse
Affiliation(s)
- Lisa-Marie Schütz
- Heidelberg University, Institute of Sports and Sports Sciences, Im Neuenheimer Feld 720, 69120, Heidelberg, Germany.
| | - Tilmann Betsch
- University of Erfurt, Social, Organizational, and Economic Psychology, Germany.
| | - Henning Plessner
- Heidelberg University, Institute of Sports and Sports Sciences, Im Neuenheimer Feld 720, 69120, Heidelberg, Germany.
| | - Geoffrey Schweizer
- Heidelberg University, Institute of Sports and Sports Sciences, Im Neuenheimer Feld 720, 69120, Heidelberg, Germany.
| |
Collapse
|
46
|
Mokros A, Habermeyer E, Poeppl TB, Santtila P, Ziogas A. Tower of Babel or Lighthouse? The State of Research on Neuroelectric Correlates of Human Sexuality: A Response to the Commentaries. ARCHIVES OF SEXUAL BEHAVIOR 2023; 52:611-615. [PMID: 36481834 PMCID: PMC9886606 DOI: 10.1007/s10508-022-02496-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 11/15/2022] [Accepted: 11/16/2022] [Indexed: 06/17/2023]
Affiliation(s)
- Andreas Mokros
- Faculty of Psychology, FernUniversität in Hagen, 58084, Hagen, Germany.
| | - Elmar Habermeyer
- Department of Forensic Psychiatry, University Hospital of Psychiatry Zurich, Zurich, Switzerland
| | - Timm B Poeppl
- Department of Psychiatry and Psychotherapy, University of Regensburg, Regensburg, Germany
| | - Pekka Santtila
- Institute for Social Development, New York University Shanghai, Shanghai, China
| | | |
Collapse
|
47
|
On p-Values and Statistical Significance. J Clin Med 2023; 12:jcm12030900. [PMID: 36769547 PMCID: PMC9917591 DOI: 10.3390/jcm12030900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 01/20/2023] [Indexed: 01/24/2023] Open
Abstract
At the beginning of our research training, we learned about hypothesis testing, p-values, and statistical inference [...].
Collapse
|
48
|
De A. Statistical Considerations and Challenges for Pivotal Clinical Studies of Artificial Intelligence Medical Tests for Widespread Use: Opportunities for Inter-Disciplinary Collaboration. Stat Biopharm Res 2023. [DOI: 10.1080/19466315.2023.2169752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Arkendra De
- Agilent Technologies, 1005 Mark Avenue, Carpinteria, CA 93013, Tel: 408-553-7111,
| |
Collapse
|
49
|
Ullmann T, Peschel S, Finger P, Müller CL, Boulesteix AL. Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering. PLoS Comput Biol 2023; 19:e1010820. [PMID: 36608142 PMCID: PMC9873197 DOI: 10.1371/journal.pcbi.1010820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 01/24/2023] [Accepted: 12/15/2022] [Indexed: 01/07/2023] Open
Abstract
In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the "best" ones. However, if only the best results are selectively reported, this may cause over-optimism: the "best" method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the "best" method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.
Collapse
Affiliation(s)
- Theresa Ullmann
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig-Maximilians-Universität München, München, Germany
- Munich Center for Machine Learning (MCML), München, Germany
- * E-mail:
| | - Stefanie Peschel
- Institute for Asthma and Allergy Prevention, Helmholtz Zentrum München, Neuherberg, Germany
- Department of Statistics, Ludwig-Maximilians-Universität München, München, Germany
| | - Philipp Finger
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig-Maximilians-Universität München, München, Germany
| | - Christian L. Müller
- Department of Statistics, Ludwig-Maximilians-Universität München, München, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Computational Mathematics, Flatiron Institute, New York, New York, United States of America
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig-Maximilians-Universität München, München, Germany
- Munich Center for Machine Learning (MCML), München, Germany
| |
Collapse
|
50
|
Hardwicke TE, Wagenmakers EJ. Reducing bias, increasing transparency and calibrating confidence with preregistration. Nat Hum Behav 2023; 7:15-26. [PMID: 36707644 DOI: 10.1038/s41562-022-01497-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 11/09/2022] [Indexed: 01/29/2023]
Abstract
Flexibility in the design, analysis and interpretation of scientific studies creates a multiplicity of possible research outcomes. Scientists are granted considerable latitude to selectively use and report the hypotheses, variables and analyses that create the most positive, coherent and attractive story while suppressing those that are negative or inconvenient. This creates a risk of bias that can lead to scientists fooling themselves and fooling others. Preregistration involves declaring a research plan (for example, hypotheses, design and statistical analyses) in a public registry before the research outcomes are known. Preregistration (1) reduces the risk of bias by encouraging outcome-independent decision-making and (2) increases transparency, enabling others to assess the risk of bias and calibrate their confidence in research outcomes. In this Perspective, we briefly review the historical evolution of preregistration in medicine, psychology and other domains, clarify its pragmatic functions, discuss relevant meta-research, and provide recommendations for scientists and journal editors.
Collapse
Affiliation(s)
- Tom E Hardwicke
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands.
| | | |
Collapse
|