1
|
Replication of null results: Absence of evidence or evidence of absence? eLife 2024; 12:RP92311. [PMID: 38739437 PMCID: PMC11090505 DOI: 10.7554/elife.92311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2024] Open
Abstract
In several large-scale replication projects, statistically non-significant results in both the original and the replication study have been interpreted as a 'replication success.' Here, we discuss the logical problems with this approach: Non-significance in both studies does not ensure that the studies provide evidence for the absence of an effect and 'replication success' can virtually always be achieved if the sample sizes are small enough. In addition, the relevant error rates are not controlled. We show how methods, such as equivalence testing and Bayes factors, can be used to adequately quantify the evidence for the absence of an effect and how they can be applied in the replication setting. Using data from the Reproducibility Project: Cancer Biology, the Experimental Philosophy Replicability Project, and the Reproducibility Project: Psychology we illustrate that many original and replication studies with 'null results' are in fact inconclusive. We conclude that it is important to also replicate studies with statistically non-significant results, but that they should be designed, analyzed, and interpreted appropriately.
Collapse
|
2
|
Using Bayesian statistics in confirmatory clinical trials in the regulatory setting: a tutorial review. BMC Med Res Methodol 2024; 24:110. [PMID: 38714936 PMCID: PMC11077897 DOI: 10.1186/s12874-024-02235-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 04/24/2024] [Indexed: 05/12/2024] Open
Abstract
Bayesian statistics plays a pivotal role in advancing medical science by enabling healthcare companies, regulators, and stakeholders to assess the safety and efficacy of new treatments, interventions, and medical procedures. The Bayesian framework offers a unique advantage over the classical framework, especially when incorporating prior information into a new trial with quality external data, such as historical data or another source of co-data. In recent years, there has been a significant increase in regulatory submissions using Bayesian statistics due to its flexibility and ability to provide valuable insights for decision-making, addressing the modern complexity of clinical trials where frequentist trials are inadequate. For regulatory submissions, companies often need to consider the frequentist operating characteristics of the Bayesian analysis strategy, regardless of the design complexity. In particular, the focus is on the frequentist type I error rate and power for all realistic alternatives. This tutorial review aims to provide a comprehensive overview of the use of Bayesian statistics in sample size determination, control of type I error rate, multiplicity adjustments, external data borrowing, etc., in the regulatory environment of clinical trials. Fundamental concepts of Bayesian sample size determination and illustrative examples are provided to serve as a valuable resource for researchers, clinicians, and statisticians seeking to develop more complex and innovative designs.
Collapse
|
3
|
Power priors for replication studies. TEST-SPAIN 2023; 33:127-154. [PMID: 38585622 PMCID: PMC10991061 DOI: 10.1007/s11749-023-00888-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 08/31/2023] [Indexed: 04/09/2024]
Abstract
The ongoing replication crisis in science has increased interest in the methodology of replication studies. We propose a novel Bayesian analysis approach using power priors: The likelihood of the original study's data is raised to the power of α , and then used as the prior distribution in the analysis of the replication data. Posterior distribution and Bayes factor hypothesis tests related to the power parameter α quantify the degree of compatibility between the original and replication study. Inferences for other parameters, such as effect sizes, dynamically borrow information from the original study. The degree of borrowing depends on the conflict between the two studies. The practical value of the approach is illustrated on data from three replication studies, and the connection to hierarchical modeling approaches explored. We generalize the known connection between normal power priors and normal hierarchical models for fixed parameters and show that normal power prior inferences with a beta prior on the power parameter α align with normal hierarchical model inferences using a generalized beta prior on the relative heterogeneity variance I 2 . The connection illustrates that power prior modeling is unnatural from the perspective of hierarchical modeling since it corresponds to specifying priors on a relative rather than an absolute heterogeneity scale.
Collapse
|
4
|
Bayesian hypothesis testing of mediation: Methods and the impact of prior odds specifications. Behav Res Methods 2023; 55:1108-1120. [PMID: 35581435 DOI: 10.3758/s13428-022-01860-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/11/2022] [Indexed: 11/08/2022]
Abstract
Mediation analysis is widely used to study whether the effect of an independent variable on an outcome is transmitted through a mediator. Bayesian methods have become increasingly popular for mediation analysis. However, limited research has been done on formal Bayesian hypothesis testing of mediation. Although hypothesis testing using Bayes factor for a single path is readily available, how to integrate the Bayes factors of two paths (from input to mediator and from mediator to outcome) while incorporating prior beliefs on the two paths and/or mediation is under-studied. In the current study, we propose a general approach to Bayesian hypothesis testing of mediation. The proposed approach allows researchers to specify prior odds based on the substantive research context and can be used in mediation modeling with latent variables. The impact of prior odds specifications on Bayesian hypothesis test of mediation is demonstrated via both real and hypothetical data examples. Both R functions and a user-friendly R web app are provided for the implementation of the proposed approach. Our study can add to researchers' toolbox of mediation analysis and raise researchers' awareness of the importance of prior odds specifications in Bayesian hypothesis testing of mediation.
Collapse
|
5
|
How to Choose between Different Bayesian Posterior Indices for Hypothesis Testing in Practice. MULTIVARIATE BEHAVIORAL RESEARCH 2023; 58:160-188. [PMID: 34582284 DOI: 10.1080/00273171.2021.1967716] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Hypothesis testing is an essential statistical method in experimental psychology and the cognitive sciences. The problems of traditional null hypothesis significance testing (NHST) have been discussed widely, and among the proposed solutions to the replication problems caused by the inappropriate use of significance tests and p-values is a shift toward Bayesian data analysis. However, Bayesian hypothesis testing is concerned with various posterior indices for significance and the size of an effect. This complicates Bayesian hypothesis testing in practice, as the availability of multiple Bayesian alternatives to the traditional p-value causes confusion which one to select and why. In this paper, various Bayesian posterior indices which have been proposed in the literature are compared and their benefits and limitations are discussed. The comparison shows that conceptually not all proposed Bayesian alternatives to NHST and p-values are beneficial, and the usefulness of some indices strongly depends on the study design and research goal. However, the comparison also reveals that there exist at least two candidates among the available Bayesian posterior indices which have appealing theoretical properties and are widely underused in the cognitive sciences.
Collapse
|
6
|
The evidence interval and the Bayesian evidence value: On a unified theory for Bayesian hypothesis testing and interval estimation. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2022; 75:550-592. [PMID: 36200811 DOI: 10.1111/bmsp.12267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 01/06/2022] [Indexed: 06/16/2023]
Abstract
Interval estimation is one of the most frequently used methods in statistical science, employed to provide a range of credible values a parameter is located in after taking into account the uncertainty in the data. However, while this interpretation only holds for Bayesian interval estimates, these suffer from two problems. First, Bayesian interval estimates can include values which have not been corroborated by observing the data. Second, Bayesian interval estimates and hypothesis tests can yield contradictory conclusions. In this paper a new theory for Bayesian hypothesis testing and interval estimation is presented. A new interval estimate is proposed, the Bayesian evidence interval, which is inspired by the Pereira-Stern theory of the full Bayesian significance test (FBST). It is shown that the evidence interval is a generalization of existing Bayesian interval estimates, that it solves the problems of standard Bayesian interval estimates and that it unifies Bayesian hypothesis testing and parameter estimation. The Bayesian evidence value is introduced, which quantifies the evidence for the (interval) null and alternative hypothesis. Based on the evidence interval and the evidence value, the (full) Bayesian evidence test (FBET) is proposed as a new, model-independent Bayesian hypothesis test. Additionally, a decision rule for hypothesis testing is derived which shows the relationship to a widely used decision rule based on the region of practical equivalence and Bayesian highest posterior density intervals and to the e-value in the FBST. In summary, the proposed method is universally applicable, computationally efficient, and while the evidence interval can be seen as an extension of existing Bayesian interval estimates, the FBET is a generalization of the FBST and contains it as a special case. Together, the theory developed provides a unification of Bayesian hypothesis testing and interval estimation and is made available in the R package fbst.
Collapse
|
7
|
Influence of background preprocessing on the performance of deep learning retinal vessel detection. J Med Imaging (Bellingham) 2021; 8:064001. [PMID: 34746333 PMCID: PMC8562352 DOI: 10.1117/1.jmi.8.6.064001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/18/2021] [Indexed: 11/14/2022] Open
Abstract
Purpose: Segmentation of the vessel tree from retinal fundus images can be used to track changes in the retina and be an important first step in a diagnosis. Manual segmentation is a time-consuming process that is prone to error; effective and reliable automation can alleviate these problems but one of the difficulties is uneven image background, which may affect segmentation performance. Approach: We present a patch-based deep learning framework, based on a modified U-Net architecture, that automatically segments the retinal blood vessels from fundus images. In particular, we evaluate how various pre-processing techniques, images with either no processing, N4 bias field correction, contrast limited adaptive histogram equalization (CLAHE), or a combination of N4 and CLAHE, can compensate for uneven image background and impact final segmentation performance. Results: We achieved competitive results on three publicly available datasets as a benchmark for our comparisons of pre-processing techniques. In addition, we introduce Bayesian statistical testing, which indicates little practical difference ( Pr > 0.99 ) between pre-processing methods apart from the sensitivity metric. In terms of sensitivity and pre-processing, the combination of N4 correction and CLAHE performs better in comparison to unprocessed and N4 pre-processing ( Pr > 0.87 ); but compared to CLAHE alone, the differences are not significant ( Pr ≈ 0.38 to 0.88). Conclusions: We conclude that deep learning is an effective method for retinal vessel segmentation and that CLAHE pre-processing has the greatest positive impact on segmentation performance, with N4 correction helping only in images with extremely inhomogeneous background illumination.
Collapse
|
8
|
Abstract
Measures of association play a central role in the social sciences to quantify the strength of a linear relationship between the variables of interest. In many applications researchers can translate scientific expectations to hypotheses with equality and/or order constraints on these measures of association. In this paper a Bayes factor test is proposed for testing multiple hypotheses with constraints on the measures of association between ordinal and/or continuous variables, possibly after correcting for certain covariates. This test can be used to obtain a direct answer to the research question how much evidence there is in the data for a social science theory relative to competing theories. The stand-alone software package 'BCT' allows users to apply the methodology in an easy manner. The methodology will also be available in the R package 'BFpack'. An empirical application from leisure studies about the associations between life, leisure and relationship satisfaction and an application about the differences about egalitarian justice beliefs across countries are used to illustrate the methodology.
Collapse
|
9
|
fbst: An R package for the Full Bayesian Significance Test for testing a sharp null hypothesis against its alternative via the e value. Behav Res Methods 2021; 54:1114-1130. [PMID: 34471963 PMCID: PMC9170675 DOI: 10.3758/s13428-021-01613-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/03/2021] [Indexed: 12/23/2022]
Abstract
Hypothesis testing is a central statistical method in psychology and the cognitive sciences. However, the problems of null hypothesis significance testing (NHST) and p values have been debated widely, but few attractive alternatives exist. This article introduces the fbst R package, which implements the Full Bayesian Significance Test (FBST) to test a sharp null hypothesis against its alternative via the e value. The statistical theory of the FBST has been introduced more than two decades ago and since then the FBST has shown to be a Bayesian alternative to NHST and p values with both theoretical and practical highly appealing properties. The algorithm provided in the fbst package is applicable to any Bayesian model as long as the posterior distribution can be obtained at least numerically. The core function of the package provides the Bayesian evidence against the null hypothesis, the e value. Additionally, p values based on asymptotic arguments can be computed and rich visualizations for communication and interpretation of the results can be produced. Three examples of frequently used statistical procedures in the cognitive sciences are given in this paper, which demonstrate how to apply the FBST in practice using the fbst package. Based on the success of the FBST in statistical science, the fbst package should be of interest to a broad range of researchers and hopefully will encourage researchers to consider the FBST as a possible alternative when conducting hypothesis tests of a sharp null hypothesis.
Collapse
|
10
|
An Elementary Introduction to Information Geometry. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E1100. [PMID: 33286868 PMCID: PMC7650632 DOI: 10.3390/e22101100] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 09/25/2020] [Accepted: 09/27/2020] [Indexed: 11/17/2022]
Abstract
In this survey, we describe the fundamental differential-geometric structures of information manifolds, state the fundamental theorem of information geometry, and illustrate some use cases of these information manifolds in information sciences. The exposition is self-contained by concisely introducing the necessary concepts of differential geometry. Proofs are omitted for brevity.
Collapse
|
11
|
Simulation data for the analysis of Bayesian posterior significance and effect size indices for the two-sample t-test to support reproducible medical research. BMC Res Notes 2020; 13:452. [PMID: 32962722 PMCID: PMC7510139 DOI: 10.1186/s13104-020-05291-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 09/12/2020] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVES The data presented herein represents the simulated datasets of a recently conducted larger study which investigated the behaviour of Bayesian indices of significance and effect size as alternatives to traditional p-values. The study considered the setting of Student's and Welch's two-sample t-test often used in medical research. It investigated the influence of the sample size, noise, the selected prior hyperparameters and the sensitivity to type I errors. The posterior indices used included the Bayes factor, the region of practical equivalence, the probability of direction, the MAP-based p-value and the e-value in the Full Bayesian Significance Test. The simulation study was conducted in the statistical programming language R. DATA DESCRIPTION The R script files for simulation of the datasets used in the study are presented in this article. These script files can both simulate the raw datasets and run the analyses. As researchers may be faced with different effect sizes, noise levels or priors in their domain than the ones studied in the original paper, the scripts extend the original results by allowing to recreate all analyses of interest in different contexts. Therefore, they should be relevant to other researchers.
Collapse
|
12
|
Bayesian alternatives to null hypothesis significance testing in biomedical research: a non-technical introduction to Bayesian inference with JASP. BMC Med Res Methodol 2020; 20:142. [PMID: 32503439 PMCID: PMC7275319 DOI: 10.1186/s12874-020-00980-6] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 04/16/2020] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Although null hypothesis significance testing (NHST) is the agreed gold standard in medical decision making and the most widespread inferential framework used in medical research, it has several drawbacks. Bayesian methods can complement or even replace frequentist NHST, but these methods have been underutilised mainly due to a lack of easy-to-use software. JASP is an open-source software for common operating systems, which has recently been developed to make Bayesian inference more accessible to researchers, including the most common tests, an intuitive graphical user interface and publication-ready output plots. This article provides a non-technical introduction to Bayesian hypothesis testing in JASP by comparing traditional tests and statistical methods with their Bayesian counterparts. RESULTS The comparison shows the strengths and limitations of JASP for frequentist NHST and Bayesian inference. Specifically, Bayesian hypothesis testing via Bayes factors can complement and even replace NHST in most situations in JASP. While p-values can only reject the null hypothesis, the Bayes factor can state evidence for both the null and the alternative hypothesis, making confirmation of hypotheses possible. Also, effect sizes can be precisely estimated in the Bayesian paradigm via JASP. CONCLUSIONS Bayesian inference has not been widely used by now due to the dearth of accessible software. Medical decision making can be complemented by Bayesian hypothesis testing in JASP, providing richer information than single p-values and thus strengthening the credibility of an analysis. Through an easy point-and-click interface researchers used to other graphical statistical packages like SPSS can seemlessly transition to JASP and benefit from the listed advantages with only few limitations.
Collapse
|
13
|
Creative Flexibility Performance Is Neither Related to Anxiety, Nor to Self-Control Strength, Nor to Their Interaction. Front Psychol 2019; 10:1999. [PMID: 31551865 PMCID: PMC6748354 DOI: 10.3389/fpsyg.2019.01999] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 08/15/2019] [Indexed: 12/18/2022] Open
Abstract
Previous research has reliably found that self-control strength moderates the anxiety-performance relationship for cognitive and perceptual-motor tasks that involve executive functioning. In the present preregistered experiment (N = 200; https://aspredicted.org/a775h.pdf), we investigated whether the interaction of anxiety and self-control also predicts creative flexibility performance. According to the Attentional Control Theory, anxiety can impair executive functioning. In the case that creative flexibility relies on executive functions, anxiety should therefore interfere with creative flexibility performance. However, self-control strength has been demonstrated to serve as a buffer against the negative effects of anxiety on executive functioning. Therefore, we assumed that there will be a negative relationship between anxiety and creative flexibility performance, and that this negative relationship would be more pronounced for participants who are low compared to high in momentary self-control strength. Analogous to the previous studies, we manipulated the participants’ self-control strength (ego depletion vs. no depletion) and subsequently induced a potentially threatening test situation. The participants then completed a measure of their state anxiety and a standardized test of creative flexibility. Contrary to our expectation, self-control strength, state anxiety, and their interaction did not predict creative flexibility performance. Complementary Bayesian hypothesis testing revealed strong support for the null hypothesis. Therefore, we conclude that, at least under certain conditions, creative flexibility performance may be unrelated to resource-dependent executive functions.
Collapse
|
14
|
Bayes factor in one-sample tests of means with a sensitivity analysis: A discussion of separate prior distributions. Behav Res Methods 2019; 51:1998-2021. [PMID: 31161425 DOI: 10.3758/s13428-019-01262-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Due to some widely known critiques of traditional hypothesis testing, Bayesian hypothesis testing using the Bayes factor has been considered as a better alternative. Previous research about the influence of the prior focuses on the prior for the effect size and there is a debate about how to specify the prior. Thus, the focus of this paper is to explore the impact of different priors on the population mean and variance separately (separate priors) on the Bayes factor, and compare the separate priors with the priors on the effect size. Our simulation results show that both the prior distributions on mean and variance have a considerable influence on the Bayes factor, and different types of priors (different separate priors and priors on the effect size) have different influence patterns. We also find that regardless of separate priors or priors on the effect size, and shapes and centers of the priors, different priors could yield similar Bayes factors. Because noninformative prior distributions bias the Bayes factor in support of the null hypothesis, and very informative priors could be risky, we suggest that researchers use weakly informative priors as reasonable priors and they are expected to provide similar conclusions across different shapes and centers of prior distributions. Conducting sensitivity analysis is helpful in examining the influence of prior distributions and specifying reasonable prior distributions for the Bayes factor. A real data example is used to illustrate how to choose reasonable priors by a sensitivity analysis. We hope our results will help researchers choose prior distributions when conducting Bayesian hypothesis testing.
Collapse
|
15
|
Abstract
Scientific theories can often be formulated using equality and order constraints on the relative effects in a linear regression model. For example, it may be expected that the effect of the first predictor is larger than the effect of the second predictor, and the second predictor is expected to be larger than the third predictor. The goal is then to test such expectations against competing scientific expectations or theories. In this paper, a simple default Bayes factor test is proposed for testing multiple hypotheses with equality and order constraints on the effects of interest. The proposed testing criterion can be computed without requiring external prior information about the expected effects before observing the data. The method is implemented in R-package called 'lmhyp' which is freely downloadable and ready to use. The usability of the method and software is illustrated using empirical applications from the social and behavioral sciences.
Collapse
|
16
|
A Bayesian Assessment of an Approximate Model for Unconfined Water Flow in Sloping Layered Porous Media. Transp Porous Media 2019; 126:177-197. [PMID: 30872878 PMCID: PMC6390717 DOI: 10.1007/s11242-018-1094-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 05/24/2018] [Indexed: 11/13/2022]
Abstract
The prediction of water table height in unconfined layered porous media is a difficult modelling problem that typically requires numerical simulation. This paper proposes an analytical model to approximate the exact solution based on a steady-state Dupuit–Forchheimer analysis. The key contribution in relation to a similar model in the literature relies in the ability of the proposed model to consider more than two layers with different thicknesses and slopes, so that the existing model becomes a special case of the proposed model herein. In addition, a model assessment methodology based on the Bayesian inverse problem is proposed to efficiently identify the values of the physical parameters for which the proposed model is accurate when compared against a reference model given by MODFLOW-NWT, the open-source finite-difference code by the U.S. Geological Survey. Based on numerical results for a representative case study, the ratio of vertical recharge rate to hydraulic conductivity emerges as a key parameter in terms of model accuracy so that, when appropriately bounded, both the proposed model and MODFLOW-NWT provide almost identical results.
Collapse
|
17
|
ADHD diagnosis from multiple data sources with batch effects. Front Syst Neurosci 2012; 6:70. [PMID: 23060755 PMCID: PMC3465911 DOI: 10.3389/fnsys.2012.00070] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2012] [Accepted: 09/20/2012] [Indexed: 11/13/2022] Open
Abstract
The Attention Deficit Hyperactivity Disorder (ADHD) affects the school-age population and has large social costs. The scientific community is still lacking a pathophysiological model of the disorder and there are no objective biomarkers to support the diagnosis. In 2011 the ADHD-200 Consortium provided a rich, heterogeneous neuroimaging dataset aimed at studying neural correlates of ADHD and to promote the development of systems for automated diagnosis. Concurrently a competition was set up with the goal of addressing the wide range of different types of data for the accurate prediction of the presence of ADHD. Phenotypic information, structural magnetic resonance imaging (MRI) scans and resting state fMRI recordings were provided for nearly 1000 typical and non-typical young individuals. Data were collected by eight different research centers in the consortium. This work is not concerned with the main task of the contest, i.e., achieving a high prediction accuracy on the competition dataset, but we rather address the proper handling of such a heterogeneous dataset when performing classification-based analysis. Our interest lies in the clustered structure of the data causing the so-called batch effects which have strong impact when assessing the performance of classifiers built on the ADHD-200 dataset. We propose a method to eliminate the biases introduced by such batch effects. Its application on the ADHD-200 dataset generates such a significant drop in prediction accuracy that most of the conclusions from a standard analysis had to be revised. In addition we propose to adopt the dissimilarity representation to set up effective representation spaces for the heterogeneous ADHD-200 dataset. Moreover we propose to evaluate the quality of predictions through a recently proposed test of independence in order to cope with the unbalancedness of the dataset.
Collapse
|