1
|
Nichols TE, Holmes AP. Nonparametric permutation tests for functional neuroimaging: a primer with examples. Hum Brain Mapp 2001; 15:1-25. [PMID: 11747097 PMCID: PMC6871862 DOI: 10.1002/hbm.1058] [Citation(s) in RCA: 4789] [Impact Index Per Article: 199.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Requiring only minimal assumptions for validity, nonparametric permutation testing provides a flexible and intuitive methodology for the statistical analysis of data from functional neuroimaging experiments, at some computational expense. Introduced into the functional neuroimaging literature by Holmes et al. ([1996]: J Cereb Blood Flow Metab 16:7-22), the permutation approach readily accounts for the multiple comparisons problem implicit in the standard voxel-by-voxel hypothesis testing framework. When the appropriate assumptions hold, the nonparametric permutation approach gives results similar to those obtained from a comparable Statistical Parametric Mapping approach using a general linear model with multiple comparisons corrections derived from random field theory. For analyses with low degrees of freedom, such as single subject PET/SPECT experiments or multi-subject PET/SPECT or fMRI designs assessed for population effects, the nonparametric approach employing a locally pooled (smoothed) variance estimate can outperform the comparable Statistical Parametric Mapping approach. Thus, these nonparametric techniques can be used to verify the validity of less computationally expensive parametric approaches. Although the theory and relative advantages of permutation approaches have been discussed by various authors, there has been no accessible explication of the method, and no freely distributed software implementing it. Consequently, there have been few practical applications of the technique. This article, and the accompanying MATLAB software, attempts to address these issues. The standard nonparametric randomization and permutation testing ideas are developed at an accessible level, using practical examples from functional neuroimaging, and the extensions for multiple comparisons described. Three worked examples from PET and fMRI are presented, with discussion, and comparisons with standard parametric approaches made where appropriate. Practical considerations are given throughout, and relevant statistical concepts are expounded in appendices.
Collapse
|
research-article |
24 |
4789 |
2
|
Bullmore E, Long C, Suckling J, Fadili J, Calvert G, Zelaya F, Carpenter TA, Brammer M. Colored noise and computational inference in neurophysiological (fMRI) time series analysis: resampling methods in time and wavelet domains. Hum Brain Mapp 2000; 12:61-78. [PMID: 11169871 PMCID: PMC6871881 DOI: 10.1002/1097-0193(200102)12:2<61::aid-hbm1004>3.0.co;2-w] [Citation(s) in RCA: 429] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Even in the absence of an experimental effect, functional magnetic resonance imaging (fMRI) time series generally demonstrate serial dependence. This colored noise or endogenous autocorrelation typically has disproportionate spectral power at low frequencies, i.e., its spectrum is (1/f)-like. Various pre-whitening and pre-coloring strategies have been proposed to make valid inference on standardised test statistics estimated by time series regression in this context of residually autocorrelated errors. Here we introduce a new method based on random permutation after orthogonal transformation of the observed time series to the wavelet domain. This scheme exploits the general whitening or decorrelating property of the discrete wavelet transform and is implemented using a Daubechies wavelet with four vanishing moments to ensure exchangeability of wavelet coefficients within each scale of decomposition. For (1/f)-like or fractal noises, e.g., realisations of fractional Brownian motion (fBm) parameterised by Hurst exponent 0 < H < 1, this resampling algorithm exactly preserves wavelet-based estimates of the second order stochastic properties of the (possibly nonstationary) time series. Performance of the method is assessed empirically using (1/f)-like noise simulated by multiple physical relaxation processes, and experimental fMRI data. Nominal type 1 error control in brain activation mapping is demonstrated by analysis of 13 images acquired under null or resting conditions. Compared to autoregressive pre-whitening methods for computational inference, a key advantage of wavelet resampling seems to be its robustness in activation mapping of experimental fMRI data acquired at 3 Tesla field strength. We conclude that wavelet resampling may be a generally useful method for inference on naturally complex time series.
Collapse
|
research-article |
25 |
429 |
3
|
Li J, Tibshirani R. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat Methods Med Res 2011; 22:519-36. [PMID: 22127579 DOI: 10.1177/0962280211428386] [Citation(s) in RCA: 297] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We discuss the identification of features that are associated with an outcome in RNA-Sequencing (RNA-Seq) and other sequencing-based comparative genomic experiments. RNA-Seq data takes the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especially challenging because different sequencing experiments may generate quite different total numbers of reads, or 'sequencing depths'. Existing methods for this problem are based on Poisson or negative binomial models: they are useful but can be heavily influenced by 'outliers' in the data. We introduce a simple, non-parametric method with resampling to account for the different sequencing depths. The new method is more robust than parametric methods. It can be applied to data with quantitative, survival, two-class or multiple-class outcomes. We compare our proposed method to Poisson and negative binomial-based methods in simulated and real data sets, and find that our method discovers more consistent patterns than competing methods.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
14 |
297 |
4
|
Abstract
Identifying recombinant sequences in an era of large genomic databases is challenging as it requires an efficient algorithm to identify candidate recombinants and parents, as well as appropriate statistical methods to correct for the large number of comparisons performed. In 2007, a computation was introduced for an exact nonparametric mosaicism statistic that gave high-precision P values for putative recombinants. This exact computation meant that multiple-comparisons corrected P values also had high precision, which is crucial when performing millions or billions of tests in large databases. Here, we introduce an improvement to the algorithmic complexity of this computation from O(mn3) to O(mn2), where m and n are the numbers of recombination-informative sites in the candidate recombinant. This new computation allows for recombination analysis to be performed in alignments with thousands of polymorphic sites. Benchmark runs are presented on viral genome sequence alignments, new features are introduced, and applications outside recombination analysis are discussed.
Collapse
|
research-article |
7 |
133 |
5
|
Howard R, Carriquiry AL, Beavis WD. Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3 (BETHESDA, MD.) 2014; 4:1027-46. [PMID: 24727289 PMCID: PMC4065247 DOI: 10.1534/g3.114.010298] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 03/18/2014] [Indexed: 01/12/2023]
Abstract
Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE.
Collapse
|
research-article |
11 |
96 |
6
|
Abstract
A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with symmetric Dirichlet weights, and put a prior on the number of components-that is, to use a mixture of finite mixtures (MFM). The most commonly-used method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in high-dimensional spaces. Meanwhile, there are samplers for Dirichlet process mixture (DPM) models that are relatively simple and are easily adapted to new applications. It turns out that, in fact, many of the essential properties of DPMs are also exhibited by MFMs-an exchangeable partition distribution, restaurant process, random measure representation, and stick-breaking representation-and crucially, the MFM analogues are simple enough that they can be used much like the corresponding DPM properties. Consequently, many of the powerful methods developed for inference in DPMs can be directly applied to MFMs as well; this simplifies the implementation of MFMs and can substantially improve mixing. We illustrate with real and simulated data, including high-dimensional gene expression data used to discriminate cancer subtypes.
Collapse
|
Journal Article |
8 |
66 |
7
|
Åsberg A, Midtvedt K, van Guilder M, Størset E, Bremer S, Bergan S, Jelliffe R, Hartmann A, Neely MN. Inclusion of CYP3A5 genotyping in a nonparametric population model improves dosing of tacrolimus early after transplantation. Transpl Int 2013; 26:1198-207. [PMID: 24118301 PMCID: PMC3852421 DOI: 10.1111/tri.12194] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Revised: 07/21/2013] [Accepted: 09/15/2013] [Indexed: 12/02/2022]
Abstract
Following organ engraftment, initial dosing of tacrolimus is based on recipient weight and adjusted by measured C0 concentrations. The bioavailability and elimination of tacrolimus are affected by the patients CYP3A5 genotype. Prospective data of the clinical advantage of knowing patient's CYP3A5 genotype prior to transplantation are lacking. A nonparametric population model was developed for tacrolimus in renal transplant recipients. Data from 99 patients were used for model development and validation. A three-compartment model with first-order absorption and lag time from the dosing compartment described the data well. Clearances and volumes of distribution were allometrically scaled to body size. The final model included fat-free mass, body mass index, hematocrit, time after transplantation, and CYP3A5 genotype as covariates. The bias and imprecision were 0.35 and 1.38, respectively, in the external data set. Patients with functional CYP3A5 had 26% higher clearance and 37% lower bioavailability. Knowledge of CYP3A5 genotype provided an initial advantage, but only until 3-4 tacrolimus concentrations were known. After this, a model without CYP3A5 genotype predicted just as well. The present models seem applicable for clinical individual dose predictions but need a prospective evaluation.
Collapse
|
Validation Study |
12 |
59 |
8
|
Mander AP, Sweeting MJ. A product of independent beta probabilities dose escalation design for dual-agent phase I trials. Stat Med 2015; 34:1261-76. [PMID: 25630638 PMCID: PMC4409822 DOI: 10.1002/sim.6434] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Revised: 01/07/2015] [Accepted: 01/09/2015] [Indexed: 11/12/2022]
Abstract
Dual-agent trials are now increasingly common in oncology research, and many proposed dose-escalation designs are available in the statistical literature. Despite this, the translation from statistical design to practical application is slow, as has been highlighted in single-agent phase I trials, where a 3 + 3 rule-based design is often still used. To expedite this process, new dose-escalation designs need to be not only scientifically beneficial but also easy to understand and implement by clinicians. In this paper, we propose a curve-free (nonparametric) design for a dual-agent trial in which the model parameters are the probabilities of toxicity at each of the dose combinations. We show that it is relatively trivial for a clinician's prior beliefs or historical information to be incorporated in the model and updating is fast and computationally simple through the use of conjugate Bayesian inference. Monotonicity is ensured by considering only a set of monotonic contours for the distribution of the maximum tolerated contour, which defines the dose-escalation decision process. Varied experimentation around the contour is achievable, and multiple dose combinations can be recommended to take forward to phase II. Code for R, Stata and Excel are available for implementation.
Collapse
|
research-article |
10 |
53 |
9
|
Houpt JW, Townsend JT. Statistical measures for workload capacity analysis. JOURNAL OF MATHEMATICAL PSYCHOLOGY 2012; 56:341-355. [PMID: 23175582 PMCID: PMC3501136 DOI: 10.1016/j.jmp.2012.05.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
A critical component of how we understand a mental process is given by measuring the effect of varying the workload. The capacity coefficient (Townsend & Nozawa, 1995; Townsend & Wenger, 2004) is a measure on response times for quantifying changes in performance due to workload. Despite its precise mathematical foundation, until now rigorous statistical tests have been lacking. In this paper, we demonstrate statistical properties of the components of the capacity measure and propose a significance test for comparing the capacity coefficient to a baseline measure or two capacity coefficients to each other.
Collapse
|
research-article |
13 |
44 |
10
|
Parast L, McDermott MM, Tian L. Robust estimation of the proportion of treatment effect explained by surrogate marker information. Stat Med 2015; 35:1637-53. [PMID: 26631934 DOI: 10.1002/sim.6820] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Revised: 10/28/2015] [Accepted: 11/02/2015] [Indexed: 11/10/2022]
Abstract
In randomized treatment studies where the primary outcome requires long follow-up of patients and/or expensive or invasive obtainment procedures, the availability of a surrogate marker that could be used to estimate the treatment effect and could potentially be observed earlier than the primary outcome would allow researchers to make conclusions regarding the treatment effect with less required follow-up time and resources. The Prentice criterion for a valid surrogate marker requires that a test for treatment effect on the surrogate marker also be a valid test for treatment effect on the primary outcome of interest. Based on this criterion, methods have been developed to define and estimate the proportion of treatment effect on the primary outcome that is explained by the treatment effect on the surrogate marker. These methods aim to identify useful statistical surrogates that capture a large proportion of the treatment effect. However, current methods to estimate this proportion usually require restrictive model assumptions that may not hold in practice and thus may lead to biased estimates of this quantity. In this paper, we propose a nonparametric procedure to estimate the proportion of treatment effect on the primary outcome that is explained by the treatment effect on a potential surrogate marker and extend this procedure to a setting with multiple surrogate markers. We compare our approach with previously proposed model-based approaches and propose a variance estimation procedure based on a perturbation-resampling method. Simulation studies demonstrate that the procedure performs well in finite samples and outperforms model-based procedures when the specified models are not correct. We illustrate our proposed procedure using a data set from a randomized study investigating a group-mediated cognitive behavioral intervention for peripheral artery disease participants.
Collapse
|
Research Support, N.I.H., Extramural |
10 |
31 |
11
|
Golicki D, Niewada M, Hout BV, Janssen MF, Pickard AS. Interim EQ-5D-5L Value Set for Poland: First Crosswalk Value Set in Central and Eastern Europe. Value Health Reg Issues 2014; 4:19-23. [PMID: 29702801 DOI: 10.1016/j.vhri.2014.06.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
OBJECTIVE To estimate an interim five-level EuroQol five-dimensional (EQ-5D-5L) value set for Poland on the basis of the crosswalk methodology developed by the EuroQol Group. METHODS On the basis of data from 3691 respondents from six European countries, the EuroQol Group has developed a method of obtaining interim value sets for the EQ-5D-5L by means of mapping to the available three-level EuroQol five-dimensional (EQ-5D-3L) value sets ("crosswalk" methodology). A significant part of the data in this study came from Polish respondents (n = 972; 26.3%). Poland is the first Central European country with EQ-5D-3L time trade-off-based social value set published. To obtain an interim EQ-5D-5L value set, we applied the crosswalk methodology to the Polish EQ-5D-3L value set. RESULTS Estimated Polish values for 3125 EQ-5D-5L health states are presented. Both EQ-5D-5L and EQ-5D-3L value sets have the same range (from -0.523 to 1.000), but different means (0.448 vs. 0.380) and medians (0.483 vs. 0.403), respectively. Proportionately fewer states worse than dead were observed in the EQ-5D-5L (5.4%) value set than in the EQ-5D-3L (13.2%) value set. CONCLUSIONS The crosswalk-based value set is available for use in EQ-5D-5L studies in Poland to calculate health state utilities. It should be considered an interim value set until values based on preferences elicited directly from a sample representative of the Polish general population become available. This study helps users of the crosswalk algorithm understand the properties of the EQ-5D-5L values generated using this method, in comparison to EQ-5D-3L values obtained with the Polish time trade-off value set. It is likely that similar results would be observed for values sets in other countries because the same crosswalk methodology applies across all countries.
Collapse
|
Journal Article |
11 |
30 |
12
|
Suckling J, Davis MH, Ooi C, Wink AM, Fadili J, Salvador R, Welchew D, Sendur L, Maxim V, Bullmore ET. Permutation testing of orthogonal factorial effects in a language-processing experiment using fMRI. Hum Brain Mapp 2006; 27:425-33. [PMID: 16596618 PMCID: PMC6871336 DOI: 10.1002/hbm.20252] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
The block-paradigm of the Functional Image Analysis Contest (FIAC) dataset was analysed with the Brain Activation and Morphological Mapping software. Permutation methods in the wavelet domain were used for inference on cluster-based test statistics of orthogonal contrasts relevant to the factorial design of the study, namely: the average response across all active blocks, the main effect of speaker, the main effect of sentence, and the interaction between sentence and speaker. Extensive activation was seen with all these contrasts. In particular, different vs. same-speaker blocks produced elevated activation in bilateral regions of the superior temporal lobe and repetition suppression for linguistic materials (same vs. different-sentence blocks) in left inferior frontal regions. These are regions previously reported in the literature. Additional regions were detected in this study, perhaps due to the enhanced sensitivity of the methodology. Within-block sentence suppression was tested post-hoc by regression of an exponential decay model onto the extracted time series from the left inferior frontal gyrus, but no strong evidence of such an effect was found. The significance levels set for the activation maps are P-values at which we expect <1 false-positive cluster per image. Nominal type I error control was verified by empirical testing of a test statistic corresponding to a randomly ordered design matrix. The small size of the BOLD effect necessitates sensitive methods of detection of brain activation. Permutation methods permit the necessary flexibility to develop novel test statistics to meet this challenge.
Collapse
|
research-article |
19 |
26 |
13
|
Posner AB, Tranah GJ, Blackwell T, Yaffe K, Ancoli-Israel S, Redline S, Leng Y, Zeitzer JM, Chen DM, Webber KR, Stone KL. Predicting incident dementia and mild cognitive impairment in older women with nonparametric analysis of circadian activity rhythms in the Study of Osteoporotic Fractures. Sleep 2021; 44:zsab119. [PMID: 33964167 PMCID: PMC8503832 DOI: 10.1093/sleep/zsab119] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 03/31/2021] [Indexed: 11/12/2022] Open
Abstract
STUDY OBJECTIVES Disrupted daily rhythms are associated with mild cognitive impairment (MCI) and dementia. The specific nature of how rhythms and cognition are related, however, is unknown. We hypothesized characteristics from a nonparametric estimate of circadian rest-activity rhythm patterns would be associated to the development of MCI or dementia. METHODS Wrist actigraphy from 1232 cognitively healthy, community-dwelling women (mean age 82.6 years) from the Study of Osteoporotic Fractures was used to estimate rest-activity patterns, including intradaily variability (IV), interdaily stability (IS), most active 10-hour period (M10), least active 5-hour period (L5), and relative amplitude (RA). Logistic regression examined associations of these predictors with 5-year incidence of MCI or dementia. Models were adjusted for potential confounders. RESULTS Women with earlier sleep/wake times had higher risk of dementia, but not MCI, (early vs. average L5 midpoint: OR, 1.66; 95% CI, 1.08-2.55) as did women with smaller day/night activity differentials (low vs. high RA: OR, 1.96; 95% CI, 1.14-3.35). IV, IS, and M10 were not associated with MCI or dementia. CONCLUSION The timing and difference in day/night amplitude, but not variability of activity, may be useful as predictors of dementia.
Collapse
|
Research Support, N.I.H., Extramural |
4 |
22 |
14
|
Halliday DM. Nonparametric directionality measures for time series and point process data. J Integr Neurosci 2015; 14:253-77. [PMID: 25958923 DOI: 10.1142/s0219635215300127] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The need to determine the directionality of interactions between neural signals is a key requirement for analysis of multichannel recordings. Approaches most commonly used are parametric, typically relying on autoregressive models. A number of concerns have been expressed regarding parametric approaches, thus there is a need to consider alternatives. We present an alternative nonparametric approach for construction of directionality measures for bivariate random processes. The method combines time and frequency domain representations of bivariate data to decompose the correlation by direction. Our framework generates two sets of complementary measures, a set of scalar measures, which decompose the total product moment correlation coefficient summatively into three terms by direction and a set of functions which decompose the coherence summatively at each frequency into three terms by direction: forward direction, reverse direction and instantaneous interaction. It can be undertaken as an addition to a standard bivariate spectral and coherence analysis, and applied to either time series or point-process (spike train) data or mixtures of the two (hybrid data). In this paper, we demonstrate application to spike train data using simulated cortical neurone networks and application to experimental data from isolated muscle spindle sensory endings subject to random efferent stimulation.
Collapse
|
Journal Article |
10 |
21 |
15
|
Eden SK, Li C, Shepherd BE. Nonparametric estimation of Spearman's rank correlation with bivariate survival data. Biometrics 2022; 78:421-434. [PMID: 33704769 PMCID: PMC8453584 DOI: 10.1111/biom.13453] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 01/17/2021] [Accepted: 02/24/2021] [Indexed: 11/30/2022]
Abstract
We study rank-based approaches to estimate the correlation between two right-censored variables. With end-of-study censoring, it is often impossible to nonparametrically identify the complete bivariate survival distribution, and therefore it is impossible to nonparametrically compute Spearman's rank correlation. As a solution, we propose two measures that can be nonparametrically estimated. The first measure is Spearman's correlation in a restricted region. The second measure is Spearman's correlation for an altered but estimable joint distribution. We describe population parameters for these measures and illustrate how they are similar to and different from the overall Spearman's correlation. We propose consistent estimators of these measures and study their performance through simulations. We illustrate our methods with a study assessing the correlation between the time to viral failure and the time to regimen change among persons living with HIV in Latin America who start antiretroviral therapy.
Collapse
|
Research Support, N.I.H., Extramural |
3 |
21 |
16
|
Caldwell AR, Cheuvront SN. Basic statistical considerations for physiology: The journal Temperature toolbox. Temperature (Austin) 2019; 6:181-210. [PMID: 31608303 PMCID: PMC6773229 DOI: 10.1080/23328940.2019.1624131] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 05/19/2019] [Accepted: 05/21/2019] [Indexed: 12/26/2022] Open
Abstract
The average environmental and occupational physiologist may find statistics are difficult to interpret and use since their formal training in statistics is limited. Unfortunately, poor statistical practices can generate erroneous or at least misleading results and distorts the evidence in the scientific literature. These problems are exacerbated when statistics are used as thoughtless ritual that is performed after the data are collected. The situation is worsened when statistics are then treated as strict judgements about the data (i.e., significant versus non-significant) without a thought given to how these statistics were calculated or their practical meaning. We propose that researchers should consider statistics at every step of the research process whether that be the designing of experiments, collecting data, analysing the data or disseminating the results. When statistics are considered as an integral part of the research process, from start to finish, several problematic practices can be mitigated. Further, proper practices in disseminating the results of a study can greatly improve the quality of the literature. Within this review, we have included a number of reminders and statistical questions researchers should answer throughout the scientific process. Rather than treat statistics as a strict rule following procedure we hope that readers will use this review to stimulate a discussion around their current practices and attempt to improve them. The code to reproduce all analyses and figures within the manuscript can be found at https://doi.org/10.17605/OSF.IO/BQGDH.
Collapse
|
Review |
6 |
15 |
17
|
|
other |
7 |
14 |
18
|
Zou GY, Yue L. Using confidence intervals to compare several correlated areas under the receiver operating characteristic curves. Stat Med 2013; 32:5077-90. [PMID: 23824874 DOI: 10.1002/sim.5889] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2012] [Accepted: 05/29/2013] [Indexed: 11/07/2022]
Abstract
The performance of a diagnostic tool yielding quantitative or ordinal measurements is often assessed in terms of its area under the receiver operating characteristic curve (AUC). As new diagnostic tools are constantly being developed, a frequently occurring task is to compare multiple AUCs as derived from the same group of subjects. For this purpose, previous methods have usually used an omnibus chi-square test, which may not be very informative. We present here methods for comparing several correlated AUCs using simultaneous confidence intervals. To improve small sample properties, we adopt the method of variance estimates recovery in which confidence limits for each AUC are obtained on the basis of the logit and inverse hyperbolic sine transformations. A simulation study demonstrates the superior performance of the proposed approach. The methods are illustrated with two examples.
Collapse
|
Research Support, Non-U.S. Gov't |
12 |
14 |
19
|
Awate SP, Whitaker RT. Multiatlas segmentation as nonparametric regression. IEEE TRANSACTIONS ON MEDICAL IMAGING 2014; 33:1803-17. [PMID: 24802528 PMCID: PMC4440593 DOI: 10.1109/tmi.2014.2321281] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.
Collapse
|
Research Support, N.I.H., Extramural |
11 |
14 |
20
|
Cabreros I, Storey JD. A Likelihood-Free Estimator of Population Structure Bridging Admixture Models and Principal Components Analysis. Genetics 2019; 212:1009-1029. [PMID: 31028112 PMCID: PMC6707457 DOI: 10.1534/genetics.119.302159] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 04/08/2019] [Indexed: 11/18/2022] Open
Abstract
We introduce a simple and computationally efficient method for fitting the admixture model of genetic population structure, called ALStructure The strategy of ALStructure is to first estimate the low-dimensional linear subspace of the population admixture components, and then search for a model within this subspace that is consistent with the admixture model's natural probabilistic constraints. Central to this strategy is the observation that all models belonging to this constrained space of solutions are risk-minimizing and have equal likelihood, rendering any additional optimization unnecessary. The low-dimensional linear subspace is estimated through a recently introduced principal components analysis method that is appropriate for genotype data, thereby providing a solution that has both principal components and probabilistic admixture interpretations. Our approach differs fundamentally from other existing methods for estimating admixture, which aim to fit the admixture model directly by searching for parameters that maximize the likelihood function or the posterior probability. We observe that ALStructure typically outperforms existing methods both in accuracy and computational speed under a wide array of simulated and real human genotype datasets. Throughout this work, we emphasize that the admixture model is a special case of a much broader class of models for which algorithms similar to ALStructure may be successfully employed.
Collapse
|
Evaluation Study |
6 |
11 |
21
|
Carter KM, Lu M, Jiang H, An L. An Information-Based Approach for Mediation Analysis on High-Dimensional Metagenomic Data. Front Genet 2020; 11:148. [PMID: 32231681 PMCID: PMC7083016 DOI: 10.3389/fgene.2020.00148] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 02/10/2020] [Indexed: 12/13/2022] Open
Abstract
The human microbiome plays a critical role in the development of gut-related illnesses such as inflammatory bowel disease and clinical pouchitis. A mediation model can be used to describe the interaction between host gene expression, the gut microbiome, and clinical/health situation (e.g., diseased or not, inflammation level) and may provide insights into underlying disease mechanisms. Current mediation regression methodology cannot adequately model high-dimensional exposures and mediators or mixed data types. Additionally, regression based mediation models require some assumptions for the model parameters, and the relationships are usually assumed to be linear and additive. With the microbiome being the mediators, these assumptions are violated. We propose two novel nonparametric procedures utilizing information theory to detect significant mediation effects with high-dimensional exposures and mediators and varying data types while avoiding standard regression assumptions. Compared with available methods through comprehensive simulation studies, the proposed method shows higher power and lower error. The innovative method is applied to clinical pouchitis data as well and interesting results are obtained.
Collapse
|
Journal Article |
5 |
10 |
22
|
Preisser JS, Sen PK, Offenbacher S. Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics. Stat Biopharm Res 2011; 3:372-384. [PMID: 21984957 PMCID: PMC3186946 DOI: 10.1198/sbr.2011.10025] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined.
Collapse
|
research-article |
14 |
8 |
23
|
Cox LAT, Liu X, Shi L, Zu K, Goodman J. Applying Nonparametric Methods to Analyses of Short-Term Fine Particulate Matter Exposure and Hospital Admissions for Cardiovascular Diseases among Older Adults. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2017; 14:1051. [PMID: 28895893 PMCID: PMC5615588 DOI: 10.3390/ijerph14091051] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Revised: 09/08/2017] [Accepted: 09/10/2017] [Indexed: 02/08/2023]
Abstract
Short-term exposure to fine particulate matter (PM2.5) has been associated with increased risks of cardiovascular diseases (CVDs), but whether such associations are supportive of a causal relationship is unclear, and few studies have employed formal causal analysis methods to address this. We employed nonparametric methods to examine the associations between daily concentrations of PM2.5 and hospital admissions (HAs) for CVD among adults aged 75 years and older in Texas, USA. We first quantified the associations in partial dependence plots generated using the random forest approach. We next used a Bayesian network learning algorithm to identify conditional dependencies between CVD HAs of older men and women and several predictor variables. We found that geographic location (county), time (e.g., month and year), and temperature satisfied necessary information conditions for being causes of CVD HAs among older men and women, but daily PM2.5 concentrations did not. We also found that CVD HAs of disjoint subpopulations were strongly predictive of CVD HAs among older men and women, indicating the presence of unmeasured confounders. Our findings from nonparametric analyses do not support PM2.5 as a direct cause of CVD HAs among older adults.
Collapse
|
research-article |
8 |
8 |
24
|
Li L, Tchetgen ET, van der Vaart A, Robins JM. Higher Order Inference On A Treatment Effect Under Low Regularity Conditions. Stat Probab Lett 2011; 81:821-828. [PMID: 21552339 PMCID: PMC3088168 DOI: 10.1016/j.spl.2011.02.030] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
We describe a novel approach to nonparametric point and interval estimation of a treatment effect in the presence of many continuous confounders. We show the problem can be reduced to that of point and interval estimation of the expected conditional covariance between treatment and response given the confounders. Our estimators are higher order U-statistics. The approach applies equally to the regular case where the expected conditional covariance is root-n estimable and to the irregular case where slower non-parametric rates prevail.
Collapse
|
research-article |
14 |
8 |
25
|
Şendur L, Suckling J, Whitcher B, Bullmore E. Resampling methods for improved wavelet-based multiple hypothesis testing of parametric maps in functional MRI. Neuroimage 2007; 37:1186-94. [PMID: 17651989 PMCID: PMC2633606 DOI: 10.1016/j.neuroimage.2007.05.057] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2006] [Revised: 05/08/2007] [Accepted: 05/11/2007] [Indexed: 11/20/2022] Open
Abstract
Two- or three-dimensional wavelet transforms have been considered as a basis for multiple hypothesis testing of parametric maps derived from functional magnetic resonance imaging (fMRI) experiments. Most of the previous approaches have assumed that the noise variance is equally distributed across levels of the transform. Here we show that this assumption is unrealistic; fMRI parameter maps typically have more similarity to a 1/f-type spatial covariance with greater variance in 2D wavelet coefficients representing lower spatial frequencies, or coarser spatial features, in the maps. To address this issue we resample the fMRI time series data in the wavelet domain (using a 1D discrete wavelet transform [DWT]) to produce a set of permuted parametric maps that are decomposed (using a 2D DWT) to estimate level-specific variances of the 2D wavelet coefficients under the null hypothesis. These resampling-based estimates of the "wavelet variance spectrum" are substituted in a Bayesian bivariate shrinkage operator to denoise the observed 2D wavelet coefficients, which are then inverted to reconstitute the observed, denoised map in the spatial domain. Multiple hypothesis testing controlling the false discovery rate in the observed, denoised maps then proceeds in the spatial domain, using thresholds derived from an independent set of permuted, denoised maps. We show empirically that this more realistic, resampling-based algorithm for wavelet-based denoising and multiple hypothesis testing has good Type I error control and can detect experimentally engendered signals in data acquired during auditory-linguistic processing.
Collapse
|
Research Support, N.I.H., Extramural |
18 |
8 |