1
|
A stable sequential multiple test for Koopman–Darmois family. J Stat Plan Inference 2023. [DOI: 10.1016/j.jspi.2023.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
2
|
Zhai J, Jiang H. Two-sample test with g-modeling and its applications. Stat Med 2023; 42:89-104. [PMID: 36412978 PMCID: PMC10099579 DOI: 10.1002/sim.9603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 07/31/2022] [Accepted: 10/31/2022] [Indexed: 11/23/2022]
Abstract
Many real data analyses involve two-sample comparisons in location or in distribution. Most existing methods focus on problems where observations are independently and identically distributed in each group. However, in some applications the observed data are not identically distributed but associated with some unobserved parameters which are identically distributed. To address this challenge, we propose a novel two-sample testing procedure as a combination of the g $$ g $$ -modeling density estimation introduced by Efron and the two-sample Kolmogorov-Smirnov test. We also propose efficient bootstrap algorithms to estimate the statistical significance for such tests. We demonstrate the utility of the proposed approach with two biostatistical applications: the analysis of surgical nodes data with binomial model and differential expression analysis of single-cell RNA sequencing data with zero-inflated Poisson model.
Collapse
Affiliation(s)
- Jingyi Zhai
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Hui Jiang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
3
|
He X, Bartroff J. Asymptotically optimal sequential FDR and pFDR control with (or without) prior information on the number of signals. J Stat Plan Inference 2021. [DOI: 10.1016/j.jspi.2020.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
4
|
Rosenblatt JD, Ritov Y, Goeman JJ. Discussion of ‘Gene hunting with hidden Markov model knockoffs’. Biometrika 2019. [DOI: 10.1093/biomet/asy062] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Jonathan D Rosenblatt
- Department of Industrial Engineering and Management, Ben Gurion University of the Negev, , Beer Sheva 84105, Israel
| | - Ya’acov Ritov
- Department of Statistics, University of Michigan, 1085 South University, Ann Arbor, Michigan, U.S.A
| | - Jelle J Goeman
- Department of Biomedical Data Sciences, Leiden University Medical Center, Albinusdreef 2, ZA Leiden, The Netherlands
| |
Collapse
|
5
|
Watt ED, Judson RS. Uncertainty quantification in ToxCast high throughput screening. PLoS One 2018; 13:e0196963. [PMID: 30044784 PMCID: PMC6059398 DOI: 10.1371/journal.pone.0196963] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 04/24/2018] [Indexed: 01/04/2023] Open
Abstract
High throughput screening (HTS) projects like the U.S. Environmental Protection Agency's ToxCast program are required to address the large and rapidly increasing number of chemicals for which we have little to no toxicity measurements. Concentration-response parameters such as potency and efficacy are extracted from HTS data using nonlinear regression, and models and analyses built from these parameters are used to predict in vivo and in vitro toxicity of thousands of chemicals. How these predictions are impacted by uncertainties that stem from parameter estimation and propagated through the models and analyses has not been well explored. While data size and complexity makes uncertainty quantification computationally expensive for HTS datasets, continued advancements in computational resources have allowed these computational challenges to be met. This study uses nonparametric bootstrap resampling to calculate uncertainties in concentration-response parameters from a variety of HTS assays. Using the ToxCast estrogen receptor model for bioactivity as a case study, we highlight how these uncertainties can be propagated through models to quantify the uncertainty in model outputs. Uncertainty quantification in model outputs is used to identify potential false positives and false negatives and to determine the distribution of model values around semi-arbitrary activity cutoffs, increasing confidence in model predictions. At the individual chemical-assay level, curves with high variability are flagged for manual inspection or retesting, focusing subject-matter-expert time on results that need further input. This work improves the confidence of predictions made using HTS data, increasing the ability to use this data in risk assessment.
Collapse
Affiliation(s)
- Eric D. Watt
- U.S. Environmental Protection Agency, National Center for Computational Toxicology, Research Triangle Park, North Carolina, United States of America
- Oak Ridge Institute for Science Education Postdoctoral Fellow, Oak Ridge, Tennessee, United States of America
| | - Richard S. Judson
- U.S. Environmental Protection Agency, National Center for Computational Toxicology, Research Triangle Park, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
6
|
Segal BD, Braun T, Elliott MR, Jiang H. Fast approximation of small p-values in permutation tests by partitioning the permutations. Biometrics 2017. [PMID: 29542118 DOI: 10.1111/biom.12731] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Researchers in genetics and other life sciences commonly use permutation tests to evaluate differences between groups. Permutation tests have desirable properties, including exactness if data are exchangeable, and are applicable even when the distribution of the test statistic is analytically intractable. However, permutation tests can be computationally intensive. We propose both an asymptotic approximation and a resampling algorithm for quickly estimating small permutation p-values (e.g., <10-6) for the difference and ratio of means in two-sample tests. Our methods are based on the distribution of test statistics within and across partitions of the permutations, which we define. In this article, we present our methods and demonstrate their use through simulations and an application to cancer genomic data. Through simulations, we find that our resampling algorithm is more computationally efficient than another leading alternative, particularly for extremely small p-values (e.g., <10-30). Through application to cancer genomic data, we find that our methods can successfully identify up- and down-regulated genes. While we focus on the difference and ratio of means, we speculate that our approaches may work in other settings.
Collapse
Affiliation(s)
- Brian D Segal
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109-2029, U.S.A
| | - Thomas Braun
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109-2029, U.S.A
| | - Michael R Elliott
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109-2029, U.S.A
| | - Hui Jiang
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109-2029, U.S.A
| |
Collapse
|
7
|
Jelsema CM, Peddada SD. CLME: An R Package for Linear Mixed Effects Models under Inequality Constraints. J Stat Softw 2016; 75. [PMID: 32655332 DOI: 10.18637/jss.v075.i01] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
In many applications researchers are typically interested in testing for inequality constraints in the context of linear fixed effects and mixed effects models. Although there exists a large body of literature for performing statistical inference under inequality constraints, user friendly statistical software for implementing such methods is lacking, especially in the context of linear fixed and mixed effects models. In this article we introduce CLME, a package in the R language that can be used for testing a broad collection of inequality constraints. It uses residual bootstrap based methodology which is reasonably robust to non-normality as well as heteroscedasticity. The package is illustrated using two data sets. The package also contains a graphical interface built using the shiny package.
Collapse
|
8
|
Geis-Asteggiante L, Ostrand-Rosenberg S, Fenselau C, Edwards NJ. Evaluation of Spectral Counting for Relative Quantitation of Proteoforms in Top-Down Proteomics. Anal Chem 2016; 88:10900-10907. [PMID: 27748581 PMCID: PMC6178225 DOI: 10.1021/acs.analchem.6b02151] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Spectral counting is a straightforward label-free quantitation strategy used in bottom-up proteomics workflows. The application of spectral counting in label-free top-down proteomics workflows can be similarly straightforward but has not been applied as widely as quantitation by chromatographic peak areas or peak intensities. In this study, we evaluate spectral counting for quantitative comparisons in label-free top-down proteomics workflows by comparison with chromatographic peak areas and intensities. We tested these quantitation approaches by spiking standard proteins into a complex protein background and comparing relative quantitation by spectral counts with normalized chromatographic peak areas and peak intensities from deconvoluted extracted ion chromatograms of the spiked proteins. Ratio estimates and statistical significance of differential abundance from each quantitation technique are evaluated against the expected ratios and each other. In this experiment, spectral counting was able to detect differential abundance of spiked proteins for expected ratios ≥2, with comparable or higher sensitivity than normalized areas and intensities. We also found that while ratio estimates using peak areas and intensities are usually more accurate, the spectral-counting-based estimates are not substantially worse. Following the evaluation and comparison of these label-free top-down quantitation strategies using spiked proteins, spectral counting, along with normalized chromatographic peak areas and intensities, were used to analyze the complex protein cargo of exosomes shed by myeloid-derived suppressor cells collected under high and low conditions of inflammation, revealing statistically significant differences in abundance for several proteoforms, including the active pro-inflammatory proteins S100A8 and S100A9.
Collapse
Affiliation(s)
| | | | | | - Nathan J. Edwards
- Georgetown University Medical Center, Washington DC 20007, United States
| |
Collapse
|
9
|
Affiliation(s)
- Axel Gandy
- Department of Mathematics, Imperial College London
| | - Georg Hahn
- Department of Mathematics, Imperial College London
| |
Collapse
|
10
|
Sun W, Liu Y, Crowley JJ, Chen TH, Zhou H, Chu H, Huang S, Kuan PF, Li Y, Miller DR, Shaw GD, Wu Y, Zhabotynsky V, McMillan L, Zou F, Sullivan PF, de Villena FPM. IsoDOT Detects Differential RNA-isoform Expression/Usage with respect to a Categorical or Continuous Covariate with High Sensitivity and Specificity. J Am Stat Assoc 2015; 110:975-986. [PMID: 26617424 DOI: 10.1080/01621459.2015.1040880] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We have developed a statistical method named IsoDOT to assess differential isoform expression (DIE) and differential isoform usage (DIU) using RNA-seq data. Here isoform usage refers to relative isoform expression given the total expression of the corresponding gene. IsoDOT performs two tasks that cannot be accomplished by existing methods: to test DIE/DIU with respect to a continuous covariate, and to test DIE/DIU for one case versus one control. The latter task is not an uncommon situation in practice, e.g., comparing the paternal and maternal alleles of one individual or comparing tumor and normal samples of one cancer patient. Simulation studies demonstrate the high sensitivity and specificity of IsoDOT. We apply IsoDOT to study the effects of haloperidol treatment on the mouse transcriptome and identify a group of genes whose isoform usages respond to haloperidol treatment.
Collapse
Affiliation(s)
- Wei Sun
- Department of Biostatistics, Department of Genetics, UNC Chapel Hill, NC 27599
| | - Yufeng Liu
- Department of Statistics and Operations Research, Department of Genetics, Department and Biostatistics, UNC Chapel Hill
| | | | | | - Hua Zhou
- Department of Statistics, NC State University
| | - Haitao Chu
- Department of Biostatistics, University of Minnesota
| | | | - Pei-Fen Kuan
- Department of Applied Mathematics and Statistics, Stony Brook University
| | - Yuan Li
- Department of Statistics, NC State University
| | - Darla R Miller
- Department of Genetics, Lineberger Comprehensive Cancer Center, UNC Chapel Hill
| | - Ginger D Shaw
- Department of Genetics, Lineberger Comprehensive Cancer Center, UNC Chapel Hill
| | - Yichao Wu
- Department of Statistics, NC State University
| | | | | | - Fei Zou
- Department of Biostatistics, UNC Chapel Hill
| | - Patrick F Sullivan
- Department of Genetics, Department of Psychiatry, Department of Epidemiology, UNC Chapel Hill
| | | |
Collapse
|
11
|
Shockley KR. Quantitative high-throughput screening data analysis: challenges and recent advances. Drug Discov Today 2014; 20:296-300. [PMID: 25449657 DOI: 10.1016/j.drudis.2014.10.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Revised: 09/18/2014] [Accepted: 10/16/2014] [Indexed: 11/17/2022]
Abstract
In vitro HTS holds much potential to advance drug discovery and provide cell-based alternatives for toxicity testing. In quantitative HTS, concentration-response data can be generated simultaneously for thousands of different compounds and mixtures. However, nonlinear modeling in these multiple-concentration assays presents important statistical challenges that are not problematic for linear models. The uncertainty of parameter estimates obtained from the widely used Hill equation model can be extremely large when using standard designs. Failure to properly consider standard errors of these parameter estimates would greatly hinder chemical genomics and toxicity testing efforts. In this light, optimal study designs should be developed to improve nonlinear parameter estimation; or alternative approaches with reliable performance characteristics should be used to describe concentration-response profiles.
Collapse
Affiliation(s)
- Keith R Shockley
- Biostatistics and Computational Biology Branch, The National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA.
| |
Collapse
|
12
|
Gandy A, Hahn G. MMCTest-A Safe Algorithm for Implementing Multiple Monte Carlo Tests. Scand Stat Theory Appl 2014. [DOI: 10.1111/sjos.12085] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Axel Gandy
- Department of Mathematics; Imperial College London
| | - Georg Hahn
- Department of Mathematics; Imperial College London
| |
Collapse
|
13
|
Shi Y, Jiang H. rSeqDiff: detecting differential isoform expression from RNA-Seq data using hierarchical likelihood ratio test. PLoS One 2013; 8:e79448. [PMID: 24260225 PMCID: PMC3832546 DOI: 10.1371/journal.pone.0079448] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2013] [Accepted: 09/23/2013] [Indexed: 12/29/2022] Open
Abstract
High-throughput sequencing of transcriptomes (RNA-Seq) has recently become a powerful tool for the study of gene expression. We present rSeqDiff, an efficient algorithm for the detection of differential expression and differential splicing of genes from RNA-Seq experiments across multiple conditions. Unlike existing approaches which detect differential expression of transcripts, our approach considers three cases for each gene: 1) no differential expression, 2) differential expression without differential splicing and 3) differential splicing. We specify statistical models characterizing each of these three cases and use hierarchical likelihood ratio test for model selection. Simulation studies show that our approach achieves good power for detecting differentially expressed or differentially spliced genes. Comparisons with competing methods on two real RNA-Seq datasets demonstrate that our approach provides accurate estimates of isoform abundances and biological meaningful rankings of differentially spliced genes. The proposed approach is implemented as an R package named rSeqDiff.
Collapse
Affiliation(s)
- Yang Shi
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Hui Jiang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|