51
|
Baldoni PL, Rashid NU, Ibrahim JG. Improved detection of epigenomic marks with mixed-effects hidden Markov models. Biometrics 2019; 75:1401-1413. [PMID: 31081192 PMCID: PMC6851437 DOI: 10.1111/biom.13083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 05/03/2019] [Indexed: 11/30/2022]
Abstract
Chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is a technique to detect genomic regions containing protein-DNA interaction, such as transcription factor binding sites or regions containing histone modifications. One goal of the analysis of ChIP-seq experiments is to identify genomic loci enriched for sequencing reads pertaining to DNA bound to the factor of interest. The accurate identification of such regions aids in the understanding of epigenomic marks and gene regulatory mechanisms. Given the reduction of massively parallel sequencing costs, methods to detect consensus regions of enrichment across multiple samples are of interest. Here, we present a statistical model to detect broad consensus regions of enrichment from ChIP-seq technical or biological replicates through a class of zero-inflated mixed-effects hidden Markov models. We show that the proposed model outperforms existing methods for consensus peak calling in common epigenomic marks by accounting for the excess zeros and sample-specific biases. We apply our method to data from the Encyclopedia of DNA Elements and Roadmap Epigenomics projects and also from an extensive simulation study.
Collapse
|
52
|
Psioda MA, Hu K, Zhang Y, Pan J, Ibrahim JG. Bayesian design of biosimilars clinical programs involving multiple therapeutic indications. Biometrics 2019; 76:630-642. [PMID: 31631321 DOI: 10.1111/biom.13163] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2018] [Accepted: 10/08/2019] [Indexed: 11/28/2022]
Abstract
In this paper, we propose a Bayesian design framework for a biosimilars clinical program that entails conducting concurrent trials in multiple therapeutic indications to establish equivalent efficacy for a proposed biologic compared to a reference biologic in each indication to support approval of the proposed biologic as a biosimilar. Our method facilitates information borrowing across indications through the use of a multivariate normal correlated parameter prior (CPP), which is constructed from easily interpretable hyperparameters that represent direct statements about the equivalence hypotheses to be tested. The CPP accommodates different endpoints and data types across indications (eg, binary and continuous) and can, therefore, be used in a wide context of models without having to modify the data (eg, rescaling) to provide reasonable information-borrowing properties. We illustrate how one can evaluate the design using Bayesian versions of the type I error rate and power with the objective of determining the sample size required for each indication such that the design has high power to demonstrate equivalent efficacy in each indication, reasonably high power to demonstrate equivalent efficacy simultaneously in all indications (ie, globally), and reasonable type I error control from a Bayesian perspective. We illustrate the method with several examples, including designing biosimilars trials for follicular lymphoma and rheumatoid arthritis using binary and continuous endpoints, respectively.
Collapse
|
53
|
Wu J, Chen MH, Schifano ED, Ibrahim JG, Fisher JD. A new Bayesian joint model for longitudinal count data with many zeros, intermittent missingness, and dropout with applications to HIV prevention trials. Stat Med 2019; 38:5565-5586. [PMID: 31691322 DOI: 10.1002/sim.8379] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 09/02/2019] [Accepted: 09/05/2019] [Indexed: 11/08/2022]
Abstract
In longitudinal clinical trials, it is common that subjects may permanently withdraw from the study (dropout), or return to the study after missing one or more visits (intermittent missingness). It is also routinely encountered in HIV prevention clinical trials that there is a large proportion of zeros in count response data. In this paper, a sequential multinomial model is adopted for dropout and subsequently a conditional model is constructed for intermittent missingness. The new model captures the complex structure of missingness and incorporates dropout and intermittent missingness simultaneously. The model also allows us to easily compute the predictive probabilities of different missing data patterns. A zero-inflated Poisson mixed-effects regression model is assumed for the longitudinal count response data. We also propose an approach to assess the overall treatment effects under the zero-inflated Poisson model. We further show that the joint posterior distribution is improper if uniform priors are specified for the regression coefficients under the proposed model. Variations of the g-prior, Jeffreys prior, and maximally dispersed normal prior are thus established as remedies for the improper posterior distribution. An efficient Gibbs sampling algorithm is developed using a hierarchical centering technique. A modified logarithm of the pseudomarginal likelihood and a concordance based area under the curve criterion are used to compare the models under different missing data mechanisms. We then conduct an extensive simulation study to investigate the empirical performance of the proposed methods and further illustrate the methods using real data from an HIV prevention clinical trial.
Collapse
|
54
|
Rashid NU, Li Q, Yeh JJ, Ibrahim JG. Modeling Between-Study Heterogeneity for Improved Replicability in Gene Signature Selection and Clinical Prediction. J Am Stat Assoc 2019; 115:1125-1138. [PMID: 33012902 DOI: 10.1080/01621459.2019.1671197] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
In the genomic era, the identification of gene signatures associated with disease is of significant interest. Such signatures are often used to predict clinical outcomes in new patients and aid clinical decision-making. However, recent studies have shown that gene signatures are often not replicable. This occurrence has practical implications regarding the generalizability and clinical applicability of such signatures. To improve replicability, we introduce a novel approach to select gene signatures from multiple datasets whose effects are consistently non-zero and account for between-study heterogeneity. We build our model upon some rank-based quantities, facilitating integration over different genomic datasets. A high dimensional penalized Generalized Linear Mixed Model (pGLMM) is used to select gene signatures and address data heterogeneity. We compare our method to some commonly used strategies that select gene signatures ignoring between-study heterogeneity. We provide asymptotic results justifying the performance of our method and demonstrate its advantage in the presence of heterogeneity through thorough simulation studies. Lastly, we motivate our method through a case study subtyping pancreatic cancer patients from four gene expression studies.
Collapse
|
55
|
Zhu A, Srivastava A, Ibrahim JG, Patro R, Love MI. Nonparametric expression analysis using inferential replicate counts. Nucleic Acids Res 2019; 47:e105. [PMID: 31372651 PMCID: PMC6765120 DOI: 10.1093/nar/gkz622] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Revised: 06/11/2019] [Accepted: 07/11/2019] [Indexed: 11/13/2022] Open
Abstract
A primary challenge in the analysis of RNA-seq data is to identify differentially expressed genes or transcripts while controlling for technical biases. Ideally, a statistical testing procedure should incorporate the inherent uncertainty of the abundance estimates arising from the quantification step. Most popular methods for RNA-seq differential expression analysis fit a parametric model to the counts for each gene or transcript, and a subset of methods can incorporate uncertainty. Previous work has shown that nonparametric models for RNA-seq differential expression may have better control of the false discovery rate, and adapt well to new data types without requiring reformulation of a parametric model. Existing nonparametric models do not take into account inferential uncertainty, leading to an inflated false discovery rate, in particular at the transcript level. We propose a nonparametric model for differential expression analysis using inferential replicate counts, extending the existing SAMseq method to account for inferential uncertainty. We compare our method, Swish, with popular differential expression analysis methods. Swish has improved control of the false discovery rate, in particular for transcripts with high inferential uncertainty. We apply Swish to a single-cell RNA-seq dataset, assessing differential expression between sub-populations of cells, and compare its performance to the Wilcoxon test.
Collapse
|
56
|
Wilson DR, Jin C, Ibrahim JG, Sun W. ICeD-T Provides Accurate Estimates of Immune Cell Abundance in Tumor Samples by Allowing for Aberrant Gene Expression Patterns. J Am Stat Assoc 2019; 115:1055-1065. [PMID: 33012900 DOI: 10.1080/01621459.2019.1654874] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Immunotherapies have attracted lots of research interests recently. The need to understand the underlying mechanisms of immunotherapies and to develop precision immunotherapy regimens has spurred great interest in characterizing immune cell composition within the tumor microenvironment. Several methods have been developed to estimate immune cell composition using gene expression data from bulk tumor samples. However, these methods are not flexible enough to handle aberrant patterns of gene expression data, e.g., inconsistent cell type-specific gene expression between purified reference samples and tumor samples. We propose a novel statistical method for expression deconvolution called ICeD-T (Immune Cell Deconvolution in Tumor tissues). ICeD-T automatically identifies aberrant genes whose expression are inconsistent with the deconvolution model and down-weights their contributions to cell type abundance estimates. We evaluated the performance of ICeD-T versus existing methods in simulation studies and several real data analyses. ICeD-T displayed comparable or superior performance to these competing methods. Applying these methods to assess the relationship between immunotherapy response and immune cell composition, ICeD-T is able to identify significant associations that are missed by its competitors.
Collapse
|
57
|
Diao G, Zeng D, Hu K, Ibrahim JG. Semiparametric frailty models for zero-inflated event count data in the presence of informative dropout. Biometrics 2019; 75:1168-1178. [PMID: 31106400 DOI: 10.1111/biom.13085] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Accepted: 05/14/2019] [Indexed: 11/27/2022]
Abstract
Recurrent events data are commonly encountered in medical studies. In many applications, only the number of events during the follow-up period rather than the recurrent event times is available. Two important challenges arise in such studies: (a) a substantial portion of subjects may not experience the event, and (b) we may not observe the event count for the entire study period due to informative dropout. To address the first challenge, we assume that underlying population consists of two subpopulations: a subpopulation nonsusceptible to the event of interest and a subpopulation susceptible to the event of interest. In the susceptible subpopulation, the event count is assumed to follow a Poisson distribution given the follow-up time and the subject-specific characteristics. We then introduce a frailty to account for informative dropout. The proposed semiparametric frailty models consist of three submodels: (a) a logistic regression model for the probability such that a subject belongs to the nonsusceptible subpopulation; (b) a nonhomogeneous Poisson process model with an unspecified baseline rate function; and (c) a Cox model for the informative dropout time. We develop likelihood-based estimation and inference procedures. The maximum likelihood estimators are shown to be consistent. Additionally, the proposed estimators of the finite-dimensional parameters are asymptotically normal and the covariance matrix attains the semiparametric efficiency bound. Simulation studies demonstrate that the proposed methodologies perform well in practical situations. We apply the proposed methods to a clinical trial on patients with myelodysplastic syndromes.
Collapse
|
58
|
Duan R, Cao M, Ning Y, Zhu M, Zhang B, McDermott A, Chu H, Zhou X, Moore JH, Ibrahim JG, Scharfstein DO, Chen Y. Global identifiability of latent class models with applications to diagnostic test accuracy studies: A Gröbner basis approach. Biometrics 2019; 76:98-108. [PMID: 31444807 DOI: 10.1111/biom.13133] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 07/25/2019] [Indexed: 11/30/2022]
Abstract
Identifiability of statistical models is a fundamental regularity condition that is required for valid statistical inference. Investigation of model identifiability is mathematically challenging for complex models such as latent class models. Jones et al. used Goodman's technique to investigate the identifiability of latent class models with applications to diagnostic tests in the absence of a gold standard test. The tool they used was based on examining the singularity of the Jacobian or the Fisher information matrix, in order to obtain insights into local identifiability (ie, there exists a neighborhood of a parameter such that no other parameter in the neighborhood leads to the same probability distribution as the parameter). In this paper, we investigate a stronger condition: global identifiability (ie, no two parameters in the parameter space give rise to the same probability distribution), by introducing a powerful mathematical tool from computational algebra: the Gröbner basis. With several existing well-known examples, we argue that the Gröbner basis method is easy to implement and powerful to study global identifiability of latent class models, and is an attractive alternative to the information matrix analysis by Rothenberg and the Jacobian analysis by Goodman and Jones et al.
Collapse
|
59
|
Tan X, Liu GF, Zeng D, Wang W, Diao G, Heyse JF, Ibrahim JG. Controlling false discovery proportion in identification of drug-related adverse events from multiple system organ classes. Stat Med 2019; 38:4378-4389. [PMID: 31313376 DOI: 10.1002/sim.8304] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 05/31/2019] [Accepted: 06/07/2019] [Indexed: 11/12/2022]
Abstract
Analyzing safety data from clinical trials to detect safety signals worth further examination involves testing multiple hypotheses, one for each observed adverse event (AE) type. There exists certain hierarchical structure for these hypotheses due to the classification of the AEs into system organ classes, and these AEs are also likely correlated. Many approaches have been proposed to identify safety signals under the multiple testing framework and tried to achieve control of false discovery rate (FDR). The FDR control concerns the expectation of the false discovery proportion (FDP). In practice, the control of the actual random variable FDP could be more relevant and has recently drawn much attention. In this paper, we proposed a two-stage procedure for safety signal detection with direct control of FDP, through a permutation-based approach for screening groups of AEs and a permutation-based approach of constructing simultaneous upper bounds for false discovery proportion. Our simulation studies showed that this new approach has controlled FDP. We demonstrate our approach using data sets derived from a drug clinical trial.
Collapse
|
60
|
Li H, Chen MH, Ibrahim JG, Kim S, Shah AK, Lin J, Tershakovec AM. Bayesian inference for network meta-regression using multivariate random effects with applications to cholesterol lowering drugs. Biostatistics 2019; 20:499-516. [PMID: 29912318 PMCID: PMC6676556 DOI: 10.1093/biostatistics/kxy014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Accepted: 03/18/2018] [Indexed: 11/13/2022] Open
Abstract
Low-density lipoprotein cholesterol (LDL-C) has been identified as a causative factor for atherosclerosis and related coronary heart disease, and as the main target for cholesterol- and lipid-lowering therapy. Statin drugs inhibit cholesterol synthesis in the liver and are typically the first line of therapy to lower elevated levels of LDL-C. On the other hand, a different drug, Ezetimibe, inhibits the absorption of cholesterol by the small intestine and provides a different mechanism of action. Many clinical trials have been carried out on safety and efficacy evaluation of cholesterol lowering drugs. To synthesize the results from different clinical trials, we examine treatment level (aggregate) network meta-data from 29 double-blind, randomized, active, or placebo-controlled statins +/$-$ Ezetimibe clinical trials on adult treatment-naïve patients with primary hypercholesterolemia. In this article, we propose a new approach to carry out Bayesian inference for arm-based network meta-regression. Specifically, we develop a new strategy of grouping the variances of random effects, in which we first formulate possible sets of the groups of the treatments based on their clinical mechanisms of action and then use Bayesian model comparison criteria to select the best set of groups. The proposed approach is especially useful when some treatment arms are involved in only a single trial. In addition, a Markov chain Monte Carlo sampling algorithm is developed to carry out the posterior computations. In particular, the correlation matrix is generated from its full conditional distribution via partial correlations. The proposed methodology is further applied to analyze the network meta-data from 29 trials with 11 treatment arms.
Collapse
|
61
|
Diao G, Ibrahim JG. Quantifying time-varying cause-specific hazard and subdistribution hazard ratios with competing risks data. Clin Trials 2019; 16:363-374. [PMID: 31165631 DOI: 10.1177/1740774519852708] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Various non-proportional hazard models have been developed in the literature for competing risks data. The regression coefficients under these models, however, typically cannot be compared directly. We propose new methods to quantify the average of the time-varying cause-specific hazard ratios and subdistribution hazard ratios through two general classes of transformations and weight functions that are chosen to reflect the relative importance of the hazard ratios in different time periods. We further propose an L∞ -norm type of test statistic that incorporates the test statistics for all possible pairs of the transformation function and weight function under consideration. Extensive simulations are conducted under various settings of the hazards and demonstrate that the proposed test performs well under all settings. An application to a clinical trial in follicular lymphoma is examined in detail.
Collapse
|
62
|
Psioda MA, Xu J, Jiang Q, Ke C, Yang Z, Ibrahim JG. Bayesian adaptive basket trial design using model averaging. Biostatistics 2019; 22:19-34. [PMID: 31107534 DOI: 10.1093/biostatistics/kxz014] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 03/04/2019] [Accepted: 03/24/2019] [Indexed: 11/13/2022] Open
Abstract
In this article, we develop a Bayesian adaptive design methodology for oncology basket trials with binary endpoints using a Bayesian model averaging framework. Most existing methods seek to borrow information based on the degree of homogeneity of estimated response rates across all baskets. In reality, an investigational product may only demonstrate activity for a subset of baskets, and the degree of activity may vary across the subset. A key benefit of our Bayesian model averaging approach is that it explicitly accounts for the possibility that any subset of baskets may have similar activity and that some may not. Our proposed approach performs inference on the basket-specific response rates by averaging over the complete model space for the response rates, which can include thousands of models. We present results that demonstrate that this computationally feasible Bayesian approach performs favorably compared to existing state-of-the-art approaches, even when held to stringent requirements regarding false positive rates.
Collapse
|
63
|
Diao G, Liu GF, Zeng D, Wang W, Tan X, Heyse JF, Ibrahim JG. Efficient methods for signal detection from correlated adverse events in clinical trials. Biometrics 2019; 75:1000-1008. [PMID: 30690717 DOI: 10.1111/biom.13031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 01/15/2019] [Indexed: 11/27/2022]
Abstract
It is an important and yet challenging task to identify true signals from many adverse events that may be reported during the course of a clinical trial. One unique feature of drug safety data from clinical trials, unlike data from post-marketing spontaneous reporting, is that many types of adverse events are reported by only very few patients leading to rare events. Due to the limited study size, the p-values of testing whether the rate is higher in the treatment group across all types of adverse events are in general not uniformly distributed under the null hypothesis that there is no difference between the treatment group and the placebo group. A consequence is that typically fewer than 100 α percent of the hypotheses are rejected under the null at the nominal significance level of α . The other challenge is multiplicity control. Adverse events from the same body system may be correlated. There may also be correlations between adverse events from different body systems. To tackle these challenging issues, we develop Monte-Carlo-based methods for the signal identification from patient-reported adverse events in clinical trials. The proposed methodologies account for the rare events and arbitrary correlation structures among adverse events within and/or between body systems. Extensive simulation studies demonstrate that the proposed method can accurately control the family-wise error rate and is more powerful than existing methods under many practical situations. Application to two real examples is provided.
Collapse
|
64
|
Ma X, Lian Q, Chu H, Ibrahim JG, Chen Y. A Bayesian hierarchical model for network meta-analysis of multiple diagnostic tests. Biostatistics 2019; 19:87-102. [PMID: 28586407 DOI: 10.1093/biostatistics/kxx025] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 03/18/2017] [Indexed: 11/13/2022] Open
Abstract
To compare the accuracy of multiple diagnostic tests in a single study, three designs are commonly used (i) the multiple test comparison design; (ii) the randomized design, and (iii) the non-comparative design. Existing meta-analysis methods of diagnostic tests (MA-DT) have been focused on evaluating the performance of a single test by comparing it with a reference test. The increasing number of available diagnostic instruments for a disease condition and the different study designs being used have generated the need to develop efficient and flexible meta-analysis framework to combine all designs for simultaneous inference. In this article, we develop a missing data framework and a Bayesian hierarchical model for network MA-DT (NMA-DT) and offer important promises over traditional MA-DT: (i) It combines studies using all three designs; (ii) It pools both studies with or without a gold standard; (iii) it combines studies with different sets of candidate tests; and (iv) it accounts for heterogeneity across studies and complex correlation structure among multiple tests. We illustrate our method through a case study: network meta-analysis of deep vein thrombosis tests.
Collapse
|
65
|
Abstract
In this paper, we propose the hard thresholding regression (HTR) for estimating high-dimensional sparse linear regression models. HTR uses a two-stage convex algorithm to approximate the ℓ 0-penalized regression: The first stage calculates a coarse initial estimator, and the second stage identifies the oracle estimator by borrowing information from the first one. Theoretically, the HTR estimator achieves the strong oracle property over a wide range of regularization parameters. Numerical examples and a real data example lend further support to our proposed methodology.
Collapse
|
66
|
|
67
|
O'Brien JJ, Gunawardena HP, Paulo JA, Chen X, Ibrahim JG, Gygi SP, Qaqish BF. The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments. Ann Appl Stat 2018; 12:2075-2095. [PMID: 30473739 PMCID: PMC6249692 DOI: 10.1214/18-aoas1144] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
An idealized version of a label-free discovery mass spectrometry proteomics experiment would provide absolute abundance measurements for a whole proteome, across varying conditions. Unfortunately, this ideal is not realized. Measurements are made on peptides requiring an inferential step to obtain protein level estimates. The inference is complicated by experimental factors that necessitate relative abundance estimation and result in widespread non-ignorable missing data. Relative abundance on the log scale takes the form of parameter contrasts. In a complete-case analysis, contrast estimates may be biased by missing data and a substantial amount of useful information will often go unused. To avoid problems with missing data, many analysts have turned to single imputation solutions. Unfortunately, these methods often create further difficulties by hiding inestimable contrasts, preventing the recovery of interblock information and failing to account for imputation uncertainty. To mitigate many of the problems caused by missing values, we propose the use of a Bayesian selection model. Our model is tested on simulated data, real data with simulated missing values, and on a ground truth dilution experiment where all of the true relative changes are known. The analysis suggests that our model, compared with various imputation strategies and complete-case analyses, can increase accuracy and provide substantial improvements to interval coverage.
Collapse
|
68
|
Psioda MA, Soukup M, Ibrahim JG. A practical Bayesian adaptive design incorporating data from historical controls. Stat Med 2018. [PMID: 30033617 DOI: 10.1002/sim.7897.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In this paper, we develop the fixed-borrowing adaptive design, a Bayesian adaptive design which facilitates information borrowing from a historical trial using subject-level control data while assuring a reasonable upper bound on the maximum type I error rate and lower bound on the minimum power. First, one constructs an informative power prior from the historical data to be used for design and analysis of the new trial. At an interim analysis opportunity, one evaluates the degree of prior-data conflict. If there is too much conflict between the new trial data and the historical control data, the prior information is discarded and the study proceeds to the final analysis opportunity at which time a noninformative prior is used for analysis. Otherwise, the trial is stopped early and the informative power prior is used for analysis. Simulation studies are used to calibrate the early stopping rule. The proposed design methodology seamlessly accommodates covariates in the statistical model, which the authors argue is necessary to justify borrowing information from historical controls. Implementation of the proposed methodology is straightforward for many common data models, including linear regression models, generalized linear regression models, and proportional hazards models. We demonstrate the methodology to design a cardiovascular outcomes trial for a hypothetical new therapy for treatment of type 2 diabetes mellitus and borrow information from the SAVOR trial, one of the earliest cardiovascular outcomes trials designed to assess cardiovascular risk in antidiabetic therapies.
Collapse
|
69
|
Ibrahim JG, Kim S, Chen MH, Shah AK, Lin J. Bayesian multivariate skew meta-regression models for individual patient data. Stat Methods Med Res 2018; 28:3415-3436. [PMID: 30309294 DOI: 10.1177/0962280218801147] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We examine a class of multivariate meta-regression models in the presence of individual patient data. The methodology is well motivated from several studies of cholesterol-lowering drugs where the goal is to jointly analyze the multivariate outcomes, low density lipoprotein cholesterol, high density lipoprotein cholesterol, and triglycerides. These three continuous outcome measures are correlated and shed much light on a subject's lipid status. One of the main goals in lipid research is the joint analysis of these three outcome measures in a meta-regression setting. Since these outcome measures are not typically multivariate normal, one must consider classes of distributions that allow for skewness in one or more of the outcomes. In this paper, we consider a new general class of multivariate skew distributions for multivariate meta-regression and examine their theoretical properties. Using these distributions, we construct a Bayesian model for the meta-data and develop an efficient Markov chain Monte Carlo computational scheme for carrying out the computations. In addition, we develop a multivariate L measure for model comparison, Bayesian residuals for model assessment, and a Bayesian procedure for detecting outlying trials. The proposed multivariate L measure, Bayesian residuals, and Bayesian outlying trial detection procedure are particularly suitable and computationally attractive in the multivariate meta-regression setting. A detailed case study demonstrating the usefulness of the proposed methodology is carried out in an individual patient data multivariate meta-regression setting using 26 pivotal Merck clinical trials that compare statins (cholesterol-lowering drugs) in combination with ezetimibe and statins alone on treatment-naïve patients and those continuing on statins at baseline.
Collapse
|
70
|
Gao F, Zeng D, Wei H, Wang X, Ibrahim JG. Estimating Treatment Effects for Recurrent Events in the Presence of Rescue Medications: An Application to the Immune Thrombocytopenia Study. STATISTICS IN BIOSCIENCES 2018; 10:473-489. [PMID: 30298095 DOI: 10.1007/s12561-016-9164-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
In many clinical studies, patients may experience the same type of event of interest repeatedly over time. However, the assessment of treatment effects is often complicated by the rescue medication uses due to ethical reasons. For example, in the motivating trial in studying the Immune Thrombocytopenia (ITP), when the interest lies in evaluating the treatment benefit of investigational product (IP) on reducing patient's repeated bleeding, rescue medication such as platelet transfusions may be allowed to raise platelet counts. Both the intention-to-treat analysis and treating the intermediate rescue medication as covariate tend to attenuate the treatment benefit, and the estimates can be biased if interpreted as causal. In this paper, we propose a general causal framework when intermediate rescue medications are informative. We adopt the inverse weighted estimation approach to estimate the treatment effect, where weights are constructed to reflect time-dependent medication use probabilities. The proposed estimators are shown to be asymptotically normal and are demonstrated to perform well in small-sample simulation studies. The application to the ITP studies reveals a stronger benefit of using IP in reducing bleeding.
Collapse
|
71
|
Li T, Xie F, Feng X, Ibrahim JG, Zhu H. Functional Linear Regression Models for Nonignorable Missing Scalar Responses. Stat Sin 2018; 28:1867-1886. [PMID: 30344426 PMCID: PMC6191855 DOI: 10.5705/ss.202016.0350] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
As an important part of modern health care, medical imaging data, which can be regarded as densely sampled functional data, have been widely used for diagnosis, screening, treatment, and prognosis, such as finding breast cancer through mammograms. The aim of this paper is to propose a functional linear regression model for using functional (or imaging) predictors to predict clinical outcomes (e.g., disease status), while addressing missing clinical outcomes. We introduce an exponential tilting semiparametric model to account for the nonignorable missing data mechanism. We develop a set of estimating equations and its associated computational methods for both parameter estimation and the selection of the tuning parameters. We also propose a bootstrap resampling procedure for carrying out statistical inference. Under some regularity conditions, we systematically establish the asymptotic properties (e.g., consistency and convergence rate) of the estimates calculated from the proposed estimating equations. Simulation studies and a real data analysis are used to illustrate the finite sample performance of the proposed methods.
Collapse
|
72
|
Miranda MF, Zhu H, Ibrahim JG. TPRM: TENSOR PARTITION REGRESSION MODELS WITH APPLICATIONS IN IMAGING BIOMARKER DETECTION. Ann Appl Stat 2018; 12:1422-1450. [PMID: 30416640 PMCID: PMC6221472 DOI: 10.1214/17-aoas1116] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Medical imaging studies have collected high dimensional imaging data to identify imaging biomarkers for diagnosis, screening, and prognosis, among many others. These imaging data are often represented in the form of a multi-dimensional array, called a tensor. The aim of this paper is to develop a tensor partition regression modeling (TPRM) framework to establish a relationship between low-dimensional clinical outcomes (e.g., diagnosis) and high dimensional tensor covariates. Our TPRM is a hierarchical model and efficiently integrates four components: (i) a partition model, (ii) a canonical polyadic decomposition model, (iii) a principal components model, and (iv) a generalized linear model with a sparse inducing normal mixture prior. This framework not only reduces ultra-high dimensionality to a manageable level, resulting in efficient estimation, but also optimizes prediction accuracy in the search for informative subtensors. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. Simulation shows that TPRM outperforms several other competing methods. We apply TPRM to predict disease status (Alzheimer versus control) by using structural magnetic resonance imaging data obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study.
Collapse
|
73
|
Yang H, Zhu H, Ibrahim JG. MILFM: Multiple index latent factor model based on high-dimensional features. Biometrics 2018; 74:834-844. [PMID: 29665616 PMCID: PMC6158073 DOI: 10.1111/biom.12866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Revised: 11/01/2018] [Accepted: 11/01/2017] [Indexed: 11/30/2022]
Abstract
The aim of this article is to develop a multiple-index latent factor modeling (MILFM) framework to build an accurate prediction model for clinical outcomes based on a massive number of features. We develop a three-stage estimation procedure to build the prediction model. MILFM uses an independent screening method to select a set of informative features, which may have a complex nonlinear relationship with the outcome variables. Moreover, we develop a latent factor model to project all informative predictors onto a small number of local subspaces, which lead to a few key features that capture reliable and informative covariate information. Finally, we fit the regularized empirical estimate to those key features in order to accurately predict clinical outcomes. We systematically investigate the theoretical properties of MILFM, such as risk bounds and selection consistency. Our simulation results and real data analysis show that MILFM outperforms many state-of-the-art methods in terms of prediction accuracy.
Collapse
|
74
|
Psioda MA, Soukup M, Ibrahim JG. A practical Bayesian adaptive design incorporating data from historical controls. Stat Med 2018; 37:4054-4070. [PMID: 30033617 DOI: 10.1002/sim.7897] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2017] [Revised: 05/24/2018] [Accepted: 06/13/2018] [Indexed: 12/12/2022]
Abstract
In this paper, we develop the fixed-borrowing adaptive design, a Bayesian adaptive design which facilitates information borrowing from a historical trial using subject-level control data while assuring a reasonable upper bound on the maximum type I error rate and lower bound on the minimum power. First, one constructs an informative power prior from the historical data to be used for design and analysis of the new trial. At an interim analysis opportunity, one evaluates the degree of prior-data conflict. If there is too much conflict between the new trial data and the historical control data, the prior information is discarded and the study proceeds to the final analysis opportunity at which time a noninformative prior is used for analysis. Otherwise, the trial is stopped early and the informative power prior is used for analysis. Simulation studies are used to calibrate the early stopping rule. The proposed design methodology seamlessly accommodates covariates in the statistical model, which the authors argue is necessary to justify borrowing information from historical controls. Implementation of the proposed methodology is straightforward for many common data models, including linear regression models, generalized linear regression models, and proportional hazards models. We demonstrate the methodology to design a cardiovascular outcomes trial for a hypothetical new therapy for treatment of type 2 diabetes mellitus and borrow information from the SAVOR trial, one of the earliest cardiovascular outcomes trials designed to assess cardiovascular risk in antidiabetic therapies.
Collapse
|
75
|
Psioda MA, Ibrahim JG. Bayesian design of a survival trial with a cured fraction using historical data. Stat Med 2018; 37:3814-3831. [PMID: 29938817 DOI: 10.1002/sim.7846] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Revised: 02/14/2018] [Accepted: 05/12/2018] [Indexed: 11/06/2022]
Abstract
In this paper, we develop a general Bayesian clinical trial design methodology, tailored for time-to-event trials with a cured fraction in scenarios where a previously completed clinical trial is available to inform the design and analysis of the new trial. Our methodology provides a conceptually appealing and computationally feasible framework that allows one to construct a fixed, maximally informative prior a priori while simultaneously identifying the minimum sample size required for the new trial so that the design has high power and reasonable type I error control from a Bayesian perspective. This strategy is particularly well suited for scenarios where adaptive borrowing approaches are not practical due to the nature of the trial, complexity of the model, or the source of the prior information. Control of a Bayesian type I error rate offers a sensible balance between wanting to use high-quality information in the design and analysis of future trials while still controlling type I errors in an equitable way. Moreover, sample size determination based on our Bayesian view of power can lead to a more adequately sized trial by virtue of taking into account all the uncertainty in the treatment effect. We demonstrate our methodology by designing a cancer clinical trial in high-risk melanoma.
Collapse
|