1
|
Menacher A, Nichols TE, Holmes C, Ganjgahi H. Bayesian Lesion Estimation with a Structured Spike-and-Slab Prior. J Am Stat Assoc 2024; 119:66-80. [PMID: 39132605 PMCID: PMC11315456 DOI: 10.1080/01621459.2023.2278201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 10/24/2023] [Indexed: 08/13/2024]
Abstract
Neural demyelination and brain damage accumulated in white matter appear as hyperintense areas on T2-weighted MRI scans in the form of lesions. Modeling binary images at the population level, where each voxel represents the existence of a lesion, plays an important role in understanding aging and inflammatory diseases. We propose a scalable hierarchical Bayesian spatial model, called BLESS, capable of handling binary responses by placing continuous spike-and-slab mixture priors on spatially-varying parameters and enforcing spatial dependency on the parameter dictating the amount of sparsity within the probability of inclusion. The use of mean-field variational inference with dynamic posterior exploration, which is an annealing-like strategy that improves optimization, allows our method to scale to large sample sizes. Our method also accounts for underestimation of posterior variance due to variational inference by providing an approximate posterior sampling approach based on Bayesian bootstrap ideas and spike-and-slab priors with random shrinkage targets. Besides accurate uncertainty quantification, this approach is capable of producing novel cluster size based imaging statistics, such as credible intervals of cluster size, and measures of reliability of cluster occurrence. Lastly, we validate our results via simulation studies and an application to the UK Biobank, a large-scale lesion mapping study with a sample size of 40,000 subjects.
Collapse
Affiliation(s)
| | | | | | - Habib Ganjgahi
- Department of Statistics, University of Oxford
- Nuffield Department of Population Health, University of Oxford
| |
Collapse
|
2
|
San Valentin EMD, Do KA, Yeung SCJ, Reyes-Gibby CC. Attempts to Understand Oral Mucositis in Head and Neck Cancer Patients through Omics Studies: A Narrative Review. Int J Mol Sci 2023; 24:16995. [PMID: 38069314 PMCID: PMC10706892 DOI: 10.3390/ijms242316995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 11/27/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
Oral mucositis (OM) is a common and clinically impactful side effect of cytotoxic cancer treatment, particularly in patients with head and neck squamous cell carcinoma (HNSCC) who undergo radiotherapy with or without concomitant chemotherapy. The etiology and pathogenic mechanisms of OM are complex, multifaceted and elicit both direct and indirect damage to the mucosa. In this narrative review, we describe studies that use various omics methodologies (genomics, transcriptomics, microbiomics and metabolomics) in attempts to elucidate the biological pathways associated with the development or severity of OM. Integrating different omics into multi-omics approaches carries the potential to discover links among host factors (genomics), host responses (transcriptomics, metabolomics), and the local environment (microbiomics).
Collapse
Affiliation(s)
- Erin Marie D. San Valentin
- Department of Emergency Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Department of Interventional Radiology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Kim-Anh Do
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Sai-Ching J. Yeung
- Department of Emergency Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Cielito C. Reyes-Gibby
- Department of Emergency Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
3
|
Liang M, Koslovsky MD, Hébert ET, Kendzor DE, Businelle MS, Vannucci M. Bayesian continuous-time hidden Markov models with covariate selection for intensive longitudinal data with measurement error. Psychol Methods 2023; 28:880-894. [PMID: 34928674 PMCID: PMC9207158 DOI: 10.1037/met0000433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Intensive longitudinal data collected with ecological momentary assessment methods capture information on participants' behaviors, feelings, and environment in near real-time. While these methods can reduce recall biases typically present in survey data, they may still suffer from other biases commonly found in self-reported data (e.g., measurement error and social desirability bias). To accommodate potential biases, we develop a Bayesian hidden Markov model to simultaneously identify risk factors for subjects transitioning between discrete latent states as well as risk factors potentially associated with them misreporting their true behaviors. We use simulated data to demonstrate how ignoring potential measurement error can negatively affect variable selection performance and estimation accuracy. We apply our proposed model to smartphone-based ecological momentary assessment data collected within a randomized controlled trial that evaluated the impact of incentivizing abstinence from cigarette smoking among socioeconomically disadvantaged adults. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
| | | | - Emily T. Hébert
- Department of Health Promotion and Behavioral Sciences, University of Texas Health Science Center at Austin (UTHealth) School of Public Health
| | - Darla E. Kendzor
- Department of Family and Preventive Medicine, University of Oklahoma Health Sciences Center
| | - Michael S. Businelle
- Department of Family and Preventive Medicine, University of Oklahoma Health Sciences Center
| | | |
Collapse
|
4
|
Bodnar O, Touli EF. Exact test theory in Gaussian graphical models. J MULTIVARIATE ANAL 2023. [DOI: 10.1016/j.jmva.2023.105185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
5
|
Wang ET, Chiang S, Haneef Z, Rao VR, Moss R, Vannucci M. BAYESIAN NON-HOMOGENEOUS HIDDEN MARKOV MODEL WITH VARIABLE SELECTION FOR INVESTIGATING DRIVERS OF SEIZURE RISK CYCLING. Ann Appl Stat 2023; 17:333-356. [PMID: 38486612 PMCID: PMC10939012 DOI: 10.1214/22-aoas1630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
A major issue in the clinical management of epilepsy is the unpredictability of seizures. Yet, traditional approaches to seizure forecasting and risk assessment in epilepsy rely heavily on raw seizure frequencies, which are a stochastic measurement of seizure risk. We consider a Bayesian non-homogeneous hidden Markov model for unsupervised clustering of zero-inflated seizure count data. The proposed model allows for a probabilistic estimate of the sequence of seizure risk states at the individual level. It also offers significant improvement over prior approaches by incorporating a variable selection prior for the identification of clinical covariates that drive seizure risk changes and accommodating highly granular data. For inference, we implement an efficient sampler that employs stochastic search and data augmentation techniques. We evaluate model performance on simulated seizure count data. We then demonstrate the clinical utility of the proposed model by analyzing daily seizure count data from 133 patients with Dravet syndrome collected through the Seizure Tracker™ system, a patient-reported electronic seizure diary. We report on the dynamics of seizure risk cycling, including validation of several known pharmacologic relationships. We also uncover novel findings characterizing the presence and volatility of risk states in Dravet syndrome, which may directly inform counseling to reduce the unpredictability of seizures for patients with this devastating cause of epilepsy.
Collapse
|
6
|
Zhou J, Hoen AG, Mcritchie S, Pathmasiri W, Viles WD, Nguyen QP, Madan JC, Dade E, Karagas MR, Gui J. Information enhanced model selection for Gaussian graphical model with application to metabolomic data. Biostatistics 2022; 23:926-948. [PMID: 33720330 PMCID: PMC9608647 DOI: 10.1093/biostatistics/kxab006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 11/12/2022] Open
Abstract
In light of the low signal-to-noise nature of many large biological data sets, we propose a novel method to learn the structure of association networks using Gaussian graphical models combined with prior knowledge. Our strategy includes two parts. In the first part, we propose a model selection criterion called structural Bayesian information criterion, in which the prior structure is modeled and incorporated into Bayesian information criterion. It is shown that the popular extended Bayesian information criterion is a special case of structural Bayesian information criterion. In the second part, we propose a two-step algorithm to construct the candidate model pool. The algorithm is data-driven and the prior structure is embedded into the candidate model automatically. Theoretical investigation shows that under some mild conditions structural Bayesian information criterion is a consistent model selection criterion for high-dimensional Gaussian graphical model. Simulation studies validate the superiority of the proposed algorithm over the existing ones and show the robustness to the model misspecification. Application to relative concentration data from infant feces collected from subjects enrolled in a large molecular epidemiological cohort study validates that metabolic pathway involvement is a statistically significant factor for the conditional dependence between metabolites. Furthermore, new relationships among metabolites are discovered which can not be identified by the conventional methods of pathway analysis. Some of them have been widely recognized in biological literature.
Collapse
Affiliation(s)
- Jie Zhou
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Anne G Hoen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Susan Mcritchie
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Wimal Pathmasiri
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Weston D Viles
- Department of Mathematics and Statistics, University of Southern Maine, 96 Falmouth St, Portland, ME 04103, USA
| | - Quang P Nguyen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Juliette C Madan
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Erika Dade
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Margaret R Karagas
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| |
Collapse
|
7
|
Denis M, Varghese RS, Barefoot ME, Tadesse MG, Ressom HW. A Bayesian two-step integrative procedure incorporating prior knowledge for the identification of miRNA-mRNAs involved in hepatocellular carcinoma. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:81-86. [PMID: 36085997 PMCID: PMC9473151 DOI: 10.1109/embc48229.2022.9871330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Recent studies have confirmed the role of miRNA regulation of gene expression in oncogenesis for various cancers. In parallel, prior knowledge about relationships between miRNA and mRNA have been accumulated from biological experiments or statistical analyses. Improved identification of disease-associated miRNA-mRNA pairs may be achieved by incorporating prior knowledge into integrative genomic analyses. In this study we focus on 39 patients with hepatocellular carcinoma (HCC) and 25 patients with liver cirrhosis and use a flexible Bayesian two-step integrative method. We found 66 significant miRNA-mRNA pairs, several of which contain molecules that have previously been identified as potential biomarkers. These results demonstrate the utility of the proposed approach in providing a better understanding of relationships between different biological levels, thereby giving insights into the biological mechanisms underlying the diseases, while providing a better selection of biomarkers that may serve as diagnostic, prognostic, or therapeutic biomarker candidates.
Collapse
|
8
|
Yousef M, Goy G, Bakir-Gungor B. miRModuleNet: Detecting miRNA-mRNA Regulatory Modules. Front Genet 2022; 13:767455. [PMID: 35495139 PMCID: PMC9039401 DOI: 10.3389/fgene.2022.767455] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 03/24/2022] [Indexed: 12/13/2022] Open
Abstract
Increasing evidence that microRNAs (miRNAs) play a key role in carcinogenesis has revealed the need for elucidating the mechanisms of miRNA regulation and the roles of miRNAs in gene-regulatory networks. A better understanding of the interactions between miRNAs and their mRNA targets will provide a better understanding of the complex biological processes that occur during carcinogenesis. Increased efforts to reveal these interactions have led to the development of a variety of tools to detect and understand these interactions. We have recently described a machine learning approach miRcorrNet, based on grouping and scoring (ranking) groups of genes, where each group is associated with a miRNA and the group members are genes with expression patterns that are correlated with this specific miRNA. The miRcorrNet tool requires two types of -omics data, miRNA and mRNA expression profiles, as an input file. In this study we describe miRModuleNet, which groups mRNA (genes) that are correlated with each miRNA to form a star shape, which we identify as a miRNA-mRNA regulatory module. A scoring procedure is then applied to each module to further assess their contribution in terms of classification. An important output of miRModuleNet is that it provides a hierarchical list of significant miRNA-mRNA regulatory modules. miRModuleNet was further validated on external datasets for their disease associations, and functional enrichment analysis was also performed. The application of miRModuleNet aids the identification of functional relationships between significant biomarkers and reveals essential pathways involved in cancer pathogenesis. The miRModuleNet tool and all other supplementary files are available at https://github.com/malikyousef/miRModuleNet/
Collapse
Affiliation(s)
- Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- *Correspondence: Malik Yousef,
| | - Gokhan Goy
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
- The Scientific and Technological Research Council of Turkey, Ankara, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| |
Collapse
|
9
|
Osborne N, Peterson CB, Vannucci M. Latent Network Estimation and Variable Selection for Compositional Data Via Variational EM. J Comput Graph Stat 2022; 31:163-175. [PMID: 36776345 PMCID: PMC9909885 DOI: 10.1080/10618600.2021.1935971] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Network estimation and variable selection have been extensively studied in the statistical literature, but only recently have those two challenges been addressed simultaneously. In this article, we seek to develop a novel method to simultaneously estimate network interactions and associations to relevant covariates for count data, and specifically for compositional data, which have a fixed sum constraint. We use a hierarchical Bayesian model with latent layers and employ spike-and-slab priors for both edge and covariate selection. For posterior inference, we develop a novel variational inference scheme with an expectation-maximization step, to enable efficient estimation. Through simulation studies, we demonstrate that the proposed model outperforms existing methods in its accuracy of network recovery. We show the practical utility of our model via an application to microbiome data. The human microbiome has been shown to contribute too many of the functions of the human body, and also to be linked with a number of diseases. In our application, we seek to better understand the interaction between microbes and relevant covariates, as well as the interaction of microbes with each other. We call our algorithm simultaneous inference for networks and covariates and provide a Python implementation, which is available online.
Collapse
Affiliation(s)
| | - Christine B. Peterson
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX
| | | |
Collapse
|
10
|
Ni Y, Baladandayuthapani V, Vannucci M, Stingo FC. Bayesian graphical models for modern biological applications. STAT METHOD APPL-GER 2021; 31:197-225. [PMID: 35673326 PMCID: PMC9165295 DOI: 10.1007/s10260-021-00572-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/16/2021] [Indexed: 12/14/2022]
Abstract
Graphical models are powerful tools that are regularly used to investigate complex dependence structures in high-throughput biomedical datasets. They allow for holistic, systems-level view of the various biological processes, for intuitive and rigorous understanding and interpretations. In the context of large networks, Bayesian approaches are particularly suitable because it encourages sparsity of the graphs, incorporate prior information, and most importantly account for uncertainty in the graph structure. These features are particularly important in applications with limited sample size, including genomics and imaging studies. In this paper, we review several recently developed techniques for the analysis of large networks under non-standard settings, including but not limited to, multiple graphs for data observed from multiple related subgroups, graphical regression approaches used for the analysis of networks that change with covariates, and other complex sampling and structural settings. We also illustrate the practical utility of some of these methods using examples in cancer genomics and neuroimaging.
Collapse
Affiliation(s)
- Yang Ni
- Department of Statistics, Texas A&M University, College Station, USA
| | | | | | - Francesco C. Stingo
- Department of Statistics, Computer Science, Applications “G. Parenti”, The University of Florence, Florence, Italy
| |
Collapse
|
11
|
Hinoveanu LC, Leisen F, Villa C. A loss‐based prior for Gaussian graphical models. AUST NZ J STAT 2021. [DOI: 10.1111/anzs.12307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Laurenţiu Cătălin Hinoveanu
- School of Mathematics, Statistics and Actuarial Science University of Kent Sibson Building Canterbury CT2 7FSUK
| | - Fabrizio Leisen
- School of Mathematical Sciences University of Nottingham University Park Nottingham NG7 2RDUK
| | - Cristiano Villa
- School of Mathematics, Statistics and Physics Newcastle University Herschel Building Newcastle NE1 7RUUK
| |
Collapse
|
12
|
Paul S, Madhumita. Pattern Recognition Algorithms for Multi-Omics Data Analysis. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11538-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
13
|
Castelletti F, La Rocca L, Peluso S, Stingo FC, Consonni G. Bayesian learning of multiple directed networks from observational data. Stat Med 2020; 39:4745-4766. [PMID: 32969059 DOI: 10.1002/sim.8751] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 06/29/2020] [Accepted: 08/25/2020] [Indexed: 11/08/2022]
Abstract
Graphical modeling represents an established methodology for identifying complex dependencies in biological networks, as exemplified in the study of co-expression, gene regulatory, and protein interaction networks. The available observations often exhibit an intrinsic heterogeneity, which impacts on the network structure through the modification of specific pathways for distinct groups, such as disease subtypes. We propose to infer the resulting multiple graphs jointly in order to benefit from potential similarities across groups; on the other hand our modeling framework is able to accommodate group idiosyncrasies. We consider directed acyclic graphs (DAGs) as network structures, and develop a Bayesian method for structural learning of multiple DAGs. We explicitly account for Markov equivalence of DAGs, and propose a suitable prior on the collection of graph spaces that induces selective borrowing strength across groups. The resulting inference allows in particular to compute the posterior probability of edge inclusion, a useful summary for representing flow directions within the network. Finally, we detail a simulation study addressing the comparative performance of our method, and present an analysis of two protein networks together with a substantive interpretation of our findings.
Collapse
Affiliation(s)
- Federico Castelletti
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| | - Luca La Rocca
- Department of Physics, Informatics and Mathematics, Università degli Studi di Modena e Reggio Emilia, Modena, Italy
| | - Stefano Peluso
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| | - Francesco C Stingo
- Department of Statistics, Computer Science, Applications "G. Parenti", Università degli Studi di Firenze, Florence, Italy
| | - Guido Consonni
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| |
Collapse
|
14
|
Koslovsky MD, Hoffman KL, Daniel CR, Vannucci M. A Bayesian model of microbiome data for simultaneous identification of covariate associations and prediction of phenotypic outcomes. Ann Appl Stat 2020. [DOI: 10.1214/20-aoas1354] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
15
|
Cremaschi A, Argiento R, Shoemaker K, Peterson C, Vannucci M. Hierarchical Normalized Completely Random Measures for Robust Graphical Modeling. BAYESIAN ANALYSIS 2019; 14:1271-1301. [PMID: 32431780 PMCID: PMC7237071 DOI: 10.1214/19-ba1153] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Gaussian graphical models are useful tools for exploring network structures in multivariate normal data. In this paper we are interested in situations where data show departures from Gaussianity, therefore requiring alternative modeling distributions. The multivariate t-distribution, obtained by dividing each component of the data vector by a gamma random variable, is a straightforward generalization to accommodate deviations from normality such as heavy tails. Since different groups of variables may be contaminated to a different extent, Finegold and Drton (2014) introduced the Dirichlet t-distribution, where the divisors are clustered using a Dirichlet process. In this work, we consider a more general class of nonparametric distributions as the prior on the divisor terms, namely the class of normalized completely random measures (NormCRMs). To improve the effectiveness of the clustering, we propose modeling the dependence among the divisors through a nonparametric hierarchical structure, which allows for the sharing of parameters across the samples in the data set. This desirable feature enables us to cluster together different components of multivariate data in a parsimonious way. We demonstrate through simulations that this approach provides accurate graphical model inference, and apply it to a case study examining the dependence structure in radiomics data derived from The Cancer Imaging Atlas.
Collapse
Affiliation(s)
- Andrea Cremaschi
- Department of Cancer Immunology, Institute of Cancer Research, Oslo University Hospital, Oslo, Norway
- Oslo Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Oslo, Norway
| | - Raffaele Argiento
- ESOMAS Department, University of Torino, Torino, Italy
- Collegio Carlo Alberto, Torino, Italy
| | - Katherine Shoemaker
- Department of Statistics, Rice University, Houston, TX, USA
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Christine Peterson
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | | |
Collapse
|
16
|
Sinclair D, Hooker G. Sparse inverse covariance estimation for high-throughput microRNA sequencing data in the Poisson log-normal graphical model. J STAT COMPUT SIM 2019. [DOI: 10.1080/00949655.2019.1657116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- David Sinclair
- Department of Statistical Science, Cornell University, Ithaca, NY, USA
| | - Giles Hooker
- Department of Statistical Science, Cornell University, Ithaca, NY, USA
| |
Collapse
|
17
|
Chakraborty S, Lozano AC. A graph Laplacian prior for Bayesian variable selection and grouping. Comput Stat Data Anal 2019. [DOI: 10.1016/j.csda.2019.01.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
18
|
Tang Z, Shen Y, Zhang X, Yi N. The spike-and-slab lasso Cox model for survival prediction and associated genes detection. Bioinformatics 2018; 33:2799-2807. [PMID: 28472220 DOI: 10.1093/bioinformatics/btx300] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Accepted: 05/05/2017] [Indexed: 12/20/2022] Open
Abstract
Motivation Large-scale molecular profiling data have offered extraordinary opportunities to improve survival prediction of cancers and other diseases and to detect disease associated genes. However, there are considerable challenges in analyzing large-scale molecular data. Results We propose new Bayesian hierarchical Cox proportional hazards models, called the spike-and-slab lasso Cox, for predicting survival outcomes and detecting associated genes. We also develop an efficient algorithm to fit the proposed models by incorporating Expectation-Maximization steps into the extremely fast cyclic coordinate descent algorithm. The performance of the proposed method is assessed via extensive simulations and compared with the lasso Cox regression. We demonstrate the proposed procedure on two cancer datasets with censored survival outcomes and thousands of molecular features. Our analyses suggest that the proposed procedure can generate powerful prognostic models for predicting cancer survival and can detect associated genes. Availability and implementation The methods have been implemented in a freely available R package BhGLM ( http://www.ssg.uab.edu/bhglm/ ). Contact nyi@uab.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zaixiang Tang
- Department of Biostatistics, School of Public Health.,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, and Center for Genetic Epidemiology and Genomics, Medical College of Soochow University, Suzhou 215123, China.,Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Yueping Shen
- Department of Biostatistics, School of Public Health.,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, and Center for Genetic Epidemiology and Genomics, Medical College of Soochow University, Suzhou 215123, China
| | - Xinyan Zhang
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Nengjun Yi
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| |
Collapse
|
19
|
Abstract
BACKGROUND Computational network biology is an emerging interdisciplinary research area. Among many other network approaches, probabilistic graphical models provide a comprehensive probabilistic characterization of interaction patterns between molecules and the associated uncertainties. RESULTS In this article, we first review graphical models, including directed, undirected, and reciprocal graphs (RG), with an emphasis on the RG models that are curiously under-utilized in biostatistics and bioinformatics literature. RG's strictly contain chain graphs as a special case and are suitable to model reciprocal causality such as feedback mechanism in molecular networks. We then extend the RG approach to modeling molecular networks by integrating DNA-, RNA- and protein-level data. We apply the extended RG method to The Cancer Genome Atlas multi-platform ovarian cancer data and reveal several interesting findings. CONCLUSIONS This study aims to review the basics of different probabilistic graphical models as well as recent development in RG approaches for network modeling. The extension presented in this paper provides a principled and efficient way of integrating DNA copy number, DNA methylation, mRNA gene expression and protein expression.
Collapse
Affiliation(s)
- Yang Ni
- Department of Statistics and Data Sciences, The University of Texas at Austin, Austin, 78712 TX USA
| | - Peter Müller
- Department of Mathematics, The University of Texas at Austin, Austin, 78712 TX USA
| | - Lin Wei
- NorthShore University HealthSystem, Evanston, 60201 IL USA
| | - Yuan Ji
- NorthShore University HealthSystem, Evanston, 60201 IL USA
- Department of Public Health Sciences, The University of Chicago, Chicago, 60637 IL USA
| |
Collapse
|
20
|
He Q, Liu Y, Sun W. Statistical analysis of non-coding RNA data. Cancer Lett 2018; 417:161-167. [PMID: 29306017 DOI: 10.1016/j.canlet.2017.12.029] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 12/13/2017] [Accepted: 12/20/2017] [Indexed: 12/15/2022]
Abstract
With rapid progress in high-throughput genome technology, the study of noncoding RNA has arisen as a highly popular topic in biomedical research. Noncoding RNA plays fundamental roles in cell proliferation, cell differentiation and epigenetic regulation, and the study of noncoding RNA will yield novel insights into gene regulation and provide new clues for disease treatment. However, due to the large volume and diverse functions of noncoding RNAs, the analysis of these RNAs has proved to be a challenging task. In this review, we review the commonly used computational tools for the identification of noncoding RNAs, and discuss popular statistical tools for their analysis. Due to the large body of noncoding RNA classes, we focus on the analysis of microRNA and long noncoding RNA, two of the most widely studied classes of noncoding RNAs. Specific examples are provided to show the context of the analysis. This review aims to provide up-to-date information on existing tools and methods for identifying and analyzing noncoding RNA.
Collapse
Affiliation(s)
- Qianchuan He
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
| | - Yang Liu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Wei Sun
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
| |
Collapse
|
21
|
Shaddox E, Stingo FC, Peterson CB, Jacobson S, Cruickshank-Quinn C, Kechris K, Bowler R, Vannucci M. A Bayesian Approach for Learning Gene Networks Underlying Disease Severity in COPD. STATISTICS IN BIOSCIENCES 2018; 10:59-85. [PMID: 33912251 DOI: 10.1007/s12561-016-9176-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
In this paper, we propose a Bayesian hierarchical approach to infer network structures across multiple sample groups where both shared and differential edges may exist across the groups. In our approach, we link graphs through a Markov random field prior. This prior on network similarity provides a measure of pairwise relatedness that borrows strength only between related groups. We incorporate the computational efficiency of continuous shrinkage priors, improving scalability for network estimation in cases of larger dimensionality. Our model is applied to patient groups with increasing levels of chronic obstructive pulmonary disease severity, with the goal of better understanding the break down of gene pathways as the disease progresses. Our approach is able to identify critical hub genes for four targeted pathways. Furthermore, it identifies gene connections that are disrupted with increased disease severity and that characterize the disease evolution. We also demonstrate the superior performance of our approach with respect to competing methods, using simulated data.
Collapse
Affiliation(s)
- Elin Shaddox
- Department of Statistics, Rice University, Houston, USA
| | - Francesco C Stingo
- Dipartimento di Statistica, Informatica, Applicazioni "G.Parenti", University of Florence, Florence, Italy
| | | | - Sean Jacobson
- Department of Medicine, National Jewish Health, Denver, CO, USA
| | - Charmion Cruickshank-Quinn
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Colorado Denver, Denver, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Denver, Denver, CO, USA
| | - Russell Bowler
- Department of Medicine, National Jewish Health, Denver, CO, USA
| | | |
Collapse
|
22
|
Modeling miRNA-mRNA interactions that cause phenotypic abnormality in breast cancer patients. PLoS One 2017; 12:e0182666. [PMID: 28793339 PMCID: PMC5549916 DOI: 10.1371/journal.pone.0182666] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 07/13/2017] [Indexed: 01/04/2023] Open
Abstract
Background The dysregulation of microRNAs (miRNAs) alters expression level of pro-oncogenic or tumor suppressive mRNAs in breast cancer, and in the long run, causes multiple biological abnormalities. Identification of such interactions of miRNA-mRNA requires integrative analysis of miRNA-mRNA expression profile data. However, current approaches have limitations to consider the regulatory relationship between miRNAs and mRNAs and to implicate the relationship with phenotypic abnormality and cancer pathogenesis. Methodology/Findings We modeled causal relationships between genomic expression and clinical data using a Bayesian Network (BN), with the goal of discovering miRNA-mRNA interactions that are associated with cancer pathogenesis. The Multiple Beam Search (MBS) algorithm learned interactions from data and discovered that hsa-miR-21, hsa-miR-10b, hsa-miR-448, and hsa-miR-96 interact with oncogenes, such as, CCND2, ESR1, MET, NOTCH1, TGFBR2 and TGFB1 that promote tumor metastasis, invasion, and cell proliferation. We also calculated Bayesian network posterior probability (BNPP) for the models discovered by the MBS algorithm to validate true models with high likelihood. Conclusion/Significance The MBS algorithm successfully learned miRNA and mRNA expression profile data using a BN, and identified miRNA-mRNA interactions that probabilistically affect breast cancer pathogenesis. The MBS algorithm is a potentially useful tool for identifying interacting gene pairs implicated by the deregulation of expression.
Collapse
|
23
|
Chu SH, Huang YT. Integrated genomic analysis of biological gene sets with applications in lung cancer prognosis. BMC Bioinformatics 2017; 18:336. [PMID: 28697753 PMCID: PMC5505153 DOI: 10.1186/s12859-017-1737-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 06/22/2017] [Indexed: 01/22/2023] Open
Abstract
Background Burgeoning interest in integrative analyses has produced a rise in studies which incorporate data from multiple genomic platforms. Literature for conducting formal hypothesis testing on an integrative gene set level is considerably sparse. This paper is biologically motivated by our interest in the joint effects of epigenetic methylation loci and their associated mRNA gene expressions on lung cancer survival status. Results We provide an efficient screening approach across multiplatform genomic data on the level of biologically related sets of genes, and our methods are applicable to various disease models regardless whether the underlying true model is known (iTEGS) or unknown (iNOTE). Our proposed testing procedure dominated two competing methods. Using our methods, we identified a total of 28 gene sets with significant joint epigenomic and transcriptomic effects on one-year lung cancer survival. Conclusions We propose efficient variance component-based testing procedures to facilitate the joint testing of multiplatform genomic data across an entire gene set. The testing procedure for the gene set is self-contained, and can easily be extended to include more or different genetic platforms. iTEGS and iNOTE implemented in R are freely available through the inote package at https://cran.r-project.org//. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1737-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Su Hee Chu
- Department of Epidemiology, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital Harvard Medical School, 181 Longwood Ave, Boston, MA, USA
| | - Yen-Tsung Huang
- Department of Epidemiology, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA. .,Department of Biostatistics, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA. .,Institute of Statistical Science, Academia Sinica, No. 128, Section 2, Academia Rd, Taipei City, Taiwan.
| |
Collapse
|
24
|
Morris JS, Baladandayuthapani V. Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration. STAT MODEL 2017; 17:245-289. [PMID: 29129969 PMCID: PMC5679480 DOI: 10.1177/1471082x17698255] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The advent of high-throughput multi-platform genomics technologies providing whole-genome molecular summaries of biological samples has revolutionalized biomedical research. These technologiees yield highly structured big data, whose analysis poses significant quantitative challenges. The field of Bioinformatics has emerged to deal with these challenges, and is comprised of many quantitative and biological scientists working together to effectively process these data and extract the treasure trove of information they contain. Statisticians, with their deep understanding of variability and uncertainty quantification, play a key role in these efforts. In this article, we attempt to summarize some of the key contributions of statisticians to bioinformatics, focusing on four areas: (1) experimental design and reproducibility, (2) preprocessing and feature extraction, (3) unified modeling, and (4) structure learning and integration. In each of these areas, we highlight some key contributions and try to elucidate the key statistical principles underlying these methods and approaches. Our goals are to demonstrate major ways in which statisticians have contributed to bioinformatics, encourage statisticians to get involved early in methods development as new technologies emerge, and to stimulate future methodological work based on the statistical principles elucidated in this article and utilizing all availble information to uncover new biological insights.
Collapse
Affiliation(s)
- Jeffrey S Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
| | | |
Collapse
|
25
|
Ni Y, Stingo FC, Baladandayuthapani V. Sparse Multi-Dimensional Graphical Models: A Unified Bayesian Framework. J Am Stat Assoc 2017. [DOI: 10.1080/01621459.2016.1167694] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
- Yang Ni
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX
- Department of Statistics and Data Sciences, The University of Texas at Austin, Austin, TX
| | - Francesco C. Stingo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX
- Dipartimento di Statistica, Informatica, Applicazioni “G.Parenti,” University of Florence, Florence, Italy
| | | |
Collapse
|
26
|
Zhao Y, Chung M, Johnson BA, Moreno CS, Long Q. Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence. J Am Stat Assoc 2017; 111:1427-1439. [PMID: 28435175 DOI: 10.1080/01621459.2016.1164051] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Our work is motivated by a prostate cancer study aimed at identifying mRNA and miRNA biomarkers that are predictive of cancer recurrence after prostatectomy. It has been shown in the literature that incorporating known biological information on pathway memberships and interactions among biomarkers improves feature selection of high-dimensional biomarkers in relation to disease risk. Biological information is often represented by graphs or networks, in which biomarkers are represented by nodes and interactions among them are represented by edges; however, biological information is often not fully known. For example, the role of microRNAs (miRNAs) in regulating gene expression is not fully understood and the miRNA regulatory network is not fully established, in which case new strategies are needed for feature selection. To this end, we treat unknown biological information as missing data (i.e., missing edges in graphs), different from commonly encountered missing data problems where variable values are missing. We propose a new concept of imputing unknown biological information based on observed data and define the imputed information as the novel biological information. In addition, we propose a hierarchical group penalty to encourage sparsity and feature selection at both the pathway level and the within-pathway level, which, combined with the imputation step, allows for incorporation of known and novel biological information. While it is applicable to general regression settings, we develop and investigate the proposed approach in the context of semiparametric accelerated failure time models motivated by our data example. Data application and simulation studies show that incorporation of novel biological information improves performance in risk prediction and feature selection and the proposed penalty outperforms the extensions of several existing penalties.
Collapse
Affiliation(s)
- Yize Zhao
- Postdoctoral Fellow, Statistical and Applied Mathematical Sciences Institute, Research Triangle Park, NC 27709
| | - Matthias Chung
- Assistant Professor, Department of Mathematics, Virginia Tech, Blacksburg, VA 24061
| | - Brent A Johnson
- Associate Professor, Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642
| | - Carlos S Moreno
- Associate Professor, Department of Pathology and Laboratory Medicine
| | - Qi Long
- Associate Professor, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322
| |
Collapse
|
27
|
Chekouo T, Stingo FC, Doecke JD, Do KA. A Bayesian integrative approach for multi-platform genomic data: A kidney cancer case study. Biometrics 2016; 73:615-624. [DOI: 10.1111/biom.12587] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 05/01/2016] [Accepted: 08/01/2016] [Indexed: 12/17/2022]
Affiliation(s)
- Thierry Chekouo
- Department of Mathematics and Statistics, University of Minnesota Duluth; Duluth, MN 55812 USA
| | - Francesco C. Stingo
- Dipartimento di Statistica, Informatica, Applicazioni “G.Parenti”, University of Florence; 50134 Florence Italy
| | - James D. Doecke
- CSIRO Health and Biosecurity/Australian e-Health Research Center Level 5; Queensland 4029 Australia
| | - Kim-Anh Do
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center; Houston, TX 77030 USA
| |
Collapse
|
28
|
A Meta-Path-Based Prediction Method for Human miRNA-Target Association. BIOMED RESEARCH INTERNATIONAL 2016; 2016:7460740. [PMID: 27703979 PMCID: PMC5040835 DOI: 10.1155/2016/7460740] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Revised: 08/14/2016] [Accepted: 08/21/2016] [Indexed: 01/21/2023]
Abstract
MicroRNAs (miRNAs) are short noncoding RNAs that play important roles in regulating gene expressing, and the perturbed miRNAs are often associated with development and tumorigenesis as they have effects on their target mRNA. Predicting potential miRNA-target associations from multiple types of genomic data is a considerable problem in the bioinformatics research. However, most of the existing methods did not fully use the experimentally validated miRNA-mRNA interactions. Here, we developed RMLM and RMLMSe to predict the relationship between miRNAs and their targets. RMLM and RMLMSe are global approaches as they can reconstruct the missing associations for all the miRNA-target simultaneously and RMLMSe demonstrates that the integration of sequence information can improve the performance of RMLM. In RMLM, we use RM measure to evaluate different relatedness between miRNA and its target based on different meta-paths; logistic regression and MLE method are employed to estimate the weight of different meta-paths. In RMLMSe, sequence information is utilized to improve the performance of RMLM. Here, we carry on fivefold cross validation and pathway enrichment analysis to prove the performance of our methods. The fivefold experiments show that our methods have higher AUC scores compared with other methods and the integration of sequence information can improve the performance of miRNA-target association prediction.
Collapse
|
29
|
Chekouo T, Stingo FC, Guindani M, Do KA. A Bayesian predictive model for imaging genetics with application to schizophrenia. Ann Appl Stat 2016. [DOI: 10.1214/16-aoas948] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
30
|
Richardson S, Tseng GC, Sun W. Statistical Methods in Integrative Genomics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2016; 3:181-209. [PMID: 27482531 PMCID: PMC4963036 DOI: 10.1146/annurev-statistics-041715-033506] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions.
Collapse
Affiliation(s)
- Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, University of Cambridge, CB2 0SR, United Kingdom
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261
| | - Wei Sun
- Department of Biostatistics, Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 27516
| |
Collapse
|
31
|
Wang T, Ren Z, Ding Y, Fang Z, Sun Z, MacDonald ML, Sweet RA, Wang J, Chen W. FastGGM: An Efficient Algorithm for the Inference of Gaussian Graphical Model in Biological Networks. PLoS Comput Biol 2016; 12:e1004755. [PMID: 26872036 PMCID: PMC4752261 DOI: 10.1371/journal.pcbi.1004755] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 01/14/2016] [Indexed: 11/19/2022] Open
Abstract
Biological networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single variables. Gaussian graphical model (GGM), a probability model that characterizes the conditional dependence structure of a set of random variables by a graph, has wide applications in the analysis of biological networks, such as inferring interaction or comparing differential networks. However, existing approaches are either not statistically rigorous or are inefficient for high-dimensional data that include tens of thousands of variables for making inference. In this study, we propose an efficient algorithm to implement the estimation of GGM and obtain p-value and confidence interval for each edge in the graph, based on a recent proposal by Ren et al., 2015. Through simulation studies, we demonstrate that the algorithm is faster by several orders of magnitude than the current implemented algorithm for Ren et al. without losing any accuracy. Then, we apply our algorithm to two real data sets: transcriptomic data from a study of childhood asthma and proteomic data from a study of Alzheimer’s disease. We estimate the global gene or protein interaction networks for the disease and healthy samples. The resulting networks reveal interesting interactions and the differential networks between cases and controls show functional relevance to the diseases. In conclusion, we provide a computationally fast algorithm to implement a statistically sound procedure for constructing Gaussian graphical model and making inference with high-dimensional biological data. The algorithm has been implemented in an R package named “FastGGM”. Gaussian graphical model (GGM), a probability model for characterizing conditional dependence among a set of random variables, has been widely used in studying biological networks. It is important and practical to make inference with rigorous statistical properties and high efficiency under a high-dimensional setting, which is common in biological systems that usually contain tens of thousands of molecular elements, such as genes and proteins. This work proposes a novel efficient algorithm, FastGGM, to implement asymptotically normal estimation of large GGM established by Ren et al [1]. It quickly estimates the precision matrix, partial correlations, as well as p-values and confidence intervals for the graph. Simulation studies demonstrate our algorithm outperforms the current algorithm for Ren et al. and algorithms for some other estimation methods, and real data analyses further prove its efficiency in studying biological networks. In conclusion, FastGGM is a statistically sound and computationally fast algorithm for constructing GGM with high-dimensional data. An R package for implementation can be downloaded from http://www.pitt.edu/~wec47/FastGGM.html.
Collapse
Affiliation(s)
- Ting Wang
- Division of Pulmonary Medicine, Allergy and Immunology; Department of Pediatrics, Children’s Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Zhao Ren
- Department of Statistics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Ying Ding
- Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania, United States of America
| | - Zhou Fang
- Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania, United States of America
| | - Zhe Sun
- Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania, United States of America
| | - Matthew L. MacDonald
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Robert A. Sweet
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Neurology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- VISN 4 Mental Illness Research, Education and Clinical Center (MIRECC), VA Pittsburgh Healthcare System, Pittsburgh, Pennsylvania, United States of America
| | - Jieru Wang
- Division of Pulmonary Medicine, Allergy and Immunology; Department of Pediatrics, Children’s Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Wei Chen
- Division of Pulmonary Medicine, Allergy and Immunology; Department of Pediatrics, Children’s Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania, United States of America
- Department of Human Genetics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
32
|
Cooper GF, Bahar I, Becich MJ, Benos PV, Berg J, Espino JU, Glymour C, Jacobson RC, Kienholz M, Lee AV, Lu X, Scheines R. The center for causal discovery of biomedical knowledge from big data. J Am Med Inform Assoc 2015; 22:1132-6. [PMID: 26138794 PMCID: PMC5009908 DOI: 10.1093/jamia/ocv059] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Revised: 04/27/2015] [Accepted: 05/02/2015] [Indexed: 01/12/2023] Open
Abstract
The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers.
Collapse
Affiliation(s)
- Gregory F Cooper
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ivet Bahar
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Michael J Becich
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Panayiotis V Benos
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jeremy Berg
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA Institute for Personalized Medicine, University of Pittsburgh and University of Pittsburgh Medical Center (UPMC), Pittsburgh, PA, USA
| | - Jeremy U Espino
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Clark Glymour
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, USA
| | | | - Michelle Kienholz
- Institute for Personalized Medicine, University of Pittsburgh and University of Pittsburgh Medical Center (UPMC), Pittsburgh, PA, USA
| | - Adrian V Lee
- Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Richard Scheines
- Dietrich College of Humanities and Social Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
33
|
Lewin A, Saadi H, Peters JE, Moreno-Moral A, Lee JC, Smith KGC, Petretto E, Bottolo L, Richardson S. MT-HESS: an efficient Bayesian approach for simultaneous association detection in OMICS datasets, with application to eQTL mapping in multiple tissues. Bioinformatics 2015; 32:523-32. [PMID: 26504141 PMCID: PMC4743623 DOI: 10.1093/bioinformatics/btv568] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Accepted: 09/03/2015] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Analysing the joint association between a large set of responses and predictors is a fundamental statistical task in integrative genomics, exemplified by numerous expression Quantitative Trait Loci (eQTL) studies. Of particular interest are the so-called ': hotspots ': , important genetic variants that regulate the expression of many genes. Recently, attention has focussed on whether eQTLs are common to several tissues, cell-types or, more generally, conditions or whether they are specific to a particular condition. RESULTS We have implemented MT-HESS, a Bayesian hierarchical model that analyses the association between a large set of predictors, e.g. SNPs, and many responses, e.g. gene expression, in multiple tissues, cells or conditions. Our Bayesian sparse regression algorithm goes beyond ': one-at-a-time ': association tests between SNPs and responses and uses a fully multivariate model search across all linear combinations of SNPs, coupled with a model of the correlation between condition/tissue-specific responses. In addition, we use a hierarchical structure to leverage shared information across different genes, thus improving the detection of hotspots. We show the increase of power resulting from our new approach in an extensive simulation study. Our analysis of two case studies highlights new hotspots that would remain undetected by standard approaches and shows how greater prediction power can be achieved when several tissues are jointly considered. AVAILABILITY AND IMPLEMENTATION C[Formula: see text] source code and documentation including compilation instructions are available under GNU licence at http://www.mrc-bsu.cam.ac.uk/software/.
Collapse
Affiliation(s)
- Alex Lewin
- Department of Mathematics, Brunel University London
| | - Habib Saadi
- Department of Epidemiology and Biostatistics, Imperial College London, London
| | - James E Peters
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge, MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge
| | | | - James C Lee
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge
| | - Kenneth G C Smith
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge
| | - Enrico Petretto
- MRC Clinical Sciences Centre, Imperial College London, London, UK, Duke-NUS Graduate Medical School, Singapore, Singapore
| | - Leonardo Bottolo
- Department of Mathematics, Imperial College London, London, UK and Department of Medical Genetics, University of Cambridge
| | - Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge
| |
Collapse
|
34
|
Cava C, Bertoli G, Castiglioni I. Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential. BMC SYSTEMS BIOLOGY 2015; 9:62. [PMID: 26391647 PMCID: PMC4578257 DOI: 10.1186/s12918-015-0211-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 09/15/2015] [Indexed: 12/11/2022]
Abstract
BACKGROUND Development of human cancer can proceed through the accumulation of different genetic changes affecting the structure and function of the genome. Combined analyses of molecular data at multiple levels, such as DNA copy-number alteration, mRNA and miRNA expression, can clarify biological functions and pathways deregulated in cancer. The integrative methods that are used to investigate these data involve different fields, including biology, bioinformatics, and statistics. RESULTS These methodologies are presented in this review, and their implementation in breast cancer is discussed with a focus on integration strategies. We report current applications, recent studies and interesting results leading to the identification of candidate biomarkers for diagnosis, prognosis, and therapy in breast cancer by using both individual and combined analyses. CONCLUSION This review presents a state of art of the role of different technologies in breast cancer based on the integration of genetics and epigenetics, and shares some issues related to the new opportunities and challenges offered by the application of such integrative approaches.
Collapse
Affiliation(s)
- Claudia Cava
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Gloria Bertoli
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Isabella Castiglioni
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| |
Collapse
|
35
|
Integrative analysis of the microRNA-mRNA response to radiochemotherapy in primary head and neck squamous cell carcinoma cells. BMC Genomics 2015; 16:654. [PMID: 26328888 PMCID: PMC4557600 DOI: 10.1186/s12864-015-1865-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Accepted: 08/19/2015] [Indexed: 01/01/2023] Open
Abstract
Background Head and neck squamous cell carcinoma (HNSCC) is a very heterogeneous disease resulting in huge differences in the treatment response. New individualized therapy strategies including molecular targeting might help to improve treatment success. In order to identify potential targets, we developed a HNSCC radiochemotherapy cell culture model of primary HNSCC cells derived from two different patients (HN1957 and HN2092) and applied an integrative microRNA (miRNA) and mRNA analysis in order to gain information on the biological networks and processes of the cellular therapy response. We further identified potential target genes of four therapy-responsive miRNAs detected previously in the circulation of HNSCC patients by pathway enrichment analysis. Results The two primary cell cultures differ in global copy number alterations and P53 mutational status, thus reflecting heterogeneity of HNSCC. However, they also share many copy number alterations and chromosomal rearrangements as well as deregulated therapy-responsive miRNAs and mRNAs. Accordingly, six common therapy-responsive pathways (direct P53 effectors, apoptotic execution phase, DNA damage/telomere stress induced senescence, cholesterol biosynthesis, unfolded protein response, dissolution of fibrin clot) were identified in both cell cultures based on deregulated mRNAs. However, inflammatory pathways represented an important part of the treatment response only in HN1957, pointing to differences in the treatment responses of the two primary cultures. Focused analysis of target genes of four therapy-responsive circulating miRNAs, identified in a previous study on HNSCC patients, revealed a major impact on the pathways direct P53 effectors, the E2F transcription factor network and pathways in cancer (mainly represented by the PTEN/AKT signaling pathway). Conclusions The integrative analysis combining miRNA expression, mRNA expression and the related cellular pathways revealed that the majority of radiochemotherapy-responsive pathways in primary HNSCC cells are related to cell cycle, proliferation, cell death and stress response (including inflammation). Despite the heterogeneity of HNSCC, the two primary cell cultures exhibited strong similarities in the treatment response. The findings of our study suggest potential therapeutic targets in the E2F transcription factor network and the PTEN/AKT signaling pathway. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1865-x) contains supplementary material, which is available to authorized users.
Collapse
|
36
|
Peterson CB, Stingo FC, Vannucci M. Bayesian Inference of Multiple Gaussian Graphical Models. J Am Stat Assoc 2015; 110:159-174. [PMID: 26078481 DOI: 10.1080/01621459.2014.896806] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
In this paper, we propose a Bayesian approach to inference on multiple Gaussian graphical models. Specifically, we address the problem of inferring multiple undirected networks in situations where some of the networks may be unrelated, while others share common features. We link the estimation of the graph structures via a Markov random field (MRF) prior which encourages common edges. We learn which sample groups have a shared graph structure by placing a spike-and-slab prior on the parameters that measure network relatedness. This approach allows us to share information between sample groups, when appropriate, as well as to obtain a measure of relative network similarity across groups. Our modeling framework incorporates relevant prior knowledge through an edge-specific informative prior and can encourage similarity to an established network. Through simulations, we demonstrate the utility of our method in summarizing relative network similarity and compare its performance against related methods. We find improved accuracy of network estimation, particularly when the sample sizes within each subgroup are moderate. We also illustrate the application of our model to infer protein networks for various cancer subtypes and under different experimental conditions.
Collapse
Affiliation(s)
| | - Francesco C Stingo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center
| | | |
Collapse
|
37
|
Ni Y, Stingo FC, Baladandayuthapani V. Bayesian nonlinear model selection for gene regulatory networks. Biometrics 2015; 71:585-95. [PMID: 25854759 DOI: 10.1111/biom.12309] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2014] [Revised: 12/01/2014] [Accepted: 02/01/2015] [Indexed: 12/31/2022]
Abstract
Gene regulatory networks represent the regulatory relationships between genes and their products and are important for exploring and defining the underlying biological processes of cellular systems. We develop a novel framework to recover the structure of nonlinear gene regulatory networks using semiparametric spline-based directed acyclic graphical models. Our use of splines allows the model to have both flexibility in capturing nonlinear dependencies as well as control of overfitting via shrinkage, using mixed model representations of penalized splines. We propose a novel discrete mixture prior on the smoothing parameter of the splines that allows for simultaneous selection of both linear and nonlinear functional relationships as well as inducing sparsity in the edge selection. Using simulation studies, we demonstrate the superior performance of our methods in comparison with several existing approaches in terms of network reconstruction and functional selection. We apply our methods to a gene expression dataset in glioblastoma multiforme, which reveals several interesting and biologically relevant nonlinear relationships.
Collapse
Affiliation(s)
- Yang Ni
- Department of Statistics, Rice University, Houston, Texas, U.S.A
| | - Francesco C Stingo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, U.S.A
| | | |
Collapse
|
38
|
Cassese A, Guindani M, Antczak P, Falciani F, Vannucci M. A Bayesian model for the identification of differentially expressed genes in Daphnia magna exposed to munition pollutants. Biometrics 2015; 71:803-11. [PMID: 25771699 DOI: 10.1111/biom.12303] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 12/01/2014] [Accepted: 02/01/2015] [Indexed: 11/29/2022]
Abstract
In this article we propose a Bayesian hierarchical model for the identification of differentially expressed genes in Daphnia magna organisms exposed to chemical compounds, specifically munition pollutants in water. The model we propose constitutes one of the very first attempts at a rigorous modeling of the biological effects of water purification. We have data acquired from a purification system that comprises four consecutive purification stages, which we refer to as "ponds," of progressively more contaminated water. We model the expected expression of a gene in a pond as the sum of the mean of the same gene in the previous pond plus a gene-pond specific difference. We incorporate a variable selection mechanism for the identification of the differential expressions, with a prior distribution on the probability of a change that accounts for the available information on the concentration of chemical compounds present in the water. We carry out posterior inference via MCMC stochastic search techniques. In the application, we reduce the complexity of the data by grouping genes according to their functional characteristics, based on the KEGG pathway database. This also increases the biological interpretability of the results. Our model successfully identifies a number of pathways that show differential expression between consecutive purification stages. We also find that changes in the transcriptional response are more strongly associated to the presence of certain compounds, with the remaining contributing to a lesser extent. We discuss the sensitivity of these results to the model parameters that measure the influence of the prior information on the posterior inference.
Collapse
Affiliation(s)
- Alberto Cassese
- Department of Statistics, Rice University, Houston, Texas 77005, U.S.A.,Department of Biostatistics, UT MD Anderson Cancer Center, Houston, Texas, U.S.A
| | - Michele Guindani
- Department of Biostatistics, UT MD Anderson Cancer Center, Houston, Texas, U.S.A
| | - Philipp Antczak
- Institute of Integrative Biology, University of Liverpool, Liverpool, U.K
| | - Francesco Falciani
- Institute of Integrative Biology, University of Liverpool, Liverpool, U.K
| | - Marina Vannucci
- Department of Statistics, Rice University, Houston, Texas 77005, U.S.A
| |
Collapse
|
39
|
Chekouo T, Stingo FC, Doecke JD, Do KA. miRNA-target gene regulatory networks: A Bayesian integrative approach to biomarker selection with application to kidney cancer. Biometrics 2015; 71:428-38. [PMID: 25639276 DOI: 10.1111/biom.12266] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2013] [Revised: 09/01/2014] [Accepted: 10/01/2014] [Indexed: 11/30/2022]
Abstract
The availability of cross-platform, large-scale genomic data has enabled the investigation of complex biological relationships for many cancers. Identification of reliable cancer-related biomarkers requires the characterization of multiple interactions across complex genetic networks. MicroRNAs are small non-coding RNAs that regulate gene expression; however, the direct relationship between a microRNA and its target gene is difficult to measure. We propose a novel Bayesian model to identify microRNAs and their target genes that are associated with survival time by incorporating the microRNA regulatory network through prior distributions. We assume that biomarkers involved in regulatory networks are likely associated with survival time. We employ non-local prior distributions and a stochastic search method for the selection of biomarkers associated with the survival outcome. We use KEGG pathway information to incorporate correlated gene effects within regulatory networks. Using simulation studies, we assess the performance of our method, and apply it to experimental data of kidney renal cell carcinoma (KIRC) obtained from The Cancer Genome Atlas. Our novel method validates previously identified cancer biomarkers and identifies biomarkers specific to KIRC progression that were not previously discovered. Using the KIRC data, we confirm that biomarkers involved in regulatory networks are more likely to be associated with survival time, showing connections in one regulatory network for five out of six such genes we identified.
Collapse
Affiliation(s)
- Thierry Chekouo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, 1400 Pressler Street, Unit 1411, Texas, 77030-3722, U.S.A
| | - Francesco C Stingo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, 1400 Pressler Street, Unit 1411, Texas, 77030-3722, U.S.A
| | - James D Doecke
- CSIRO Computational Informatics/Australian e-Health Research Centre Level 5, UQ Health Sciences Building, 901/16 Royal Brisbane, Queensland, 4029, Australia
| | - Kim-Anh Do
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, 1400 Pressler Street, Unit 1411, Texas, 77030-3722, U.S.A
| |
Collapse
|
40
|
Tabas-Madrid D, Muniategui A, Sánchez-Caballero I, Martínez-Herrera DJ, Sorzano COS, Rubio A, Pascual-Montano A. Improving miRNA-mRNA interaction predictions. BMC Genomics 2014; 15 Suppl 10:S2. [PMID: 25559987 PMCID: PMC4304206 DOI: 10.1186/1471-2164-15-s10-s2] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Background MicroRNAs are short RNA molecules that post-transcriptionally regulate gene expression. Today, microRNA target prediction remains challenging since very few have been experimentally validated and sequence-based predictions have large numbers of false positives. Furthermore, due to the different measuring rules used in each database of predicted interactions, the selection of the most reliable ones requires extensive knowledge about each algorithm. Results Here we propose two methods to measure the confidence of predicted interactions based on experimentally validated information. The output of the methods is a combined database where new scores and statistical confidences are re-assigned to each predicted interaction. The new scores allow the robust combination of several databases without the effect of low-performing algorithms dragging down good-performing ones. The combined databases obtained using both algorithms described in this paper outperform each of the existing predictive algorithms that were considered for the combination. Conclusions Our approaches are a useful way to integrate predicted interactions from different databases. They reduce the selection of interactions to a unique database based on an intuitive score and allow comparing databases between them.
Collapse
|
41
|
Wang Z, Xu W, Zhu H, Liu Y. A Bayesian Framework to Improve MicroRNA Target Prediction by Incorporating External Information. Cancer Inform 2014; 13:19-25. [PMID: 25452690 PMCID: PMC4238384 DOI: 10.4137/cin.s16348] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Revised: 10/14/2014] [Accepted: 10/16/2014] [Indexed: 01/10/2023] Open
Abstract
MicroRNAs (miRNAs) are small regulatory RNAs that play key gene-regulatory roles in diverse biological processes, particularly in cancer development. Therefore, inferring miRNA targets is an essential step to fully understanding the functional properties of miRNA actions in regulating tumorigenesis. Bayesian linear regression modeling has been proposed for identifying the interactions between miRNAs and mRNAs on the basis of the integrated sequence information and matched miRNA and mRNA expression data; however, this approach does not use the full spectrum of available features of putative miRNA targets. In this study, we integrated four important sequence and structural features of miRNA targeting with paired miRNA and mRNA expression data to improve miRNA-target prediction in a Bayesian framework. We have applied this approach to a gene-expression study of liver cancer patients and examined the posterior probability of each miRNA-mRNA interaction being functional in the development of liver cancer. Our method achieved better performance, in terms of the number of true targets identified, than did other methods.
Collapse
Affiliation(s)
- Zixing Wang
- Department of Neurobiology and Anatomy, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Wenlong Xu
- Department of Neurobiology and Anatomy, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Haifeng Zhu
- Department of Melanoma Medical Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Yin Liu
- Department of Neurobiology and Anatomy, University of Texas Health Science Center at Houston, Houston, TX, USA. ; University of Texas Graduate School of Biomedical Science, Houston, TX, USA
| |
Collapse
|
42
|
Cassese A, Guindani M, Vannucci M. A bayesian integrative model for genetical genomics with spatially informed variable selection. Cancer Inform 2014; 13:29-37. [PMID: 25288877 PMCID: PMC4179607 DOI: 10.4137/cin.s13784] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 04/10/2014] [Accepted: 04/16/2014] [Indexed: 11/05/2022] Open
Abstract
We consider a Bayesian hierarchical model for the integration of gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. The approach defines a measurement error model that relates the gene expression levels to latent copy number states. In turn, the latent states are related to the observed surrogate CGH measurements via a hidden Markov model. The model further incorporates variable selection with a spatial prior based on a probit link that exploits dependencies across adjacent DNA segments. Posterior inference is carried out via Markov chain Monte Carlo stochastic search techniques. We study the performance of the model in simulations and show better results than those achieved with recently proposed alternative priors. We also show an application to data from a genomic study on lung squamous cell carcinoma, where we identify potential candidates of associations between copy number variants and the transcriptional activity of target genes. Gene ontology (GO) analyses of our findings reveal enrichments in genes that code for proteins involved in cancer. Our model also identifies a number of potential candidate biomarkers for further experimental validation.
Collapse
Affiliation(s)
- Alberto Cassese
- Department of Statistics, Rice University, Houston, TX, USA. ; Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Michele Guindani
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | |
Collapse
|
43
|
Ni Y, Stingo FC, Baladandayuthapani V. Integrative bayesian network analysis of genomic data. Cancer Inform 2014; 13:39-48. [PMID: 25288878 PMCID: PMC4179606 DOI: 10.4137/cin.s13786] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Revised: 05/04/2014] [Accepted: 05/05/2014] [Indexed: 01/09/2023] Open
Abstract
Rapid development of genome-wide profiling technologies has made it possible to conduct integrative analysis on genomic data from multiple platforms. In this study, we develop a novel integrative Bayesian network approach to investigate the relationships between genetic and epigenetic alterations as well as how these mutations affect a patient's clinical outcome. We take a Bayesian network approach that admits a convenient decomposition of the joint distribution into local distributions. Exploiting the prior biological knowledge about regulatory mechanisms, we model each local distribution as linear regressions. This allows us to analyze multi-platform genome-wide data in a computationally efficient manner. We illustrate the performance of our approach through simulation studies. Our methods are motivated by and applied to a multi-platform glioblastoma dataset, from which we reveal several biologically relevant relationships that have been validated in the literature as well as new genes that could potentially be novel biomarkers for cancer progression.
Collapse
Affiliation(s)
- Yang Ni
- Department of Statistics, Rice University, Houston, Texas, USA
| | - Francesco C Stingo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | | |
Collapse
|
44
|
|
45
|
Chen YA, Eschrich SA. Computational methods and opportunities for phosphorylation network medicine. Transl Cancer Res 2014; 3:266-278. [PMID: 25530950 PMCID: PMC4271781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Protein phosphorylation, one of the most ubiquitous post-translational modifications (PTM) of proteins, is known to play an essential role in cell signaling and regulation. With the increasing understanding of the complexity and redundancy of cell signaling, there is a growing recognition that targeting the entire network or system could be a necessary and advantageous strategy for treating cancer. Protein kinases, the proteins that add a phosphate group to the substrate proteins during phosphorylation events, have become one of the largest groups of 'druggable' targets in cancer therapeutics in recent years. Kinase inhibitors are being regularly used in clinics for cancer treatment. This therapeutic paradigm shift in cancer research is partly due to the generation and availability of high-dimensional proteomics data. Generation of this data, in turn, is enabled by increased use of mass-spectrometry (MS)-based or other high-throughput proteomics platforms as well as companion public databases and computational tools. This review briefly summarizes the current state and progress on phosphoproteomics identification, quantification, and platform related characteristics. We review existing database resources, computational tools, methods for phosphorylation network inference, and ultimately demonstrate the connection to therapeutics. Finally, many research opportunities exist for bioinformaticians or biostatisticians based on developments and limitations of the current and emerging technologies.
Collapse
Affiliation(s)
- Yian Ann Chen
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, 12902 Magnolia Drive Tampa, FL 33612, USA
| | - Steven A Eschrich
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, 12902 Magnolia Drive Tampa, FL 33612, USA
| |
Collapse
|
46
|
Li Y, Liang C, Wong KC, Jin K, Zhang Z. Inferring probabilistic miRNA-mRNA interaction signatures in cancers: a role-switch approach. Nucleic Acids Res 2014; 42:e76. [PMID: 24609385 PMCID: PMC4027195 DOI: 10.1093/nar/gku182] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Revised: 01/22/2014] [Accepted: 02/14/2014] [Indexed: 11/14/2022] Open
Abstract
Aberrant microRNA (miRNA) expression is implicated in tumorigenesis. The underlying mechanisms are unclear because the regulations of each miRNA on potentially hundreds of mRNAs are sample specific. We describe a novel approach to infer Probabilistic MiRNA-mRNA Interaction Signature ('ProMISe') from a single pair of miRNA-mRNA expression profile. Our model considers mRNA and miRNA competition as a probabilistic function of the expressed seeds (matches). To demonstrate ProMISe, we extensively exploited The Cancer Genome Atlas data. As a target predictor, ProMISe identifies more confidence/validated targets than other methods. Importantly, ProMISe confers higher cancer diagnostic power than using expression profiles alone. Gene set enrichment analysis on averaged ProMISe uniquely revealed respective target enrichments of oncomirs miR-21 and 145 in glioblastoma and ovarian cancers. Moreover, comparing matched breast (BRCA) and thyroid (THCA) tumor/normal samples uncovered thousands of tumor-related interactions. For example, ProMISe-BRCA network involves miR-155/183/21, which exhibits higher ProMISe coupled with coherently higher miRNA expression and lower target expression; oncomirs miR-221/222 in the ProMISe-THCA network engage with many downregulated target genes. Together, our probabilistic approach of integrating expression and sequence scores establishes a functional link between the aberrant miRNA and mRNA expression, which was previously under-appreciated due to the methodological differences.
Collapse
Affiliation(s)
- Yue Li
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4, Canada The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Cheng Liang
- The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Ka-Chun Wong
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4, Canada The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Ke Jin
- The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A4, Canada
| | - Zhaolei Zhang
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4, Canada The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A4, Canada Banting and Best Department of Medical Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| |
Collapse
|
47
|
Cassese A, Guindani M, Tadesse MG, Falciani F, Vannucci M. A HIERARCHICAL BAYESIAN MODEL FOR INFERENCE OF COPY NUMBER VARIANTS AND THEIR ASSOCIATION TO GENE EXPRESSION. Ann Appl Stat 2014; 8:148-175. [PMID: 24834139 DOI: 10.1214/13-aoas705] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
A number of statistical models have been successfully developed for the analysis of high-throughput data from a single source, but few methods are available for integrating data from different sources. Here we focus on integrating gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. We specify a measurement error model that relates the gene expression levels to latent copy number states which, in turn, are related to the observed surrogate CGH measurements via a hidden Markov model. We employ selection priors that exploit the dependencies across adjacent copy number states and investigate MCMC stochastic search techniques for posterior inference. Our approach results in a unified modeling framework for simultaneously inferring copy number variants (CNV) and identifying their significant associations with mRNA transcripts abundance. We show performance on simulated data and illustrate an application to data from a genomic study on human cancer cell lines.
Collapse
|
48
|
Abstract
Inferring microRNA (miRNA) functions and activities has been extremely important to understand their system-level roles and the mechanisms behind the cellular behaviors of their target genes. This chapter first details methodologies necessary for prediction of function and activity. It then introduces the computational methods available for investigation of sequence and experimental data and for analysis of the information flow mediated through miRNAs.
Collapse
Affiliation(s)
- Hasan Oğul
- Department of Computer Engineering, Baskent University, Ankara, Turkey
| |
Collapse
|
49
|
Wang W, Baladandayuthapani V, Holmes CC, Do KA. Integrative network-based Bayesian analysis of diverse genomics data. BMC Bioinformatics 2013; 14 Suppl 13:S8. [PMID: 24267288 PMCID: PMC3849715 DOI: 10.1186/1471-2105-14-s13-s8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND In order to better understand cancer as a complex disease with multiple genetic and epigenetic factors, it is vital to model the fundamental biological relationships among these alterations as well as their relationships with important clinical outcomes. METHODS We develop an integrative network-based Bayesian analysis (iNET) approach that allows us to jointly analyze multi-platform high-dimensional genomic data in a computationally efficient manner. The iNET approach is formulated as an objective Bayesian model selection problem for Gaussian graphical models to model joint dependencies among platform-specific features using known biological mechanisms. Using both simulated datasets and a glioblastoma (GBM) study from The Cancer Genome Atlas (TCGA), we illustrate the iNET approach via integrating three data types, microRNA, gene expression (mRNA), and patient survival time. RESULTS We show that the iNET approach has greater power in identifying cancer-related microRNAs than non-integrative approaches based on realistic simulated datasets. In the TCGA GBM study, we found many mRNA-microRNA pairs and microRNAs that are associated with patient survival time, with some of these associations identified in previous studies. CONCLUSIONS The iNET discovers relationships consistent with the underlying biological mechanisms among these variables, as well as identifying important biomarkers that are potentially relevant to patient survival. In addition, we identified some microRNAs that can potentially affect patient survival which are missed by non-integrative approaches.
Collapse
Affiliation(s)
- Wenting Wang
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, USA
| | | | - Chris C Holmes
- Department of Statistics, University of Oxford, Oxford, UK
- MRC Harwell, Oxon, UK
| | - Kim-Anh Do
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, USA
| |
Collapse
|
50
|
Jadhav V, Hackl M, Druz A, Shridhar S, Chung CY, Heffner KM, Kreil DP, Betenbaugh M, Shiloach J, Barron N, Grillari J, Borth N. CHO microRNA engineering is growing up: recent successes and future challenges. Biotechnol Adv 2013; 31:1501-13. [PMID: 23916872 PMCID: PMC3854872 DOI: 10.1016/j.biotechadv.2013.07.007] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Revised: 07/18/2013] [Accepted: 07/20/2013] [Indexed: 12/16/2022]
Abstract
microRNAs with their ability to regulate complex pathways that control cellular behavior and phenotype have been proposed as potential targets for cell engineering in the context of optimization of biopharmaceutical production cell lines, specifically of Chinese Hamster Ovary cells. However, until recently, research was limited by a lack of genomic sequence information on this industrially important cell line. With the publication of the genomic sequence and other relevant data sets for CHO cells since 2011, the doors have been opened for an improved understanding of CHO cell physiology and for the development of the necessary tools for novel engineering strategies. In the present review we discuss both knowledge on the regulatory mechanisms of microRNAs obtained from other biological models and proof of concepts already performed on CHO cells, thus providing an outlook of potential applications of microRNA engineering in production cell lines.
Collapse
Affiliation(s)
- Vaibhav Jadhav
- Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|