1
|
Long DK, Bangerth W, Handwerk DR, Whitehead CB, Shipman PD, Finke RG. Estimating reaction parameters in mechanism-enabled population balance models of nanoparticle size distributions: A Bayesian inverse problem approach. J Comput Chem 2022; 43:43-56. [PMID: 34672375 DOI: 10.1002/jcc.26770] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 09/03/2021] [Accepted: 10/01/2021] [Indexed: 01/03/2023]
Abstract
In order to quantitatively predict nano- as well as other particle-size distributions, one needs to have both a mathematical model and estimates of the parameters that appear in these models. Here, we show how one can use Bayesian inversion to obtain statistical estimates for the parameters that appear in recently derived mechanism-enabled population balance models (ME-PBM) of nanoparticle growth. The Bayesian approach addresses the question of "how well do we know our parameters, along with their uncertainties?." The results reveal that Bayesian inversion statistical analysis on an example, prototype Ir 0 n nanoparticle formation system allows one to estimate not just the most likely rate constants and other parameter values, but also their SDs, confidence intervals, and other statistical information. Moreover, knowing the reliability of the mechanistic model's parameters in turn helps inform one about the reliability of the proposed mechanism, as well as the reliability of its predictions. The paper can also be seen as a tutorial with the additional goal of achieving a "Gold Standard" Bayesian inversion ME-PBM benchmark that others can use as a control to check their own use of this methodology for other systems of interest throughout nature. Overall, the results provide strong support for the hypothesis that there is substantial value in using a Bayesian inversion methodology for parameter estimation in particle formation systems.
Collapse
Affiliation(s)
- Danny K Long
- Department of Mathematics, Colorado State University, Fort Collins, Colorado, USA
| | - Wolfgang Bangerth
- Department of Mathematics, Colorado State University, Fort Collins, Colorado, USA.,Department of Geosciences, Colorado State University, Fort Collins, Colorado, USA
| | - Derek R Handwerk
- Department of Chemistry, Colorado State University, Fort Collins, Colorado, USA
| | - Christopher B Whitehead
- Department of Chemistry, Colorado State University, Fort Collins, Colorado, USA.,Department of Chemistry, University of Basel, Basel, Switzerland
| | - Patrick D Shipman
- Department of Mathematics, Colorado State University, Fort Collins, Colorado, USA
| | - Richard G Finke
- Department of Chemistry, Colorado State University, Fort Collins, Colorado, USA
| |
Collapse
|
2
|
Fan A, Wang H, Xiang H, Zou X. Inferring Large-Scale Gene Regulatory Networks Using a Randomized Algorithm Based on Singular Value Decomposition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1997-2008. [PMID: 29993839 DOI: 10.1109/tcbb.2018.2825446] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Reconstructing large-scale gene regulatory networks (GRNs) is a challenging problem in the field of computational biology. Various methods for inferring GRNs have been developed, but they fail to accurately infer GRNs with a large number of genes. Additionally, the existing evaluation indexes for evaluating the constructed networks have obvious disadvantages because GRNs in most biological systems are sparse. In this paper, we develop a new method for inferring GRNs based on randomized singular value decomposition (RSVD) and ordinary differential equation (ODE)-based optimization, denoted as IGRSVD, from large-scale time series data with noise. The three major contributions of this paper are as follows. First, the IGRSVD algorithm uses the RSVD to handle the noise and reduce the original large-scale data into small-scale problems. Second, we propose two new evaluated indexes, the expected value accuracy (EVA) and the expected value error (EVE), to evaluate the performance of inferred networks by considering the sparse features in the network. Finally, the proposed IGRSVD algorithm is compared with the existing SVD algorithm and PCA_CMI algorithm using four subsets from E. coli and datasets from DREAM challenge. The experimental results demonstrate that the IGRSVD algorithm is effective and more suitable for reconstructing large-scale networks.
Collapse
|
3
|
Zhang LQ, Li QZ, Su WX, Jin W. Predicting gene expression level by the transcription factor binding signals in human embryonic stem cells. Biosystems 2016; 150:92-98. [DOI: 10.1016/j.biosystems.2016.08.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Revised: 08/17/2016] [Accepted: 08/18/2016] [Indexed: 11/28/2022]
|
4
|
Chen X, Jung JG, Shajahan-Haq AN, Clarke R, Shih IM, Wang Y, Magnani L, Wang TL, Xuan J. ChIP-BIT: Bayesian inference of target genes using a novel joint probabilistic model of ChIP-seq profiles. Nucleic Acids Res 2016; 44:e65. [PMID: 26704972 PMCID: PMC4838354 DOI: 10.1093/nar/gkv1491] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Revised: 11/16/2015] [Accepted: 12/09/2015] [Indexed: 11/16/2022] Open
Abstract
Chromatin immunoprecipitation with massively parallel DNA sequencing (ChIP-seq) has greatly improved the reliability with which transcription factor binding sites (TFBSs) can be identified from genome-wide profiling studies. Many computational tools are developed to detect binding events or peaks, however the robust detection of weak binding events remains a challenge for current peak calling tools. We have developed a novel Bayesian approach (ChIP-BIT) to reliably detect TFBSs and their target genes by jointly modeling binding signal intensities and binding locations of TFBSs. Specifically, a Gaussian mixture model is used to capture both binding and background signals in sample data. As a unique feature of ChIP-BIT, background signals are modeled by a local Gaussian distribution that is accurately estimated from the input data. Extensive simulation studies showed a significantly improved performance of ChIP-BIT in target gene prediction, particularly for detecting weak binding signals at gene promoter regions. We applied ChIP-BIT to find target genes from NOTCH3 and PBX1 ChIP-seq data acquired from MCF-7 breast cancer cells. TF knockdown experiments have initially validated about 30% of co-regulated target genes identified by ChIP-BIT as being differentially expressed in MCF-7 cells. Functional analysis on these genes further revealed the existence of crosstalk between Notch and Wnt signaling pathways.
Collapse
Affiliation(s)
- Xi Chen
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 900 North Glebe Road, Arlington, VA 22203, USA
| | - Jin-Gyoung Jung
- Department of Pathology, Johns Hopkins Medical Institutions, 1550 Orleans Street, CRB-II, Baltimore, MD 21231, USA
| | - Ayesha N Shajahan-Haq
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, 3970 Reservoir Road NW, Washington, DC 20057, USA
| | - Robert Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, 3970 Reservoir Road NW, Washington, DC 20057, USA
| | - Ie-Ming Shih
- Department of Pathology, Johns Hopkins Medical Institutions, 1550 Orleans Street, CRB-II, Baltimore, MD 21231, USA
| | - Yue Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 900 North Glebe Road, Arlington, VA 22203, USA
| | - Luca Magnani
- Department of Surgery and Cancer, Imperial College London, ICTEM building, Hammersmith Hospital, DuCane Road, London W120NN, UK
| | - Tian-Li Wang
- Department of Pathology, Johns Hopkins Medical Institutions, 1550 Orleans Street, CRB-II, Baltimore, MD 21231, USA
| | - Jianhua Xuan
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 900 North Glebe Road, Arlington, VA 22203, USA
| |
Collapse
|
5
|
Naifang S, Minping Q, Minghua D. Integrative Approaches for microRNA Target Prediction: Combining Sequence Information and the Paired mRNA and miRNA Expression Profiles. Curr Bioinform 2013; 8:37-45. [PMID: 23467572 PMCID: PMC3583062 DOI: 10.2174/1574893611308010008] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2012] [Revised: 05/01/2012] [Accepted: 05/10/2012] [Indexed: 11/30/2022]
Abstract
Gene regulation is a key factor in gaining a full understanding of molecular biology. microRNA (miRNA), a novel class of non-coding RNA, has recently been found to be one crucial class of post-transactional regulators, and play important roles in cancer. One essential step to understand the regulatory effect of miRNAs is the reliable prediction of their target mRNAs. Typically, the predictions are solely based on the sequence information, which unavoidably have high false detection rates. Recently, some novel approaches are developed to predict miRNA targets by integrating the typical algorithm with the paired expression profiles of miRNA and mRNA. Here we review and discuss these integrative approaches and propose a new algorithm called HCTarget. Applying HCtarget to the expression data in multiple myeloma, we predict target genes for ten specific miRNAs. The experimental verification and a loss of function study validate our predictions. Therefore, the integrative approach is a reliable and effective way to predict miRNA targets, and could improve our comprehensive understanding of gene regulation.
Collapse
Affiliation(s)
- Su Naifang
- LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, P.R. China ; Beijing International Center for Mathematical Research, Peking University, Beijing 100871, P.R. China
| | | | | |
Collapse
|
6
|
Hierarchical modularity in ERα transcriptional network is associated with distinct functions and implicates clinical outcomes. Sci Rep 2012; 2:875. [PMID: 23166858 PMCID: PMC3500769 DOI: 10.1038/srep00875] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 10/30/2012] [Indexed: 12/18/2022] Open
Abstract
Recent genome-wide profiling reveals highly complex regulation networks among ERα and its targets. We integrated estrogen (E2)-stimulated time-series ERα ChIP-seq and gene expression data to identify the ERα-centered transcription factor (TF) hubs and their target genes, and inferred the time-variant hierarchical network structures using a Bayesian multivariate modeling approach. With its recurrent motif patterns, we determined three embedded regulatory modules from the ERα core transcriptional network. The GO analyses revealed the distinct biological function associated with each of three embedded modules. The survival analysis showed the genes in each module were able to render a significant survival correlation in breast cancer patient cohorts. In summary, our Bayesian statistical modeling and modularity analysis not only reveals the dynamic properties of the ERα-centered regulatory network and associated distinct biological functions, but also provides a reliable and effective genomic analytical approach for the analysis of dynamic regulatory network for any given TF.
Collapse
|
7
|
Zhang X, Liu K, Liu ZP, Duval B, Richer JM, Zhao XM, Hao JK, Chen L. NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference. Bioinformatics 2012; 29:106-13. [DOI: 10.1093/bioinformatics/bts619] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
|
8
|
Gülsoy G, Bandhyopadhyay N, Kahveci T. HIDEN: Hierarchical decomposition of regulatory networks. BMC Bioinformatics 2012; 13:250. [PMID: 23016513 PMCID: PMC3556311 DOI: 10.1186/1471-2105-13-250] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 09/21/2012] [Indexed: 12/15/2022] Open
Abstract
Background Transcription factors regulate numerous cellular processes by controlling the rate of production of each gene. The regulatory relations are modeled using transcriptional regulatory networks. Recent studies have shown that such networks have an underlying hierarchical organization. We consider the problem of discovering the underlying hierarchy in transcriptional regulatory networks. Results We first transform this problem to a mixed integer programming problem. We then use existing tools to solve the resulting problem. For larger networks this strategy does not work due to rapid increase in running time and space usage. We use divide and conquer strategy for such networks. We use our method to analyze the transcriptional regulatory networks of E. coli, H. sapiens and S. cerevisiae. Conclusions Our experiments demonstrate that: (i) Our method gives statistically better results than three existing state of the art methods; (ii) Our method is robust against errors in the data and (iii) Our method’s performance is not affected by the different topologies in the data.
Collapse
Affiliation(s)
- Günhan Gülsoy
- Computer and Information Sciences and Engineering, University of Florida, Gainesville, FL 32611, USA.
| | | | | |
Collapse
|
9
|
Wei P, Pan W. Bayesian Joint Modeling of Multiple Gene Networks and Diverse Genomic Data to Identify Target Genes of a Transcription Factor. Ann Appl Stat 2012; 6:334-355. [PMID: 22408712 PMCID: PMC3298193 DOI: 10.1214/11-aoas502] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
We consider integrative modeling of multiple gene networks and diverse genomic data, including protein-DNA binding, gene expression and DNA sequence data, to accurately identify the regulatory target genes of a transcription factor (TF). Rather than treating all the genes equally and independently a priori in existing joint modeling approaches, we incorporate the biological prior knowledge that neighboring genes on a gene network tend to be (or not to be) regulated together by a TF. A key contribution of our work is that, to maximize the use of all existing biological knowledge, we allow incorporation of multiple gene networks into joint modeling of genomic data by introducing a mixture model based on the use of multiple Markov random fields (MRFs). Another important contribution of our work is to allow different genomic data to be correlated and to examine the validity and effect of the independence assumption as adopted in existing methods. Due to a fully Bayesian approach, inference about model parameters can be carried out based on MCMC samples. Application to an E. coli data set, together with simulation studies, demonstrates the utility and statistical efficiency gains with the proposed joint model.
Collapse
Affiliation(s)
- Peng Wei
- Division of Biostatistics and Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA,
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA,
| |
Collapse
|
10
|
Shen C, Huang Y, Liu Y, Wang G, Zhao Y, Wang Z, Teng M, Wang Y, Flockhart DA, Skaar TC, Yan P, Nephew KP, Huang TH, Li L. A modulated empirical Bayes model for identifying topological and temporal estrogen receptor α regulatory networks in breast cancer. BMC SYSTEMS BIOLOGY 2011; 5:67. [PMID: 21554733 PMCID: PMC3117732 DOI: 10.1186/1752-0509-5-67] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 05/09/2011] [Indexed: 12/27/2022]
Abstract
BACKGROUND Estrogens regulate diverse physiological processes in various tissues through genomic and non-genomic mechanisms that result in activation or repression of gene expression. Transcription regulation upon estrogen stimulation is a critical biological process underlying the onset and progress of the majority of breast cancer. Dynamic gene expression changes have been shown to characterize the breast cancer cell response to estrogens, the every molecular mechanism of which is still not well understood. RESULTS We developed a modulated empirical Bayes model, and constructed a novel topological and temporal transcription factor (TF) regulatory network in MCF7 breast cancer cell line upon stimulation by 17β-estradiol stimulation. In the network, significant TF genomic hubs were identified including ER-alpha and AP-1; significant non-genomic hubs include ZFP161, TFDP1, NRF1, TFAP2A, EGR1, E2F1, and PITX2. Although the early and late networks were distinct (<5% overlap of ERα target genes between the 4 and 24 h time points), all nine hubs were significantly represented in both networks. In MCF7 cells with acquired resistance to tamoxifen, the ERα regulatory network was unresponsive to 17β-estradiol stimulation. The significant loss of hormone responsiveness was associated with marked epigenomic changes, including hyper- or hypo-methylation of promoter CpG islands and repressive histone methylations. CONCLUSIONS We identified a number of estrogen regulated target genes and established estrogen-regulated network that distinguishes the genomic and non-genomic actions of estrogen receptor. Many gene targets of this network were not active anymore in anti-estrogen resistant cell lines, possibly because their DNA methylation and histone acetylation patterns have changed.
Collapse
Affiliation(s)
- Changyu Shen
- Center for Computational Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Abstract
In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increasing the difficulty of the problem. Based on previous experimental work, it is often the case that partial information about the TRN is available. For example, certain TFs may be known to regulate a given gene or in other cases a connection may be predicted with a certain probability. In general, the biology of the problem indicates there will be very few connections between TFs and genes. Several methods have been proposed for estimating TRNs. However, they all suffer from problems such as unrealistic assumptions about prior knowledge of the network structure or computational limitations. We propose a new approach that can directly utilize prior information about the network structure in conjunction with observed gene expression data to estimate the TRN. Our approach uses L(1) penalties on the network to ensure a sparse structure. This has the advantage of being computationally efficient as well as making many fewer assumptions about the network structure. We use our methodology to construct the TRN for E. coli and show that the estimate is biologically sensible and compares favorably with previous estimates.
Collapse
Affiliation(s)
- Gareth M James
- University of Southern California, Stanford University, University of Michigan and University of Michigan
| | | | | | | |
Collapse
|
12
|
Xie Y, Pan W, Jeong KS, Xiao G, Khodursky AB. A Bayesian approach to joint modeling of protein-DNA binding, gene expression and sequence data. Stat Med 2010; 29:489-503. [PMID: 20049751 DOI: 10.1002/sim.3815] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The genome-wide DNA-protein-binding data, DNA sequence data and gene expression data represent complementary means to deciphering global and local transcriptional regulatory circuits. Combining these different types of data can not only improve the statistical power, but also provide a more comprehensive picture of gene regulation. In this paper, we propose a novel statistical model to augment protein-DNA-binding data with gene expression and DNA sequence data when available. We specify a hierarchical Bayes model and use Markov chain Monte Carlo simulations to draw inferences. Both simulation studies and an analysis of an experimental data set show that the proposed joint modeling method can significantly improve the specificity and sensitivity of identifying target genes as compared with conventional approaches relying on a single data source.
Collapse
Affiliation(s)
- Yang Xie
- Division of Biostatistics, Department of Clinical Sciences, University of Texas Southwestern Medical Center at Dallas, Dallas, TX, USA.
| | | | | | | | | |
Collapse
|
13
|
Merret R, Moulia B, Hummel I, Cohen D, Dreyer E, Bogeat-Triboulot MB. Monitoring the regulation of gene expression in a growing organ using a fluid mechanics formalism. BMC Biol 2010; 8:18. [PMID: 20202192 PMCID: PMC2845557 DOI: 10.1186/1741-7007-8-18] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2009] [Accepted: 03/04/2010] [Indexed: 01/09/2023] Open
Abstract
Background Technological advances have enabled the accurate quantification of gene expression, even within single cell types. While transcriptome analyses are routinely performed, most experimental designs only provide snapshots of gene expression. Molecular mechanisms underlying cell fate or positional signalling have been revealed through these discontinuous datasets. However, in developing multicellular structures, temporal and spatial cues, known to directly influence transcriptional networks, get entangled as the cells are displaced and expand. Access to an unbiased view of the spatiotemporal regulation of gene expression occurring during development requires a specific framework that properly quantifies the rate of change of a property in a moving and expanding element, such as a cell or an organ segment. Results We show how the rate of change in gene expression can be quantified by combining kinematics and real-time polymerase chain reaction data in a mechanistic model which considers any organ as a continuum. This framework was applied in order to assess the developmental regulation of the two reference genes Actin11 and Elongation Factor 1-β in the apex of poplar root. The growth field was determined by time-lapse photography and transcript density was obtained at high spatial resolution. The net accumulation rates of the transcripts of the two genes were found to display highly contrasted developmental profiles. Actin11 showed pulses of up and down regulation in the accelerating and decelerating parts of the growth zone while the dynamic of EF1β were much slower. This framework provides key information about gene regulation in a developing organ, such as the location, the duration and the intensity of gene induction/repression. Conclusions We demonstrated that gene expression patterns can be monitored using the continuity equation without using mutants or reporter constructions. Given the rise of imaging technologies, this framework in our view opens a new way to dissect the molecular basis of growth regulation, even in non-model species or complex structures.
Collapse
Affiliation(s)
- Rémy Merret
- INRA, Nancy Université, UMR1137 Ecologie et Ecophysiologie Forestières, IFR 110 EFABA, F-54280 Champenoux, France
| | | | | | | | | | | |
Collapse
|
14
|
Revealing a signaling role of phytosphingosine-1-phosphate in yeast. Mol Syst Biol 2010; 6:349. [PMID: 20160710 PMCID: PMC2835565 DOI: 10.1038/msb.2010.3] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2009] [Accepted: 12/28/2009] [Indexed: 12/02/2022] Open
Abstract
Perturbing metabolic systems of bioactive sphingolipids with genetic approach Multiple types of “omics” data collected from the system Systems approach for integrating multiple “omics” information Predicting signal transduction information flow: lipid; TF activation; gene expression
In contemporary biomedical research, gene mutation remains the most powerful and commonly used tool in molecular and systems biology for perturbation and dissection of biological systems. However, as biological systems consist of highly connected networks, for example, metabolic networks or signal transduction networks, perturbing one portion could result in widely spread effects across the network. Such ‘ripple effects' in systems pose a challenge to the paradigm of investigating the role of a metabolite through mutating enzymes required for its production. In this study, we have developed a systems biology approach that integrates different types of ‘-omics' data to identify signal transduction pathways involving spingolipids and gene expression. See Figure 1 for an overall scheme of our approaches. Sphingolipids are a family of bioactive lipids that have important signaling functions in cells; in yeast, de novo synthesis is required to mediate the cell response to heat shock. We hypothesized that a specific sphingolipid, phyto-sphingosine-1-phosphate (PHS1P), functions as a signaling molecule in the heat stress response (HSR) because, though its mammalian counterparts are known to have important signaling roles, the function of this metabolite in yeast remains unknown. To identify a putative role of PHS1P in the HSR, we deleted the genes involved in production (LCB4 and LCB5) and degradation (DPL1) of PHS1P to perturb its levels in cells. In wild-type cells, heat shock induces a significant increase in PHS1P. Over the same course, expression of over a thousand genes was modulated. While deleting the genes involved in PHS1P metabolism ‘clamped' the PHS1P concentration as expected, these mutations also resulted in wide spread changes in many sphingolipids in addition to PHS1P. This ‘ripple effect' prevented direct identification of signaling role of PHS1P in gene expression. We overcame this difficulty by using a set of systems approaches as follows: (1) identifying the information between levels of each individual sphingolipid species and gene expression through combining correlation analysis and clustering; (2) identifying the putative PHS1P-sensitive subset of genes by analyzing the results from step 1; (3) identifying transcription factors (TFs) that potentially regulate these PHS1P-sensitive genes thought promoter analysis; (4) modeling the activation states of the TFs by combining gene expression data and promoter sequence data; and finally, (5) modeling the relationship between sphingolipids and activation of TFs. Our study showed that 441 genes were differentially expressed in the lcb4Δ/lcb5Δ strain in comparison to wild-type strain; however, only 77 genes among them showed a significant correlation with respect to PHS1P, with 22 genes positively correlated and 54 genes negatively correlated. The results led to a hypothesis that the genes showing significant correlation were PHS1P sensitive whereas differential expression of other genes resulted from the compounding ‘ripple effects' of the gene deletions. We tested this hypothesis by directly treating cells with PHS1P and monitoring the expression levels of the genes that were PHS1P sensitive and PHS1P insensitive, and the results showed that the expression of PHS1P-sensitive genes indeed changed in response to the treatment whereas others did not. We developed a statistical model referred to as Bayesian transcription factor state model to infer activation states of TFs in cells under a specific condition based on the genomic information and gene expression data. We then used a Bayesian logistic regression to further model the relationship between the lipid concentrations and activation states of the TFs. Combined TF enrichment analysis and TF state modeling indicated that the HAP TF complex was likely responding to the signal from PHS1P and mediating the regulation of PHS1P-sensitive genes. We tested this hypothesis by treating wild type and a strain of yeast with deletion of HAP4 gene (hap4Δ), a component of the HAP complex, with PHS1P and monitoring the expression of PHS1P-sensitive genes. Indeed, the PHS1P induced the genes in the wild-type strain but not in hap4Δ, thus indicating that induction of the PHS1P-sensitive genes required a functioning HAP complex (see Figure 5 ). In summary, our experiments demonstrated that, though gene mutation remains one of the most powerful tools to perturb biological systems, the high connectivity of biological systems poses a challenge for using this approach to identify signaling roles of bioactive metabolites. Here, we demonstrated combining the information from multiple types of ‘-omics' data using systems approaches, it is possible to circumvent these difficulties and reveal novel signal transduction pathways. Sphingolipids including sphingosine-1-phosphate and ceramide participate in numerous cell programs through signaling mechanisms. This class of lipids has important functions in stress responses; however, determining which sphingolipid mediates specific events has remained encumbered by the numerous metabolic interconnections of sphingolipids, such that modulating a specific lipid of interest through manipulating metabolic enzymes causes ‘ripple effects', which change levels of many other lipids. Here, we develop a method of integrative analysis for genomic, transcriptomic, and lipidomic data to address this previously intractable problem. This method revealed a specific signaling role for phytosphingosine-1-phosphate, a lipid with no previously defined specific function in yeast, in regulating genes required for mitochondrial respiration through the HAP complex transcription factor. This approach could be applied to extract meaningful biological information from a similar experimental design that produces multiple sets of high-throughput data.
Collapse
|
15
|
Wang J, Tian T. Quantitative model for inferring dynamic regulation of the tumour suppressor gene p53. BMC Bioinformatics 2010; 11:36. [PMID: 20085646 PMCID: PMC2832896 DOI: 10.1186/1471-2105-11-36] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 01/19/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The availability of various "omics" datasets creates a prospect of performing the study of genome-wide genetic regulatory networks. However, one of the major challenges of using mathematical models to infer genetic regulation from microarray datasets is the lack of information for protein concentrations and activities. Most of the previous researches were based on an assumption that the mRNA levels of a gene are consistent with its protein activities, though it is not always the case. Therefore, a more sophisticated modelling framework together with the corresponding inference methods is needed to accurately estimate genetic regulation from "omics" datasets. RESULTS This work developed a novel approach, which is based on a nonlinear mathematical model, to infer genetic regulation from microarray gene expression data. By using the p53 network as a test system, we used the nonlinear model to estimate the activities of transcription factor (TF) p53 from the expression levels of its target genes, and to identify the activation/inhibition status of p53 to its target genes. The predicted top 317 putative p53 target genes were supported by DNA sequence analysis. A comparison between our prediction and the other published predictions of p53 targets suggests that most of putative p53 targets may share a common depleted or enriched sequence signal on their upstream non-coding region. CONCLUSIONS The proposed quantitative model can not only be used to infer the regulatory relationship between TF and its down-stream genes, but also be applied to estimate the protein activities of TF from the expression levels of its target genes.
Collapse
Affiliation(s)
- Junbai Wang
- Division of Pathology, The Norwegian Radium Hospital, Rikshospitalet University Hospital, Montebello 0310 Oslo, Norway
| | | |
Collapse
|
16
|
ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proc Natl Acad Sci U S A 2009; 106:21521-6. [PMID: 19995984 DOI: 10.1073/pnas.0904863106] [Citation(s) in RCA: 243] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Next-generation sequencing has greatly increased the scope and the resolution of transcriptional regulation study. RNA sequencing (RNA-Seq) and ChIP-Seq experiments are now generating comprehensive data on transcript abundance and on regulator-DNA interactions. We propose an approach for an integrated analysis of these data based on feature extraction of ChIP-Seq signals, principal component analysis, and regression-based component selection. Compared with traditional methods, our approach not only offers higher power in predicting gene expression from ChIP-Seq data but also provides a way to capture cooperation among regulators. In mouse embryonic stem cells (ESCs), we find that a remarkably high proportion of variation in gene expression (65%) can be explained by the binding signals of 12 transcription factors (TFs). Two groups of TFs are identified. Whereas the first group (E2f1, Myc, Mycn, and Zfx) act as activators in general, the second group (Oct4, Nanog, Sox2, Smad1, Stat3, Tcfcp2l1, and Esrrb) may serve as either activator or repressor depending on the target. The two groups of TFs cooperate tightly to activate genes that are differentially up-regulated in ESCs. In the absence of binding by the first group, the binding of the second group is associated with genes that are repressed in ESCs and derepressed upon early differentiation.
Collapse
|
17
|
Abstract
One central problem in biology is to understand how gene expression is regulated under different conditions. Microarray gene expression data and other high throughput data have made it possible to dissect transcriptional regulatory networks at the genomics level. Owing to the very large number of genes that need to be studied, the relatively small number of data sets available, the noise in the data and the different natures of the distinct data types, network inference presents great challenges. In this article, we review statistical and computational methods that have been developed in the last decade in response to genomics data for inferring transcriptional regulatory networks.
Collapse
Affiliation(s)
- Ning Sun
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520, USA.
| | | |
Collapse
|
18
|
Wang RS, Jin G, Zhang XS, Chen L. Modeling post-transcriptional regulation activity of small non-coding RNAs in Escherichia coli. BMC Bioinformatics 2009; 10 Suppl 4:S6. [PMID: 19426454 PMCID: PMC2681065 DOI: 10.1186/1471-2105-10-s4-s6] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transcriptional regulation is a fundamental process in biological systems, where transcription factors (TFs) have been revealed to play crucial roles. In recent years, in addition to TFs, an increasing number of non-coding RNAs (ncRNAs) have been shown to mediate post-transcriptional processes and regulate many critical pathways in both prokaryotes and eukaryotes. On the other hand, with more and more high-throughput biological data becoming available, it is possible and imperative to quantitatively study gene regulation in a systematic and detailed manner. RESULTS Most existing studies for inferring transcriptional regulatory interactions and the activity of TFs ignore the possible post-transcriptional effects of ncRNAs. In this work, we propose a novel framework to infer the activity of regulators including both TFs and ncRNAs by exploring the expression profiles of target genes and (post)transcriptional regulatory relationships. We model the integrated regulatory system by a set of biochemical reactions which lead to a log-bilinear problem. The inference process is achieved by an iterative algorithm, in which two linear programming models are efficiently solved. In contrast to available related studies, the effects of ncRNAs on transcription process are considered in this work, and thus more reasonable and accurate reconstruction can be expected. In addition, the approach is suitable for large-scale problems from the viewpoint of computation. Experiments on two synthesized data sets and a model system of Escherichia coli (E. coli) carbon source transition from glucose to acetate illustrate the effectiveness of our model and algorithm. CONCLUSION Our results show that incorporating the post-transcriptional regulation of ncRNAs into system model can mine the hidden effects from the regulation activity of TFs in transcription processes and thus can uncover the biological mechanisms in gene regulation in a more accurate manner. The software for the algorithm in this paper is available upon request.
Collapse
Affiliation(s)
- Rui-Sheng Wang
- School of Information, Renmin University of China, Beijing 100872, PR China.
| | | | | | | |
Collapse
|
19
|
Wei P, Pan W. Incorporating gene functions into regression analysis of DNA-protein binding data and gene expression data to construct transcriptional networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:401-415. [PMID: 18670043 DOI: 10.1109/tcbb.2007.1062] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Useful information on transcriptional networks has been extracted by regression analyses of gene expression data and DNA-protein binding data. However, a potential limitation of these approaches is their assumption on the common and constant activity level of a transcription factor (TF) on all the genes in any given experimental condition; for example, any TF is assumed to be either an activator or a repressor, but not both, while it is known that some TFs can be dual regulators. Rather than assuming a common linear regression model for all the genes, we propose using separate regression models for various gene groups; the genes can be grouped based on their functions or some clustering results. Furthermore, to take advantage of the hierarchical structure of many existing gene function annotation systems, such as Gene Ontology (GO), we propose a shrinkage method that borrows information from relevant gene groups. Applications to a yeast dataset and simulations lend support for our proposed methods. In particular, we find that the shrinkage method consistently works well under various scenarios. We recommend the use of the shrinkage method as a useful alternative to the existing methods.
Collapse
Affiliation(s)
- Peng Wei
- Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, MMC 303, Minneapolis, MN 55455-0378, USA.
| | | |
Collapse
|
20
|
Lin S, Ding J. Integration of ranked lists via cross entropy Monte Carlo with applications to mRNA and microRNA Studies. Biometrics 2008; 65:9-18. [PMID: 18479487 DOI: 10.1111/j.1541-0420.2008.01044.x] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
One of the major challenges facing researchers studying complex biological systems is integration of data from -omics platforms. Omic-scale data include DNA variations, transcriptom profiles, and RAomics. Selection of an appropriate approach for a data-integration task is problem dependent, primarily dictated by the information contained in the data. In situations where modeling of multiple raw datasets jointly might be extremely challenging due to their vast differences, rankings from each dataset would provide a commonality based on which results could be integrated. Aggregation of microRNA targets predicted from different computational algorithms is such a problem. Integration of results from multiple mRNA studies based on different platforms is another example that will be discussed. Formulating the problem of integrating ranked lists as minimizing an objective criterion, we explore the usage of a cross entropy Monte Carlo method for solving such a combinatorial problem. Instead of placing a discrete uniform distribution on all the potential solutions, an iterative importance sampling technique is utilized "to slowly tighten the net" to place most distributional mass on the optimal solution and its neighbors. Extensive simulation studies were performed to assess the performance of the method. With satisfactory simulation results, the method was applied to the microRNA and mRNA problems to illustrate its utility.
Collapse
Affiliation(s)
- Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio 43210-1247, USA.
| | | |
Collapse
|
21
|
Datta D, Zhao H. Statistical methods to infer cooperative binding among transcription factors in Saccharomyces cerevisiae. ACTA ACUST UNITED AC 2007; 24:545-52. [PMID: 17989095 DOI: 10.1093/bioinformatics/btm523] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Transcription factors regulate transcription in prokaryotes and eukaryotes by binding to specific DNA sequences in the regulatory regions of the genes. This regulation usually occurs in a coordinated manner involving multiple transcription factors. Genome-wide location data, also called ChIP-chip data, have enabled researchers to infer the binding sites for individual regulatory proteins. However, current methods to infer binding sites, such as simple thresholding based on p-values, are not optimal for a number of study objectives like combinatorial regulation, leading to potential loss of information. Hence, there is a need to develop more efficient statistical methods for analyzing such data. RESULTS We propose to use log-linear models to study cooperative binding among transcription factors and have developed an Expectation-Maximization algorithm for statistical inferences. Our method is advantageous over simple thresholding methods both based on simulation and real data studies. We apply our method to infer the cooperative network of 204 regulators in Rich Medium and a subset of them in four different environmental conditions. Our results indicate that the cooperative network is condition specific; for a set of regulators, the network structure changes under different environmental conditions. AVAILABILITY Our program is available at http://bioinformatics.med.yale.edu/TFcooperativity.
Collapse
Affiliation(s)
- Debayan Datta
- Department of Biomedical Engineering, Yale University, New Haven, CT 06520, USA
| | | |
Collapse
|
22
|
Androulakis IP, Yang E, Almon RR. Analysis of time-series gene expression data: methods, challenges, and opportunities. Annu Rev Biomed Eng 2007; 9:205-28. [PMID: 17341157 PMCID: PMC4181347 DOI: 10.1146/annurev.bioeng.9.060906.151904] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Monitoring the change in expression patterns over time provides the distinct possibility of unraveling the mechanistic drivers characterizing cellular responses. Gene arrays measuring the level of mRNA expression of thousands of genes simultaneously provide a method of high-throughput data collection necessary for obtaining the scope of data required for understanding the complexities of living organisms. Unraveling the coherent complex structures of transcriptional dynamics is the goal of a large family of computational methods aiming at upgrading the information content of time-course gene expression data. In this review, we summarize the qualitative characteristics of these approaches, discuss the main challenges that this type of complex data present, and, finally, explore the opportunities in the context of developing mechanistic models of cellular response.
Collapse
Affiliation(s)
- I P Androulakis
- Biomedical Engineering Department, Rutgers University, Piscataway, New Jersey 08854, USA.
| | | | | |
Collapse
|
23
|
Hart CE, Mjolsness E, Wold BJ. Connectivity in the yeast cell cycle transcription network: inferences from neural networks. PLoS Comput Biol 2006; 2:e169. [PMID: 17194216 PMCID: PMC1761652 DOI: 10.1371/journal.pcbi.0020169] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2006] [Accepted: 10/30/2006] [Indexed: 02/02/2023] Open
Abstract
A current challenge is to develop computational approaches to infer gene network regulatory relationships based on multiple types of large-scale functional genomic data. We find that single-layer feed-forward artificial neural network (ANN) models can effectively discover gene network structure by integrating global in vivo protein:DNA interaction data (ChIP/Array) with genome-wide microarray RNA data. We test this on the yeast cell cycle transcription network, which is composed of several hundred genes with phase-specific RNA outputs. These ANNs were robust to noise in data and to a variety of perturbations. They reliably identified and ranked 10 of 12 known major cell cycle factors at the top of a set of 204, based on a sum-of-squared weights metric. Comparative analysis of motif occurrences among multiple yeast species independently confirmed relationships inferred from ANN weights analysis. ANN models can capitalize on properties of biological gene networks that other kinds of models do not. ANNs naturally take advantage of patterns of absence, as well as presence, of factor binding associated with specific expression output; they are easily subjected to in silico "mutation" to uncover biological redundancies; and they can use the full range of factor binding values. A prominent feature of cell cycle ANNs suggested an analogous property might exist in the biological network. This postulated that "network-local discrimination" occurs when regulatory connections (here between MBF and target genes) are explicitly disfavored in one network module (G2), relative to others and to the class of genes outside the mitotic network. If correct, this predicts that MBF motifs will be significantly depleted from the discriminated class and that the discrimination will persist through evolution. Analysis of distantly related Schizosaccharomyces pombe confirmed this, suggesting that network-local discrimination is real and complements well-known enrichment of MBF sites in G1 class genes.
Collapse
Affiliation(s)
- Christopher E Hart
- Division of Biology, California Institute of Technology, Pasadena, California, United States of America
| | - Eric Mjolsness
- Institute for Genomics and Bioinformatics, School of Information and Computer Science, University of California Irvine, Irvine, California, United States of America
- Biological Network Modeling Center, Beckman Institute, California Institute of Technology, Pasadena, California, United States of America
| | - Barbara J Wold
- Division of Biology, California Institute of Technology, Pasadena, California, United States of America
- Biological Network Modeling Center, Beckman Institute, California Institute of Technology, Pasadena, California, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
24
|
Brynildsen MP, Tran LM, Liao JC. A Gibbs sampler for the identification of gene expression and network connectivity consistency. Bioinformatics 2006; 22:3040-6. [PMID: 17060361 DOI: 10.1093/bioinformatics/btl541] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Data from DNA microarrays and ChIP-chip binding assays often form the basis of transcriptional regulatory analyses. However, experimental noise in both data types combined with environmental dependence and uncorrelation between binding and regulation in ChIP-chip binding data complicate analyses that utilize these complimentary data sources. Therefore, to minimize the impact of these inaccuracies on transcription analyses it is desirable to identify instances of gene expression-ChIP-chip agreement, under the premise that inaccuracies are less likely to be present when separate data sources corroborate each other. Current methods for such identification either make key assumptions that limit their applicability and/or yield high false positive and false negative rates. The goal of this work was to develop a method with a minimal amount of assumptions, and thus widely applicable, that can identify agreement between gene expression and ChIP-chip data at a higher confidence level than current methods. RESULTS We demonstrate in Saccharomyces cerevisiae that currently available ChIP-chip binding data explain microarray data from a variety of environments only as well as randomized networks with the same connectivity density. This suggests a high degree of inconsistency between the two data types and illustrates the need for a method that can identify consistency between the two data sources. Here we have developed a Gibbs sampling technique to identify genes whose expression and ChIP-chip binding data are mutually consistent. Compared to current methods that could perform the same task, the Gibbs sampling method developed here exceeds their ability at high levels (>50%) of transcription network and gene expression error, while performing similarly at lower levels. Using this technique, we show that on average 73% more gene expression features can be captured per gene as compared to the unfiltered use of gene expression and ChIP-chip-derived network connectivity data. It is important to note that the method described here can be generalized to other transcription connectivity data (e.g. sequence analysis, etc.). AVAILABILITY Our algorithm is available on request from the authors and soon to be posted on the web. See author's homepage for details, http://www.seas.ucla.edu/~liaoj/
Collapse
Affiliation(s)
- Mark P Brynildsen
- Department of Chemical and Biomolecular Engineering, University of California Los Angeles, CA 90095, USA
| | | | | |
Collapse
|