101
|
Chen L, Cai C, Chen V, Lu X. Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model. BMC Bioinformatics 2016; 17 Suppl 1:9. [PMID: 26818848 PMCID: PMC4895523 DOI: 10.1186/s12859-015-0852-1] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background A living cell has a complex, hierarchically organized signaling system that encodes and assimilates diverse environmental and intracellular signals, and it further transmits signals that control cellular responses, including a tightly controlled transcriptional program. An important and yet challenging task in systems biology is to reconstruct cellular signaling system in a data-driven manner. In this study, we investigate the utility of deep hierarchical neural networks in learning and representing the hierarchical organization of yeast transcriptomic machinery. Results We have designed a sparse autoencoder model consisting of a layer of observed variables and four layers of hidden variables. We applied the model to over a thousand of yeast microarrays to learn the encoding system of yeast transcriptomic machinery. After model selection, we evaluated whether the trained models captured biologically sensible information. We show that the latent variables in the first hidden layer correctly captured the signals of yeast transcription factors (TFs), obtaining a close to one-to-one mapping between latent variables and TFs. We further show that genes regulated by latent variables at higher hidden layers are often involved in a common biological process, and the hierarchical relationships between latent variables conform to existing knowledge. Finally, we show that information captured by the latent variables provide more abstract and concise representations of each microarray, enabling the identification of better separated clusters in comparison to gene-based representation. Conclusions Contemporary deep hierarchical latent variable models, such as the autoencoder, can be used to partially recover the organization of transcriptomic machinery.
Collapse
Affiliation(s)
- Lujia Chen
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd, 15237, Pittsburgh, PA, USA.
| | - Chunhui Cai
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd, 15237, Pittsburgh, PA, USA.
| | - Vicky Chen
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd, 15237, Pittsburgh, PA, USA.
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Blvd, 15237, Pittsburgh, PA, USA.
| |
Collapse
|
102
|
Reconstruction of temporal activity of microRNAs from gene expression data in breast cancer cell line. BMC Genomics 2015; 16:1077. [PMID: 26763900 PMCID: PMC4712512 DOI: 10.1186/s12864-015-2260-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 11/30/2015] [Indexed: 12/20/2022] Open
Abstract
Background MicroRNAs (miRNAs) are small non-coding RNAs that regulate genes at the post-transcriptional level in spatiotemporal manner. Several miRNAs are identified as prognostic and diagnostic markers in many human cancers. Estimation of the temporal activities of the miRNAs is an important step in the way to understand the complex interactions of these important regulatory elements with transcription factors (TFs) and target genes (TGs). However, current research on miRNA activities excludes network dynamics from the studies, disregarding the important element of time in the regulatory network analysis. Results In the current study, we combined experimentally verified miRNA-TG interactions with breast cancer microarray TG expression data to identify key miRNAs and compute their temporal activity using network component analysis (NCA). The computed activities showed that miRNAs were regulated in a time dependent manner. Our results allowed constructing a synergistic network of miRNAs using the computed miRNA activities and their shared regulation of TGs. We further extended this network by incorporating miRNA-TG, miRNA-TF, TF-miRNA and TF-TG regulations in the context of breast cancer. Our integrated network identified several miRNAs known to be involved in breast cancer regulation and revealed several novel miRNAs. Our further analysis detected substantial involvement of the miRNAs miR-324, miR-93, miR-615 and miR-1 in breast cancer, which was not known previously. Next, combining our integrated networks with functional annotation of differentially expressed genes resulted in new sub-networks. These sub-networks allowed us to identify the key miRNAs and their interactions with TFs and TGs of several biological processes involved in breast cancer. The identified markers are validated for their potential as prognostic markers for breast cancer through survival analysis. Conclusions Our dynamical analysis of the miRNA interactions greatly helps to discover new network based markers, and is highly applicable (but not limited) to cancer research. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2260-3) contains supplementary material, which is available to authorized users.
Collapse
|
103
|
Barah P, B N MN, Jayavelu ND, Sowdhamini R, Shameer K, Bones AM. Transcriptional regulatory networks in Arabidopsis thaliana during single and combined stresses. Nucleic Acids Res 2015; 44:3147-64. [PMID: 26681689 PMCID: PMC4838348 DOI: 10.1093/nar/gkv1463] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 11/28/2015] [Indexed: 11/25/2022] Open
Abstract
Differentially evolved responses to various stress conditions in plants are controlled by complex regulatory circuits of transcriptional activators, and repressors, such as transcription factors (TFs). To understand the general and condition-specific activities of the TFs and their regulatory relationships with the target genes (TGs), we have used a homogeneous stress gene expression dataset generated on ten natural ecotypes of the model plant Arabidopsis thaliana, during five single and six combined stress conditions. Knowledge-based profiles of binding sites for 25 stress-responsive TF families (187 TFs) were generated and tested for their enrichment in the regulatory regions of the associated TGs. Condition-dependent regulatory sub-networks have shed light on the differential utilization of the underlying network topology, by stress-specific regulators and multifunctional regulators. The multifunctional regulators maintain the core stress response processes while the transient regulators confer the specificity to certain conditions. Clustering patterns of transcription factor binding sites (TFBS) have reflected the combinatorial nature of transcriptional regulation, and suggested the putative role of the homotypic clusters of TFBS towards maintaining transcriptional robustness against cis-regulatory mutations to facilitate the preservation of stress response processes. The Gene Ontology enrichment analysis of the TGs reflected sequential regulation of stress response mechanisms in plants.
Collapse
Affiliation(s)
- Pankaj Barah
- Cell, Molecular Biology and Genomics Group, Department of Biology, Norwegian University of Science and Technology, Trondheim N-7491, Norway
| | - Mahantesha Naika B N
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK campus, Bangalore 560 065, India
| | - Naresh Doni Jayavelu
- Department of Chemical Engineering, Norwegian University of Science and Technology, Trondheim N-7491, Norway
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK campus, Bangalore 560 065, India
| | - Khader Shameer
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK campus, Bangalore 560 065, India
| | - Atle M Bones
- Cell, Molecular Biology and Genomics Group, Department of Biology, Norwegian University of Science and Technology, Trondheim N-7491, Norway
| |
Collapse
|
104
|
Arrieta-Ortiz ML, Hafemeister C, Bate AR, Chu T, Greenfield A, Shuster B, Barry SN, Gallitto M, Liu B, Kacmarczyk T, Santoriello F, Chen J, Rodrigues CDA, Sato T, Rudner DZ, Driks A, Bonneau R, Eichenberger P. An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network. Mol Syst Biol 2015; 11:839. [PMID: 26577401 PMCID: PMC4670728 DOI: 10.15252/msb.20156236] [Citation(s) in RCA: 138] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Organisms from all domains of life use gene regulation networks to control cell growth, identity, function, and responses to environmental challenges. Although accurate global regulatory models would provide critical evolutionary and functional insights, they remain incomplete, even for the best studied organisms. Efforts to build comprehensive networks are confounded by challenges including network scale, degree of connectivity, complexity of organism–environment interactions, and difficulty of estimating the activity of regulatory factors. Taking advantage of the large number of known regulatory interactions in Bacillus subtilis and two transcriptomics datasets (including one with 38 separate experiments collected specifically for this study), we use a new combination of network component analysis and model selection to simultaneously estimate transcription factor activities and learn a substantially expanded transcriptional regulatory network for this bacterium. In total, we predict 2,258 novel regulatory interactions and recall 74% of the previously known interactions. We obtained experimental support for 391 (out of 635 evaluated) novel regulatory edges (62% accuracy), thus significantly increasing our understanding of various cell processes, such as spore formation.
Collapse
Affiliation(s)
- Mario L Arrieta-Ortiz
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Christoph Hafemeister
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Ashley Rose Bate
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Timothy Chu
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Alex Greenfield
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Bentley Shuster
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Samantha N Barry
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Matthew Gallitto
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Brian Liu
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Thadeous Kacmarczyk
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Francis Santoriello
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Jie Chen
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | | | - Tsutomu Sato
- Department of Frontier Bioscience, Hosei University, Koganei, Tokyo, Japan
| | - David Z Rudner
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA, USA
| | - Adam Driks
- Department of Microbiology and Immunology, Stritch School of Medicine, Loyola University Chicago, Maywood, IL, USA
| | - Richard Bonneau
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA Courant Institute of Mathematical Science, Computer Science Department, New York, NY, USA Simons Foundation, Simons Center for Data Analysis, New York, NY, USA
| | - Patrick Eichenberger
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| |
Collapse
|
105
|
Wang X, Alshawaqfeh M, Dang X, Wajid B, Noor A, Qaraqe M, Serpedin E. An Overview of NCA-Based Algorithms for Transcriptional Regulatory Network Inference. ACTA ACUST UNITED AC 2015; 4:596-617. [PMID: 27600242 PMCID: PMC4996402 DOI: 10.3390/microarrays4040596] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 10/07/2015] [Accepted: 11/11/2015] [Indexed: 01/08/2023]
Abstract
In systems biology, the regulation of gene expressions involves a complex network of regulators. Transcription factors (TFs) represent an important component of this network: they are proteins that control which genes are turned on or off in the genome by binding to specific DNA sequences. Transcription regulatory networks (TRNs) describe gene expressions as a function of regulatory inputs specified by interactions between proteins and DNA. A complete understanding of TRNs helps to predict a variety of biological processes and to diagnose, characterize and eventually develop more efficient therapies. Recent advances in biological high-throughput technologies, such as DNA microarray data and next-generation sequence (NGS) data, have made the inference of transcription factor activities (TFAs) and TF-gene regulations possible. Network component analysis (NCA) represents an efficient computational framework for TRN inference from the information provided by microarrays, ChIP-on-chip and the prior information about TF-gene regulation. However, NCA suffers from several shortcomings. Recently, several algorithms based on the NCA framework have been proposed to overcome these shortcomings. This paper first overviews the computational principles behind NCA, and then, it surveys the state-of-the-art NCA-based algorithms proposed in the literature for TRN reconstruction.
Collapse
Affiliation(s)
- Xu Wang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Mustafa Alshawaqfeh
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Xuan Dang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Bilal Wajid
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Amina Noor
- Institute of Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| | - Marwa Qaraqe
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | - Erchin Serpedin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
106
|
Jayavelu ND, Aasgaard LS, Bar N. Iterative sub-network component analysis enables reconstruction of large scale genetic networks. BMC Bioinformatics 2015; 16:366. [PMID: 26537518 PMCID: PMC4634733 DOI: 10.1186/s12859-015-0768-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 10/09/2015] [Indexed: 11/28/2022] Open
Abstract
Background Network component analysis (NCA) became a popular tool to understand complex regulatory networks. The method uses high-throughput gene expression data and a priori topology to reconstruct transcription factor activity profiles. Current NCA algorithms are constrained by several conditions posed on the network topology, to guarantee unique reconstruction (termed compliancy). However, the restrictions these conditions pose are not necessarily true from biological perspective and they force network size reduction, pruning potentially important components. Results To address this, we developed a novel, Iterative Sub-Network Component Analysis (ISNCA) for reconstructing networks at any size. By dividing the initial network into smaller, compliant subnetworks, the algorithm first predicts the reconstruction of each subntework using standard NCA algorithms. It then subtracts from the reconstruction the contribution of the shared components from the other subnetwork. We tested the ISNCA on real, large datasets using various NCA algorithms. The size of the networks we tested and the accuracy of the reconstruction increased significantly. Importantly, FOXA1, ATF2, ATF3 and many other known key regulators in breast cancer could not be incorporated by any NCA algorithm because of the necessary conditions. However, their temporal activities could be reconstructed by our algorithm, and therefore their involvement in breast cancer could be analyzed. Conclusions Our framework enables reconstruction of large gene expression data networks, without reducing their size or pruning potentially important components, and at the same time rendering the results more biological plausible. Our ISNCA method is not only suitable for prediction of key regulators in cancer studies, but it can be applied to any high-throughput gene expression data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0768-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Naresh Doni Jayavelu
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Salandsvei 4, Trondheim, Norway.
| | - Lasse S Aasgaard
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Salandsvei 4, Trondheim, Norway.
| | - Nadav Bar
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Salandsvei 4, Trondheim, Norway.
| |
Collapse
|
107
|
Pseudo-transition Analysis Identifies the Key Regulators of Dynamic Metabolic Adaptations from Steady-State Data. Cell Syst 2015; 1:270-82. [DOI: 10.1016/j.cels.2015.09.008] [Citation(s) in RCA: 108] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Revised: 08/12/2015] [Accepted: 09/30/2015] [Indexed: 11/20/2022]
|
108
|
Lv QY, Wan B, Guo LH, Yang Y, Ren XM, Zhang H. In vivo immunotoxicity of perfluorooctane sulfonate in BALB/c mice: Identification of T-cell receptor and calcium-mediated signaling pathway disruption through gene expression profiling of the spleen. Chem Biol Interact 2015; 240:84-93. [PMID: 26300304 DOI: 10.1016/j.cbi.2015.07.015] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Revised: 06/05/2015] [Accepted: 07/30/2015] [Indexed: 11/16/2022]
Abstract
Perfluorooctane sulfonate (PFOS) is a persistent organic pollutant that is used worldwide and is continuously being detected in biota and the environment, thus presenting potential threats to the ecosystem and human health. Although PFOS is highly immunotoxic, its underlying molecular mechanisms remain largely unknown. The present study examined PFOS-induced immunotoxicity in the mouse spleen and explored its underlying mechanisms by gene expression profiling. Oral exposure of male BALB/c mice for three weeks followed by one-week recovery showed that a 10 mg/kg/day PFOS exposure damaged the splenic architecture, inhibited T-cell proliferation in response to mitogen, and increased the percentages of T helper (CD3(+)CD4(+)) and cytotoxic T (CD3(+)CD8(+)) cells, despite the decrease in the absolute number of these cells. A delayed type of PFOS immunotoxicity was observed, which mainly occurred during the recovery period. Global gene expression profiling of mouse spleens and QRT-PCR analyses suggest that PFOS inhibited the expression of genes involved in cell cycle regulation and NRF2-mediated oxidative stress response, and upregulated those in TCR signaling, calcium signaling, and p38/MAPK signaling pathways. Western blot analysis confirmed that the expressions of CAMK4, THEMIS, and CD3G, which were involved in the upregulated pathways, were induced upon PFOS exposure. Acute PFOS exposure modulated calcium homoeostasis in splenocytes. These results indicate that PFOS exposure can activate TCR signaling and calcium ion influx, which provides a clue for the potential mechanism of PFOS immunotoxicity. The altered signaling pathways by PFOS treatment as revealed in the present study might facilitate in better understanding PFOS immunotoxicity and explain the association between immune disease and PFOS exposure.
Collapse
Affiliation(s)
- Qi-Yan Lv
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 10085, China
| | - Bin Wan
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 10085, China.
| | - Liang-Hong Guo
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 10085, China.
| | - Yu Yang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 10085, China
| | - Xiao-Min Ren
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 10085, China
| | - Hui Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 10085, China
| |
Collapse
|
109
|
Wu WS. A Computational Method for Identifying Yeast Cell Cycle Transcription Factors. Methods Mol Biol 2015; 1342:209-19. [PMID: 26254926 DOI: 10.1007/978-1-4939-2957-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
The eukaryotic cell cycle is a complex process and is precisely regulated at many levels. Many genes specific to the cell cycle are regulated transcriptionally and are expressed just before they are needed. To understand the cell cycle process, it is important to identify the cell cycle transcription factors (TFs) that regulate the expression of cell cycle-regulated genes. Here, we describe a computational method to identify cell cycle TFs in yeast by integrating current ChIP-chip, mutant, transcription factor-binding site (TFBS), and cell cycle gene expression data. For each identified cell cycle TF, our method also assigned specific cell cycle phases in which the TF functions and identified the time lag for the TF to exert regulatory effects on its target genes. Moreover, our method can identify novel cell cycle-regulated genes as a by-product.
Collapse
Affiliation(s)
- Wei-Sheng Wu
- Department of Electrical Engineering, National Cheng Kung University, No. 1 Daxue Road, East District, Tainan City, 701, Taiwan,
| |
Collapse
|
110
|
Yang H, Kang K, Cheng C, Mamillapalli R, Taylor HS. Integrative Analysis Reveals Regulatory Programs in Endometriosis. Reprod Sci 2015; 22:1060-72. [PMID: 26134036 DOI: 10.1177/1933719115592709] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Endometriosis is a common gynecological disease found in approximately 10% of reproductive-age women. Gene expression analysis has been performed to explore alterations in gene expression associated with endometriosis; however, the underlying transcription factors (TFs) governing such expression changes have not been investigated in a systematic way. In this study, we propose a method to integrate gene expression with TF binding data and protein-protein interactions to construct an integrated regulatory network (IRN) for endometriosis. The IRN has shown that the most regulated gene in endometriosis is RUNX1, which is targeted by 14 of 26 TFs also involved in endometriosis. Using 2 published cohorts, GSE7305 (Hover, n = 20) and GSE7307 (Roth, n = 36) from the Gene Expression Omnibus database, we identified a network of TFs, which bind to target genes that are differentially expressed in endometriosis. Enrichment analysis based on the hypergeometric distribution allowed us to predict the TFs involved in endometriosis (n = 40). This included known TFs such as androgen receptor (AR) and critical factors in the pathology of endometriosis, estrogen receptor α, and estrogen receptor β. We also identified several new ones from which we selected FOXA2 and TFAP2C, and their regulation was confirmed by quantitative real-time polymerase chain reaction and immunohistochemistry (IHC). Further, our analysis revealed that the function of AR and p53 in endometriosis is regulated by posttranscriptional changes and not by differential gene expression. Our integrative analysis provides new insights into the regulatory programs involved in endometriosis.
Collapse
Affiliation(s)
- Huan Yang
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, CT, USA Department of Gynecology, Minimally Invasive Gynecology Center, Beijing Obstetrics and Gynecology Hospital Capital Medical University, Beijing, China
| | - Kai Kang
- Department of Gynecology, Minimally Invasive Gynecology Center, Beijing Obstetrics and Gynecology Hospital Capital Medical University, Beijing, China
| | - Chao Cheng
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Ramanaiah Mamillapalli
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, CT, USA
| | - Hugh S Taylor
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Yale School of Medicine, New Haven, CT, USA
| |
Collapse
|
111
|
A Semiquantitative Framework for Gene Regulatory Networks: Increasing the Time and Quantitative Resolution of Boolean Networks. PLoS One 2015; 10:e0130033. [PMID: 26067297 PMCID: PMC4489432 DOI: 10.1371/journal.pone.0130033] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Accepted: 05/15/2015] [Indexed: 12/29/2022] Open
Abstract
Boolean models have been instrumental in predicting general features of gene networks and more recently also as explorative tools in specific biological applications. In this study we introduce a basic quantitative and a limited time resolution to a discrete (Boolean) framework. Quantitative resolution is improved through the employ of normalized variables in unison with an additive approach. Increased time resolution stems from the introduction of two distinct priority classes. Through the implementation of a previously published chondrocyte network and T helper cell network, we show that this addition of quantitative and time resolution broadens the scope of biological behaviour that can be captured by the models. Specifically, the quantitative resolution readily allows models to discern qualitative differences in dosage response to growth factors. The limited time resolution, in turn, can influence the reachability of attractors, delineating the likely long term system behaviour. Importantly, the information required for implementation of these features, such as the nature of an interaction, is typically obtainable from the literature. Nonetheless, a trade-off is always present between additional computational cost of this approach and the likelihood of extending the model’s scope. Indeed, in some cases the inclusion of these features does not yield additional insight. This framework, incorporating increased and readily available time and semi-quantitative resolution, can help in substantiating the litmus test of dynamics for gene networks, firstly by excluding unlikely dynamics and secondly by refining falsifiable predictions on qualitative behaviour.
Collapse
|
112
|
Gehan MA, Greenham K, Mockler TC, McClung CR. Transcriptional networks-crops, clocks, and abiotic stress. CURRENT OPINION IN PLANT BIOLOGY 2015; 24:39-46. [PMID: 25646668 DOI: 10.1016/j.pbi.2015.01.004] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Revised: 01/07/2015] [Accepted: 01/08/2015] [Indexed: 05/20/2023]
Abstract
Several factors affect the yield potential and geographical range of crops including the circadian clock, water availability, and seasonal temperature changes. In order to sustain and increase plant productivity on marginal land in the face of both biotic and abiotic stresses, we need to more efficiently generate stress-resistant crops through marker-assisted breeding, genetic modification, and new genome-editing technologies. To leverage these strategies for producing the next generation of crops, future transcriptomic data acquisition should be pursued with an appropriate temporal design and analyzed with a network-centric approach. The following review focuses on recent developments in abiotic stress transcriptional networks in economically important crops and will highlight the utility of correlation-based network analysis and applications.
Collapse
Affiliation(s)
- Malia A Gehan
- Donald Danforth Plant Science Center, St. Louis, MO 63132, United States
| | - Kathleen Greenham
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, United States
| | - Todd C Mockler
- Donald Danforth Plant Science Center, St. Louis, MO 63132, United States
| | - C Robertson McClung
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, United States.
| |
Collapse
|
113
|
Chung NC, Huang YH, Chang CH, Liao JC, Yang CH, Chen CC, Liu IY. Behavior training reverses asymmetry in hippocampal transcriptome of the cav3.2 knockout mice. PLoS One 2015; 10:e0118832. [PMID: 25768289 PMCID: PMC4358833 DOI: 10.1371/journal.pone.0118832] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 01/23/2015] [Indexed: 12/13/2022] Open
Abstract
Homozygous Cav3.2 knockout mice, which are defective in the pore-forming subunit of a low voltage activated T-type calcium channel, have been documented to show impaired maintenance of late-phase long-term potentiation (L-LTP) and defective retrieval of context-associated fear memory. To investigate the role of Cav3.2 in global gene expression, we performed a microarray transcriptome study on the hippocampi of the Cav3.2-/- mice and their wild-type littermates, either naïve (untrained) or trace fear conditioned. We found a significant left-right asymmetric effect on the hippocampal transcriptome caused by the Cav3.2 knockout. Between the naive Cav3.2-/- and the naive wild-type mice, 3522 differentially expressed genes (DEGs) were found in the left hippocampus, but only 4 DEGs were found in the right hippocampus. Remarkably, the effect of Cav3.2 knockout was partially reversed by trace fear conditioning. The number of DEGs in the left hippocampus was reduced to 6 in the Cav3.2 knockout mice after trace fear conditioning, compared with the wild-type naïve mice. To our knowledge, these results demonstrate for the first time the asymmetric effects of the Cav3.2 and its partial reversal by behavior training on the hippocampal transcriptome.
Collapse
Affiliation(s)
- Ni-Chun Chung
- Department of Molecular Biology and Human Genetics, Tzu Chi University, Hualien, Taiwan
| | - Ying-Hsueh Huang
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Chuan-Hsiung Chang
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - James C. Liao
- Department of Chemical and Biomolecular Engineering, University of California Los Angeles, Los Angeles, California, United States of America
| | - Chih-Hsien Yang
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Chien-Chang Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Ingrid Y. Liu
- Department of Molecular Biology and Human Genetics, Tzu Chi University, Hualien, Taiwan
- Institute of Medical Sciences, Tzu Chi University, Hualien, Taiwan
- * E-mail:
| |
Collapse
|
114
|
Zhang X, Zhao J, Hao JK, Zhao XM, Chen L. Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks. Nucleic Acids Res 2015; 43:e31. [PMID: 25539927 PMCID: PMC4357691 DOI: 10.1093/nar/gku1315] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 12/03/2014] [Accepted: 12/05/2014] [Indexed: 11/13/2022] Open
Abstract
Mutual information (MI), a quantity describing the nonlinear dependence between two random variables, has been widely used to construct gene regulatory networks (GRNs). Despite its good performance, MI cannot separate the direct regulations from indirect ones among genes. Although the conditional mutual information (CMI) is able to identify the direct regulations, it generally underestimates the regulation strength, i.e. it may result in false negatives when inferring gene regulations. In this work, to overcome the problems, we propose a novel concept, namely conditional mutual inclusive information (CMI2), to describe the regulations between genes. Furthermore, with CMI2, we develop a new approach, namely CMI2NI (CMI2-based network inference), for reverse-engineering GRNs. In CMI2NI, CMI2 is used to quantify the mutual information between two genes given a third one through calculating the Kullback-Leibler divergence between the postulated distributions of including and excluding the edge between the two genes. The benchmark results on the GRNs from DREAM challenge as well as the SOS DNA repair network in Escherichia coli demonstrate the superior performance of CMI2NI. Specifically, even for gene expression data with small sample size, CMI2NI can not only infer the correct topology of the regulation networks but also accurately quantify the regulation strength between genes. As a case study, CMI2NI was also used to reconstruct cancer-specific GRNs using gene expression data from The Cancer Genome Atlas (TCGA). CMI2NI is freely accessible at http://www.comp-sysbio.org/cmi2ni.
Collapse
Affiliation(s)
- Xiujun Zhang
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China Department of Mathematics, Xinyang Normal University, Xinyang 464000, China School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore 637459, Singapore
| | - Juan Zhao
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jin-Kao Hao
- LERIA, Department of Computer Science, University of Angers, Angers 49045, France
| | - Xing-Ming Zhao
- Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China Collaborative Research Center for Innovative Mathematical Modelling, Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan
| |
Collapse
|
115
|
Chen YH, Yang CD, Tseng CP, Huang HD, Ho SY. GeNOSA: inferring and experimentally supporting quantitative gene regulatory networks in prokaryotes. Bioinformatics 2015; 31:2151-8. [DOI: 10.1093/bioinformatics/btv075] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Accepted: 01/30/2015] [Indexed: 11/14/2022] Open
|
116
|
Stegle O, Teichmann SA, Marioni JC. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 2015; 16:133-45. [PMID: 25628217 DOI: 10.1038/nrg3833] [Citation(s) in RCA: 759] [Impact Index Per Article: 84.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The development of high-throughput RNA sequencing (RNA-seq) at the single-cell level has already led to profound new discoveries in biology, ranging from the identification of novel cell types to the study of global patterns of stochastic gene expression. Alongside the technological breakthroughs that have facilitated the large-scale generation of single-cell transcriptomic data, it is important to consider the specific computational and analytical challenges that still have to be overcome. Although some tools for analysing RNA-seq data from bulk cell populations can be readily applied to single-cell RNA-seq data, many new computational strategies are required to fully exploit this data type and to enable a comprehensive yet detailed study of gene expression at the single-cell level.
Collapse
Affiliation(s)
- Oliver Stegle
- European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sarah A Teichmann
- 1] European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. [2] Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - John C Marioni
- 1] European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. [2] Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
117
|
Hu H, Dai Y. A model-based approach to transcription regulatory network reconstruction from time-course gene expression data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2015; 2014:4767-70. [PMID: 25571058 DOI: 10.1109/embc.2014.6944690] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Time-course gene expression profiling provides valuable data on dynamic behavior of cellular responses to external stimulation. Investigation of transcription factors (TFs) that regulate co-expressed genes in a dynamic process can reveal insights on the underlying molecular mechanisms. As the ChIP-seq technology is only suitable for a fraction of TFs in mammalian organisms, the computational identification of relevant TFs remains to be critical. We propose a regression-based model to infer the functional binding sites of TFs from time-course gene expression profiles. Our approach incorporates an association strength for each potential TF and target gene pair based on computational analysis of binding sites in promoter sequences of co-expressed genes. Our model further uses the Lasso-penalized technique to search for the most informative TF-target pairs. The application of our method to a gene expression study on E2-induced apoptosis in a variant of MCF-7 cells revealed that the findings are biologically meaningful.
Collapse
|
118
|
Srihari S, Madhamshettiwar PB, Song S, Liu C, Simpson PT, Khanna KK, Ragan MA. Complex-based analysis of dysregulated cellular processes in cancer. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 4:S1. [PMID: 25521701 PMCID: PMC4290683 DOI: 10.1186/1752-0509-8-s4-s1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Background Differential expression analysis of (individual) genes is often used to study their roles in diseases. However, diseases such as cancer are a result of the combined effect of multiple genes. Gene products such as proteins seldom act in isolation, but instead constitute stable multi-protein complexes performing dedicated functions. Therefore, complexes aggregate the effect of individual genes (proteins) and can be used to gain a better understanding of cancer mechanisms. Here, we observe that complexes show considerable changes in their expression, in turn directed by the concerted action of transcription factors (TFs), across cancer conditions. We seek to gain novel insights into cancer mechanisms through a systematic analysis of complexes and their transcriptional regulation. Results We integrated large-scale protein-interaction (PPI) and gene-expression datasets to identify complexes that exhibit significant changes in their expression across different conditions in cancer. We devised a log-linear model to relate these changes to the differential regulation of complexes by TFs. The application of our model on two case studies involving pancreatic and familial breast tumour conditions revealed: (i) complexes in core cellular processes, especially those responsible for maintaining genome stability and cell proliferation (e.g. DNA damage repair and cell cycle) show considerable changes in expression; (ii) these changes include decrease and countering increase for different sets of complexes indicative of compensatory mechanisms coming into play in tumours; and (iii) TFs work in cooperative and counteractive ways to regulate these mechanisms. Such aberrant complexes and their regulating TFs play vital roles in the initiation and progression of cancer. Conclusions Complexes in core cellular processes display considerable decreases and countering increases in expression, strongly reflective of compensatory mechanisms in cancer. These changes are directed by the concerted action of cooperative and counteractive TFs. Our study highlights the roles of these complexes and TFs and presents several case studies of compensatory processes, thus providing novel insights into cancer mechanisms.
Collapse
|
119
|
TimeXNet: identifying active gene sub-networks using time-course gene expression profiles. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 4:S2. [PMID: 25522063 PMCID: PMC4290689 DOI: 10.1186/1752-0509-8-s4-s2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Background Time-course gene expression profiles are frequently used to provide insight into the changes in cellular state over time and to infer the molecular pathways involved. When combined with large-scale molecular interaction networks, such data can provide information about the dynamics of cellular response to stimulus. However, few tools are currently available to predict a single active gene sub-network from time-course gene expression profiles. Results We introduce a tool, TimeXNet, which identifies active gene sub-networks with temporal paths using time-course gene expression profiles in the context of a weighted gene regulatory and protein-protein interaction network. TimeXNet uses a specialized form of the network flow optimization approach to identify the most probable paths connecting the genes with significant changes in expression at consecutive time intervals. TimeXNet has been extensively evaluated for its ability to predict novel regulators and their associated pathways within active gene sub-networks in the mouse innate immune response and the yeast osmotic stress response. Compared to other similar methods, TimeXNet identified up to 50% more novel regulators from independent experimental datasets. It predicted paths within a greater number of known pathways with longer overlaps (up to 7 consecutive edges) within these pathways. TimeXNet was also shown to be robust in the presence of varying amounts of noise in the molecular interaction network. Conclusions TimeXNet is a reliable tool that can be used to study cellular response to stimuli through the identification of time-dependent active gene sub-networks in diverse biological systems. It is significantly better than other similar tools. TimeXNet is implemented in Java as a stand-alone application and supported on Linux, MS Windows and Macintosh. The output of TimeXNet can be directly viewed in Cytoscape. TimeXNet is freely available for non-commercial users.
Collapse
|
120
|
Hagen DR, Tidor B. Efficient Bayesian estimates for discrimination among topologically different systems biology models. MOLECULAR BIOSYSTEMS 2014; 11:574-84. [PMID: 25460000 DOI: 10.1039/c4mb00276h] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
A major effort in systems biology is the development of mathematical models that describe complex biological systems at multiple scales and levels of abstraction. Determining the topology-the set of interactions-of a biological system from observations of the system's behavior is an important and difficult problem. Here we present and demonstrate new methodology for efficiently computing the probability distribution over a set of topologies based on consistency with existing measurements. Key features of the new approach include derivation in a Bayesian framework, incorporation of prior probability distributions of topologies and parameters, and use of an analytically integrable linearization based on the Fisher information matrix that is responsible for large gains in efficiency. The new method was demonstrated on a collection of four biological topologies representing a kinase and phosphatase that operate in opposition to each other with either processive or distributive kinetics, giving 8-12 parameters for each topology. The linearization produced an approximate result very rapidly (CPU minutes) that was highly accurate on its own, as compared to a Monte Carlo method guaranteed to converge to the correct answer but at greater cost (CPU weeks). The Monte Carlo method developed and applied here used the linearization method as a starting point and importance sampling to approach the Bayesian answer in acceptable time. Other inexpensive methods to estimate probabilities produced poor approximations for this system, with likelihood estimation showing its well-known bias toward topologies with more parameters and the Akaike and Schwarz Information Criteria showing a strong bias toward topologies with fewer parameters. These results suggest that this linear approximation may be an effective compromise, providing an answer whose accuracy is near the true Bayesian answer, but at a cost near the common heuristics.
Collapse
Affiliation(s)
- David R Hagen
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | | |
Collapse
|
121
|
Abstract
Background High-throughput expression data, such as gene expression and metabolomics data, exhibit modular structures. Groups of features in each module follow a latent factor model, while between modules, the latent factors are quasi-independent. Recovering the latent factors can shed light on the hidden regulation patterns of the expression. The difficulty in detecting such modules and recovering the latent factors lies in the high dimensionality of the data, and the lack of knowledge in module membership. Methods Here we describe a method based on community detection in the co-expression network. It consists of inference-based network construction, module detection, and interacting latent factor detection from modules. Results In simulations, the method outperformed projection-based modular latent factor discovery when the input signals were not Gaussian. We also demonstrate the method's value in real data analysis. Conclusions The new method nMLSA (network-based modular latent structure analysis) is effective in detecting latent structures, and is easy to extend to non-linear cases. The method is available as R code at http://web1.sph.emory.edu/users/tyu8/nMLSA/.
Collapse
|
122
|
Regression analysis of combined gene expression regulation in acute myeloid leukemia. PLoS Comput Biol 2014; 10:e1003908. [PMID: 25340776 PMCID: PMC4207489 DOI: 10.1371/journal.pcbi.1003908] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 09/13/2014] [Indexed: 12/04/2022] Open
Abstract
Gene expression is a combinatorial function of genetic/epigenetic factors such as copy number variation (CNV), DNA methylation (DM), transcription factors (TF) occupancy, and microRNA (miRNA) post-transcriptional regulation. At the maturity of microarray/sequencing technologies, large amounts of data measuring the genome-wide signals of those factors became available from Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA). However, there is a lack of an integrative model to take full advantage of these rich yet heterogeneous data. To this end, we developed RACER (Regression Analysis of Combined Expression Regulation), which fits the mRNA expression as response using as explanatory variables, the TF data from ENCODE, and CNV, DM, miRNA expression signals from TCGA. Briefly, RACER first infers the sample-specific regulatory activities by TFs and miRNAs, which are then used as inputs to infer specific TF/miRNA-gene interactions. Such a two-stage regression framework circumvents a common difficulty in integrating ENCODE data measured in generic cell-line with the sample-specific TCGA measurements. As a case study, we integrated Acute Myeloid Leukemia (AML) data from TCGA and the related TF binding data measured in K562 from ENCODE. As a proof-of-concept, we first verified our model formalism by 10-fold cross-validation on predicting gene expression. We next evaluated RACER on recovering known regulatory interactions, and demonstrated its superior statistical power over existing methods in detecting known miRNA/TF targets. Additionally, we developed a feature selection procedure, which identified 18 regulators, whose activities clustered consistently with cytogenetic risk groups. One of the selected regulators is miR-548p, whose inferred targets were significantly enriched for leukemia-related pathway, implicating its novel role in AML pathogenesis. Moreover, survival analysis using the inferred activities identified C-Fos as a potential AML prognostic marker. Together, we provided a novel framework that successfully integrated the TCGA and ENCODE data in revealing AML-specific regulatory program at global level. Recent studies from The Cancer Genome Atlas (TCGA) showed that most Acute Myeloid Leukemia (AML) patients lack DNA mutations, which can potentially explain the tumorigenesis, and motivated a systematic approach to elucidate aberrant molecular signatures at the transcriptional and epigenetic levels. Using recently available data from two large consortia namely Encyclopedia of DNA Elements and TCGA, we developed a novel computational model to infer the regulatory activities of the expression regulators and their target genes in AML samples. Our analysis revealed 18 regulators whose dysregulation contributed significantly to explaining the global mRNA expression changes. Encouragingly, the inferred activities of these regulatory features followed a consistent pattern with cytogenetic phenotypes of the AML patients. Among these regulators, we identified microRNA hsa-miR-548p, whose regulatory relationships with leukemia-related genes including YY1 suggest its novel role in AML pathogenesis. Additionally, we discovered that the inferred activities of transcription factor C-Fos can be used as a prognostic marker to characterize survival rate of the AML patients. Together, we demonstrated an effective model that can integrate useful information from a large amount of heterogeneous data to dissect regulatory effects. Furthermore, the novel biological findings from this study may be constructive to future experimental research in AML.
Collapse
|
123
|
Cai C, Chen L, Jiang X, Lu X. Modeling signal transduction from protein phosphorylation to gene expression. Cancer Inform 2014; 13:59-67. [PMID: 25392684 PMCID: PMC4216050 DOI: 10.4137/cin.s13883] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Revised: 05/04/2014] [Accepted: 05/04/2014] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Signaling networks are of great importance for us to understand the cell’s regulatory mechanism. The rise of large-scale genomic and proteomic data, and prior biological knowledge has paved the way for the reconstruction and discovery of novel signaling pathways in a data-driven manner. In this study, we investigate computational methods that integrate proteomics and transcriptomic data to identify signaling pathways transmitting signals in response to specific stimuli. Such methods can be applied to cancer genomic data to infer perturbed signaling pathways. METHOD We proposed a novel Bayesian Network (BN) framework to integrate transcriptomic data with proteomic data reflecting protein phosphorylation states for the purpose of identifying the pathways transmitting the signal of diverse stimuli in rat and human cells. We represented the proteins and genes as nodes in a BN in which edges reflect the regulatory relationship between signaling proteins. We designed an efficient inference algorithm that incorporated the prior knowledge of pathways and searched for a network structure in a data-driven manner. RESULTS We applied our method to infer rat and human specific networks given gene expression and proteomic datasets. We were able to effectively identify sparse signaling networks that modeled the observed transcriptomic and proteomic data. Our methods were able to identify distinct signaling pathways for rat and human cells in a data-driven manner, based on the facts that rat and human cells exhibited distinct transcriptomic and proteomics responses to a common set of stimuli. Our model performed well in the SBV IMPROVER challenge in comparison to other models addressing the same task. The capability of inferring signaling pathways in a data-driven fashion may contribute to cancer research by identifying distinct aberrations in signaling pathways underlying heterogeneous cancers subtypes.
Collapse
Affiliation(s)
- Chunhui Cai
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Lujia Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Xia Jiang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
124
|
Li LX, Wu L, Zhang HS, Wu FX. A fast algorithm for nonnegative matrix factorization and its convergence. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014; 25:1855-1863. [PMID: 25291738 DOI: 10.1109/tnnls.2013.2296627] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Nonnegative matrix factorization (NMF) has recently become a very popular unsupervised learning method because of its representational properties of factors and simple multiplicative update algorithms for solving the NMF. However, for the common NMF approach of minimizing the Euclidean distance between approximate and true values, the convergence of multiplicative update algorithms has not been well resolved. This paper first discusses the convergence of existing multiplicative update algorithms. We then propose a new multiplicative update algorithm for minimizing the Euclidean distance between approximate and true values. Based on the optimization principle and the auxiliary function method, we prove that our new algorithm not only converges to a stationary point, but also does faster than existing ones. To verify our theoretical results, the experiments on three data sets have been conducted by comparing our proposed algorithm with other existing methods.
Collapse
|
125
|
The comprehensive transcriptional analysis in Caenorhabditis elegans by integrating ChIP-seq and gene expression data. Genet Res (Camb) 2014; 96:e005. [PMID: 25023089 DOI: 10.1017/s0016672314000081] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
The fundamental step of learning transcriptional regulation mechanism is to identify the target genes regulated by transcription factors (TFs). Despite numerous target genes identified by chromatin immunoprecipitation followed by high-throughput sequencing technology (ChIP-seq) assays, it is not possible to infer function from binding alone in vivo. This is equally true in one of the best model systems, the nematode Caenorhabditis elegans (C. elegans), where regulation often occurs through diverse TF binding features of transcriptional networks identified in modENCODE. Here, we integrated ten ChIP-seq datasets with genome-wide expression data derived from tiling arrays, involved in six TFs (HLH-1, ELT-3, PQM-1, SKN-1, CEH-14 and LIN-11) with tissue-specific and four TFs (CEH-30, LIN-13, LIN-15B and MEP-1) with broad expression patterns. In common, TF bindings within 3 kb upstream of or within its target gene for these ten studies showed significantly elevated level of expression as opposed to that of non-target controls, indicated that these sites may be more likely to be functional through up-regulating its target genes. Intriguingly, expression of the target genes out of 5 kb upstream of their transcription start site also showed high levels, which was consistent with the results of following network component analysis. Our study has identified similar transcriptional regulation mechanisms of tissue-specific or broad expression TFs in C. elegans using ChIP-seq and gene expression data. It may also provide a novel insight into the mechanism of transcriptional regulation not only for simple organisms but also for more complex species.
Collapse
|
126
|
Martin F, Sewer A, Talikka M, Xiang Y, Hoeng J, Peitsch MC. Quantification of biological network perturbations for mechanistic insight and diagnostics using two-layer causal models. BMC Bioinformatics 2014; 15:238. [PMID: 25015298 PMCID: PMC4227138 DOI: 10.1186/1471-2105-15-238] [Citation(s) in RCA: 92] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Accepted: 06/26/2014] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND High-throughput measurement technologies such as microarrays provide complex datasets reflecting mechanisms perturbed in an experiment, typically a treatment vs. control design. Analysis of these information rich data can be guided based on a priori knowledge, such as networks or set of related proteins or genes. Among those, cause-and-effect network models are becoming increasingly popular and more than eighty such models, describing processes involved in cell proliferation, cell fate, cell stress, and inflammation have already been published. A meaningful systems toxicology approach to study the response of a cell system, or organism, exposed to bio-active substances requires a quantitative measure of dose-response at network level, to go beyond the differential expression of single genes. RESULTS We developed a method that quantifies network response in an interpretable manner. It fully exploits the (signed graph) structure of cause-and-effect networks models to integrate and mine transcriptomics measurements. The presented approach also enables the extraction of network-based signatures for predicting a phenotype of interest. The obtained signatures are coherent with the underlying network perturbation and can lead to more robust predictions across independent studies. The value of the various components of our mathematically coherent approach is substantiated using several in vivo and in vitro transcriptomics datasets. As a proof-of-principle, our methodology was applied to unravel mechanisms related to the efficacy of a specific anti-inflammatory drug in patients suffering from ulcerative colitis. A plausible mechanistic explanation of the unequal efficacy of the drug is provided. Moreover, by utilizing the underlying mechanisms, an accurate and robust network-based diagnosis was built to predict the response to the treatment. CONCLUSION The presented framework efficiently integrates transcriptomics data and "cause and effect" network models to enable a mathematically coherent framework from quantitative impact assessment and data interpretation to patient stratification for diagnosis purposes.
Collapse
Affiliation(s)
- Florian Martin
- Philip Morris International, R&D, Biological Systems Research, Quai Jeanrenaud 5, 2000 Neuchatel, Switzerland.
| | | | | | | | | | | |
Collapse
|
127
|
Chen Y, Wang Z, Wang Y. Spatiotemporal positioning of multipotent modules in diverse biological networks. Cell Mol Life Sci 2014; 71:2605-24. [PMID: 24413666 PMCID: PMC11113103 DOI: 10.1007/s00018-013-1547-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Revised: 12/05/2013] [Accepted: 12/19/2013] [Indexed: 02/06/2023]
Abstract
A biological network exhibits a modular organization. The modular structure dependent on functional module is of great significance in understanding the organization and dynamics of network functions. A huge variety of module identification methods as well as approaches to analyze modularity and dynamics of the inter- and intra-module interactions have emerged recently, but they are facing unexpected challenges in further practical applications. Here, we discuss recent progress in understanding how such a modular network can be deconstructed spatiotemporally. We focus particularly on elucidating how various deciphering mechanisms operate to ensure precise module identification and assembly. In this case, a system-level understanding of the entire mechanism of module construction is within reach, with important implications for reasonable perspectives in both constructing a modular analysis framework and deconstructing different modular hierarchical structures.
Collapse
Affiliation(s)
- Yinying Chen
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Dongzhimen, Beijing, 100700 China
- Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, 100053 China
| | - Zhong Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Dongzhimen, Beijing, 100700 China
| | - Yongyan Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Dongzhimen, Beijing, 100700 China
| |
Collapse
|
128
|
Dynamic regulatory network reconstruction for Alzheimer's disease based on matrix decomposition techniques. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2014:891761. [PMID: 25024739 PMCID: PMC4082865 DOI: 10.1155/2014/891761] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2014] [Revised: 05/19/2014] [Accepted: 05/26/2014] [Indexed: 11/18/2022]
Abstract
Alzheimer's disease (AD) is the most common form of dementia and leads to irreversible neurodegenerative damage of the brain. Finding the dynamic responses of genes, signaling proteins, transcription factor (TF) activities, and regulatory networks of the progressively deteriorative progress of AD would represent a significant advance in discovering the pathogenesis of AD. However, the high throughput technologies of measuring TF activities are not yet available on a genome-wide scale. In this study, based on DNA microarray gene expression data and a priori information of TFs, network component analysis (NCA) algorithm is applied to determining the TF activities and regulatory influences on TGs of incipient, moderate, and severe AD. Based on that, the dynamical gene regulatory networks of the deteriorative courses of AD were reconstructed. To select significant genes which are differentially expressed in different courses of AD, independent component analysis (ICA), which is better than the traditional clustering methods and can successfully group one gene in different meaningful biological processes, was used. The molecular biological analysis showed that the changes of TF activities and interactions of signaling proteins in mitosis, cell cycle, immune response, and inflammation play an important role in the deterioration of AD.
Collapse
|
129
|
Chen YA, Eschrich SA. Computational methods and opportunities for phosphorylation network medicine. Transl Cancer Res 2014; 3:266-278. [PMID: 25530950 PMCID: PMC4271781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Protein phosphorylation, one of the most ubiquitous post-translational modifications (PTM) of proteins, is known to play an essential role in cell signaling and regulation. With the increasing understanding of the complexity and redundancy of cell signaling, there is a growing recognition that targeting the entire network or system could be a necessary and advantageous strategy for treating cancer. Protein kinases, the proteins that add a phosphate group to the substrate proteins during phosphorylation events, have become one of the largest groups of 'druggable' targets in cancer therapeutics in recent years. Kinase inhibitors are being regularly used in clinics for cancer treatment. This therapeutic paradigm shift in cancer research is partly due to the generation and availability of high-dimensional proteomics data. Generation of this data, in turn, is enabled by increased use of mass-spectrometry (MS)-based or other high-throughput proteomics platforms as well as companion public databases and computational tools. This review briefly summarizes the current state and progress on phosphoproteomics identification, quantification, and platform related characteristics. We review existing database resources, computational tools, methods for phosphorylation network inference, and ultimately demonstrate the connection to therapeutics. Finally, many research opportunities exist for bioinformaticians or biostatisticians based on developments and limitations of the current and emerging technologies.
Collapse
Affiliation(s)
- Yian Ann Chen
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, 12902 Magnolia Drive Tampa, FL 33612, USA
| | - Steven A Eschrich
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, 12902 Magnolia Drive Tampa, FL 33612, USA
| |
Collapse
|
130
|
Modeling the transcriptional regulatory network that controls the early hypoxic response in Candida albicans. EUKARYOTIC CELL 2014; 13:675-90. [PMID: 24681685 DOI: 10.1128/ec.00292-13] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
We determined the changes in transcriptional profiles that occur in the first hour following the transfer of Candida albicans to hypoxic growth conditions. The impressive speed of this response is not compatible with current models of fungal adaptation to hypoxia that depend on the depletion of sterol and heme. Functional analysis using Gene Set Enrichment Analysis (GSEA) identified the Sit4 phosphatase, Ccr4 mRNA deacetylase, and Sko1 transcription factor (TF) as potential regulators of the early hypoxic response. Cells mutated in these and other regulators exhibit a delay in their transcriptional responses to hypoxia. Promoter occupancy data for 29 TFs were combined with the transcriptional profiles of 3,111 in vivo target genes in a Network Component Analysis (NCA) to produce a model of the dynamic and highly interconnected TF network that controls this process. With data from the TF network obtained from a variety of sources, we generated an edge and node model that was capable of separating many of the hypoxia-upregulated and -downregulated genes. Upregulated genes are centered on Tye7, Upc2, and Mrr1, which are associated with many of the gene promoters that exhibit the strongest activations. The connectivity of the model illustrates the high redundancy of this response system and the challenges that lie in determining the individual contributions of specific TFs. Finally, treating cells with an inhibitor of the oxidative phosphorylation chain mimics most of the early hypoxic profile, which suggests that this response may be initiated by a drop in ATP production.
Collapse
|
131
|
Reshetova P, Smilde AK, van Kampen AHC, Westerhuis JA. Use of prior knowledge for the analysis of high-throughput transcriptomics and metabolomics data. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 2:S2. [PMID: 25033193 PMCID: PMC4101693 DOI: 10.1186/1752-0509-8-s2-s2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
BACKGROUND High-throughput omics technologies have enabled the measurement of many genes or metabolites simultaneously. The resulting high dimensional experimental data poses significant challenges to transcriptomics and metabolomics data analysis methods, which may lead to spurious instead of biologically relevant results. One strategy to improve the results is the incorporation of prior biological knowledge in the analysis. This strategy is used to reduce the solution space and/or to focus the analysis on biological meaningful regions. In this article, we review a selection of these methods used in transcriptomics and metabolomics. We combine the reviewed methods in three groups based on the underlying mathematical model: exploratory methods, supervised methods and estimation of the covariance matrix. We discuss which prior knowledge has been used, how it is incorporated and how it modifies the mathematical properties of the underlying methods.
Collapse
|
132
|
Karczewski KJ, Snyder M, Altman RB, Tatonetti NP. Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association. PLoS Genet 2014; 10:e1004122. [PMID: 24516403 PMCID: PMC3916285 DOI: 10.1371/journal.pgen.1004122] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Accepted: 12/03/2013] [Indexed: 12/17/2022] Open
Abstract
Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease. Transcription factors (TFs) are crucial to the precise regulation of many cellular processes and thus, are responsible for many human phenotypes and diseases. Now that the ENCODE project has mapped hundreds of TFs to their genomic binding locations, extracting functional biological signals is the next step in understanding their role in disease. In this paper, we present a novel approach to identifying TF targets and use these targets to find regulatory relationships between TFs and diseases. We present a large open dataset of putative TF-TF interactions and TF-disease associations which includes known connections as well as novel ones. We validate the association of one of our novel TF-disease associations, MEF2A and Crohn's disease, suggesting that our approach generates testable disease association hypotheses. Integrating these datasets will be crucial for understanding phenotypes and complex diseases.
Collapse
Affiliation(s)
- Konrad J. Karczewski
- Biomedical Informatics Training Program, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Michael Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Russ B. Altman
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Bioengineering, Stanford University School of Medicine, Stanford, California, United States of America
| | - Nicholas P. Tatonetti
- Department of Biomedical Informatics, Department of Systems Biology, and Department of Medicine, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
133
|
Kuczenski RS, Aggarwal K, Lee KH. Improved understanding of gene expression regulation using systems biology. Expert Rev Proteomics 2014; 2:915-24. [PMID: 16307520 DOI: 10.1586/14789450.2.6.915] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This article reviews the current state of systems biology approaches, including the experimental tools used to generate 'omic' data and computational frameworks to interpret this data. Through illustrative examples, systems biology approaches to understand gene expression and gene expression regulation are discussed. Some of the challenges facing this field and the future opportunities in the systems biology era are highlighted.
Collapse
Affiliation(s)
- Robert S Kuczenski
- Cornell University, School of Chemical & Biomolecular Engineering, 120 Olin Hall, Ithaca, NY 14853, USA.
| | | | | |
Collapse
|
134
|
Doni Jayavelu N, Bar N. Dynamics of regulatory networks in gastrin-treated adenocarcinoma cells. PLoS One 2014; 9:e78349. [PMID: 24416123 PMCID: PMC3885390 DOI: 10.1371/journal.pone.0078349] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 09/20/2013] [Indexed: 12/29/2022] Open
Abstract
Understanding gene transcription regulatory networks is critical to deciphering the molecular mechanisms of different cellular states. Most studies focus on static transcriptional networks. In the current study, we used the gastrin-regulated system as a model to understand the dynamics of transcriptional networks composed of transcription factors (TFs) and target genes (TGs). The hormone gastrin activates and stimulates signaling pathways leading to various cellular states through transcriptional programs. Dysregulation of gastrin can result in cancerous tumors, for example. However, the regulatory networks involving gastrin are highly complex, and the roles of most of the components of these networks are unknown. We used time series microarray data of AR42J adenocarcinoma cells treated with gastrin combined with static TF-TG relationships integrated from different sources, and we reconstructed the dynamic activities of TFs using network component analysis (NCA). Based on the peak expression of TGs and activity of TFs, we created active sub-networks at four time ranges after gastrin treatment, namely immediate-early (IE), mid-early (ME), mid-late (ML) and very late (VL). Network analysis revealed that the active sub-networks were topologically different at the early and late time ranges. Gene ontology analysis unveiled that each active sub-network was highly enriched in a particular biological process. Interestingly, network motif patterns were also distinct between the sub-networks. This analysis can be applied to other time series microarray datasets, focusing on smaller sub-networks that are activated in a cascade, allowing better overview of the mechanisms involved at each time range.
Collapse
Affiliation(s)
- Naresh Doni Jayavelu
- Department of Chemical Engineering, Norwegian University of Science and Technology, Trondheim, Norway
- * E-mail:
| | - Nadav Bar
- Department of Chemical Engineering, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
135
|
Shi X, Gu J, Chen X, Shajahan A, Hilakivi-Clarke L, Clarke R, Xuan J. mAPC-GibbsOS: an integrated approach for robust identification of gene regulatory networks. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 5:S4. [PMID: 24564939 PMCID: PMC4028818 DOI: 10.1186/1752-0509-7-s5-s4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Background Identification of cooperative gene regulatory network is an important topic for biological study especially in cancer research. Traditional approaches suffer from large noise in gene expression data and false positive connections in motif binding data; they also fail to identify the modularized structure of gene regulatory network. Methods that are capable of revealing underlying modularized structure and robust to noise and false positives are needed to be developed. Results We proposed and developed an integrated approach to identify gene regulatory networks, which consists of a novel clustering method (namely motif-guided affinity propagation clustering (mAPC)) and a sampling based method (called Gibbs sampler based on outlier sum statistic (GibbsOS)). mAPC is used in the first step to obtain co-regulated gene modules by clustering genes with a similarity measurement taking into account both gene expression data and binding motif information. This clustering method can reduce the noise effect from microarray data to obtain modularized gene clusters. However, due to many false positives in motif binding data, some genes not regulated by certain transcription factors (TFs) will be falsely clustered with true target genes. To overcome this problem, GibbsOS is applied in the second step to refine each cluster for the identification of true target genes. In order to evaluate the performance of the proposed method, we generated simulation data under different signal-to-noise ratios and false positive ratios to test the method. The experimental results show an improved accuracy in terms of clustering and transcription factor identification. Moreover, an improved performance is demonstrated in target gene identification as compared with GibbsOS. Finally, we applied the proposed method to two breast cancer patient datasets to identify cooperative transcriptional regulatory networks associated with recurrence of breast cancer, as supported by their functional annotations. Conclusions We have developed a two-step approach for gene regulatory network identification, featuring an integrated method to identify modularized regulatory structures and refine their target genes subsequently. Simulation studies have shown the robustness of the method against noise in gene expression data and false positives in motif binding data. The proposed method has been applied to two breast cancer gene expression datasets to infer the hidden regulation mechanisms. The experimental results demonstrate the efficacy of the method in identifying key regulatory networks related to the progression and recurrence of breast cancer.
Collapse
|
136
|
Misra A, Sriram G. Network component analysis provides quantitative insights on an Arabidopsis transcription factor-gene regulatory network. BMC SYSTEMS BIOLOGY 2013; 7:126. [PMID: 24228871 PMCID: PMC3843564 DOI: 10.1186/1752-0509-7-126] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2013] [Accepted: 11/05/2013] [Indexed: 01/01/2023]
Abstract
Background Gene regulatory networks (GRNs) are models of molecule-gene interactions instrumental in the coordination of gene expression. Transcription factor (TF)-GRNs are an important subset of GRNs that characterize gene expression as the effect of TFs acting on their target genes. Although such networks can qualitatively summarize TF-gene interactions, it is highly desirable to quantitatively determine the strengths of the interactions in a TF-GRN as well as the magnitudes of TF activities. To our knowledge, such analysis is rare in plant biology. A computational methodology developed for this purpose is network component analysis (NCA), which has been used for studying large-scale microbial TF-GRNs to obtain nontrivial, mechanistic insights. In this work, we employed NCA to quantitatively analyze a plant TF-GRN important in floral development using available regulatory information from AGRIS, by processing previously reported gene expression data from four shoot apical meristem cell types. Results The NCA model satisfactorily accounted for gene expression measurements in a TF-GRN of seven TFs (LFY, AG, SEPALLATA3 [SEP3], AP2, AGL15, HY5 and AP3/PI) and 55 genes. NCA found strong interactions between certain TF-gene pairs including LFY → MYB17, AG → CRC, AP2 → RD20, AGL15 → RAV2 and HY5 → HLH1, and the direction of the interaction (activation or repression) for some AGL15 targets for which this information was not previously available. The activity trends of four TFs - LFY, AG, HY5 and AP3/PI as deduced by NCA correlated well with the changes in expression levels of the genes encoding these TFs across all four cell types; such a correlation was not observed for SEP3, AP2 and AGL15. Conclusions For the first time, we have reported the use of NCA to quantitatively analyze a plant TF-GRN important in floral development for obtaining nontrivial information about connectivity strengths between TFs and their target genes as well as TF activity. However, since NCA relies on documented connectivity information about the underlying TF-GRN, it is currently limited in its application to larger plant networks because of the lack of documented connectivities. In the future, the identification of interactions between plant TFs and their target genes on a genome scale would allow the use of NCA to provide quantitative regulatory information about plant TF-GRNs, leading to improved insights on cellular regulatory programs.
Collapse
Affiliation(s)
| | - Ganesh Sriram
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
137
|
Patil A, Kumagai Y, Liang KC, Suzuki Y, Nakai K. Linking transcriptional changes over time in stimulated dendritic cells to identify gene networks activated during the innate immune response. PLoS Comput Biol 2013; 9:e1003323. [PMID: 24244133 PMCID: PMC3820512 DOI: 10.1371/journal.pcbi.1003323] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2013] [Accepted: 09/21/2013] [Indexed: 01/09/2023] Open
Abstract
The innate immune response is primarily mediated by the Toll-like receptors functioning through the MyD88-dependent and TRIF-dependent pathways. Despite being widely studied, it is not yet completely understood and systems-level analyses have been lacking. In this study, we identified a high-probability network of genes activated during the innate immune response using a novel approach to analyze time-course gene expression profiles of activated immune cells in combination with a large gene regulatory and protein-protein interaction network. We classified the immune response into three consecutive time-dependent stages and identified the most probable paths between genes showing a significant change in expression at each stage. The resultant network contained several novel and known regulators of the innate immune response, many of which did not show any observable change in expression at the sampled time points. The response network shows the dominance of genes from specific functional classes during different stages of the immune response. It also suggests a role for the protein phosphatase 2a catalytic subunit α in the regulation of the immunoproteasome during the late phase of the response. In order to clarify the differences between the MyD88-dependent and TRIF-dependent pathways in the innate immune response, time-course gene expression profiles from MyD88-knockout and TRIF-knockout dendritic cells were analyzed. Their response networks suggest the dominance of the MyD88-dependent pathway in the innate immune response, and an association of the circadian regulators and immunoproteasomal degradation with the TRIF-dependent pathway. The response network presented here provides the most probable associations between genes expressed in the early and the late phases of the innate immune response, while taking into account the intermediate regulators. We propose that the method described here can also be used in the identification of time-dependent gene sub-networks in other biological systems.
Collapse
Affiliation(s)
- Ashwini Patil
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Yutaro Kumagai
- WPI Immunology Frontier Research Center, Osaka University, Osaka, Japan
| | - Kuo-ching Liang
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Yutaka Suzuki
- Department of Medical Genome Sciences, The University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|
138
|
Chen X, Xuan J, Wang C, Shajahan AN, Riggins RB, Clarke R. Reconstruction of transcriptional regulatory networks by stability-based network component analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1347-1358. [PMID: 24407294 PMCID: PMC3652899 DOI: 10.1109/tcbb.2012.146] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Reliable inference of transcription regulatory networks is a challenging task in computational biology. Network component analysis (NCA) has become a powerful scheme to uncover regulatory networks behind complex biological processes. However, the performance of NCA is impaired by the high rate of false connections in binding information. In this paper, we integrate stability analysis with NCA to form a novel scheme, namely stability-based NCA (sNCA), for regulatory network identification. The method mainly addresses the inconsistency between gene expression data and binding motif information. Small perturbations are introduced to prior regulatory network, and the distance among multiple estimated transcript factor (TF) activities is computed to reflect the stability for each TF's binding network. For target gene identification, multivariate regression and t-statistic are used to calculate the significance for each TF-gene connection. Simulation studies are conducted and the experimental results show that sNCA can achieve an improved and robust performance in TF identification as compared to NCA. The approach for target gene identification is also demonstrated to be suitable for identifying true connections between TFs and their target genes. Furthermore, we have successfully applied sNCA to breast cancer data to uncover the role of TFs in regulating endocrine resistance in breast cancer.
Collapse
Affiliation(s)
- Xi Chen
- Virginia Polytechnic Institute and State University, Arlington
| | - Jianhua Xuan
- Virginia Polytechnic Institute and State University, Arlington
| | - Chen Wang
- Virginia Polytechnic Institute and State University, Arlington
| | | | | | | |
Collapse
|
139
|
Barah P, Jayavelu ND, Rasmussen S, Nielsen HB, Mundy J, Bones AM. Genome-scale cold stress response regulatory networks in ten Arabidopsis thaliana ecotypes. BMC Genomics 2013; 14:722. [PMID: 24148294 PMCID: PMC3829657 DOI: 10.1186/1471-2164-14-722] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Accepted: 10/11/2013] [Indexed: 12/30/2022] Open
Abstract
Background Low temperature leads to major crop losses every year. Although several studies have been conducted focusing on diversity of cold tolerance level in multiple phenotypically divergent Arabidopsis thaliana (A. thaliana) ecotypes, genome-scale molecular understanding is still lacking. Results In this study, we report genome-scale transcript response diversity of 10 A. thaliana ecotypes originating from different geographical locations to non-freezing cold stress (10°C). To analyze the transcriptional response diversity, we initially compared transcriptome changes in all 10 ecotypes using Arabidopsis NimbleGen ATH6 microarrays. In total 6061 transcripts were significantly cold regulated (p < 0.01) in 10 ecotypes, including 498 transcription factors and 315 transposable elements. The majority of the transcripts (75%) showed ecotype specific expression pattern. By using sequence data available from Arabidopsis thaliana 1001 genome project, we further investigated sequence polymorphisms in the core cold stress regulon genes. Significant numbers of non-synonymous amino acid changes were observed in the coding region of the CBF regulon genes. Considering the limited knowledge about regulatory interactions between transcription factors and their target genes in the model plant A. thaliana, we have adopted a powerful systems genetics approach- Network Component Analysis (NCA) to construct an in-silico transcriptional regulatory network model during response to cold stress. The resulting regulatory network contained 1,275 nodes and 7,720 connections, with 178 transcription factors and 1,331 target genes. Conclusions A. thaliana ecotypes exhibit considerable variation in transcriptome level responses to non-freezing cold stress treatment. Ecotype specific transcripts and related gene ontology (GO) categories were identified to delineate natural variation of cold stress regulated differential gene expression in the model plant A. thaliana. The predicted regulatory network model was able to identify new ecotype specific transcription factors and their regulatory interactions, which might be crucial for their local geographic adaptation to cold temperature. Additionally, since the approach presented here is general, it could be adapted to study networks regulating biological process in any biological systems.
Collapse
Affiliation(s)
| | | | | | | | | | - Atle M Bones
- Department of Biology, Norwegian University of Science and Technology, Trondheim N-7491, Norway.
| |
Collapse
|
140
|
Asif HMS, Sanguinetti G. Simultaneous inference and clustering of transcriptional dynamics in gene regulatory networks. Stat Appl Genet Mol Biol 2013; 12:545-57. [PMID: 24051920 DOI: 10.1515/sagmb-2012-0010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
We present a novel method for simultaneous inference and nonparametric clustering of transcriptional dynamics from gene expression data. The proposed method uses gene expression data to infer time-varying TF profiles and cluster these temporal profiles according to the dynamics they exhibit. We use the latent structure of factorial hidden Markov model to model the transcription factor profiles as Markov chains and cluster these profiles using nonparametric mixture modeling. An efficient Gibbs sampling scheme is proposed for inference of latent variables and grouping of transcriptional dynamics into a priori unknown number of clusters. We test our model on simulated data and analyse its performance on two expression datasets; S. cerevisiae cell cycle data and E. coli oxygen starvation response data. Our results show the applicability of the method for genome wide analysis of expression data.
Collapse
|
141
|
Yan B, Li H, Yang X, Shao J, Jang M, Guan D, Zou S, Van Waes C, Chen Z, Zhan M. Unraveling regulatory programs for NF-kappaB, p53 and microRNAs in head and neck squamous cell carcinoma. PLoS One 2013; 8:e73656. [PMID: 24069219 PMCID: PMC3777940 DOI: 10.1371/journal.pone.0073656] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Accepted: 07/20/2013] [Indexed: 12/14/2022] Open
Abstract
In head and neck squamous cell carcinoma (HNSCC), mutations of p53 usually coexist with aberrant activation of NF-kappaB (NF-κB), other transcription factors and microRNAs, which promote tumor pathogenesis. However, how these factors and microRNAs interact to globally modulate gene expression and mediate oncogenesis is not fully understood. We devised a novel bioinformatics method to uncover interactive relationships between transcription factors or microRNAs and genes. This approach is based on matrix decomposition modeling under the joint constraints of sparseness and regulator-target connectivity, and able to integrate gene expression profiling and binding data of regulators. We employed this method to infer the gene regulatory networks in HNSCC. We found that the majority of the predicted p53 targets overlapped with those for NF-κB, suggesting that the two transcription factors exert a concerted modulation on regulatory programs in tumor cells. We further investigated the interrelationships of p53 and NF-κB with five additional transcription factors, AP1, CEBPB, EGR1, SP1 and STAT3, and microRNAs mir21 and mir34ac. The resulting gene networks indicate that interactions among NF-κB, p53, and the two miRNAs likely regulate progression of HNSCC. We experimentally validated our findings by determining expression of the predicted NF-κB and p53 target genes by siRNA knock down, and by examining p53 binding activity on promoters of predicted target genes in the tumor cell lines. Our results elucidating the cross-regulations among NF-κB, p53, and microRNAs provide insights into the complex regulatory mechanisms underlying HNSCC, and shows an efficient approach to inferring gene regulatory programs in biological complex systems.
Collapse
Affiliation(s)
- Bin Yan
- Department of Biology, Hong Kong Baptist University, Kowloon, Hong Kong
| | - Huai Li
- Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, Maryland, United States of America
| | - Xinping Yang
- Head and Neck Surgery Branch, National Institute on Deafness and Communication Disorder, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Jiaofang Shao
- Department of Biology, Hong Kong Baptist University, Kowloon, Hong Kong
| | - Minyoung Jang
- Head and Neck Surgery Branch, National Institute on Deafness and Communication Disorder, National Institutes of Health, Bethesda, Maryland, United States of America
- Clinical Research Training Program, sponsored by National Institutes of Health and Pfizer, Bethesda, Maryland, United States of America
| | - Daogang Guan
- Department of Biology, Hong Kong Baptist University, Kowloon, Hong Kong
| | - Sige Zou
- Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, Maryland, United States of America
| | - Carter Van Waes
- Head and Neck Surgery Branch, National Institute on Deafness and Communication Disorder, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Zhong Chen
- Head and Neck Surgery Branch, National Institute on Deafness and Communication Disorder, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ming Zhan
- Methodist Hospital Research Institute, Weill Cornell Medical College, Houston, Texas, United States of America
| |
Collapse
|
142
|
Gunawardana Y, Niranjan M. Bridging the gap between transcriptome and proteome measurements identifies post-translationally regulated genes. Bioinformatics 2013; 29:3060-6. [PMID: 24045772 DOI: 10.1093/bioinformatics/btt537] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Despite much dynamical cellular behaviour being achieved by accurate regulation of protein concentrations, messenger RNA abundances, measured by microarray technology, and more recently by deep sequencing techniques, are widely used as proxies for protein measurements. Although for some species and under some conditions, there is good correlation between transcriptome and proteome level measurements, such correlation is by no means universal due to post-transcriptional and post-translational regulation, both of which are highly prevalent in cells. Here, we seek to develop a data-driven machine learning approach to bridging the gap between these two levels of high-throughput omic measurements on Saccharomyces cerevisiae and deploy the model in a novel way to uncover mRNA-protein pairs that are candidates for post-translational regulation. RESULTS The application of feature selection by sparsity inducing regression (l₁ norm regularization) leads to a stable set of features: i.e. mRNA, ribosomal occupancy, ribosome density, tRNA adaptation index and codon bias while achieving a feature reduction from 37 to 5. A linear predictor used with these features is capable of predicting protein concentrations fairly accurately (R² = 0.86). Proteins whose concentration cannot be predicted accurately, taken as outliers with respect to the predictor, are shown to have annotation evidence of post-translational modification, significantly more than random subsets of similar size P < 0.02. In a data mining sense, this work also shows a wider point that outliers with respect to a learning method can carry meaningful information about a problem domain.
Collapse
Affiliation(s)
- Yawwani Gunawardana
- School of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, UK
| | | |
Collapse
|
143
|
Noor A, Ahmad A, Serpedin E, Nounou M, Nounou H. ROBNCA: robust network component analysis for recovering transcription factor activities. ACTA ACUST UNITED AC 2013; 29:2410-8. [PMID: 23940252 DOI: 10.1093/bioinformatics/btt433] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Network component analysis (NCA) is an efficient method of reconstructing the transcription factor activity (TFA), which makes use of the gene expression data and prior information available about transcription factor (TF)-gene regulations. Most of the contemporary algorithms either exhibit the drawback of inconsistency and poor reliability, or suffer from prohibitive computational complexity. In addition, the existing algorithms do not possess the ability to counteract the presence of outliers in the microarray data. Hence, robust and computationally efficient algorithms are needed to enable practical applications. RESULTS We propose ROBust Network Component Analysis (ROBNCA), a novel iterative algorithm that explicitly models the possible outliers in the microarray data. An attractive feature of the ROBNCA algorithm is the derivation of a closed form solution for estimating the connectivity matrix, which was not available in prior contributions. The ROBNCA algorithm is compared with FastNCA and the non-iterative NCA (NI-NCA). ROBNCA estimates the TF activity profiles as well as the TF-gene control strength matrix with a much higher degree of accuracy than FastNCA and NI-NCA, irrespective of varying noise, correlation and/or amount of outliers in case of synthetic data. The ROBNCA algorithm is also tested on Saccharomyces cerevisiae data and Escherichia coli data, and it is observed to outperform the existing algorithms. The run time of the ROBNCA algorithm is comparable with that of FastNCA, and is hundreds of times faster than NI-NCA. AVAILABILITY The ROBNCA software is available at http://people.tamu.edu/∼amina/ROBNCA
Collapse
Affiliation(s)
- Amina Noor
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA, Corporate Research and Development, Qualcomm Technologies Inc., San Diego, CA 92121, USA, Department of Chemical Engineering and Department of Electrical Engineering, Texas A&M University at Qatar, Doha Qatar
| | | | | | | | | |
Collapse
|
144
|
COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC SYSTEMS BIOLOGY 2013; 7:74. [PMID: 23927696 PMCID: PMC3751080 DOI: 10.1186/1752-0509-7-74] [Citation(s) in RCA: 687] [Impact Index Per Article: 62.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 08/02/2013] [Indexed: 12/21/2022]
Abstract
Background COnstraint-Based Reconstruction and Analysis (COBRA) methods are widely used for genome-scale modeling of metabolic networks in both prokaryotes and eukaryotes. Due to the successes with metabolism, there is an increasing effort to apply COBRA methods to reconstruct and analyze integrated models of cellular processes. The COBRA Toolbox for MATLAB is a leading software package for genome-scale analysis of metabolism; however, it was not designed to elegantly capture the complexity inherent in integrated biological networks and lacks an integration framework for the multiomics data used in systems biology. The openCOBRA Project is a community effort to promote constraints-based research through the distribution of freely available software. Results Here, we describe COBRA for Python (COBRApy), a Python package that provides support for basic COBRA methods. COBRApy is designed in an object-oriented fashion that facilitates the representation of the complex biological processes of metabolism and gene expression. COBRApy does not require MATLAB to function; however, it includes an interface to the COBRA Toolbox for MATLAB to facilitate use of legacy codes. For improved performance, COBRApy includes parallel processing support for computationally intensive processes. Conclusion COBRApy is an object-oriented framework designed to meet the computational challenges associated with the next generation of stoichiometric constraint-based models and high-density omics data sets. Availability http://opencobra.sourceforge.net/
Collapse
|
145
|
Almario MP, Reyes LH, Kao KC. Evolutionary engineering ofSaccharomyces cerevisiaefor enhanced tolerance to hydrolysates of lignocellulosic biomass. Biotechnol Bioeng 2013; 110:2616-23. [DOI: 10.1002/bit.24938] [Citation(s) in RCA: 102] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Revised: 03/14/2013] [Accepted: 04/08/2013] [Indexed: 11/09/2022]
Affiliation(s)
- María P. Almario
- Department of Chemical Engineering; Texas A&M University; 3122 TAMU; College Station; Texas; 77843-3122
| | - Luis H. Reyes
- Department of Chemical Engineering; Texas A&M University; 3122 TAMU; College Station; Texas; 77843-3122
| | - Katy C. Kao
- Department of Chemical Engineering; Texas A&M University; 3122 TAMU; College Station; Texas; 77843-3122
| |
Collapse
|
146
|
Rolfe MD, Ocone A, Stapleton MR, Hall S, Trotter EW, Poole RK, Sanguinetti G, Green J. Systems analysis of transcription factor activities in environments with stable and dynamic oxygen concentrations. Open Biol 2013; 2:120091. [PMID: 22870390 PMCID: PMC3411108 DOI: 10.1098/rsob.120091] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 06/20/2012] [Indexed: 11/13/2022] Open
Abstract
Understanding gene regulation requires knowledge of changes in transcription factor (TF) activities. Simultaneous direct measurement of numerous TF activities is currently impossible. Nevertheless, statistical approaches to infer TF activities have yielded non-trivial and verifiable predictions for individual TFs. Here, global statistical modelling identifies changes in TF activities from transcript profiles of Escherichia coli growing in stable (fixed oxygen availabilities) and dynamic (changing oxygen availability) environments. A core oxygen-responsive TF network, supplemented by additional TFs acting under specific conditions, was identified. The activities of the cytoplasmic oxygen-responsive TF, FNR, and the membrane-bound terminal oxidases implied that, even on the scale of the bacterial cell, spatial effects significantly influence oxygen-sensing. Several transcripts exhibited asymmetrical patterns of abundance in aerobic to anaerobic and anaerobic to aerobic transitions. One of these transcripts, ndh, encodes a major component of the aerobic respiratory chain and is regulated by oxygen-responsive TFs ArcA and FNR. Kinetic modelling indicated that ArcA and FNR behaviour could not explain the ndh transcript profile, leading to the identification of another TF, PdhR, as the source of the asymmetry. Thus, this approach illustrates how systematic examination of regulatory responses in stable and dynamic environments yields new mechanistic insights into adaptive processes.
Collapse
Affiliation(s)
- Matthew D Rolfe
- Department of Molecular Biology and Biotechnology, The Krebs Institute, University of Sheffield, Sheffield S10 2TN, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
147
|
Modular pharmacology: deciphering the interacting structural organization of the targeted networks. Drug Discov Today 2013; 18:560-6. [DOI: 10.1016/j.drudis.2013.01.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2012] [Revised: 12/14/2012] [Accepted: 01/16/2013] [Indexed: 12/24/2022]
|
148
|
Liu JX, Wang YT, Zheng CH, Sha W, Mi JX, Xu Y. Robust PCA based method for discovering differentially expressed genes. BMC Bioinformatics 2013; 14 Suppl 8:S3. [PMID: 23815087 PMCID: PMC3654929 DOI: 10.1186/1471-2105-14-s8-s3] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
How to identify a set of genes that are relevant to a key biological process is an important issue in current molecular biology. In this paper, we propose a novel method to discover differentially expressed genes based on robust principal component analysis (RPCA). In our method, we treat the differentially and non-differentially expressed genes as perturbation signals S and low-rank matrix A, respectively. Perturbation signals S can be recovered from the gene expression data by using RPCA. To discover the differentially expressed genes associated with special biological progresses or functions, the scheme is given as follows. Firstly, the matrix D of expression data is decomposed into two adding matrices A and S by using RPCA. Secondly, the differentially expressed genes are identified based on matrix S. Finally, the differentially expressed genes are evaluated by the tools based on Gene Ontology. A larger number of experiments on hypothetical and real gene expression data are also provided and the experimental results show that our method is efficient and effective.
Collapse
Affiliation(s)
- Jin-Xing Liu
- Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China
| | | | | | | | | | | |
Collapse
|
149
|
Wu M, Liu L, Hijazi H, Chan C. A multi-layer inference approach to reconstruct condition-specific genes and their regulation. ACTA ACUST UNITED AC 2013; 29:1541-52. [PMID: 23610368 DOI: 10.1093/bioinformatics/btt186] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
UNLABELLED An important topic in systems biology is the reverse engineering of regulatory mechanisms through reconstruction of context-dependent gene networks. A major challenge is to identify the genes and the regulations specific to a condition or phenotype, given that regulatory processes are highly connected such that a specific response is typically accompanied by numerous collateral effects. In this study, we design a multi-layer approach that is able to reconstruct condition-specific genes and their regulation through an integrative analysis of large-scale information of gene expression, protein interaction and transcriptional regulation (transcription factor-target gene relationships). We establish the accuracy of our methodology against synthetic datasets, as well as a yeast dataset. We then extend the framework to the application of higher eukaryotic systems, including human breast cancer and Arabidopsis thaliana cold acclimation. Our study identified TACSTD2 (TROP2) as a target gene for human breast cancer and discovered its regulation by transcription factors CREB, as well as NFkB. We also predict KIF2C is a target gene for ER-/HER2- breast cancer and is positively regulated by E2F1. The predictions were further confirmed through experimental studies. AVAILABILITY The implementation and detailed protocol of the layer approach is available at http://www.egr.msu.edu/changroup/Protocols/Three-layer%20approach%20 to % 20reconstruct%20condition.html.
Collapse
Affiliation(s)
- Ming Wu
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | | | | | | |
Collapse
|
150
|
Dissecting specific and global transcriptional regulation of bacterial gene expression. Mol Syst Biol 2013; 9:658. [PMID: 23591774 PMCID: PMC3658269 DOI: 10.1038/msb.2013.14] [Citation(s) in RCA: 95] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 03/06/2013] [Indexed: 12/18/2022] Open
Abstract
Gene expression is regulated by specific transcriptional circuits but also by the global expression machinery as a function of growth. Simultaneous specific and global regulation thus constitutes an additional--but often neglected--layer of complexity in gene expression. Here, we develop an experimental-computational approach to dissect specific and global regulation in the bacterium Escherichia coli. By using fluorescent promoter reporters, we show that global regulation is growth rate dependent not only during steady state but also during dynamic changes in growth rate and can be quantified through two promoter-specific parameters. By applying our approach to arginine biosynthesis, we obtain a quantitative understanding of both specific and global regulation that allows accurate prediction of the temporal response to simultaneous perturbations in arginine availability and growth rate. We thereby uncover two principles of joint regulation: (i) specific regulation by repression dominates the transcriptional response during metabolic steady states, largely repressing the biosynthesis genes even when biosynthesis is required and (ii) global regulation sets the maximum promoter activity that is exploited during the transition between steady states.
Collapse
|