1
|
Hecker D, Lauber M, Behjati Ardakani F, Ashrafiyan S, Manz Q, Kersting J, Hoffmann M, Schulz MH, List M. Computational tools for inferring transcription factor activity. Proteomics 2023; 23:e2200462. [PMID: 37706624 DOI: 10.1002/pmic.202200462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/11/2023] [Accepted: 08/22/2023] [Indexed: 09/15/2023]
Abstract
Transcription factors (TFs) are essential players in orchestrating the regulatory landscape in cells. Still, their exact modes of action and dependencies on other regulatory aspects remain elusive. Since TFs act cell type-specific and each TF has its own characteristics, untangling their regulatory interactions from an experimental point of view is laborious and convoluted. Thus, there is an ongoing development of computational tools that estimate transcription factor activity (TFA) from a variety of data modalities, either based on a mapping of TFs to their putative target genes or in a genome-wide, gene-unspecific fashion. These tools can help to gain insights into TF regulation and to prioritize candidates for experimental validation. We want to give an overview of available computational tools that estimate TFA, illustrate examples of their application, debate common result validation strategies, and discuss assumptions and concomitant limitations.
Collapse
Affiliation(s)
- Dennis Hecker
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Michael Lauber
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Fatemeh Behjati Ardakani
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Shamim Ashrafiyan
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Quirin Manz
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Johannes Kersting
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- GeneSurge GmbH, München, Germany
| | - Markus Hoffmann
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- Institute for Advanced Study, Technical University of Munich, Garching, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Marcel H Schulz
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Markus List
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|
2
|
Morin A, Chu ECP, Sharma A, Adrian-Hamazaki A, Pavlidis P. Characterizing the targets of transcription regulators by aggregating ChIP-seq and perturbation expression data sets. Genome Res 2023; 33:763-778. [PMID: 37308292 PMCID: PMC10317128 DOI: 10.1101/gr.277273.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 04/26/2023] [Indexed: 06/14/2023]
Abstract
Mapping the gene targets of chromatin-associated transcription regulators (TRs) is a major goal of genomics research. ChIP-seq of TRs and experiments that perturb a TR and measure the differential abundance of gene transcripts are a primary means by which direct relationships are tested on a genomic scale. It has been reported that there is a poor overlap in the evidence across gene regulation strategies, emphasizing the need for integrating results from multiple experiments. Although research consortia interested in gene regulation have produced a valuable trove of high-quality data, there is an even greater volume of TR-specific data throughout the literature. In this study, we show a workflow for the identification, uniform processing, and aggregation of ChIP-seq and TR perturbation experiments for the ultimate purpose of ranking human and mouse TR-target interactions. Focusing on an initial set of eight regulators (ASCL1, HES1, MECP2, MEF2C, NEUROD1, PAX6, RUNX1, and TCF4), we identified 497 experiments suitable for analysis. We used this corpus to examine data concordance, to identify systematic patterns of the two data types, and to identify putative orthologous interactions between human and mouse. We build upon commonly used strategies to forward a procedure for aggregating and combining these two genomic methodologies, assessing these rankings against independent literature-curated evidence. Beyond a framework extensible to other TRs, our work also provides empirically ranked TR-target listings, as well as transparent experiment-level gene summaries for community use.
Collapse
Affiliation(s)
- Alexander Morin
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Eric Ching-Pan Chu
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Aman Sharma
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Alex Adrian-Hamazaki
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada;
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| |
Collapse
|
3
|
Abid D, Brent MR. NetProphet 3: a machine learning framework for transcription factor network mapping and multi-omics integration. Bioinformatics 2023; 39:7000334. [PMID: 36692138 PMCID: PMC9912366 DOI: 10.1093/bioinformatics/btad038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 01/11/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION Many methods have been proposed for mapping the targets of transcription factors (TFs) from gene expression data. It is known that combining outputs from multiple methods can improve performance. To date, outputs have been combined by using either simplistic formulae, such as geometric mean, or carefully hand-tuned formulae that may not generalize well to new inputs. Finally, the evaluation of accuracy has been challenging due to the lack of genome-scale, ground-truth networks. RESULTS We developed NetProphet3, which combines scores from multiple analyses automatically, using a tree boosting algorithm trained on TF binding location data. We also developed three independent, genome-scale evaluation metrics. By these metrics, NetProphet3 is more accurate than other commonly used packages, including NetProphet 2.0, when gene expression data from direct TF perturbations are available. Furthermore, its integration mode can forge a consensus network from gene expression data and TF binding location data. AVAILABILITY AND IMPLEMENTATION All data and code are available at https://zenodo.org/record/7504131#.Y7Wu3i-B2x8. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dhoha Abid
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
4
|
Hecker D, Behjati Ardakani F, Karollus A, Gagneur J, Schulz MH. The adapted Activity-By-Contact model for enhancer-gene assignment and its application to single-cell data. Bioinformatics 2023; 39:btad062. [PMID: 36708003 PMCID: PMC9931646 DOI: 10.1093/bioinformatics/btad062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 12/05/2022] [Accepted: 01/26/2023] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION Identifying regulatory regions in the genome is of great interest for understanding the epigenomic landscape in cells. One fundamental challenge in this context is to find the target genes whose expression is affected by the regulatory regions. A recent successful method is the Activity-By-Contact (ABC) model which scores enhancer-gene interactions based on enhancer activity and the contact frequency of an enhancer to its target gene. However, it describes regulatory interactions entirely from a gene's perspective, and does not account for all the candidate target genes of an enhancer. In addition, the ABC model requires two types of assays to measure enhancer activity, which limits the applicability. Moreover, there is neither implementation available that could allow for an integration with transcription factor (TF) binding information nor an efficient analysis of single-cell data. RESULTS We demonstrate that the ABC score can yield a higher accuracy by adapting the enhancer activity according to the number of contacts the enhancer has to its candidate target genes and also by considering all annotated transcription start sites of a gene. Further, we show that the model is comparably accurate with only one assay to measure enhancer activity. We combined our generalized ABC model with TF binding information and illustrated an analysis of a single-cell ATAC-seq dataset of the human heart, where we were able to characterize cell type-specific regulatory interactions and predict gene expression based on TF affinities. All executed processing steps are incorporated into our new computational pipeline STARE. AVAILABILITY AND IMPLEMENTATION The software is available at https://github.com/schulzlab/STARE. CONTACT marcel.schulz@em.uni-frankfurt.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dennis Hecker
- Institute of Cardiovascular Regeneration, Goethe University Hospital
- Cardio-Pulmonary Institute, Goethe University
- German Centre for Cardiovascular Research, Partner site Rhine-Main, Frankfurt am Main 60590
| | - Fatemeh Behjati Ardakani
- Institute of Cardiovascular Regeneration, Goethe University Hospital
- Cardio-Pulmonary Institute, Goethe University
- German Centre for Cardiovascular Research, Partner site Rhine-Main, Frankfurt am Main 60590
| | - Alexander Karollus
- School of Computation, Information and Technology, Technical University of Munich, Garching 85748
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching 85748
- Institute of Human Genetics, Technical University of Munich, Munich 81675
- Computational Health Center, Helmholtz Center Munich, Neuherberg 85764
- Munich Data Science Institute, Technical University of Munich, Garching 85748, Germany
| | - Marcel H Schulz
- Institute of Cardiovascular Regeneration, Goethe University Hospital
- Cardio-Pulmonary Institute, Goethe University
- German Centre for Cardiovascular Research, Partner site Rhine-Main, Frankfurt am Main 60590
| |
Collapse
|
5
|
Deshpande A, Chu LF, Stewart R, Gitter A. Network inference with Granger causality ensembles on single-cell transcriptomics. Cell Rep 2022; 38:110333. [PMID: 35139376 PMCID: PMC9093087 DOI: 10.1016/j.celrep.2022.110333] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 02/19/2021] [Accepted: 01/12/2022] [Indexed: 12/20/2022] Open
Abstract
Cellular gene expression changes throughout a dynamic biological process, such as differentiation. Pseudotimes estimate cells' progress along a dynamic process based on their individual gene expression states. Ordering the expression data by pseudotime provides information about the underlying regulator-gene interactions. Because the pseudotime distribution is not uniform, many standard mathematical methods are inapplicable for analyzing the ordered gene expression states. Here we present single-cell inference of networks using Granger ensembles (SINGE), an algorithm for gene regulatory network inference from ordered single-cell gene expression data. SINGE uses kernel-based Granger causality regression to smooth irregular pseudotimes and missing expression values. It aggregates predictions from an ensemble of regression analyses to compile a ranked list of candidate interactions between transcriptional regulators and target genes. In two mouse embryonic stem cell differentiation datasets, SINGE outperforms other contemporary algorithms. However, a more detailed examination reveals caveats about poor performance for individual regulators and uninformative pseudotimes.
Collapse
Affiliation(s)
- Atul Deshpande
- Department of Electrical and Computer Engineering, University of Wisconsin - Madison, Madison, WI 53706, USA; Morgridge Institute for Research, Madison, WI 53715, USA
| | - Li-Fang Chu
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Ron Stewart
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Anthony Gitter
- Morgridge Institute for Research, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53792, USA.
| |
Collapse
|
6
|
Yang TH, Wang CY, Tsai HC, Yang YC, Liu CT. YTLR: Extracting yeast transcription factor-gene associations from the literature using automated literature readers. Comput Struct Biotechnol J 2022; 20:4636-4644. [PMID: 36090812 PMCID: PMC9449546 DOI: 10.1016/j.csbj.2022.08.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/16/2022] [Accepted: 08/18/2022] [Indexed: 11/19/2022] Open
Abstract
Cells adapt to environmental stresses mainly via transcription reprogramming. Correct transcription control is mediated by the interactions between transcription factors (TF) and their target genes. These TF-gene associations can be probed by chromatin immunoprecipitation techniques and knockout experiments, revealing TF binding (TFB) and regulatory (TFR) evidence, respectively. Nevertheless, most evidence is still fragmentary in the literature and requires tremendous human resources to curate. We developed the first pipeline called YTLR (Yeast Transcription-regulation Literature Reader) to automate TF-gene relation extraction from the literature. YTLR first identifies articles with TFB and TFR information. Then TF-gene binding pairs are extracted from the TFB articles, and TF-gene regulatory associations are recognized from the TFR papers. On gathered test sets, YTLR achieves an AUC value of 98.8% in identifying articles with TFB evidence and AUC = 83.4% in extracting the detailed TF-gene binding pairs. And similarly, YTLR also obtains an AUC value of 98.2% in identifying TFR articles and AUC = 80.4% in extracting the detailed TF-gene regulatory associations. Furthermore, YTLR outperforms previous methods in both tasks. To facilitate researchers in extracting TF-gene transcriptional relations from large-scale queried articles, an automated and easy-to-use software tool based on the YTLR pipeline is constructed. In summary, YTLR aims to provide easier literature pre-screening for curators and help researchers gather yeast TF-gene transcriptional relation conclusions from articles in a high-throughput fashion. The YTLR pipeline software tool can be downloaded at https://github.com/cobisLab/YTLR/.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Biomedical Engineering, National Cheng Kung University, No.1, University Road, Tainan 701, Taiwan
- Corresponding author.
| | - Chung-Yu Wang
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Hsiu-Chun Tsai
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Ya-Chiao Yang
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Cheng-Tse Liu
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| |
Collapse
|
7
|
Bergenholm D, Dabirian Y, Ferreira R, Siewers V, David F, Nielsen J. Rational gRNA design based on transcription factor binding data. Synth Biol (Oxf) 2021; 6:ysab014. [PMID: 34712839 PMCID: PMC8546606 DOI: 10.1093/synbio/ysab014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 04/21/2021] [Accepted: 06/08/2021] [Indexed: 11/14/2022] Open
Abstract
The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system has become a standard tool in many genome engineering endeavors. The endonuclease-deficient version of Cas9 (dCas9) is also a powerful programmable tool for gene regulation. In this study, we made use of Saccharomyces cerevisiae transcription factor (TF) binding data to obtain a better understanding of the interplay between TF binding and binding of dCas9 fused to an activator domain, VPR. More specifically, we targeted dCas9–VPR toward binding sites of Gcr1–Gcr2 and Tye7 present in several promoters of genes encoding enzymes engaged in the central carbon metabolism. From our data, we observed an upregulation of gene expression when dCas9–VPR was targeted next to a TF binding motif, whereas a downregulation or no change was observed when dCas9 was bound on a TF motif. This suggests a steric competition between dCas9 and the specific TF. Integrating TF binding data, therefore, proved to be useful for designing guide RNAs for CRISPR interference or CRISPR activation applications.
Collapse
Affiliation(s)
- David Bergenholm
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Yasaman Dabirian
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Raphael Ferreira
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Verena Siewers
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Florian David
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
8
|
Alvarez JM, Brooks MD, Swift J, Coruzzi GM. Time-Based Systems Biology Approaches to Capture and Model Dynamic Gene Regulatory Networks. ANNUAL REVIEW OF PLANT BIOLOGY 2021; 72:105-131. [PMID: 33667112 PMCID: PMC9312366 DOI: 10.1146/annurev-arplant-081320-090914] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
All aspects of transcription and its regulation involve dynamic events. However, capturing these dynamic events in gene regulatory networks (GRNs) offers both a promise and a challenge. The promise is that capturing and modeling the dynamic changes in GRNs will allow us to understand how organisms adapt to a changing environment. The ability to mount a rapid transcriptional response to environmental changes is especially important in nonmotile organisms such as plants. The challenge is to capture these dynamic, genome-wide events and model them in GRNs. In this review, we cover recent progress in capturing dynamic interactions of transcription factors with their targets-at both the local and genome-wide levels-and how they are used to learn how GRNs operate as a function of time. We also discuss recent advances that employ time-based machine learning approaches to forecast gene expression at future time points, a key goal of systems biology.
Collapse
Affiliation(s)
- Jose M Alvarez
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
- ANID-Millennium Science Initiative Program-Millennium Institute for Integrative Biology (iBio), Santiago, Chile
| | - Matthew D Brooks
- Global Change and Photosynthesis Research Unit, US Department of Agriculture Agricultural Research Service, Urbana, Illinois 61801, USA
| | - Joseph Swift
- Salk Institute for Biological Studies, La Jolla, California 92037, USA
| | - Gloria M Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA;
| |
Collapse
|
9
|
Brooks MD, Juang CL, Katari MS, Alvarez JM, Pasquino A, Shih HJ, Huang J, Shanks C, Cirrone J, Coruzzi GM. ConnecTF: A platform to integrate transcription factor-gene interactions and validate regulatory networks. PLANT PHYSIOLOGY 2021; 185:49-66. [PMID: 33631799 PMCID: PMC8133578 DOI: 10.1093/plphys/kiaa012] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 10/27/2020] [Indexed: 05/08/2023]
Abstract
Deciphering gene regulatory networks (GRNs) is both a promise and challenge of systems biology. The promise lies in identifying key transcription factors (TFs) that enable an organism to react to changes in its environment. The challenge lies in validating GRNs that involve hundreds of TFs with hundreds of thousands of interactions with their genome-wide targets experimentally determined by high-throughput sequencing. To address this challenge, we developed ConnecTF, a species-independent, web-based platform that integrates genome-wide studies of TF-target binding, TF-target regulation, and other TF-centric omic datasets and uses these to build and refine validated or inferred GRNs. We demonstrate the functionality of ConnecTF by showing how integration within and across TF-target datasets uncovers biological insights. Case study 1 uses integration of TF-target gene regulation and binding datasets to uncover TF mode-of-action and identify potential TF partners for 14 TFs in abscisic acid signaling. Case study 2 demonstrates how genome-wide TF-target data and automated functions in ConnecTF are used in precision/recall analysis and pruning of an inferred GRN for nitrogen signaling. Case study 3 uses ConnecTF to chart a network path from NLP7, a master TF in nitrogen signaling, to direct secondary TF2s and to its indirect targets in a Network Walking approach. The public version of ConnecTF (https://ConnecTF.org) contains 3,738,278 TF-target interactions for 423 TFs in Arabidopsis, 839,210 TF-target interactions for 139 TFs in maize (Zea mays), and 293,094 TF-target interactions for 26 TFs in rice (Oryza sativa). The database and tools in ConnecTF will advance the exploration of GRNs in plant systems biology applications for model and crop species.
Collapse
Affiliation(s)
- Matthew D Brooks
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
- USDA ARS Global Change and Photosynthesis Research Unit, Urbana, IL, USA
| | - Che-Lun Juang
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - Manpreet Singh Katari
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - José M Alvarez
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
- Millennium Institute for Integrative Biology (iBio), Santiago, Chile
| | - Angelo Pasquino
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - Hung-Jui Shih
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - Ji Huang
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - Carly Shanks
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | - Jacopo Cirrone
- Courant Institute for Mathematical Sciences, Department of Computer Science, New York University NY, USA
| | - Gloria M Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
- Author for communication: (G.C.)
| |
Collapse
|
10
|
Lai X, Stigliani A, Lucas J, Hugouvieux V, Parcy F, Zubieta C. Genome-wide binding of SEPALLATA3 and AGAMOUS complexes determined by sequential DNA-affinity purification sequencing. Nucleic Acids Res 2020; 48:9637-9648. [PMID: 32890394 PMCID: PMC7515736 DOI: 10.1093/nar/gkaa729] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 08/17/2020] [Accepted: 08/24/2020] [Indexed: 01/18/2023] Open
Abstract
The MADS transcription factors (TF), SEPALLATA3 (SEP3) and AGAMOUS (AG) are required for floral organ identity and floral meristem determinacy. While dimerization is obligatory for DNA binding, SEP3 and SEP3–AG also form tetrameric complexes. How homo and hetero-dimerization and tetramerization of MADS TFs affect genome-wide DNA-binding and gene regulation is not known. Using sequential DNA affinity purification sequencing (seq-DAP-seq), we determined genome-wide binding of SEP3 homomeric and SEP3–AG heteromeric complexes, including SEP3Δtet-AG, a complex with a SEP3 splice variant, SEP3Δtet, which is largely dimeric and SEP3–AG tetramer. SEP3 and SEP3–AG share numerous bound regions, however each complex bound unique sites, demonstrating that protein identity plays a role in DNA-binding. SEP3–AG and SEP3Δtet-AG share a similar genome-wide binding pattern; however the tetrameric form could access new sites and demonstrated a global increase in DNA-binding affinity. Tetramerization exhibited significant cooperative binding with preferential distances between two sites, allowing efficient binding to regions that are poorly recognized by dimeric SEP3Δtet-AG. By intersecting seq-DAP-seq with ChIP-seq and expression data, we identified unique target genes bound either in SEP3–AG seq-DAP-seq or in SEP3/AG ChIP-seq. Seq-DAP-seq is a versatile genome-wide technique and complements in vivo methods to identify putative direct regulatory targets.
Collapse
Affiliation(s)
- Xuelei Lai
- Laboratoire de Physiologie Cellulaire et Végétale, Université Grenoble-Alpes, CNRS, CEA, INRAE, IRIG-DBSCI, 38000 Grenoble, France
| | - Arnaud Stigliani
- Laboratoire de Physiologie Cellulaire et Végétale, Université Grenoble-Alpes, CNRS, CEA, INRAE, IRIG-DBSCI, 38000 Grenoble, France.,Biotech Research and Innovation Centre, University of Copenhagen, Copenhagen, DK-2200, Denmark.,Department of Biology, University of Copenhagen, Copenhagen, DK-2200 Denmark
| | - Jérémy Lucas
- Laboratoire de Physiologie Cellulaire et Végétale, Université Grenoble-Alpes, CNRS, CEA, INRAE, IRIG-DBSCI, 38000 Grenoble, France
| | - Véronique Hugouvieux
- Laboratoire de Physiologie Cellulaire et Végétale, Université Grenoble-Alpes, CNRS, CEA, INRAE, IRIG-DBSCI, 38000 Grenoble, France
| | - François Parcy
- Laboratoire de Physiologie Cellulaire et Végétale, Université Grenoble-Alpes, CNRS, CEA, INRAE, IRIG-DBSCI, 38000 Grenoble, France
| | - Chloe Zubieta
- Laboratoire de Physiologie Cellulaire et Végétale, Université Grenoble-Alpes, CNRS, CEA, INRAE, IRIG-DBSCI, 38000 Grenoble, France
| |
Collapse
|
11
|
Ko DK, Brandizzi F. Network-based approaches for understanding gene regulation and function in plants. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 104:302-317. [PMID: 32717108 PMCID: PMC8922287 DOI: 10.1111/tpj.14940] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Accepted: 07/14/2020] [Indexed: 05/03/2023]
Abstract
Expression reprogramming directed by transcription factors is a primary gene regulation underlying most aspects of the biology of any organism. Our views of how gene regulation is coordinated are dramatically changing thanks to the advent and constant improvement of high-throughput profiling and transcriptional network inference methods: from activities of individual genes to functional interactions across genes. These technical and analytical advances can reveal the topology of transcriptional networks in which hundreds of genes are hierarchically regulated by multiple transcription factors at systems level. Here we review the state of the art of experimental and computational methods used in plant biology research to obtain large-scale datasets and model transcriptional networks. Examples of direct use of these network models and perspectives on their limitations and future directions are also discussed.
Collapse
Affiliation(s)
- Dae Kwan Ko
- MSU-DOE Plant Research Lab, Michigan State University, East Lansing, MI 48824, USA
- Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
| | - Federica Brandizzi
- MSU-DOE Plant Research Lab, Michigan State University, East Lansing, MI 48824, USA
- Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
- For correspondence ()
| |
Collapse
|
12
|
Cirrone J, Brooks MD, Bonneau R, Coruzzi GM, Shasha DE. OutPredict: multiple datasets can improve prediction of expression and inference of causality. Sci Rep 2020; 10:6804. [PMID: 32321967 PMCID: PMC7176633 DOI: 10.1038/s41598-020-63347-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 03/26/2020] [Indexed: 01/09/2023] Open
Abstract
The ability to accurately predict the causal relationships from transcription factors to genes would greatly enhance our understanding of transcriptional dynamics. This could lead to applications in which one or more transcription factors could be manipulated to effect a change in genes leading to the enhancement of some desired trait. Here we present a method called OutPredict that constructs a model for each gene based on time series (and other) data and that predicts gene's expression in a previously unseen subsequent time point. The model also infers causal relationships based on the most important transcription factors for each gene model, some of which have been validated from previous physical experiments. The method benefits from known network edges and steady-state data to enhance predictive accuracy. Our results across B. subtilis, Arabidopsis, E.coli, Drosophila and the DREAM4 simulated in silico dataset show improved predictive accuracy ranging from 40% to 60% over other state-of-the-art methods. We find that gene expression models can benefit from the addition of steady-state data to predict expression values of time series. Finally, we validate, based on limited available data, that the influential edges we infer correspond to known relationships significantly more than expected by chance or by state-of-the-art methods.
Collapse
Affiliation(s)
- Jacopo Cirrone
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, 10012, USA.
| | - Matthew D Brooks
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
| | - Richard Bonneau
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, 10012, USA
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, 10010, USA
| | - Gloria M Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
| | - Dennis E Shasha
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, 10012, USA
| |
Collapse
|
13
|
Kang Y, Patel NR, Shively C, Recio PS, Chen X, Wranik BJ, Kim G, McIsaac RS, Mitra R, Brent MR. Dual threshold optimization and network inference reveal convergent evidence from TF binding locations and TF perturbation responses. Genome Res 2020; 30:459-471. [PMID: 32060051 PMCID: PMC7111528 DOI: 10.1101/gr.259655.119] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 02/11/2020] [Indexed: 12/22/2022]
Abstract
A high-confidence map of the direct, functional targets of each transcription factor (TF) requires convergent evidence from independent sources. Two significant sources of evidence are TF binding locations and the transcriptional responses to direct TF perturbations. Systematic data sets of both types exist for yeast and human, but they rarely converge on a common set of direct, functional targets for a TF. Even the few genes that are both bound and responsive may not be direct functional targets. Our analysis shows that when there are many nonfunctional binding sites and many indirect targets, nonfunctional sites are expected to occur in the cis-regulatory DNA of indirect targets by chance. To address this problem, we introduce dual threshold optimization (DTO), a new method for setting significance thresholds on binding and perturbation-response data, and show that it improves convergence. It also enables comparison of binding data to perturbation-response data that have been processed by network inference algorithms, which further improves convergence. The combination of dual threshold optimization and network inference greatly expands the high-confidence TF network map in both yeast and human. Next, we analyze a comprehensive new data set measuring the transcriptional response shortly after inducing overexpression of a yeast TF. We also present a new yeast binding location data set obtained by transposon calling cards and compare it to recent ChIP-exo data. These new data sets improve convergence and expand the high-confidence network synergistically.
Collapse
Affiliation(s)
- Yiming Kang
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| | - Nikhil R Patel
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| | - Christian Shively
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Pamela Samantha Recio
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Xuhua Chen
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Bernd J Wranik
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - Griffin Kim
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - R Scott McIsaac
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - Robi Mitra
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
14
|
Singh A, Choudhuri P, Chandradoss KR, Lal M, Mishra SK, Sandhu KS. Does genome surveillance explain the global discrepancy between binding and effect of chromatin factors? FEBS Lett 2020; 594:1339-1353. [PMID: 31930486 DOI: 10.1002/1873-3468.13729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 12/16/2019] [Accepted: 12/19/2019] [Indexed: 11/11/2022]
Abstract
Knocking out a chromatin factor often does not alter the transcription of its binding targets. What explains the observed disconnect between binding and effect? We hypothesize that this discrepancy could be associated with the role of chromatin factors in maintaining genetic and epigenetic integrity at promoters, and not necessarily with transcription. Through re-analysis of published datasets, we present several lines of evidence that support our hypothesis and deflate the popular assumptions. We also tested the hypothesis through mutation accumulation assays on yeast knockouts of chromatin factors. Altogether, the proposed hypothesis presents a simple explanation for the global discord between chromatin factor binding and effect. Future work in this direction might fortify the hypothesis and elucidate the underlying mechanisms.
Collapse
Affiliation(s)
- Arashdeep Singh
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER)-Mohali, India
| | - Poulami Choudhuri
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER)-Mohali, India
| | | | - Mohan Lal
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER)-Mohali, India
| | - Shravan Kumar Mishra
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER)-Mohali, India
| | - Kuljeet Singh Sandhu
- Department of Biological Sciences, Indian Institute of Science Education and Research (IISER)-Mohali, India
| |
Collapse
|
15
|
Yang TH. Transcription factor regulatory modules provide the molecular mechanisms for functional redundancy observed among transcription factors in yeast. BMC Bioinformatics 2019; 20:630. [PMID: 31881824 PMCID: PMC6933673 DOI: 10.1186/s12859-019-3212-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Current technologies for understanding the transcriptional reprogramming in cells include the transcription factor (TF) chromatin immunoprecipitation (ChIP) experiments and the TF knockout experiments. The ChIP experiments show the binding targets of TFs against which the antibody directs while the knockout techniques find the regulatory gene targets of the knocked-out TFs. However, it was shown that these two complementary results contain few common targets. Researchers have used the concept of TF functional redundancy to explain the low overlap between these two techniques. But the detailed molecular mechanisms behind TF functional redundancy remain unknown. Without knowing the possible molecular mechanisms, it is hard for biologists to fully unravel the cause of TF functional redundancy. RESULTS To mine out the molecular mechanisms, a novel algorithm to extract TF regulatory modules that help explain the observed TF functional redundancy effect was devised and proposed in this research. The method first searched for candidate TF sets from the TF binding data. Then based on these candidate sets the method utilized the modified Steiner Tree construction algorithm to construct the possible TF regulatory modules from protein-protein interaction data and finally filtered out the noise-induced results by using confidence tests. The mined-out regulatory modules were shown to correlate to the concept of functional redundancy and provided testable hypotheses of the molecular mechanisms behind functional redundancy. And the biological significance of the mined-out results was demonstrated in three different biological aspects: ontology enrichment, protein interaction prevalence and expression coherence. About 23.5% of the mined-out TF regulatory modules were literature-verified. Finally, the biological applicability of the proposed method was shown in one detailed example of a verified TF regulatory module for pheromone response and filamentous growth in yeast. CONCLUSION In this research, a novel method that mined out the potential TF regulatory modules which elucidate the functional redundancy observed among TFs is proposed. The extracted TF regulatory modules not only correlate the molecular mechanisms to the observed functional redundancy among TFs, but also show biological significance in inferring TF functional binding target genes. The results provide testable hypotheses for biologists to further design subsequent research and experiments.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd, Kaohsiung, 81148, Taiwan.
| |
Collapse
|
16
|
Harrop TWR, Mantegazza O, Luong AM, Béthune K, Lorieux M, Jouannic S, Adam H. A set of AP2-like genes is associated with inflorescence branching and architecture in domesticated rice. JOURNAL OF EXPERIMENTAL BOTANY 2019; 70:5617-5629. [PMID: 31346594 PMCID: PMC6812710 DOI: 10.1093/jxb/erz340] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 07/15/2019] [Indexed: 05/25/2023]
Abstract
Rice yield is influenced by inflorescence size and architecture, and inflorescences from domesticated rice accessions produce more branches and grains. Neither the molecular control of branching nor the developmental differences between wild and domesticated rice accessions are fully understood. We surveyed phenotypes related to branching, size, and grain yield across 91 wild and domesticated African and Asian accessions. Characteristics related to axillary meristem identity were the main phenotypic differences between inflorescences from wild and domesticated accessions. We used whole transcriptome sequencing in developing inflorescences to measure gene expression before and after the transition from branching axillary meristems to determinate spikelet meristems. We identified a core set of genes associated with axillary meristem identity in Asian and African rice, and another set associated with phenotypic variability between wild and domesticated accessions. AP2/EREBP-like genes were enriched in both sets, suggesting that they are key factors in inflorescence branching and rice domestication. Our work has identified new candidates in the molecular control of inflorescence development and grain yield, and provides a detailed description of the effects of domestication on phenotype and gene expression.
Collapse
Affiliation(s)
- Thomas W R Harrop
- Laboratory for Evolution and Development, Department of Biochemistry, University of Otago, Dunedin, Aotearoa, New Zealand
| | | | - Ai My Luong
- University of Montpellier, DIADE, IRD, France
| | | | - Mathias Lorieux
- Rice genetics and Genomics Laboratory, International Center for Tropical Agriculture, Cali 6713, Colombia
| | | | - Hélène Adam
- University of Montpellier, DIADE, IRD, France
| |
Collapse
|
17
|
Thormann V, Rothkegel MC, Schöpflin R, Glaser LV, Djuric P, Li N, Chung HR, Schwahn K, Vingron M, Meijsing SH. Genomic dissection of enhancers uncovers principles of combinatorial regulation and cell type-specific wiring of enhancer-promoter contacts. Nucleic Acids Res 2019; 46:2868-2882. [PMID: 29385519 PMCID: PMC5888794 DOI: 10.1093/nar/gky051] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 01/19/2018] [Indexed: 12/19/2022] Open
Abstract
Genomic binding of transcription factors, like the glucocorticoid receptor (GR), is linked to the regulation of genes. However, as we show here, GR binding is a poor predictor of GR-dependent gene regulation even when taking the 3D organization of the genome into account. To connect GR binding sites to the regulation of genes in the endogenous genomic context, we turned to genome editing. By deleting GR binding sites, individually or in combination, we uncovered how cooperative interactions between binding sites contribute to the regulation of genes. Specifically, for the GR target gene GILZ, we show that the simultaneous presence of a cluster of GR binding sites is required for the activity of an individual enhancer and that the GR-dependent regulation of GILZ depends on multiple GR-bound enhancers. Further, by deleting GR binding sites that are shared between different cell types, we show how cell type-specific genome organization and enhancer-blocking can result in cell type-specific wiring of promoter–enhancer contacts. This rewiring allows an individual GR binding site shared between different cell types to direct the expression of distinct transcripts and thereby contributes to the cell type-specific consequences of glucocorticoid signaling.
Collapse
Affiliation(s)
- Verena Thormann
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Maika C Rothkegel
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Robert Schöpflin
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Laura V Glaser
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Petar Djuric
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Na Li
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Ho-Ryun Chung
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Kevin Schwahn
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Martin Vingron
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| | - Sebastiaan H Meijsing
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-67, 14195 Berlin, Germany
| |
Collapse
|
18
|
Kuang Z, Ji Z, Boeke JD, Ji H. Dynamic motif occupancy (DynaMO) analysis identifies transcription factors and their binding sites driving dynamic biological processes. Nucleic Acids Res 2019; 46:e2. [PMID: 29325176 PMCID: PMC5758894 DOI: 10.1093/nar/gkx905] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Accepted: 09/26/2017] [Indexed: 01/02/2023] Open
Abstract
Biological processes are usually associated with genome-wide remodeling of transcription driven by transcription factors (TFs). Identifying key TFs and their spatiotemporal binding patterns are indispensable to understanding how dynamic processes are programmed. However, most methods are designed to predict TF binding sites only. We present a computational method, dynamic motif occupancy analysis (DynaMO), to infer important TFs and their spatiotemporal binding activities in dynamic biological processes using chromatin profiling data from multiple biological conditions such as time-course histone modification ChIP-seq data. In the first step, DynaMO predicts TF binding sites with a random forests approach. Next and uniquely, DynaMO infers dynamic TF binding activities at predicted binding sites using their local chromatin profiles from multiple biological conditions. Another landmark of DynaMO is to identify key TFs in a dynamic process using a clustering and enrichment analysis of dynamic TF binding patterns. Application of DynaMO to the yeast ultradian cycle, mouse circadian clock and human neural differentiation exhibits its accuracy and versatility. We anticipate DynaMO will be generally useful for elucidating transcriptional programs in dynamic processes.
Collapse
Affiliation(s)
- Zheng Kuang
- Institute for Systems Genetics, NYU Langone Medical Center, New York City, NY 10016, USA.,Department of Biochemistry and Molecular Pharmacology, NYU Langone Medical Center, New York City, NY 10016, USA.,Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Zhicheng Ji
- Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Jef D Boeke
- Institute for Systems Genetics, NYU Langone Medical Center, New York City, NY 10016, USA.,Department of Biochemistry and Molecular Pharmacology, NYU Langone Medical Center, New York City, NY 10016, USA
| | - Hongkai Ji
- Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205, USA
| |
Collapse
|
19
|
Holland P, Bergenholm D, Börlin CS, Liu G, Nielsen J. Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter usage between metabolic conditions. Nucleic Acids Res 2019; 47:4986-5000. [PMID: 30976803 PMCID: PMC6547448 DOI: 10.1093/nar/gkz253] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 03/26/2019] [Accepted: 04/04/2019] [Indexed: 01/08/2023] Open
Abstract
Transcription factors (TF) are central to transcriptional regulation, but they are often studied in relative isolation and without close control of the metabolic state of the cell. Here, we describe genome-wide binding (by ChIP-exo) of 15 yeast TFs in four chemostat conditions that cover a range of metabolic states. We integrate this data with transcriptomics and six additional recently mapped TFs to identify predictive models describing how TFs control gene expression in different metabolic conditions. Contributions by TFs to gene regulation are predicted to be mostly activating, additive and well approximated by assuming linear effects from TF binding signal. Notably, using TF binding peaks from peak finding algorithms gave distinctly worse predictions than simply summing the low-noise and high-resolution TF ChIP-exo reads on promoters. Finally, we discover indications of a novel functional role for three TFs; Gcn4, Ert1 and Sut1 during nitrogen limited aerobic fermentation. In only this condition, the three TFs have correlated binding to a large number of genes (enriched for glycolytic and translation processes) and a negative correlation to target gene transcript levels.
Collapse
Affiliation(s)
- Petter Holland
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-41296, Sweden
| | - David Bergenholm
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-41296, Sweden
| | - Christoph S Börlin
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-41296, Sweden
| | - Guodong Liu
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-41296, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-41296, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Gothenburg SE-41296, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby DK-2800, Denmark
| |
Collapse
|
20
|
Castro DM, de Veaux NR, Miraldi ER, Bonneau R. Multi-study inference of regulatory networks for more accurate models of gene regulation. PLoS Comput Biol 2019; 15:e1006591. [PMID: 30677040 PMCID: PMC6363223 DOI: 10.1371/journal.pcbi.1006591] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Revised: 02/05/2019] [Accepted: 10/23/2018] [Indexed: 12/16/2022] Open
Abstract
Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets. Due to increasing availability of biological data, methods to properly integrate data generated across the globe become essential for extracting reproducible insights into relevant research questions. In this work, we developed a framework to reconstruct gene regulatory networks from expression datasets generated in separate studies—and thus, because of technical variation (different dates, handlers, laboratories, protocols etc…), challenging to integrate. Since regulatory mechanisms are often shared across conditions, we hypothesized that drawing conclusions from various data sources would improve performance of gene regulatory network inference. By transferring knowledge among regulatory models, our method is able to detect weaker patterns that are conserved across datasets, while also being able to detect dataset-unique interactions. We also allow incorporation of prior knowledge on network structure to favor models that are somewhat similar to the prior itself. Using two model organisms, we show that joint network inference outperforms inference from a single dataset. We also demonstrate that our method is robust to false edges in the prior and to low condition overlap across datasets, and that it can outperform current data integration strategies.
Collapse
Affiliation(s)
| | - Nicholas R de Veaux
- Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA
| | - Emily R Miraldi
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA.,Divisions of Immunobiology & Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA
| | - Richard Bonneau
- New York University, New York, NY 10003, USA.,Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA
| |
Collapse
|
21
|
Reconstruction of a Global Transcriptional Regulatory Network for Control of Lipid Metabolism in Yeast by Using Chromatin Immunoprecipitation with Lambda Exonuclease Digestion. mSystems 2018; 3:mSystems00215-17. [PMID: 30073202 PMCID: PMC6068829 DOI: 10.1128/msystems.00215-17] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 07/04/2018] [Indexed: 11/20/2022] Open
Abstract
To build transcription regulatory networks, transcription factor binding must be analyzed in cells grown under different conditions because their responses and targets differ depending on environmental conditions. We performed whole-genome analysis of the DNA binding of five Saccharomyces cerevisiae transcription factors involved in lipid metabolism, Ino2, Ino4, Hap1, Oaf1, and Pip2, in response to four different environmental conditions in chemostat cultures, which allowed us to keep the specific growth rate constant. Chromatin immunoprecipitation with lambda exonuclease digestion (ChIP-exo) enabled the detection of binding events at a high resolution. We discovered a large number of unidentified targets and thus expanded functions for each transcription factor (e.g., glutamate biosynthesis as a target of Oaf1 and Pip2). Moreover, condition-dependent binding of transcription factors in response to cell metabolic state (e.g., differential binding of Ino2 between fermentative and respiratory metabolic conditions) was clearly suggested. Combining the new binding data with previously published data from transcription factor deletion studies revealed the high complexity of the transcriptional regulatory network for lipid metabolism in yeast, which involves the combinatorial and complementary regulation by multiple transcription factors. We anticipate that our work will provide insights into transcription factor binding dynamics that will prove useful for the understanding of transcription regulatory networks. IMPORTANCE Transcription factors play a crucial role in the regulation of gene expression and adaptation to different environments. To better understand the underlying roles of these adaptations, we performed experiments that give us high-resolution binding of transcription factors to their targets. We investigated five transcription factors involved in lipid metabolism in yeast, and we discovered multiple novel targets and condition-specific responses that allow us to draw a better regulatory map of the lipid metabolism.
Collapse
|
22
|
Seberg HE, Van Otterloo E, Cornell RA. Beyond MITF: Multiple transcription factors directly regulate the cellular phenotype in melanocytes and melanoma. Pigment Cell Melanoma Res 2018. [PMID: 28649789 DOI: 10.1111/pcmr.12611] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
MITF governs multiple steps in the development of melanocytes, including specification from neural crest, growth, survival, and terminal differentiation. In addition, the level of MITF activity determines the phenotype adopted by melanoma cells, whether invasive, proliferative, or differentiated. However, MITF does not act alone. Here, we review literature on the transcription factors that co-regulate MITF-dependent genes. ChIP-seq studies have indicated that the transcription factors SOX10, YY1, and TFAP2A co-occupy subsets of regulatory elements bound by MITF in melanocytes. Analyses at single loci also support roles for LEF1, RB1, IRF4, and PAX3 acting in combination with MITF, while sequence motif analyses suggest that additional transcription factors colocalize with MITF at many melanocyte-specific regulatory elements. However, the precise biochemical functions of each of these MITF collaborators and their contributions to gene expression remain to be elucidated. Analogous to the transcriptional networks in morphogen-patterned tissues during embryogenesis, we anticipate that the level of MITF activity is controlled not only by the concentration of activated MITF, but also by additional transcription factors that either quantitatively or qualitatively influence the expression of MITF-target genes.
Collapse
Affiliation(s)
- Hannah E Seberg
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA, USA
| | - Eric Van Otterloo
- SDM-Craniofacial Biology, University of Colorado - Anschutz Medical Campus, Aurora, CO, USA
| | - Robert A Cornell
- Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA, USA.,Department of Anatomy and Cell Biology, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
23
|
Kang Y, Liow HH, Maier EJ, Brent MR. NetProphet 2.0: mapping transcription factor networks by exploiting scalable data resources. Bioinformatics 2017; 34:249-257. [PMID: 28968736 PMCID: PMC5860202 DOI: 10.1093/bioinformatics/btx563] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 03/14/2017] [Accepted: 09/11/2017] [Indexed: 11/15/2022] Open
Abstract
Motivation Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and ‘integrative’ algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types. Results We present NetProphet 2.0, a ‘data light’ algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map. Availability and implementation Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yiming Kang
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| | - Hien-Haw Liow
- Department of Mathematics, Washington University, Saint Louis, MO, USA
| | - Ezekiel J Maier
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| | - Michael R Brent
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| |
Collapse
|
24
|
Rudnik R, Bulcha JT, Reifschneider E, Ellersiek U, Baier M. Specificity versus redundancy in the RAP2.4 transcription factor family of Arabidopsis thaliana: transcriptional regulation of genes for chloroplast peroxidases. BMC PLANT BIOLOGY 2017; 17:144. [PMID: 28835225 PMCID: PMC5569508 DOI: 10.1186/s12870-017-1092-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2017] [Accepted: 08/14/2017] [Indexed: 05/07/2023]
Abstract
BACKGROUND The Arabidopsis ERFIb / RAP2.4 transcription factor family consists of eight members with highly conserved DNA binding domains. Selected members have been characterized individually, but a systematic comparison is pending. The redox-sensitive transcription factor RAP2.4a mediates chloroplast-to-nucleus redox signaling and controls induction of the three most prominent chloroplast peroxidases, namely 2-Cys peroxiredoxin A (2CPA) and thylakoid- and stromal ascorbate peroxidase (tAPx and sAPx). To test the specificity and redundancy of RAP2.4 transcription factors in the regulation of genes for chloroplast peroxidases, we compared the DNA-binding sites of the transcription factors in tertiary structure models, analyzed transcription factor and target gene regulation by qRT-PCR in RAP2.4, 2-Cys peroxiredoxin and ascorbate peroxidase T-DNA insertion lines and RAP2.4 overexpressing lines of Arabidopsis thaliana and performed promoter binding studies. RESULTS All RAP2.4 proteins bound the tAPx promoter, but only the four RAP2.4 proteins with identical DNA contact sites, namely RAP2.4a, RAP2.4b, RAP2.4d and RAP2.4h, interacted stably with the redox-sensitive part of the 2CPA promoter. Gene expression analysis in RAP2.4 knockout lines revealed that RAP2.4a is the only one supporting 2CPA and chloroplast APx expression. Rap2.4h binds to the same promoter region as Rap2.4a and antagonizes 2CPA expression. Like the other six RAP2.4 proteins, Rap2.4 h promotes APx mRNA accumulation. Chloroplast ROS signals induced RAP2.4b and RAP2.4d expression, but these two transcription factor genes are (in contrast to RAP2.4a) insensitive to low 2CP availability, and their expression decreased in APx knockout lines. RAP2.4e and RAP2.4f gradually responded to chloroplast APx availability and activated specifically APx expression. These transcription factors bound, like RAP2.4c and RAP2.4g, the tAPx promoter, but hardly the 2CPA promoter. CONCLUSIONS The RAP2.4 transcription factors form an environmentally and developmentally regulated transcription factor network, in which the various members affect the expression intensity of the others. Within the transcription factor family, RAP2.4a has a unique function as a general transcriptional activator of chloroplast peroxidase activity. The other RAP2.4 proteins mediate the fine-control and adjust the relative availability of 2CPA, sAPx and tAPx.
Collapse
Affiliation(s)
- Radoslaw Rudnik
- Dahlem Center of Plant Sciences, Plant Physiology, Freie Universität Berlin, Königin-Luise-Straße 12-16, 14195, Berlin, Germany
| | - Jote Tafese Bulcha
- Dahlem Center of Plant Sciences, Plant Physiology, Freie Universität Berlin, Königin-Luise-Straße 12-16, 14195, Berlin, Germany
| | - Elena Reifschneider
- Dahlem Center of Plant Sciences, Plant Physiology, Freie Universität Berlin, Königin-Luise-Straße 12-16, 14195, Berlin, Germany
| | - Ulrike Ellersiek
- Heinrich-Heine-Universität Düsseldorf, Plant Sciences, Universitätsstraße 25, 40225, Düsseldorf, Germany
| | - Margarete Baier
- Dahlem Center of Plant Sciences, Plant Physiology, Freie Universität Berlin, Königin-Luise-Straße 12-16, 14195, Berlin, Germany.
| |
Collapse
|
25
|
Wang L, Michoel T. Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data. PLoS Comput Biol 2017; 13:e1005703. [PMID: 28821014 PMCID: PMC5576763 DOI: 10.1371/journal.pcbi.1005703] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Revised: 08/30/2017] [Accepted: 07/26/2017] [Indexed: 02/07/2023] Open
Abstract
Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr. Understanding how genetic variation between individuals determines variation in observable traits or disease risk is one of the core aims of genetics. It is known that genetic variation often affects gene regulatory DNA elements and directly causes variation in expression of nearby genes. This effect in turn cascades down to other genes via the complex pathways and gene interaction networks that ultimately govern how cells operate in an ever changing environment. In theory, when genetic variation and gene expression levels are measured simultaneously in a large number of individuals, the causal effects of genes on each other can be inferred using statistical models similar to those used in randomized controlled trials. We developed a novel method and ultra-fast software Findr which, unlike existing methods, takes into account the complex but unknown network context when predicting causality between specific gene pairs. Findr’s predictions have a significantly higher overlap with known gene networks compared to existing methods, using both simulated and real data. Findr is also nearly a million times faster, and hence the only software in its class that can handle modern datasets where the expression levels of ten-thousands of genes are simultaneously measured in hundreds to thousands of individuals.
Collapse
Affiliation(s)
- Lingfei Wang
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, United Kingdom
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, United Kingdom
- * E-mail:
| |
Collapse
|
26
|
Perez-Cerezales S, Boryshpolets S, Eisenbach M. Behavioral mechanisms of mammalian sperm guidance. Asian J Androl 2016; 17:628-32. [PMID: 25999361 PMCID: PMC4492055 DOI: 10.4103/1008-682x.154308] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
In mammals, sperm guidance in the oviduct appears essential for successful sperm arrival at the oocyte. Hitherto, three different potential sperm guidance mechanisms have been recognized: thermotaxis, rheotaxis, and chemotaxis, each of them using specific stimuli – a temperature gradient, fluid flow, and a chemoattractant gradient, respectively. Here, we review sperm behavioral in these mechanisms and indicate commonalities and differences between them.
Collapse
Affiliation(s)
| | | | - Michael Eisenbach
- Department of Biological Chemistry, The Weizmann Institute of Science, 7610001 Rehovot, Israel
| |
Collapse
|
27
|
Wu WS, Lai FJ. Functional redundancy of transcription factors explains why most binding targets of a transcription factor are not affected when the transcription factor is knocked out. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 6:S2. [PMID: 26678747 PMCID: PMC4674858 DOI: 10.1186/1752-0509-9-s6-s2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Background Biologists are puzzled by the extremely low percentage (3%) of the binding targets of a yeast transcription factor (TF) affected when the TF is knocked out, a phenomenon observed by comparing the TF binding dataset and TF knockout effect dataset. Results This study gives a plausible biological explanation of this counterintuitive phenomenon. Our analyses find that TFs with high functional redundancy show significantly lower percentage than do TFs with low functional redundancy. This suggests that functional redundancy may lead to one TF compensating for another, thus masking the TF knockout effect on the binding targets of the knocked-out TF. In addition, we show that seven classes of genes (lowly expressed genes, TATA box-less genes, genes containing a nucleosome-free region immediately upstream of the transcriptional start site (TSS), genes with low transcriptional plasticity, genes with a low number of bound TFs, genes with a low number of TFBSs, and genes with a short average distance of TFBSs to the TSS) are insensitive to the knockout of their promoter-binding TFs, providing clues for finding other biological explanations of the surprisingly low percentage of the binding targets of a TF affected when the TF is knocked out. Conclusions This study shows that one property of TFs (functional redundancy) and seven properties of genes (expression level, TATA box, nucleosome, transcriptional plasticity, the number of bound TFs, the number of TFBSs, and the average distance of TFBSs to the TSS) may be useful for explaining a counterintuitive phenomenon: most binding targets of a yeast transcription factor are not affected when the transcription factor is knocked out.
Collapse
|
28
|
Pettersson ME, Carlborg O. Capacitating epistasis--detection and role in the genetic architecture of complex traits. Methods Mol Biol 2015; 1253:185-196. [PMID: 25403533 DOI: 10.1007/978-1-4939-2155-3_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Here, we discuss the potential role of capacitating epistasis in the genetic architecture of complex traits. Two alternative methods for identifying such gene-gene interactions in genetic association studies-mapping of variance controlling loci and the variance plane ratio (VPR) method-are introduced. An overview of the theoretical foundation of the methods is presented together with a discussion on their implementation and available software for performing these analyses. We conclude by highlighting a few examples of capacitating epistasis described in the literature and its potential impacts on the genetics of complex traits.
Collapse
Affiliation(s)
- Mats E Pettersson
- Division of Computational Genetics, Department of Clinical Sciences, Swedish University of Agricultural Sciences, Box 7078, SE-750 07, Uppsala, Sweden
| | | |
Collapse
|
29
|
Vermeirssen V, De Clercq I, Van Parys T, Van Breusegem F, Van de Peer Y. Arabidopsis ensemble reverse-engineered gene regulatory network discloses interconnected transcription factors in oxidative stress. THE PLANT CELL 2014; 26:4656-79. [PMID: 25549671 PMCID: PMC4311199 DOI: 10.1105/tpc.114.131417] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Revised: 11/27/2014] [Accepted: 12/10/2014] [Indexed: 05/19/2023]
Abstract
The abiotic stress response in plants is complex and tightly controlled by gene regulation. We present an abiotic stress gene regulatory network of 200,014 interactions for 11,938 target genes by integrating four complementary reverse-engineering solutions through average rank aggregation on an Arabidopsis thaliana microarray expression compendium. This ensemble performed the most robustly in benchmarking and greatly expands upon the availability of interactions currently reported. Besides recovering 1182 known regulatory interactions, cis-regulatory motifs and coherent functionalities of target genes corresponded with the predicted transcription factors. We provide a valuable resource of 572 abiotic stress modules of coregulated genes with functional and regulatory information, from which we deduced functional relationships for 1966 uncharacterized genes and many regulators. Using gain- and loss-of-function mutants of seven transcription factors grown under control and salt stress conditions, we experimentally validated 141 out of 271 predictions (52% precision) for 102 selected genes and mapped 148 additional transcription factor-gene regulatory interactions (49% recall). We identified an intricate core oxidative stress regulatory network where NAC13, NAC053, ERF6, WRKY6, and NAC032 transcription factors interconnect and function in detoxification. Our work shows that ensemble reverse-engineering can generate robust biological hypotheses of gene regulation in a multicellular eukaryote that can be tested by medium-throughput experimental validation.
Collapse
Affiliation(s)
- Vanessa Vermeirssen
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Inge De Clercq
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Thomas Van Parys
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Frank Van Breusegem
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium Genomics Research Institute, University of Pretoria, Pretoria 0028, South Africa
| |
Collapse
|
30
|
Navlakha S, He X, Faloutsos C, Bar-Joseph Z. Topological properties of robust biological and computational networks. J R Soc Interface 2014; 11:20140283. [PMID: 24789562 DOI: 10.1098/rsif.2014.0283] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Network robustness is an important principle in biology and engineering. Previous studies of global networks have identified both redundancy and sparseness as topological properties used by robust networks. By focusing on molecular subnetworks, or modules, we show that module topology is tightly linked to the level of environmental variability (noise) the module expects to encounter. Modules internal to the cell that are less exposed to environmental noise are more connected and less robust than external modules. A similar design principle is used by several other biological networks. We propose a simple change to the evolutionary gene duplication model which gives rise to the rich range of module topologies observed within real networks. We apply these observations to evaluate and design communication networks that are specifically optimized for noisy or malicious environments. Combined, joint analysis of biological and computational networks leads to novel algorithms and insights benefiting both fields.
Collapse
Affiliation(s)
- Saket Navlakha
- Machine Learning Department, Carnegie Mellon University, , Pittsburgh, PA 15213, USA
| | | | | | | |
Collapse
|
31
|
Liu G, Marras A, Nielsen J. The future of genome-scale modeling of yeast through integration of a transcriptional regulatory network. QUANTITATIVE BIOLOGY 2014. [DOI: 10.1007/s40484-014-0027-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
32
|
Abstract
The term “transcriptional network” refers to the mechanism(s) that underlies coordinated expression of genes, typically involving transcription factors (TFs) binding to the promoters of multiple genes, and individual genes controlled by multiple TFs. A multitude of studies in the last two decades have aimed to map and characterize transcriptional networks in the yeast Saccharomyces cerevisiae. We review the methodologies and accomplishments of these studies, as well as challenges we now face. For most yeast TFs, data have been collected on their sequence preferences, in vivo promoter occupancy, and gene expression profiles in deletion mutants. These systematic studies have led to the identification of new regulators of numerous cellular functions and shed light on the overall organization of yeast gene regulation. However, many yeast TFs appear to be inactive under standard laboratory growth conditions, and many of the available data were collected using techniques that have since been improved. Perhaps as a consequence, comprehensive and accurate mapping among TF sequence preferences, promoter binding, and gene expression remains an open challenge. We propose that the time is ripe for renewed systematic efforts toward a complete mapping of yeast transcriptional regulatory mechanisms.
Collapse
|
33
|
Yang TH, Wang CC, Wang YC, Wu WS. YTRP: a repository for yeast transcriptional regulatory pathways. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau014. [PMID: 24608172 PMCID: PMC3948430 DOI: 10.1093/database/bau014] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Regulatory targets of transcription factors (TFs) can be identified by the TF perturbation experiments, which reveal the expression changes owing to the perturbation (deletion or overexpression) of TFs. But the identified targets of a given TF consist of both direct and indirect regulatory targets. It has been shown that most of the TFPE-identified regulatory targets are indirect, indicating that TF-gene regulation is mainly through transcriptional regulatory pathways (TRPs) consisting of intermediate TFs. Without identification of these TRPs, it is not easy to understand how a TF regulates its indirect targets. Because there is no such database depositing the potential TRPs for Saccharomyces cerevisiae now, this motivates us to construct the YTRP (Yeast Transcriptional Regulatory Pathway) database. For each TF-gene regulatory pair under different experimental conditions, all possible TRPs in two underlying networks (constructed using experimentally verified TF-gene binding pairs and TF-gene regulatory pairs from the literature) for the specified experimental conditions were automatically enumerated by TRP mining procedures developed from the graph theory. The enumerated TRPs of a TF-gene regulatory pair provide experimentally testable hypotheses for the molecular mechanisms behind a TF and its regulatory target. YTRP is available online at http://cosbi3.ee.ncku.edu.tw/YTRP/. We believe that the TRPs deposited in this database will greatly improve the usefulness of TFPE data for yeast biologists to study the regulatory mechanisms between a TF and its knocked-out targets. Database URL: http://cosbi3.ee.ncku.edu.tw/YTRP/
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan and Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | | | | | | |
Collapse
|
34
|
Cusanovich DA, Pavlovic B, Pritchard JK, Gilad Y. The functional consequences of variation in transcription factor binding. PLoS Genet 2014; 10:e1004226. [PMID: 24603674 PMCID: PMC3945204 DOI: 10.1371/journal.pgen.1004226] [Citation(s) in RCA: 150] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2013] [Accepted: 01/22/2014] [Indexed: 01/24/2023] Open
Abstract
One goal of human genetics is to understand how the information for precise and dynamic gene expression programs is encoded in the genome. The interactions of transcription factors (TFs) with DNA regulatory elements clearly play an important role in determining gene expression outputs, yet the regulatory logic underlying functional transcription factor binding is poorly understood. Many studies have focused on characterizing the genomic locations of TF binding, yet it is unclear to what extent TF binding at any specific locus has functional consequences with respect to gene expression output. To evaluate the context of functional TF binding we knocked down 59 TFs and chromatin modifiers in one HapMap lymphoblastoid cell line. We then identified genes whose expression was affected by the knockdowns. We intersected the gene expression data with transcription factor binding data (based on ChIP-seq and DNase-seq) within 10 kb of the transcription start sites of expressed genes. This combination of data allowed us to infer functional TF binding. Using this approach, we found that only a small subset of genes bound by a factor were differentially expressed following the knockdown of that factor, suggesting that most interactions between TF and chromatin do not result in measurable changes in gene expression levels of putative target genes. We found that functional TF binding is enriched in regulatory elements that harbor a large number of TF binding sites, at sites with predicted higher binding affinity, and at sites that are enriched in genomic regions annotated as “active enhancers.” An important question in genomics is to understand how a class of proteins called “transcription factors” controls the expression level of other genes in the genome in a cell-type-specific manner – a process that is essential to human development. One major approach to this problem is to study where these transcription factors bind in the genome, but this does not tell us about the effect of that binding on gene expression levels and it is generally accepted that much of the binding does not strongly influence gene expression. To address this issue, we artificially reduced the concentration of 59 different transcription factors in the cell and then examined which genes were impacted by the reduced transcription factor level. Our results implicate some attributes that might influence what binding is functional, but they also suggest that a simple model of functional vs. non-functional binding may not suffice.
Collapse
Affiliation(s)
- Darren A Cusanovich
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Bryan Pavlovic
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America; Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois, United States of America
| | - Jonathan K Pritchard
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America; Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois, United States of America; Departments of Genetics and Biology and Howard Hughes Medical Institute, Stanford University, Stanford, California, United States of America
| | - Yoav Gilad
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
35
|
On the use of knowledge-based potentials for the evaluation of models of protein-protein, protein-DNA, and protein-RNA interactions. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:77-120. [PMID: 24629186 DOI: 10.1016/b978-0-12-800168-4.00004-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Proteins are the bricks and mortar of cells, playing structural and functional roles. In order to perform their function, they interact with each other as well as with other biomolecules such as DNA or RNA. Therefore, to fathom the function of a protein, we require knowing its partners and the atomic details of its interactions (i.e., the structure of the complex). However, the amount of protein interactions with an experimentally determined three-dimensional structure is scarce. Therefore, computational techniques such as homology modeling are foremost to fill this gap. Protein interactions can be modeled using as templates the interactions of homologous proteins, if the structure of the complex is known, or using docking methods. In both approaches, the estimation of the quality of models is essential. There are several ways to address this problem. In this review, we focus on the use of knowledge-based potentials for the analysis of protein interactions. We describe the procedure to derive statistical potentials and split them into different energetic terms that can be used for different purposes. We extensively discuss the fields where knowledge-based potentials have been successfully applied to (1) model protein-protein, protein-DNA, and protein-RNA interactions and (2) predict binding sites (in the protein and in the DNA). Moreover, we provide ready-to-use resources for docking and benchmarking protein interactions.
Collapse
|
36
|
Yang TH, Wu WS. Inferring functional transcription factor-gene binding pairs by integrating transcription factor binding data with transcription factor knockout data. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S13. [PMID: 24565265 PMCID: PMC4029220 DOI: 10.1186/1752-0509-7-s6-s13] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Background Chromatin immunoprecipitation (ChIP) experiments are now the most comprehensive experimental approaches for mapping the binding of transcription factors (TFs) to their target genes. However, ChIP data alone is insufficient for identifying functional binding target genes of TFs for two reasons. First, there is an inherent high false positive/negative rate in ChIP-chip or ChIP-seq experiments. Second, binding signals in the ChIP data do not necessarily imply functionality. Methods It is known that ChIP-chip data and TF knockout (TFKO) data reveal complementary information on gene regulation. While ChIP-chip data can provide TF-gene binding pairs, TFKO data can provide TF-gene regulation pairs. Therefore, we propose a novel network approach for identifying functional TF-gene binding pairs by integrating the ChIP-chip data with the TFKO data. In our method, a TF-gene binding pair from the ChIP-chip data is regarded to be functional if it also has high confident curated TFKO TF-gene regulatory relation or deduced hypostatic TF-gene regulatory relation. Results and conclusions We first validated our method on a gathered ground truth set. Then we applied our method to the ChIP-chip data to identify functional TF-gene binding pairs. The biological significance of our identified functional TF-gene binding pairs was shown by assessing their functional enrichment, the prevalence of protein-protein interaction, and expression coherence. Our results outperformed the results of three existing methods across all measures. And our identified functional targets of TFs also showed statistical significance over the randomly assigned TF-gene pairs. We also showed that our method is dataset independent and can apply to ChIP-seq data and the E. coli genome. Finally, we provided an example showing the biological applicability of our notion.
Collapse
|
37
|
Sakabe NJ, Nobrega MA. Beyond the ENCODE project: using genomics and epigenomics strategies to study enhancer evolution. Philos Trans R Soc Lond B Biol Sci 2013; 368:20130022. [PMID: 24218635 DOI: 10.1098/rstb.2013.0022] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The complex expression patterns observed for many genes are often regulated by distal transcription enhancers. Changes in the nucleotide sequences of enhancers may therefore lead to changes in gene expression, representing a central mechanism by which organisms evolve. With the development of the experimental technique of chromatin immunoprecipitation (ChIP), in which discrete regions of the genome bound by specific proteins can be identified, it is now possible to identify transcription factor binding events (putative cis-regulatory elements) in entire genomes. Comparing protein-DNA binding maps allows us, for the first time, to attempt to identify regulatory differences and infer global patterns of change in gene expression across species. Here, we review studies that used genome-wide ChIP to study the evolution of enhancers. The trend is one of high divergence of cis-regulatory elements between species, possibly compensated by extensive creation and loss of regulatory elements and rewiring of their target genes. We speculate on the meaning of the differences observed and discuss that although ChIP experiments identify the biochemical event of protein-DNA interaction, it cannot determine whether the event results in a biological function, and therefore more studies are required to establish the effect of divergence of binding events on species-specific gene expression.
Collapse
Affiliation(s)
- Noboru Jo Sakabe
- Department of Human Genetics, University of Chicago, , Chicago, IL 60637, USA
| | | |
Collapse
|
38
|
Li J, Liu G, Chen M, Li Z, Qin Y, Qu Y. Cellodextrin transporters play important roles in cellulase induction in the cellulolytic fungus Penicillium oxalicum. Appl Microbiol Biotechnol 2013; 97:10479-88. [PMID: 24132667 DOI: 10.1007/s00253-013-5301-3] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Revised: 09/25/2013] [Accepted: 09/27/2013] [Indexed: 12/18/2022]
Abstract
Cellodextrin transporters (cellodextrin permeases) have been identified in fungi in recent years. However, the functions of these transporters in cellulose utilization and cellulase expression have not been well studied. In this study, three cellodextrin transporters, namely, CdtC, CdtD, and CdtG, in the cellulolytic fungus Penicillium oxalicum (formally was classified as P. decumbens) were identified, and their functions were analyzed. The deletion of a single cellodextrin transporter gene slightly decreased cellobiose consumption, but no observable effect on cellulase expression was observed, which was attributed to the overlapping activity of isozymes. Further simultaneous deletion of cdtC and cdtD resulted in significantly decreased cellobiose consumption and poor growth on cellulose. The extracellular activity and transcription level of cellulases in the mutant without cdtC and cdtD were significantly lower than those in the wild-type strain when grown on cellulose. This result provides direct evidence of the crucial function of cellodextrin transporters in the induction of cellulase expression by insoluble cellulose.
Collapse
Affiliation(s)
- Jie Li
- State Key Laboratory of Microbial Technology, School of Life Science, Shandong University, 27 Shanda South Road, Jinan, Shandong, 250100, People's Republic of China,
| | | | | | | | | | | |
Collapse
|
39
|
Srinivasan R, Chandraprakash D, Krishnamurthi R, Singh P, Scolari VF, Krishna S, Seshasayee ASN. Genomic analysis reveals epistatic silencing of "expensive" genes in Escherichia coli K-12. MOLECULAR BIOSYSTEMS 2013; 9:2021-33. [PMID: 23661089 DOI: 10.1039/c3mb70035f] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
A barrier for horizontal gene transfer is high gene expression, which is metabolically expensive. Silencing of horizontally-acquired genes in the bacterium Escherichia coli is caused by the global transcriptional repressor H-NS. The activity of H-NS is enhanced or diminished by other proteins including its homologue StpA, and Hha and YdgT. The interconnections of H-NS with these regulators and their role in silencing gene expression in E. coli are not well understood on a genomic scale. In this study, we use transcriptome sequencing to show that there is a bi-layered gene silencing system - involving the homologous H-NS and StpA - operating on horizontally-acquired genes among others. We show that H-NS-repressed genes belong to two types, termed "epistatic" and "unilateral". In the absence of H-NS, the expression of "epistatically controlled genes" is repressed by StpA, whereas that of "unilaterally controlled genes" is not. Epistatic genes show a higher tendency to be non-essential and recently acquired, when compared to unilateral genes. Epistatic genes reach much higher expression levels than unilateral genes in the absence of the silencing system. Finally, epistatic genes contain more high affinity H-NS binding motifs than unilateral genes. Therefore, both the DNA binding sites of H-NS as well as the function of StpA as a backup system might be selected for silencing highly transcribable genes.
Collapse
Affiliation(s)
- Rajalakshmi Srinivasan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK, Bellary Road, Bangalore 560065, India
| | | | | | | | | | | | | |
Collapse
|
40
|
Haynes BC, Maier EJ, Kramer MH, Wang PI, Brown H, Brent MR. Mapping functional transcription factor networks from gene expression data. Genome Res 2013; 23:1319-28. [PMID: 23636944 PMCID: PMC3730105 DOI: 10.1101/gr.150904.112] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
A critical step in understanding how a genome functions is determining which transcription factors (TFs) regulate each gene. Accordingly, extensive effort has been devoted to mapping TF networks. In Saccharomyces cerevisiae, protein–DNA interactions have been identified for most TFs by ChIP-chip, and expression profiling has been done on strains deleted for most TFs. These studies revealed that there is little overlap between the genes whose promoters are bound by a TF and those whose expression changes when the TF is deleted, leaving us without a definitive TF network for any eukaryote and without an efficient method for mapping functional TF networks. This paper describes NetProphet, a novel algorithm that improves the efficiency of network mapping from gene expression data. NetProphet exploits a fundamental observation about the nature of TF networks: The response to disrupting or overexpressing a TF is strongest on its direct targets and dissipates rapidly as it propagates through the network. Using S. cerevisiae data, we show that NetProphet can predict thousands of direct, functional regulatory interactions, using only gene expression data. The targets that NetProphet predicts for a TF are at least as likely to have sites matching the TF's binding specificity as the targets implicated by ChIP. Unlike most ChIP targets, the NetProphet targets also show evidence of functional regulation. This suggests a surprising conclusion: The best way to begin mapping direct, functional TF-promoter interactions may not be by measuring binding. We also show that NetProphet yields new insights into the functions of several yeast TFs, including a well-studied TF, Cbf1, and a completely unstudied TF, Eds1.
Collapse
Affiliation(s)
- Brian C Haynes
- Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, Missouri 63108, USA
| | | | | | | | | | | |
Collapse
|
41
|
Michoel T, Nachtergaele B. Alignment and integration of complex networks by hypergraph-based spectral clustering. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 86:056111. [PMID: 23214847 DOI: 10.1103/physreve.86.056111] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Indexed: 06/01/2023]
Abstract
Complex networks possess a rich, multiscale structure reflecting the dynamical and functional organization of the systems they model. Often there is a need to analyze multiple networks simultaneously, to model a system by more than one type of interaction, or to go beyond simple pairwise interactions, but currently there is a lack of theoretical and computational methods to address these problems. Here we introduce a framework for clustering and community detection in such systems using hypergraph representations. Our main result is a generalization of the Perron-Frobenius theorem from which we derive spectral clustering algorithms for directed and undirected hypergraphs. We illustrate our approach with applications for local and global alignment of protein-protein interaction networks between multiple species, for tripartite community detection in folksonomies, and for detecting clusters of overlapping regulatory pathways in directed networks.
Collapse
Affiliation(s)
- Tom Michoel
- Freiburg Institute for Advanced Studies (FRIAS), University of Freiburg, Albertstrasse 19, D-79104 Freiburg, Germany.
| | | |
Collapse
|
42
|
Gitter A, Carmi M, Barkai N, Bar-Joseph Z. Linking the signaling cascades and dynamic regulatory networks controlling stress responses. Genome Res 2012; 23:365-76. [PMID: 23064748 PMCID: PMC3561877 DOI: 10.1101/gr.138628.112] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Accurate models of the cross-talk between signaling pathways and transcriptional regulatory networks within cells are essential to understand complex response programs. We present a new computational method that combines condition-specific time-series expression data with general protein interaction data to reconstruct dynamic and causal stress response networks. These networks characterize the pathways involved in the response, their time of activation, and the affected genes. The signaling and regulatory components of our networks are linked via a set of common transcription factors that serve as targets in the signaling network and as regulators of the transcriptional response network. Detailed case studies of stress responses in budding yeast demonstrate the predictive power of our method. Our method correctly identifies the core signaling proteins and transcription factors of the response programs. It further predicts the involvement of additional transcription factors and other proteins not previously implicated in the response pathways. We experimentally verify several of these predictions for the osmotic stress response network. Our approach requires little condition-specific data: only a partial set of upstream initiators and time-series gene expression data, which are readily available for many conditions and species. Consequently, our method is widely applicable and can be used to derive accurate, dynamic response models in several species.
Collapse
Affiliation(s)
- Anthony Gitter
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | | | | | | |
Collapse
|
43
|
Yang TH, Wu WS. Identifying biologically interpretable transcription factor knockout targets by jointly analyzing the transcription factor knockout microarray and the ChIP-chip data. BMC SYSTEMS BIOLOGY 2012; 6:102. [PMID: 22898448 PMCID: PMC3465233 DOI: 10.1186/1752-0509-6-102] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 08/02/2012] [Indexed: 12/17/2022]
Abstract
Background Transcription factor knockout microarrays (TFKMs) provide useful information about gene regulation. By using statistical methods for detecting differentially expressed genes between the gene expression microarray data of the mutant and wild type strains, the TF knockout targets of the knocked-out TF can be identified. However, the identified TF knockout targets may contain a certain amount of false positives due to the experimental noises inherent in the high-throughput microarray technology. Even if the identified TF knockout targets are true, the molecular mechanisms of how a TF regulates its TF knockout targets remain unknown by this kind of statistical approaches. Results To solve these two problems, we developed a method to filter out the false positives in the original TF knockout targets (identified by statistical approaches) so that the biologically interpretable TF knockout targets can be extracted. Our method can further generate experimentally testable hypotheses of the molecular mechanisms of how a TF regulates its biologically interpretable TF knockout targets. The details of our method are as follows. First, a TF binding network was constructed using the ChIP-chip data deposited in the YEASTRACT database. Then for each original TF knockout target, it is said to be biologically interpretable if a path (in the TF binding network) from the knocked-out TF to this target could be identified by our path search algorithm. The identified path explains how the TF may regulate this target either directly by binding to its promoter or indirectly through intermediate TFs. After checking all the original TF knockout targets, the biologically interpretable ones could be extracted and the false positives could be filtered out. We validated the biological significance of our refined (i.e., biologically interpretable) TF knockout targets by assessing their functional enrichment, expression coherence, and the prevalence of protein-protein interactions. Our refined TF knockout targets outperform the original TF knockout targets across all measures. Conclusions By jointly analyzing the TFKM and ChIP-chip data, our method can extract the biologically interpretable TF knockout targets by identifying paths (in the TF binding network) from the knocked-out TF to these targets. The identified paths form experimentally testable hypotheses regarding the molecular mechanisms of how a TF may regulate its knockout targets. About seven hundred hypotheses generated by our methods have been experimentally validated in the literature. Our work demonstrates that integrating different data sources is a powerful approach to study complex biological systems.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | | |
Collapse
|
44
|
Garcia-Garcia J, Bonet J, Guney E, Fornes O, Planas J, Oliva B. Networks of ProteinProtein Interactions: From Uncertainty to Molecular Details. Mol Inform 2012; 31:342-62. [PMID: 27477264 DOI: 10.1002/minf.201200005] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Accepted: 03/09/2012] [Indexed: 11/08/2022]
Abstract
Proteins are the bricks and mortar of cells. The work of proteins is structural and functional, as they are the principal element of the organization of the cell architecture, but they also play a relevant role in its metabolism and regulation. To perform all these functions, proteins need to interact with each other and with other bio-molecules, either to form complexes or to recognize precise targets of their action. For instance, a particular transcription factor may activate one gene or another depending on its interactions with other proteins and not only with DNA. Hence, the ability of a protein to interact with other bio-molecules, and the partners they have at each particular time and location can be crucial to characterize the role of a protein. Proteins rarely act alone; they rather constitute a mingled network of physical interactions or other types of relationships (such as metabolic and regulatory) or signaling cascades. In this context, understanding the function of a protein implies to recognize the members of its neighborhood and to grasp how they associate, both at the systemic and atomic level. The network of physical interactions between the proteins of a system, cell or organism, is defined as the interactome. The purpose of this review is to deepen the description of interactomes at different levels of detail: from the molecular structure of complexes to the global topology of the network of interactions. The approaches and techniques applied experimentally and computationally to attain each level are depicted. The limits of each technique and its integration into a model network, the challenges and actual problems of completeness of an interactome, and the reliability of the interactions are reviewed and summarized. Finally, the application of the current knowledge of protein-protein interactions on modern network medicine and protein function annotation is also explored.
Collapse
Affiliation(s)
- Javier Garcia-Garcia
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Jaume Bonet
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Emre Guney
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Oriol Fornes
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Joan Planas
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Baldo Oliva
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain.
| |
Collapse
|
45
|
Integrating phosphorylation network with transcriptional network reveals novel functional relationships. PLoS One 2012; 7:e33160. [PMID: 22432002 PMCID: PMC3303811 DOI: 10.1371/journal.pone.0033160] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2011] [Accepted: 02/04/2012] [Indexed: 12/18/2022] Open
Abstract
Phosphorylation and transcriptional regulation events are critical for cells to transmit and respond to signals. In spite of its importance, systems-level strategies that couple these two networks have yet to be presented. Here we introduce a novel approach that integrates the physical and functional aspects of phosphorylation network together with the transcription network in S.cerevisiae, and demonstrate that different network motifs are involved in these networks, which should be considered in interpreting and integrating large scale datasets. Based on this understanding, we introduce a HeRS score (hetero-regulatory similarity score) to systematically characterize the functional relevance of kinase/phosphatase involvement with transcription factor, and present an algorithm that predicts hetero-regulatory modules. When extended to signaling network, this approach confirmed the structure and cross talk of MAPK pathways, inferred a novel functional transcription factor Sok2 in high osmolarity glycerol pathway, and explained the mechanism of reduced mating efficiency upon Fus3 deletion. This strategy is applicable to other organisms as large-scale datasets become available, providing a means to identify the functional relationships between kinases/phosphatases and transcription factors.
Collapse
|
46
|
Wu M, Chan C. Learning transcriptional regulation on a genome scale: a theoretical analysis based on gene expression data. Brief Bioinform 2012; 13:150-61. [PMID: 21622543 PMCID: PMC3294238 DOI: 10.1093/bib/bbr029] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Revised: 04/23/2011] [Indexed: 12/17/2022] Open
Abstract
The recent advent of high-throughput microarray data has enabled the global analysis of the transcriptome, driving the development and application of computational approaches to study transcriptional regulation on the genome scale, by reconstructing in silico the regulatory interactions of the gene network. Although there are many in-depth reviews of such 'reverse-engineering' methodologies, most have focused on the practical aspect of data mining, and few on the biological problem and the biological relevance of the methodology. Therefore, in this review, from a biological perspective, we used a set of yeast microarray data as a working example, to evaluate the fundamental assumptions implicit in associating transcription factor (TF)-target gene expression levels and estimating TFs' activity, and further explore cooperative models. Finally we confirm that the detailed transcription mechanism is overly-complex for expression data alone to reveal, nevertheless, future network reconstruction studies could benefit from the incorporation of context-specific information, the modeling of multiple layers of regulation (e.g. micro-RNA), or the development of approaches for context-dependent analysis, to uncover the mechanisms of gene regulation.
Collapse
Affiliation(s)
- Ming Wu
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | | |
Collapse
|
47
|
Sakabe NJ, Aneas I, Shen T, Shokri L, Park SY, Bulyk ML, Evans SM, Nobrega MA. Dual transcriptional activator and repressor roles of TBX20 regulate adult cardiac structure and function. Hum Mol Genet 2012; 21:2194-204. [PMID: 22328084 DOI: 10.1093/hmg/dds034] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The ongoing requirement in adult heart for transcription factors with key roles in cardiac development is not well understood. We recently demonstrated that TBX20, a transcriptional regulator required for cardiac development, has key roles in the maintenance of functional and structural phenotypes in adult mouse heart. Conditional ablation of Tbx20 in adult cardiomyocytes leads to a rapid onset and progression of heart failure, with prominent conduction and contractility phenotypes that lead to death. Here we describe a more comprehensive molecular characterization of the functions of TBX20 in adult mouse heart. Coupling genome-wide chromatin immunoprecipitation and transcriptome analyses (RNA-Seq), we identified a subset of genes that change expression in Tbx20 adult cardiomyocyte-specific knockout hearts which are direct downstream targets of TBX20. This analysis revealed a dual role for TBX20 as both a transcriptional activator and a repressor, and that each of these functions regulates genes with very specialized and distinct molecular roles. We also show how TBX20 binds to its targets genome-wide in a context-dependent manner, using various cohorts of co-factors to either promote or repress distinct genetic programs within adult heart. Our integrative approach has uncovered several novel aspects of TBX20 and T-box protein function within adult heart. Sequencing data accession number (http://www.ncbi.nlm.nih.gov/geo): GSE30943.
Collapse
Affiliation(s)
- Noboru J Sakabe
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
| | | | | | | | | | | | | | | |
Collapse
|
48
|
Algorithms in nature: the convergence of systems biology and computational thinking. Mol Syst Biol 2011; 7:546. [PMID: 22068329 PMCID: PMC3261700 DOI: 10.1038/msb.2011.78] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2011] [Accepted: 09/07/2011] [Indexed: 01/30/2023] Open
Abstract
Biologists rely on computational methods to analyze and integrate large data sets, while several computational methods were inspired by the high-level design principles of biological systems. This Perspectives discusses the recent convergence of these two ways of thinking. Computer science and biology have enjoyed a long and fruitful relationship for decades. Biologists rely on computational methods to analyze and integrate large data sets, while several computational methods were inspired by the high-level design principles of biological systems. Recently, these two directions have been converging. In this review, we argue that thinking computationally about biological processes may lead to more accurate models, which in turn can be used to improve the design of algorithms. We discuss the similar mechanisms and requirements shared by computational and biological processes and then present several recent studies that apply this joint analysis strategy to problems related to coordination, network analysis, and tracking and vision. We also discuss additional biological processes that can be studied in a similar manner and link them to potential computational problems. With the rapid accumulation of data detailing the inner workings of biological systems, we expect this direction of coupling biological and computational studies to greatly expand in the future.
Collapse
|
49
|
Pettersson M, Besnier F, Siegel PB, Carlborg Ö. Replication and explorations of high-order epistasis using a large advanced intercross line pedigree. PLoS Genet 2011; 7:e1002180. [PMID: 21814519 PMCID: PMC3140984 DOI: 10.1371/journal.pgen.1002180] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2010] [Accepted: 05/26/2011] [Indexed: 12/11/2022] Open
Abstract
Dissection of the genetic architecture of complex traits persists as a major challenge in biology; despite considerable efforts, much remains unclear including the role and importance of genetic interactions. This study provides empirical evidence for a strong and persistent contribution of both second- and third-order epistatic interactions to long-term selection response for body weight in two divergently selected chicken lines. We earlier reported a network of interacting loci with large effects on body weight in an F(2) intercross between these high- and low-body weight lines. Here, most pair-wise interactions in the network are replicated in an independent eight-generation advanced intercross line (AIL). The original report showed an important contribution of capacitating epistasis to growth, meaning that the genotype at a hub in the network releases the effects of one or several peripheral loci. After fine-mapping of the loci in the AIL, we show that these interactions were persistent over time. The replication of five of six originally reported epistatic loci, as well as the capacitating epistasis, provides strong empirical evidence that the originally observed epistasis is of biological importance and is a contributor in the genetic architecture of this population. The stability of genetic interaction mechanisms over time indicates a non-transient role of epistasis on phenotypic change. Third-order epistasis was for the first time examined in this study and was shown to make an important contribution to growth, which suggests that the genetic architecture of growth is more complex than can be explained by two-locus interactions only. Our results illustrate the importance of designing studies that facilitate exploration of epistasis in populations for obtaining a comprehensive understanding of the genetics underlying a complex trait.
Collapse
Affiliation(s)
- Mats Pettersson
- Department of Animal Breeding and Genetics, Swedish University of Agriculture Sciences (SLU), Uppsala, Sweden
| | - Francois Besnier
- Department of Animal Breeding and Genetics, Swedish University of Agriculture Sciences (SLU), Uppsala, Sweden
| | - Paul B. Siegel
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Örjan Carlborg
- Department of Animal Breeding and Genetics, Swedish University of Agriculture Sciences (SLU), Uppsala, Sweden
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
50
|
Unraveling condition-dependent networks of transcription factors that control metabolic pathway activity in yeast. Mol Syst Biol 2011; 6:432. [PMID: 21119627 PMCID: PMC3010106 DOI: 10.1038/msb.2010.91] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2010] [Accepted: 10/02/2010] [Indexed: 01/17/2023] Open
Abstract
While typically many expression levels change in transcription factor mutants, only few of these changes lead to functional changes. The predictive capability of expression and DNA binding data for such functional changes in metabolism is very limited. Large-scale 13C-flux data reveal the condition specificity of transcriptional control of metabolic function. Transcription control in yeast focuses on the switch between respiration and fermentation. Follow-up modeling on the basis of transcriptomics and proteomics data suggest the newly discovered Gcn4 control of respiration to be mediated via PKA and Snf1.
Effective control and modulation of cellular behavior is of paramount importance in medicine (Kreeger and Lauffenburger, 2010) and biotechnology (Haynes and Silver, 2009), and requires profound understanding of control mechanisms. In this study, we aim to elucidate the extent to which transcription factors control the operation of yeast metabolism. As a quantitative readout of metabolic function, we monitored the traffic of small molecules through various pathways of central metabolism by 13C-flux analysis (Sauer, 2006). The choosen growth conditions represent two different regulatory states of reduced (galactose) and maximal carbon source repression (glucose), as well as a different nitrogen metabolism and two common, permanent stress conditions. Depending on the growth condition, between 7 and 13% of the deleted transcription factors altered the determined flux ratios (Figure 3). Of the six quantified flux ratios, only the glycolysis/pentose phosphate pathway split, and the convergent ratio of anaplerosis and TCA cycle were controlled by the deleted transcription factors. Thus, we concluded that 23 transcription factors control flux distributions under at least one of the tested growth conditions, leading to 42 condition-dependent interactions of transcription factors with metabolic pathway activity (Figure 4). With two exceptions, all other identified transcription factors interactions controlled the TCA cycle flux. This condition-specific active control of metabolic function could not have been predicted from DNA binding and expression data; that is, 26.1% false negatives, 48.6% true positives. Of the 23 transcription factors that controlled TCA cycle flux distributions under the tested conditions, only Bas1, Gcn4, Gcr2 and Pho2 exerted control under more than one condition. We identified Cit1, Mdh1 and Idh1/2 with a proteomics approach as the relevant target enzyme that increase the TCA cycle flux. Next, we asked whether Bas1, Gcr2, Gcn4 and Pho2 act directly on the TCA cycle or mediate their effect indirectly. Based on the transcriptomics data, the pattern of differentially activated transcription factors inferred by the differential expression of their target genes suggested reduced glucose repression in all four mutants as the common mechanism. Starting from the currently largest set of 13C-based flux distributions, we identified networks of individual transcription factors that control metabolic pathway activity. These networks of active metabolic control have the following properties. First, they are highly condition dependent, as at most four transcription factors control the same metabolic flux distribution under more than one growth conditions. Second, they focus almost exclusively on the TCA cycle, thereby controlling the switch between respiratory and fermentative metabolism. Third, with four to 14 active transcription factors, they are small compared with gene regulation networks that were obtained from expression and DNA binding data. For the metabolic network studied here, robustness is also apparent from the fact that upregulated TCA cycle fluxes were not sufficient to achieve full respiratory metabolism; that is, absent or low ethanol formation. Several explanations could potentially explain the observed robustness. The most likely explanation is that environmental signals might be transmitted by different signaling pathways to several transcription factors, whose orchestrated action on multiple target genes is necessary to achieve a functional flux response. This hypothesis would explain why several transcription factors exert flux effects on the same pathway, but each flux effect is relatively small, as further, coordinated manipulations would be necessary to further improve the respiratory flux. Our findings demonstrate the importance of identifying and quantifying the extent to which regulatory effectors alter cellular function. Which transcription factors control the distribution of metabolic fluxes under a given condition? We address this question by systematically quantifying metabolic fluxes in 119 transcription factor deletion mutants of Saccharomyces cerevisiae under five growth conditions. While most knockouts did not affect fluxes, we identified 42 condition-dependent interactions that were mediated by a total of 23 transcription factors that control almost exclusively the cellular decision between respiration and fermentation. This relatively sparse, condition-specific network of active metabolic control contrasts with the much larger gene regulation network inferred from expression and DNA binding data. Based on protein and transcript analyses in key mutants, we identified three enzymes in the tricarboxylic acid cycle as the key targets of this transcriptional control. For the transcription factor Gcn4, we demonstrate that this control is mediated through the PKA and Snf1 signaling cascade. The discrepancy between flux response predictions, based on the known regulatory network architecture and our functional 13C-data, demonstrates the importance of identifying and quantifying the extent to which regulatory effectors alter cellular functions.
Collapse
|