1
|
Su C, Pastor WA, Emad A. Deciphering lineage-relevant gene regulatory networks during endoderm formation by InPheRNo-ChIP. Brief Bioinform 2024; 25:bbae592. [PMID: 39535258 PMCID: PMC11558691 DOI: 10.1093/bib/bbae592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 10/09/2024] [Accepted: 11/01/2024] [Indexed: 11/16/2024] Open
Abstract
Deciphering the underlying gene regulatory networks (GRNs) that govern early human embryogenesis is critical for understanding developmental mechanisms yet remains challenging due to limited sample availability and the inherent complexity of the biological processes involved. To address this, we developed InPheRNo-ChIP, a computational framework that integrates multimodal data, including RNA-seq, transcription factor (TF)-specific ChIP-seq, and phenotypic labels, to reconstruct phenotype-relevant GRNs associated with endoderm development. The core of this method is a probabilistic graphical model that models the simultaneous effect of TFs on their putative target genes to influence a particular phenotypic outcome. Unlike the majority of existing GRN inference methods that are agnostic to the phenotypic outcomes, InPheRNo-ChIP directly incorporates phenotypic information during GRN inference, enabling the distinction between lineage-specific and general regulatory interactions. We integrated data from three experimental studies and applied InPheRNo-ChIP to infer the GRN governing the differentiation of human embryonic stem cells into definitive endoderm. Benchmarking against a scRNA-seq CRISPRi study demonstrated InPheRNo-ChIP's ability to identify regulatory interactions involving endoderm markers FOXA2, SMAD2, and SOX17, outperforming other methods. This highlights the importance of incorporating the phenotypic context during network inference. Furthermore, an ablation study confirms the synergistic contribution of ChIP-seq, RNA-seq, and phenotypic data, highlighting the value of multimodal integration for accurate phenotype-relevant GRN reconstruction.
Collapse
Affiliation(s)
- Chen Su
- Department of Electrical and Computer Engineering, McGill University, 845 Sherbrooke Street West, Montreal, Quebec H3A 0G4, Canada
| | - William A Pastor
- Department of Biochemistry, McGill University, 845 Sherbrooke Street West, Montreal, Quebec H3A 0G4, Canada
- The Rosalind and Morris Goodman Cancer Institute, 1160 Pine Avenue, Montreal, Quebec H3A 1A3, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, 845 Sherbrooke Street West, Montreal, Quebec H3A 0G4, Canada
- The Rosalind and Morris Goodman Cancer Institute, 1160 Pine Avenue, Montreal, Quebec H3A 1A3, Canada
- Mila, Quebec AI Institute, 6666 St-Urbain Street #200, Montreal, Quebec H2S 3H1, Canada
| |
Collapse
|
2
|
Loers JU, Vermeirssen V. A single-cell multimodal view on gene regulatory network inference from transcriptomics and chromatin accessibility data. Brief Bioinform 2024; 25:bbae382. [PMID: 39207727 PMCID: PMC11359808 DOI: 10.1093/bib/bbae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
Eukaryotic gene regulation is a combinatorial, dynamic, and quantitative process that plays a vital role in development and disease and can be modeled at a systems level in gene regulatory networks (GRNs). The wealth of multi-omics data measured on the same samples and even on the same cells has lifted the field of GRN inference to the next stage. Combinations of (single-cell) transcriptomics and chromatin accessibility allow the prediction of fine-grained regulatory programs that go beyond mere correlation of transcription factor and target gene expression, with enhancer GRNs (eGRNs) modeling molecular interactions between transcription factors, regulatory elements, and target genes. In this review, we highlight the key components for successful (e)GRN inference from (sc)RNA-seq and (sc)ATAC-seq data exemplified by state-of-the-art methods as well as open challenges and future developments. Moreover, we address preprocessing strategies, metacell generation and computational omics pairing, transcription factor binding site detection, and linear and three-dimensional approaches to identify chromatin interactions as well as dynamic and causal eGRN inference. We believe that the integration of transcriptomics together with epigenomics data at a single-cell level is the new standard for mechanistic network inference, and that it can be further advanced with integrating additional omics layers and spatiotemporal data, as well as with shifting the focus towards more quantitative and causal modeling strategies.
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| |
Collapse
|
3
|
Cassan O, Lecellier CH, Martin A, Bréhélin L, Lèbre S. Optimizing data integration improves gene regulatory network inference in Arabidopsis thaliana. Bioinformatics 2024; 40:btae415. [PMID: 38913855 PMCID: PMC11227367 DOI: 10.1093/bioinformatics/btae415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 06/12/2024] [Accepted: 06/21/2024] [Indexed: 06/26/2024] Open
Abstract
MOTIVATIONS Gene regulatory networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior data integration in the inference process. RESULTS We address this issue for two regression-based GRN inference models, a weighted random forest (weigthedRF) and a generalized linear model estimated under a weighted LASSO penalty with stability selection (weightedLASSO). These approaches are applied to data from the root response to nitrate induction in Arabidopsis thaliana. For each gene, we measure how the integration of transcription factor binding motifs influences model prediction. We propose a new approach, DIOgene, that uses model prediction error and a simulated null hypothesis in order to optimize data integration strength in a hypothesis-driven, gene-specific manner. This integration scheme reveals a strong diversity of optimal integration intensities between genes, and offers good performance in minimizing prediction error as well as retrieving experimental interactions. Experimental results show that DIOgene compares favorably against state-of-the-art approaches and allows to recover master regulators of nitrate induction. AVAILABILITY AND IMPLEMENTATION The R code and notebooks demonstrating the use of the proposed approaches are available in the repository https://github.com/OceaneCsn/integrative_GRN_N_induction.
Collapse
Affiliation(s)
- Océane Cassan
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
| | - Charles-Henri Lecellier
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IGMM, Univ Montpellier, CNRS, Montpellier, 34090, France
| | - Antoine Martin
- IPSIM, CNRS, INRAE, Institut Agro, Univ Montpellier, 34060, Montpellier, France
| | | | - Sophie Lèbre
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IMAG, Univ Montpellier, CNRS, Montpellier, 34090, France
- Université Paul-Valéry-Montpellier 3, Montpellier, 34090, France
| |
Collapse
|
4
|
Marku M, Pancaldi V. From time-series transcriptomics to gene regulatory networks: A review on inference methods. PLoS Comput Biol 2023; 19:e1011254. [PMID: 37561790 PMCID: PMC10414591 DOI: 10.1371/journal.pcbi.1011254] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open
Abstract
Inference of gene regulatory networks has been an active area of research for around 20 years, leading to the development of sophisticated inference algorithms based on a variety of assumptions and approaches. With the ever increasing demand for more accurate and powerful models, the inference problem remains of broad scientific interest. The abstract representation of biological systems through gene regulatory networks represents a powerful method to study such systems, encoding different amounts and types of information. In this review, we summarize the different types of inference algorithms specifically based on time-series transcriptomics, giving an overview of the main applications of gene regulatory networks in computational biology. This review is intended to give an updated reference of regulatory networks inference tools to biologists and researchers new to the topic and guide them in selecting the appropriate inference method that best fits their questions, aims, and experimental data.
Collapse
Affiliation(s)
- Malvina Marku
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Vera Pancaldi
- CRCT, Université de Toulouse, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| |
Collapse
|
5
|
van der Sande M, Frölich S, van Heeringen SJ. Computational approaches to understand transcription regulation in development. Biochem Soc Trans 2023; 51:1-12. [PMID: 36695505 PMCID: PMC9988001 DOI: 10.1042/bst20210145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/07/2023] [Accepted: 01/13/2023] [Indexed: 01/26/2023]
Abstract
Gene regulatory networks (GRNs) serve as useful abstractions to understand transcriptional dynamics in developmental systems. Computational prediction of GRNs has been successfully applied to genome-wide gene expression measurements with the advent of microarrays and RNA-sequencing. However, these inferred networks are inaccurate and mostly based on correlative rather than causative interactions. In this review, we highlight three approaches that significantly impact GRN inference: (1) moving from one genome-wide functional modality, gene expression, to multi-omics, (2) single cell sequencing, to measure cell type-specific signals and predict context-specific GRNs, and (3) neural networks as flexible models. Together, these experimental and computational developments have the potential to significantly impact the quality of inferred GRNs. Ultimately, accurately modeling the regulatory interactions between transcription factors and their target genes will be essential to understand the role of transcription factors in driving developmental gene expression programs and to derive testable hypotheses for validation.
Collapse
Affiliation(s)
| | | | - Simon J. van Heeringen
- Radboud University, Department of Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, 6525GA Nijmegen, The Netherlands
| |
Collapse
|
6
|
Wang Y, Lee H, Fear JM, Berger I, Oliver B, Przytycka TM. NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks. Commun Biol 2022; 5:1282. [PMID: 36418514 PMCID: PMC9684490 DOI: 10.1038/s42003-022-04226-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 11/04/2022] [Indexed: 11/25/2022] Open
Abstract
The inference of Gene Regulatory Networks (GRNs) is one of the key challenges in systems biology. Leading algorithms utilize, in addition to gene expression, prior knowledge such as Transcription Factor (TF) DNA binding motifs or results of TF binding experiments. However, such prior knowledge is typically incomplete, therefore, integrating it with gene expression to infer GRNs remains difficult. To address this challenge, we introduce NetREX-CF-Regulatory Network Reconstruction using EXpression and Collaborative Filtering-a GRN reconstruction approach that brings together Collaborative Filtering to address the incompleteness of the prior knowledge and a biologically justified model of gene expression (sparse Network Component Analysis based model). We validated the NetREX-CF using Yeast data and then used it to construct the GRN for Drosophila Schneider 2 (S2) cells. To corroborate the GRN, we performed a large-scale RNA-Seq analysis followed by a high-throughput RNAi treatment against all 465 expressed TFs in the cell line. Our knockdown result has not only extensively validated the GRN we built, but also provides a benchmark that our community can use for evaluating GRNs. Finally, we demonstrate that NetREX-CF can infer GRNs using single-cell RNA-Seq, and outperforms other methods, by using previously published human data.
Collapse
Affiliation(s)
- Yijie Wang
- Computer Science Department, Indiana University, Bloomington, IN, 47408, USA.
| | - Hangnoh Lee
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Justin M Fear
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Isabelle Berger
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Brian Oliver
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA.
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA.
| |
Collapse
|
7
|
Hawe JS, Saha A, Waldenberger M, Kunze S, Wahl S, Müller-Nurasyid M, Prokisch H, Grallert H, Herder C, Peters A, Strauch K, Theis FJ, Gieger C, Chambers J, Battle A, Heinig M. Network reconstruction for trans acting genetic loci using multi-omics data and prior information. Genome Med 2022; 14:125. [PMID: 36344995 PMCID: PMC9641770 DOI: 10.1186/s13073-022-01124-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 10/11/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND Molecular measurements of the genome, the transcriptome, and the epigenome, often termed multi-omics data, provide an in-depth view on biological systems and their integration is crucial for gaining insights in complex regulatory processes. These data can be used to explain disease related genetic variants by linking them to intermediate molecular traits (quantitative trait loci, QTL). Molecular networks regulating cellular processes leave footprints in QTL results as so-called trans-QTL hotspots. Reconstructing these networks is a complex endeavor and use of biological prior information can improve network inference. However, previous efforts were limited in the types of priors used or have only been applied to model systems. In this study, we reconstruct the regulatory networks underlying trans-QTL hotspots using human cohort data and data-driven prior information. METHODS We devised a new strategy to integrate QTL with human population scale multi-omics data. State-of-the art network inference methods including BDgraph and glasso were applied to these data. Comprehensive prior information to guide network inference was manually curated from large-scale biological databases. The inference approach was extensively benchmarked using simulated data and cross-cohort replication analyses. Best performing methods were subsequently applied to real-world human cohort data. RESULTS Our benchmarks showed that prior-based strategies outperform methods without prior information in simulated data and show better replication across datasets. Application of our approach to human cohort data highlighted two novel regulatory networks related to schizophrenia and lean body mass for which we generated novel functional hypotheses. CONCLUSIONS We demonstrate that existing biological knowledge can improve the integrative analysis of networks underlying trans associations and generate novel hypotheses about regulatory mechanisms.
Collapse
Affiliation(s)
- Johann S Hawe
- Institute of Computational Biology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Heart Centre Munich, Department of Cardiology, Technical University Munich, Munich, Germany.,Department of Informatics, Technical University of Munich, Garching, Germany
| | - Ashis Saha
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Melanie Waldenberger
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Sonja Kunze
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Simone Wahl
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Martina Müller-Nurasyid
- Institute of Genetic Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,IBE, Faculty of Medicine, LMU Munich, 81377, Munich, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Department of Internal Medicine I (Cardiology), Hospital of the Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
| | - Holger Prokisch
- Institute of Human Genetics, School of Medicine, Technische Universität München, Munich, Germany
| | - Harald Grallert
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Christian Herder
- German Center for Diabetes Research (DZD), Neuherberg, Germany.,Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Düsseldorf, Germany.,Division of Endocrinology and Diabetology, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Annette Peters
- Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Konstantin Strauch
- Institute of Genetic Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU Munich, Munich, Germany
| | - Fabian J Theis
- Department of Informatics, Technical University of Munich, Garching, Germany.,Department of Mathematics, Technical University of Munich, Garching, Germany
| | - Christian Gieger
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - John Chambers
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, UK.,Lee Kong Chian School of Medicine, Nanyang Technological University, 308232, Singapore, Singapore
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Matthias Heinig
- Institute of Computational Biology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany. .,Department of Informatics, Technical University of Munich, Garching, Germany. .,Munich Heart Association, Partner Site Munich, DZHK (German Centre for Cardiovascular Research), 10785, Berlin, Germany.
| |
Collapse
|
8
|
Zhivkoplias EK, Vavulov O, Hillerton T, Sonnhammer ELL. Generation of Realistic Gene Regulatory Networks by Enriching for Feed-Forward Loops. Front Genet 2022; 13:815692. [PMID: 35222536 PMCID: PMC8872634 DOI: 10.3389/fgene.2022.815692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/13/2022] [Indexed: 11/13/2022] Open
Abstract
The regulatory relationships between genes and proteins in a cell form a gene regulatory network (GRN) that controls the cellular response to changes in the environment. A number of inference methods to reverse engineer the original GRN from large-scale expression data have recently been developed. However, the absence of ground-truth GRNs when evaluating the performance makes realistic simulations of GRNs necessary. One aspect of this is that local network motif analysis of real GRNs indicates that the feed-forward loop (FFL) is significantly enriched. To simulate this properly, we developed a novel motif-based preferential attachment algorithm, FFLatt, which outperformed the popular GeneNetWeaver network generation tool in reproducing the FFL motif occurrence observed in literature-based biological GRNs. It also preserves important topological properties such as scale-free topology, sparsity, and average in/out-degree per node. We conclude that FFLatt is well-suited as a network generation module for a benchmarking framework with the aim to provide fair and robust performance evaluation of GRN inference methods.
Collapse
Affiliation(s)
- Erik K. Zhivkoplias
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Oleg Vavulov
- Bioinformatics Institute, St. Petersburg, Russia
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Erik L. L. Sonnhammer
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
- *Correspondence: Erik L. L. Sonnhammer,
| |
Collapse
|
9
|
Suriyalaksh M, Raimondi C, Mains A, Segonds-Pichon A, Mukhtar S, Murdoch S, Aldunate R, Krueger F, Guimerà R, Andrews S, Sales-Pardo M, Casanueva O. Gene regulatory network inference in long-lived C. elegans reveals modular properties that are predictive of novel aging genes. iScience 2022; 25:103663. [PMID: 35036864 PMCID: PMC8753122 DOI: 10.1016/j.isci.2021.103663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 09/09/2021] [Accepted: 12/15/2021] [Indexed: 11/24/2022] Open
Abstract
We design a “wisdom-of-the-crowds” GRN inference pipeline and couple it to complex network analysis to understand the organizational principles governing gene regulation in long-lived glp-1/Notch Caenorhabditis elegans. The GRN has three layers (input, core, and output) and is topologically equivalent to bow-tie/hourglass structures prevalent among metabolic networks. To assess the functional importance of structural layers, we screened 80% of regulators and discovered 50 new aging genes, 86% with human orthologues. Genes essential for longevity—including ones involved in insulin-like signaling (ILS)—are at the core, indicating that GRN's structure is predictive of functionality. We used in vivo reporters and a novel functional network covering 5,497 genetic interactions to make mechanistic predictions. We used genetic epistasis to test some of these predictions, uncovering a novel transcriptional regulator, sup-37, that works alongside DAF-16/FOXO. We present a framework with predictive power that can accelerate discovery in C. elegans and potentially humans. Gene-regulatory inference provides global network of long-lived animals The large-scale topology of the network has an hourglass structure Membership to the core of the hourglass is a good predictor of functionality Discovered 50 novel aging genes, including sup-37, a DAF-16 dependent gene
Collapse
Affiliation(s)
| | | | - Abraham Mains
- Babraham Institute, Babraham, Cambridge CB22 3AT, UK
| | | | | | | | - Rebeca Aldunate
- Escuela de Biotecnología, Facultad de Ciencias, Universidad Santo Tomas, Santiago, Chile
| | - Felix Krueger
- Babraham Institute, Babraham, Cambridge CB22 3AT, UK
| | - Roger Guimerà
- ICREA, Barcelona 08010, Catalonia, Spain.,Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| | - Simon Andrews
- Babraham Institute, Babraham, Cambridge CB22 3AT, UK
| | - Marta Sales-Pardo
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| | | |
Collapse
|
10
|
Johnson JS, De Veaux N, Rives AW, Lahaye X, Lucas SY, Perot BP, Luka M, Garcia-Paredes V, Amon LM, Watters A, Abdessalem G, Aderem A, Manel N, Littman DR, Bonneau R, Ménager MM. A Comprehensive Map of the Monocyte-Derived Dendritic Cell Transcriptional Network Engaged upon Innate Sensing of HIV. Cell Rep 2021; 30:914-931.e9. [PMID: 31968263 PMCID: PMC7039998 DOI: 10.1016/j.celrep.2019.12.054] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Revised: 06/25/2019] [Accepted: 12/13/2019] [Indexed: 01/12/2023] Open
Abstract
Transcriptional programming of the innate immune response is pivotal for host protection. However, the transcriptional mechanisms that link pathogen sensing with innate activation remain poorly under-stood. During HIV-1 infection, human dendritic cells (DCs) can detect the virus through an innate sensing pathway, leading to antiviral interferon and DC maturation. Here, we develop an iterative experimental and computational approach to map the HIV-1 innate response circuitry in monocyte-derived DCs (MDDCs). By integrating genome-wide chromatin accessibility with expression kinetics, we infer a gene regulatory network that links 542 transcription factors with 21,862 target genes. We observe that an interferon response is required, yet insufficient, to drive MDDC maturation and identify PRDM1 and RARA as essential regulators of the interferon response and MDDC maturation, respectively. Our work provides a resource for interrogation of regulators of HIV replication and innate immunity, highlighting complexity and cooperativity in the regulatory circuit controlling the response to infection. Pathogen sensing leads to host transcriptional reprogramming to protect against infection. However, it is unclear how transcription factor activity is coordinated across the genome. Johnson et al. integrate chromatin accessibility and gene expression data to infer and validate a gene regulatory network that directs the innate immune response to HIV.
Collapse
Affiliation(s)
- Jarrod S Johnson
- Department of Biochemistry, University of Utah, Salt Lake City, UT 84112, USA; Center for Infectious Disease Research, Seattle, WA 98109, USA.
| | - Nicholas De Veaux
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | - Alexander W Rives
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | - Xavier Lahaye
- Immunity and Cancer Department, Institut Curie, PSL Research University, INSERM U932, 75005 Paris, France
| | - Sasha Y Lucas
- Center for Infectious Disease Research, Seattle, WA 98109, USA
| | - Brieuc P Perot
- Laboratory of Inflammatory Responses and Transcriptomic Networks in Diseases, Imagine Institute, INSERM UMR 1163, ATIP-Avenir Team, Université de Paris, 24 Boulevard du Montparnasse, 75015 Paris, France
| | - Marine Luka
- Laboratory of Inflammatory Responses and Transcriptomic Networks in Diseases, Imagine Institute, INSERM UMR 1163, ATIP-Avenir Team, Université de Paris, 24 Boulevard du Montparnasse, 75015 Paris, France
| | - Victor Garcia-Paredes
- Laboratory of Inflammatory Responses and Transcriptomic Networks in Diseases, Imagine Institute, INSERM UMR 1163, ATIP-Avenir Team, Université de Paris, 24 Boulevard du Montparnasse, 75015 Paris, France
| | - Lynn M Amon
- Center for Infectious Disease Research, Seattle, WA 98109, USA
| | - Aaron Watters
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | - Ghaith Abdessalem
- Laboratory of Inflammatory Responses and Transcriptomic Networks in Diseases, Imagine Institute, INSERM UMR 1163, ATIP-Avenir Team, Université de Paris, 24 Boulevard du Montparnasse, 75015 Paris, France
| | - Alan Aderem
- Center for Infectious Disease Research, Seattle, WA 98109, USA; Department of Immunology, University of Washington School of Medicine, Seattle, WA 98109, USA
| | - Nicolas Manel
- Immunity and Cancer Department, Institut Curie, PSL Research University, INSERM U932, 75005 Paris, France
| | - Dan R Littman
- The Kimmel Center for Biology and Medicine of the Skirball Institute, New York University School of Medicine, New York, NY 10016, USA; Howard Hughes Medical Institute, New York University School of Medicine, New York, NY 10016, USA
| | - Richard Bonneau
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA; Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA; Center for Data Science, New York University, New York, NY 10011, USA
| | - Mickaël M Ménager
- Laboratory of Inflammatory Responses and Transcriptomic Networks in Diseases, Imagine Institute, INSERM UMR 1163, ATIP-Avenir Team, Université de Paris, 24 Boulevard du Montparnasse, 75015 Paris, France; The Kimmel Center for Biology and Medicine of the Skirball Institute, New York University School of Medicine, New York, NY 10016, USA.
| |
Collapse
|
11
|
Emad A, Sinha S. Inference of phenotype-relevant transcriptional regulatory networks elucidates cancer type-specific regulatory mechanisms in a pan-cancer study. NPJ Syst Biol Appl 2021; 7:9. [PMID: 33558504 PMCID: PMC7870953 DOI: 10.1038/s41540-021-00169-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Accepted: 01/05/2021] [Indexed: 01/30/2023] Open
Abstract
Reconstruction of transcriptional regulatory networks (TRNs) is a powerful approach to unravel the gene expression programs involved in healthy and disease states of a cell. However, these networks are usually reconstructed independent of the phenotypic (or clinical) properties of the samples. Therefore, they may confound regulatory mechanisms that are specifically related to a phenotypic property with more general mechanisms underlying the full complement of the analyzed samples. In this study, we develop a method called InPheRNo to identify "phenotype-relevant" TRNs. This method is based on a probabilistic graphical model that models the simultaneous effects of multiple transcription factors (TFs) on their target genes and the statistical relationship between the target genes' expression and the phenotype. Extensive comparison of InPheRNo with related approaches using primary tumor samples of 18 cancer types from The Cancer Genome Atlas reveals that InPheRNo can accurately reconstruct cancer type-relevant TRNs and identify cancer driver TFs. In addition, survival analysis reveals that the activity level of TFs with many target genes could distinguish patients with poor prognosis from those with better prognosis.
Collapse
Affiliation(s)
- Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada.
| | - Saurabh Sinha
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
12
|
Ghaffari S, Hanson C, Schmidt RE, Bouchonville KJ, Offer SM, Sinha S. An integrated multi-omics approach to identify regulatory mechanisms in cancer metastatic processes. Genome Biol 2021; 22:19. [PMID: 33413550 PMCID: PMC7789593 DOI: 10.1186/s13059-020-02213-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 11/25/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Metastatic progress is the primary cause of death in most cancers, yet the regulatory dynamics driving the cellular changes necessary for metastasis remain poorly understood. Multi-omics approaches hold great promise for addressing this challenge; however, current analysis tools have limited capabilities to systematically integrate transcriptomic, epigenomic, and cistromic information to accurately define the regulatory networks critical for metastasis. RESULTS To address this limitation, we use a purposefully generated cellular model of colon cancer invasiveness to generate multi-omics data, including expression, accessibility, and selected histone modification profiles, for increasing levels of invasiveness. We then adopt a rigorous probabilistic framework for joint inference from the resulting heterogeneous data, along with transcription factor binding profiles. Our approach uses probabilistic graphical models to leverage the functional information provided by specific epigenomic changes, models the influence of multiple transcription factors simultaneously, and automatically learns the activating or repressive roles of cis-regulatory events. Global analysis of these relationships reveals key transcription factors driving invasiveness, as well as their likely target genes. Disrupting the expression of one of the highly ranked transcription factors JunD, an AP-1 complex protein, confirms functional relevance to colon cancer cell migration and invasion. Transcriptomic profiling confirms key regulatory targets of JunD, and a gene signature derived from the model demonstrates strong prognostic potential in TCGA colorectal cancer data. CONCLUSIONS Our work sheds new light into the complex molecular processes driving colon cancer metastasis and presents a statistically sound integrative approach to analyze multi-omics profiles of a dynamic biological process.
Collapse
Affiliation(s)
- Saba Ghaffari
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA
| | - Casey Hanson
- Department of Genetics, Stanford University, Stanford, USA
| | - Remington E Schmidt
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Gonda 19-476, 200 First St SW, Rochester, MN, 55905, USA
| | - Kelly J Bouchonville
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Gonda 19-476, 200 First St SW, Rochester, MN, 55905, USA
| | - Steven M Offer
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Gonda 19-476, 200 First St SW, Rochester, MN, 55905, USA.
| | - Saurabh Sinha
- Department of Computer Science, Carl R. Woese Institute of Genomic Biology, and Cancer Center of Illinois, University of Illinois at Urbana-Champaign, 2122, Siebel Center, 201 N. Goodwin Ave., Urbana, IL, 61801, USA.
| |
Collapse
|
13
|
Shi M, Tan S, Xie XP, Li A, Yang W, Zhu T, Wang HQ. Globally learning gene regulatory networks based on hidden atomic regulators from transcriptomic big data. BMC Genomics 2020; 21:711. [PMID: 33054712 PMCID: PMC7559338 DOI: 10.1186/s12864-020-07079-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 09/18/2020] [Indexed: 12/02/2022] Open
Abstract
Background Genes are regulated by various types of regulators and most of them are still unknown or unobserved. Current gene regulatory networks (GRNs) reverse engineering methods often neglect the unknown regulators and infer regulatory relationships in a local and sub-optimal manner. Results This paper proposes a global GRNs inference framework based on dictionary learning, named dlGRN. The method intends to learn atomic regulators (ARs) from gene expression data using a modified dictionary learning (DL) algorithm, which reflects the whole gene regulatory system, and predicts the regulation between a known regulator and a target gene in a global regression way. The modified DL algorithm fits the scale-free property of biological network, rendering dlGRN intrinsically discern direct and indirect regulations. Conclusions Extensive experimental results on simulation and real-world data demonstrate the effectiveness and efficiency of dlGRN in reverse engineering GRNs. A novel predicted transcription regulation between a TF TFAP2C and an oncogene EGFR was experimentally verified in lung cancer cells. Furthermore, the real application reveals the prevalence of DNA methylation regulation in gene regulatory system. dlGRN can be a standalone tool for GRN inference for its globalization and robustness.
Collapse
Affiliation(s)
- Ming Shi
- MICB Laboratory, Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China.,Current Address: MOE Key Laboratory of Bioinformatics, Division of Bioinformatics and Center for Synthetic and Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Sheng Tan
- The CAS Key Laboratory of Innate Immunity and Chronic Disease, Division of Life Sciences and Medicine, University of Science and Technology of China, 96 Jinzhai Road, Hefei, Anhui, 230026, P. R. China
| | - Xin-Ping Xie
- School of Mathematics and Physics, Anhui Jianzhu University, 856 Jinzhai Road, Hefei, Anhui, 230022, P. R. China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, 96 Jinzhai Road, Hefei, Anhui, 230026, P. R. China
| | - Wulin Yang
- Cancer hospital & Anhui Province Key Laboratory of Medical Physics and Technology, Center of Medical Physics and Technology, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China
| | - Tao Zhu
- Current Address: MOE Key Laboratory of Bioinformatics, Division of Bioinformatics and Center for Synthetic and Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084, China.
| | - Hong-Qiang Wang
- MICB Laboratory, Institute of Intelligent Machines, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China. .,Cancer hospital & Anhui Province Key Laboratory of Medical Physics and Technology, Center of Medical Physics and Technology, Hefei Institutes of Physical Science, CAS, 350 Shushanghu Road, Hefei, Anhui, 230031, P. R. China.
| |
Collapse
|
14
|
Fan K, Chen Y, Mao Z, Fang Y, Li Z, Lin W, Zhang Y, Liu J, Huang J, Lin W. Pervasive duplication, biased molecular evolution and comprehensive functional analysis of the PP2C family in Glycine max. BMC Genomics 2020; 21:465. [PMID: 32631220 PMCID: PMC7339511 DOI: 10.1186/s12864-020-06877-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 07/01/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Soybean (Glycine max) is an important oil provider and ecosystem participant. The protein phosphatase 2C (PP2C) plays important roles in key biological processes. Molecular evolution and functional analysis of the PP2C family in soybean are yet to be reported. RESULTS The present study identified 134 GmPP2Cs with 10 subfamilies in soybean. Duplication events were prominent in the GmPP2C family, and all duplicated gene pairs were involved in the segmental duplication events. The legume-common duplication event and soybean-specific tetraploid have primarily led to expanding GmPP2C members in soybean. Sub-functionalization was the main evolutionary fate of duplicated GmPP2C members. Meanwhile, massive genes were lost in the GmPP2C family, especially from the F subfamily. Compared with other genes, the evolutionary rates were slower in the GmPP2C family. The PP2C members from the H subfamily resembled their ancestral genes. In addition, some GmPP2Cs were identified as the putative key regulator that could control plant growth and development. CONCLUSIONS A total of 134 GmPP2Cs were identified in soybean, and their expansion, molecular evolution and putative functions were comprehensively analyzed. Our findings provided the detailed information on the evolutionary history of the GmPP2C family, and the candidate genes can be used in soybean breeding.
Collapse
Affiliation(s)
- Kai Fan
- Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Fujian Provincial Key Laboratory of Agroecological Processing and Safety Monitoring, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Key Laboratory of Crop Ecology and Molecular Physiology (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou, 35002 P. R. China
| | - Yunrui Chen
- Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Fujian Provincial Key Laboratory of Agroecological Processing and Safety Monitoring, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Key Laboratory of Crop Ecology and Molecular Physiology (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou, 35002 P. R. China
| | - Zhijun Mao
- Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Fujian Provincial Key Laboratory of Agroecological Processing and Safety Monitoring, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Key Laboratory of Crop Ecology and Molecular Physiology (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou, 35002 P. R. China
| | - Yao Fang
- Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Fujian Provincial Key Laboratory of Agroecological Processing and Safety Monitoring, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Key Laboratory of Crop Ecology and Molecular Physiology (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou, 35002 P. R. China
| | - Zhaowei Li
- Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Fujian Provincial Key Laboratory of Agroecological Processing and Safety Monitoring, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Key Laboratory of Crop Ecology and Molecular Physiology (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou, 35002 P. R. China
| | - Weiwei Lin
- Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Fujian Provincial Key Laboratory of Agroecological Processing and Safety Monitoring, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Key Laboratory of Crop Ecology and Molecular Physiology (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou, 35002 P. R. China
| | - Yongqiang Zhang
- Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Fujian Provincial Key Laboratory of Agroecological Processing and Safety Monitoring, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Key Laboratory of Crop Ecology and Molecular Physiology (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou, 35002 P. R. China
| | - Jianping Liu
- Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Fujian Provincial Key Laboratory of Agroecological Processing and Safety Monitoring, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Key Laboratory of Crop Ecology and Molecular Physiology (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou, 35002 P. R. China
| | - Jinwen Huang
- Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Fujian Provincial Key Laboratory of Agroecological Processing and Safety Monitoring, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Key Laboratory of Crop Ecology and Molecular Physiology (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou, 35002 P. R. China
| | - Wenxiong Lin
- Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Fujian Provincial Key Laboratory of Agroecological Processing and Safety Monitoring, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, 350002 P. R. China
- Key Laboratory of Crop Ecology and Molecular Physiology (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou, 35002 P. R. China
| |
Collapse
|
15
|
Farahmand S, O’Connor C, Macoska JA, Zarringhalam K. Causal Inference Engine: a platform for directional gene set enrichment analysis and inference of active transcriptional regulators. Nucleic Acids Res 2019; 47:11563-11573. [PMID: 31701125 PMCID: PMC7145661 DOI: 10.1093/nar/gkz1046] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Revised: 09/19/2019] [Accepted: 10/28/2019] [Indexed: 02/07/2023] Open
Abstract
Inference of active regulatory mechanisms underlying specific molecular and environmental perturbations is essential for understanding cellular response. The success of inference algorithms relies on the quality and coverage of the underlying network of regulator-gene interactions. Several commercial platforms provide large and manually curated regulatory networks and functionality to perform inference on these networks. Adaptation of such platforms for open-source academic applications has been hindered by the lack of availability of accurate, high-coverage networks of regulatory interactions and integration of efficient causal inference algorithms. In this work, we present CIE, an integrated platform for causal inference of active regulatory mechanisms form differential gene expression data. Using a regularized Gaussian Graphical Model, we construct a transcriptional regulatory network by integrating publicly available ChIP-seq experiments with gene-expression data from tissue-specific RNA-seq experiments. Our GGM approach identifies high confidence transcription factor (TF)-gene interactions and annotates the interactions with information on mode of regulation (activation vs. repression). Benchmarks against manually curated databases of TF-gene interactions show that our method can accurately detect mode of regulation. We demonstrate the ability of our platform to identify active transcriptional regulators by using controlled in vitro overexpression and stem-cell differentiation studies and utilize our method to investigate transcriptional mechanisms of fibroblast phenotypic plasticity.
Collapse
Affiliation(s)
- Saman Farahmand
- Computational Sciences PhD program, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Corey O’Connor
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Jill A Macoska
- Center for Personalized Cancer Therapy, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Kourosh Zarringhalam
- Computational Sciences PhD program, University of Massachusetts Boston, Boston, MA 02125, USA
- Department of Mathematics, University of Massachusetts Boston, Boston, MA 02125, USA
| |
Collapse
|
16
|
Mercatelli D, Scalambra L, Triboli L, Ray F, Giorgi FM. Gene regulatory network inference resources: A practical overview. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194430. [PMID: 31678629 DOI: 10.1016/j.bbagrm.2019.194430] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 09/06/2019] [Accepted: 09/09/2019] [Indexed: 02/08/2023]
Abstract
Transcriptional regulation is a fundamental molecular mechanism involved in almost every aspect of life, from homeostasis to development, from metabolism to behavior, from reaction to stimuli to disease progression. In recent years, the concept of Gene Regulatory Networks (GRNs) has grown popular as an effective applied biology approach for describing the complex and highly dynamic set of transcriptional interactions, due to its easy-to-interpret features. Since cataloguing, predicting and understanding every GRN connection in all species and cellular contexts remains a great challenge for biology, researchers have developed numerous tools and methods to infer regulatory processes. In this review, we catalogue these methods in six major areas, based on the dominant underlying information leveraged to infer GRNs: Coexpression, Sequence Motifs, Chromatin Immunoprecipitation (ChIP), Orthology, Literature and Protein-Protein Interaction (PPI) specifically focused on transcriptional complexes. The methods described here cover a wide range of user-friendliness: from web tools that require no prior computational expertise to command line programs and algorithms for large scale GRN inferences. Each method for GRN inference described herein effectively illustrates a type of transcriptional relationship, with many methods being complementary to others. While a truly holistic approach for inferring and displaying GRNs remains one of the greatest challenges in the field of systems biology, we believe that the integration of multiple methods described herein provides an effective means with which experimental and computational biologists alike may obtain the most complete pictures of transcriptional relationships. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Daniele Mercatelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Laura Scalambra
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Luca Triboli
- Centre for Integrative Biology (CIBIO), University of Trento, Italy
| | - Forest Ray
- Department of Systems Biology, Columbia University Medical Center, New York, NY, United States
| | - Federico M Giorgi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| |
Collapse
|
17
|
Chasman D, Iyer N, Fotuhi Siahpirani A, Estevez Silva M, Lippmann E, McIntosh B, Probasco MD, Jiang P, Stewart R, Thomson JA, Ashton RS, Roy S. Inferring Regulatory Programs Governing Region Specificity of Neuroepithelial Stem Cells during Early Hindbrain and Spinal Cord Development. Cell Syst 2019; 9:167-186.e12. [PMID: 31302154 DOI: 10.1016/j.cels.2019.05.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 05/05/2019] [Accepted: 05/30/2019] [Indexed: 12/19/2022]
Abstract
Neuroepithelial stem cells (NSC) from different anatomical regions of the embryonic neural tube's rostrocaudal axis can differentiate into diverse central nervous system tissues, but the transcriptional regulatory networks governing these processes are incompletely understood. Here, we measure region-specific NSC gene expression along the rostrocaudal axis in a human pluripotent stem cell model of early central nervous system development over a 72-h time course, spanning the hindbrain to cervical spinal cord. We introduce Escarole, a probabilistic clustering algorithm for non-stationary time series, and combine it with prior-based regulatory network inference to identify genes that are regulated dynamically and predict their upstream regulators. We identify known regulators of patterning and neural development, including the HOX genes, and predict a direct regulatory connection between the transcription factor POU3F2 and target gene STMN2. We demonstrate that POU3F2 is required for expression of STMN2, suggesting that this regulatory connection is important for region specificity of NSCs.
Collapse
Affiliation(s)
- Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Nisha Iyer
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Maria Estevez Silva
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Ethan Lippmann
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Brian McIntosh
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Mitchell D Probasco
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Peng Jiang
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Ron Stewart
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - James A Thomson
- Regenerative Biology Theme, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Randolph S Ashton
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, USA.
| |
Collapse
|
18
|
Miraldi ER, Pokrovskii M, Watters A, Castro DM, De Veaux N, Hall JA, Lee JY, Ciofani M, Madar A, Carriero N, Littman DR, Bonneau R. Leveraging chromatin accessibility for transcriptional regulatory network inference in T Helper 17 Cells. Genome Res 2019; 29:449-463. [PMID: 30696696 PMCID: PMC6396413 DOI: 10.1101/gr.238253.118] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Accepted: 01/15/2019] [Indexed: 12/13/2022]
Abstract
Transcriptional regulatory networks (TRNs) provide insight into cellular behavior by describing interactions between transcription factors (TFs) and their gene targets. The assay for transposase-accessible chromatin (ATAC)–seq, coupled with TF motif analysis, provides indirect evidence of chromatin binding for hundreds of TFs genome-wide. Here, we propose methods for TRN inference in a mammalian setting, using ATAC-seq data to improve gene expression modeling. We test our methods in the context of T Helper Cell Type 17 (Th17) differentiation, generating new ATAC-seq data to complement existing Th17 genomic resources. In this resource-rich mammalian setting, our extensive benchmarking provides quantitative, genome-scale evaluation of TRN inference, combining ATAC-seq and RNA-seq data. We refine and extend our previous Th17 TRN, using our new TRN inference methods to integrate all Th17 data (gene expression, ATAC-seq, TF knockouts, and ChIP-seq). We highlight newly discovered roles for individual TFs and groups of TFs (“TF–TF modules”) in Th17 gene regulation. Given the popularity of ATAC-seq, which provides high-resolution with low sample input requirements, we anticipate that our methods will improve TRN inference in new mammalian systems, especially in vivo, for cells directly from humans and animal models.
Collapse
Affiliation(s)
- Emily R Miraldi
- Divisions of Immunobiology and Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, Ohio 45229, USA.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio 45257, USA
| | - Maria Pokrovskii
- Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York, New York 10016, USA
| | - Aaron Watters
- Center for Computational Biology, Flatiron Institute, New York, New York 10010, USA
| | - Dayanne M Castro
- Department of Biology, New York University, New York, New York 10012, USA
| | - Nicholas De Veaux
- Center for Computational Biology, Flatiron Institute, New York, New York 10010, USA
| | - Jason A Hall
- Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York, New York 10016, USA
| | - June-Yong Lee
- Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York, New York 10016, USA
| | - Maria Ciofani
- Department of Immunology, Duke University School of Medicine, Durham, North Carolina 27710, USA
| | - Aviv Madar
- Department of Biology, New York University, New York, New York 10012, USA
| | - Nick Carriero
- Center for Computational Biology, Flatiron Institute, New York, New York 10010, USA
| | - Dan R Littman
- Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York, New York 10016, USA.,The Howard Hughes Medical Institute
| | - Richard Bonneau
- Center for Computational Biology, Flatiron Institute, New York, New York 10010, USA.,Department of Biology, New York University, New York, New York 10012, USA.,Center for Data Science, New York University, New York, New York 10010, USA
| |
Collapse
|
19
|
Castro DM, de Veaux NR, Miraldi ER, Bonneau R. Multi-study inference of regulatory networks for more accurate models of gene regulation. PLoS Comput Biol 2019; 15:e1006591. [PMID: 30677040 PMCID: PMC6363223 DOI: 10.1371/journal.pcbi.1006591] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Revised: 02/05/2019] [Accepted: 10/23/2018] [Indexed: 12/16/2022] Open
Abstract
Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets. Due to increasing availability of biological data, methods to properly integrate data generated across the globe become essential for extracting reproducible insights into relevant research questions. In this work, we developed a framework to reconstruct gene regulatory networks from expression datasets generated in separate studies—and thus, because of technical variation (different dates, handlers, laboratories, protocols etc…), challenging to integrate. Since regulatory mechanisms are often shared across conditions, we hypothesized that drawing conclusions from various data sources would improve performance of gene regulatory network inference. By transferring knowledge among regulatory models, our method is able to detect weaker patterns that are conserved across datasets, while also being able to detect dataset-unique interactions. We also allow incorporation of prior knowledge on network structure to favor models that are somewhat similar to the prior itself. Using two model organisms, we show that joint network inference outperforms inference from a single dataset. We also demonstrate that our method is robust to false edges in the prior and to low condition overlap across datasets, and that it can outperform current data integration strategies.
Collapse
Affiliation(s)
| | - Nicholas R de Veaux
- Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA
| | - Emily R Miraldi
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA.,Divisions of Immunobiology & Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA
| | - Richard Bonneau
- New York University, New York, NY 10003, USA.,Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA
| |
Collapse
|
20
|
Dai Z, Iqbal M, Lawrence ND, Rattray M. Efficient inference for sparse latent variable models of transcriptional regulation. Bioinformatics 2018; 33:3776-3783. [PMID: 28961802 PMCID: PMC5860323 DOI: 10.1093/bioinformatics/btx508] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 08/25/2017] [Indexed: 12/23/2022] Open
Abstract
Motivation Regulation of gene expression in prokaryotes involves complex co-regulatory mechanisms involving large numbers of transcriptional regulatory proteins and their target genes. Uncovering these genome-scale interactions constitutes a major bottleneck in systems biology. Sparse latent factor models, assuming activity of transcription factors (TFs) as unobserved, provide a biologically interpretable modelling framework, integrating gene expression and genome-wide binding data, but at the same time pose a hard computational inference problem. Existing probabilistic inference methods for such models rely on subjective filtering and suffer from scalability issues, thus are not well-suited for realistic genome-scale applications. Results We present a fast Bayesian sparse factor model, which takes input gene expression and binding sites data, either from ChIP-seq experiments or motif predictions, and outputs active TF-gene links as well as latent TF activities. Our method employs an efficient variational Bayes scheme for model inference enabling its application to large datasets which was not feasible with existing MCMC-based inference methods for such models. We validate our method on synthetic data against a similar model in the literature, employing MCMC for inference, and obtain comparable results with a small fraction of the computational time. We also apply our method to large-scale data from Mycobacterium tuberculosis involving ChIP-seq data on 113 TFs and matched gene expression data for 3863 putative target genes. We evaluate our predictions using an independent transcriptomics experiment involving over-expression of TFs. Availability and implementation An easy-to-use Jupyter notebook demo of our method with data is available at https://github.com/zhenwendai/SITAR. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhenwen Dai
- Department of Computer Science, University of Sheffield, Sheffield, UK.,Amazon Research, Cambridge, UK
| | - Mudassar Iqbal
- Division of Informatics, Imaging & Data Sciences, Faculty of Biology, Medicine, and Health Sciences, University of Manchester, Manchester, UK
| | - Neil D Lawrence
- Department of Computer Science, University of Sheffield, Sheffield, UK.,Amazon Research, Cambridge, UK
| | - Magnus Rattray
- Division of Informatics, Imaging & Data Sciences, Faculty of Biology, Medicine, and Health Sciences, University of Manchester, Manchester, UK
| |
Collapse
|
21
|
Koch C, Konieczka J, Delorey T, Lyons A, Socha A, Davis K, Knaack SA, Thompson D, O'Shea EK, Regev A, Roy S. Inference and Evolutionary Analysis of Genome-Scale Regulatory Networks in Large Phylogenies. Cell Syst 2017; 4:543-558.e8. [PMID: 28544882 PMCID: PMC5515301 DOI: 10.1016/j.cels.2017.04.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 02/20/2017] [Accepted: 04/26/2017] [Indexed: 11/22/2022]
Abstract
Changes in transcriptional regulatory networks can significantly contribute to species evolution and adaptation. However, identification of genome-scale regulatory networks is an open challenge, especially in non-model organisms. Here, we introduce multi-species regulatory network learning (MRTLE), a computational approach that uses phylogenetic structure, sequence-specific motifs, and transcriptomic data, to infer the regulatory networks in different species. Using simulated data from known networks and transcriptomic data from six divergent yeasts, we demonstrate that MRTLE predicts networks with greater accuracy than existing methods because it incorporates phylogenetic information. We used MRTLE to infer the structure of the transcriptional networks that control the osmotic stress responses of divergent, non-model yeast species and then validated our predictions experimentally. Interrogating these networks reveals that gene duplication promotes network divergence across evolution. Taken together, our approach facilitates study of regulatory network evolutionary dynamics across multiple poorly studied species.
Collapse
Affiliation(s)
- Christopher Koch
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, Wl, USA
| | - Jay Konieczka
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Toni Delorey
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Ana Lyons
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Amanda Socha
- Dartmouth College, Biology department, Hanover, NH 03755, USA
| | - Kathleen Davis
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
| | - Sara A Knaack
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
| | - Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Erin K O'Shea
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
- Howard Hughes Medical Institute, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
- Department of Molecular and Cellular Biology, Harvard University, Northwest Laboratory, Cambridge, Massachusetts, USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
- Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, Wl, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wl, USA
| |
Collapse
|
22
|
Chasman D, Roy S. Inference of cell type specific regulatory networks on mammalian lineages. ACTA ACUST UNITED AC 2017; 2:130-139. [PMID: 29082337 DOI: 10.1016/j.coisb.2017.04.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Transcriptional regulatory networks are at the core of establishing cell type specific gene expression programs. In mammalian systems, such regulatory networks are determined by multiple levels of regulation, including by transcription factors, chromatin environment, and three-dimensional organization of the genome. Recent efforts to measure diverse regulatory genomic datasets across multiple cell types and tissues offer unprecedented opportunities to examine the context-specificity and dynamics of regulatory networks at a greater resolution and scale than before. In parallel, numerous computational approaches to analyze these data have emerged that serve as important tools for understanding mammalian cell type specific regulation. In this article, we review recent computational approaches to predict the expression and sequence-based regulators of a gene's expression level and examine long-range gene regulation. We highlight promising approaches, insights gained, and open challenges that need to be overcome to build a comprehensive picture of cell type specific transcriptional regulatory networks.
Collapse
Affiliation(s)
- Deborah Chasman
- Wisconsin Institute for Discovery University of Wisconsin-Madison, Madison, WI 53715
| | - Sushmita Roy
- Wisconsin Institute for Discovery University of Wisconsin-Madison, Madison, WI 53715.,Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, WI 53792
| |
Collapse
|