1
|
Guo S, Du J, Li D, Xiong J, Chen Y. Versatile xylose and arabinose genetic switches development for yeasts. Metab Eng 2025; 87:21-36. [PMID: 39537022 DOI: 10.1016/j.ymben.2024.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Revised: 10/31/2024] [Accepted: 11/10/2024] [Indexed: 11/16/2024]
Abstract
Inducible transcription systems are essential tools in genetic engineering, where tight control, strong inducibility and fast response with cost-effective inducers are highly desired. However, existing systems in yeasts are rarely used in large-scale fermentations due to either cost-prohibitive inducers or incompatible performance. Here, we developed powerful xylose and arabinose induction systems in Saccharomyces cerevisiae, utilizing eukaryotic activators XlnR and AraRA from Aspergillus species and bacterial repressors XylR and AraRR. By integrating these signals into a highly-structured synthetic promoter, we created dual-mode systems with strong outputs and minimal leakiness. These systems demonstrated over 4000- and 300-fold regulation with strong activation and rapid response. The dual-mode xylose system was fully activated by xylose-rich agricultural residues like corncob hydrolysate, outperforming existing systems in terms of leakiness, inducibility, dynamic range, induction rate, and growth impact on host. We validated their utility in metabolic engineering with high-titer linalool production and demonstrated the transferability of the XlnR-based xylose induction system to Pichia pastoris, Candida glabrata and Candida albicans. This work provides robust genetic switches for yeasts and a general strategy for integrating activation-repression signals into synthetic promoters to achieve optimal performance.
Collapse
Affiliation(s)
- Shuhui Guo
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Juhua Du
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Donghan Li
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China; College of Biological Science, China Agricultural University, Beijing, 100193, China
| | - Jinghui Xiong
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Ye Chen
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
2
|
Mindel V, Brodsky S, Yung H, Manadre W, Barkai N. Revisiting the model for coactivator recruitment: Med15 can select its target sites independent of promoter-bound transcription factors. Nucleic Acids Res 2024; 52:12093-12111. [PMID: 39187372 PMCID: PMC11551773 DOI: 10.1093/nar/gkae718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 07/08/2024] [Accepted: 08/09/2024] [Indexed: 08/28/2024] Open
Abstract
Activation domains (ADs) within transcription factors (TFs) induce gene expression by recruiting coactivators such as the Mediator complex. Coactivators lack DNA binding domains (DBDs) and are assumed to passively follow their recruiting TFs. This is supported by direct AD-coactivator interactions seen in vitro but has not yet been tested in living cells. To examine that, we targeted two Med15-recruiting ADs to a range of budding yeast promoters through fusion with different DBDs. The DBD-AD fusions localized to hundreds of genomic sites but recruited Med15 and induced transcription in only a subset of bound promoters, characterized by a fuzzy-nucleosome architecture. Direct DBD-Med15 fusions shifted DBD localization towards fuzzy-nucleosome promoters, including promoters devoid of the endogenous Mediator. We propose that Med15, and perhaps other coactivators, possess inherent promoter preference and thus actively contribute to the selection of TF-induced genes.
Collapse
Affiliation(s)
- Vladimir Mindel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Sagie Brodsky
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Hadas Yung
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Wajd Manadre
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
3
|
Saha E, Fanfani V, Mandros P, Ben Guebila M, Fischer J, Shutta KH, DeMeo DL, Lopes-Ramos CM, Quackenbush J. Bayesian inference of sample-specific coexpression networks. Genome Res 2024; 34:1397-1410. [PMID: 39134413 PMCID: PMC11529861 DOI: 10.1101/gr.279117.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 07/31/2024] [Indexed: 08/28/2024]
Abstract
Gene regulatory networks (GRNs) are effective tools for inferring complex interactions between molecules that regulate biological processes and hence can provide insights into drivers of biological systems. Inferring coexpression networks is a critical element of GRN inference, as the correlation between expression patterns may indicate that genes are coregulated by common factors. However, methods that estimate coexpression networks generally derive an aggregate network representing the mean regulatory properties of the population and so fail to fully capture population heterogeneity. Bayesian optimized networks obtained by assimilating omic data (BONOBO) is a scalable Bayesian model for deriving individual sample-specific coexpression matrices that recognizes variations in molecular interactions across individuals. For each sample, BONOBO assumes a Gaussian distribution on the log-transformed centered gene expression and a conjugate prior distribution on the sample-specific coexpression matrix constructed from all other samples in the data. Combining the sample-specific gene coexpression with the prior distribution, BONOBO yields a closed-form solution for the posterior distribution of the sample-specific coexpression matrices, thus allowing the analysis of large data sets. We demonstrate BONOBO's utility in several contexts, including analyzing gene regulation in yeast transcription factor knockout studies, the prognostic significance of miRNA-mRNA interaction in human breast cancer subtypes, and sex differences in gene regulation within human thyroid tissue. We find that BONOBO outperforms other methods that have been used for sample-specific coexpression network inference and provides insight into individual differences in the drivers of biological processes.
Collapse
Affiliation(s)
- Enakshi Saha
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | - Viola Fanfani
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | - Panagiotis Mandros
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | - Marouen Ben Guebila
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | - Jonas Fischer
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | - Katherine H Shutta
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
| | - Dawn L DeMeo
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Camila M Lopes-Ramos
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA;
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| |
Collapse
|
4
|
Hossain I, Fanfani V, Fischer J, Quackenbush J, Burkholz R. Biologically informed NeuralODEs for genome-wide regulatory dynamics. Genome Biol 2024; 25:127. [PMID: 38773638 PMCID: PMC11106922 DOI: 10.1186/s13059-024-03264-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 04/30/2024] [Indexed: 05/24/2024] Open
Abstract
BACKGROUND Gene regulatory network (GRN) models that are formulated as ordinary differential equations (ODEs) can accurately explain temporal gene expression patterns and promise to yield new insights into important cellular processes, disease progression, and intervention design. Learning such gene regulatory ODEs is challenging, since we want to predict the evolution of gene expression in a way that accurately encodes the underlying GRN governing the dynamics and the nonlinear functional relationships between genes. Most widely used ODE estimation methods either impose too many parametric restrictions or are not guided by meaningful biological insights, both of which impede either scalability, explainability, or both. RESULTS We developed PHOENIX, a modeling framework based on neural ordinary differential equations (NeuralODEs) and Hill-Langmuir kinetics, that overcomes limitations of other methods by flexibly incorporating prior domain knowledge and biological constraints to promote sparse, biologically interpretable representations of GRN ODEs. We tested the accuracy of PHOENIX in a series of in silico experiments, benchmarking it against several currently used tools. We demonstrated PHOENIX's flexibility by modeling regulation of oscillating expression profiles obtained from synchronized yeast cells. We also assessed the scalability of PHOENIX by modeling genome-scale GRNs for breast cancer samples ordered in pseudotime and for B cells treated with Rituximab. CONCLUSIONS PHOENIX uses a combination of user-defined prior knowledge and functional forms from systems biology to encode biological "first principles" as soft constraints on the GRN allowing us to predict subsequent gene expression patterns in a biologically explainable manner.
Collapse
Affiliation(s)
| | - Viola Fanfani
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jonas Fischer
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Rebekka Burkholz
- CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
| |
Collapse
|
5
|
Meng W, Pan H, Sha Y, Zhai X, Xing A, Lingampelly SS, Sripathi SR, Wang Y, Li K. Metabolic Connectome and Its Role in the Prediction, Diagnosis, and Treatment of Complex Diseases. Metabolites 2024; 14:93. [PMID: 38392985 PMCID: PMC10890086 DOI: 10.3390/metabo14020093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 01/17/2024] [Accepted: 01/25/2024] [Indexed: 02/25/2024] Open
Abstract
The interconnectivity of advanced biological systems is essential for their proper functioning. In modern connectomics, biological entities such as proteins, genes, RNA, DNA, and metabolites are often represented as nodes, while the physical, biochemical, or functional interactions between them are represented as edges. Among these entities, metabolites are particularly significant as they exhibit a closer relationship to an organism's phenotype compared to genes or proteins. Moreover, the metabolome has the ability to amplify small proteomic and transcriptomic changes, even those from minor genomic changes. Metabolic networks, which consist of complex systems comprising hundreds of metabolites and their interactions, play a critical role in biological research by mediating energy conversion and chemical reactions within cells. This review provides an introduction to common metabolic network models and their construction methods. It also explores the diverse applications of metabolic networks in elucidating disease mechanisms, predicting and diagnosing diseases, and facilitating drug development. Additionally, it discusses potential future directions for research in metabolic networks. Ultimately, this review serves as a valuable reference for researchers interested in metabolic network modeling, analysis, and their applications.
Collapse
Affiliation(s)
- Weiyu Meng
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China; (W.M.); (H.P.); (Y.S.); (X.Z.); (A.X.)
| | - Hongxin Pan
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China; (W.M.); (H.P.); (Y.S.); (X.Z.); (A.X.)
| | - Yuyang Sha
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China; (W.M.); (H.P.); (Y.S.); (X.Z.); (A.X.)
| | - Xiaobing Zhai
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China; (W.M.); (H.P.); (Y.S.); (X.Z.); (A.X.)
| | - Abao Xing
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China; (W.M.); (H.P.); (Y.S.); (X.Z.); (A.X.)
| | | | - Srinivasa R. Sripathi
- Henderson Ocular Stem Cell Laboratory, Retina Foundation of the Southwest, Dallas, TX 75231, USA;
| | - Yuefei Wang
- National Key Laboratory of Chinese Medicine Modernization, State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China
- Haihe Laboratory of Modern Chinese Medicine, Tianjin 301617, China
| | - Kefeng Li
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China; (W.M.); (H.P.); (Y.S.); (X.Z.); (A.X.)
| |
Collapse
|
6
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure-primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. Genome Biol 2024; 25:24. [PMID: 38238840 PMCID: PMC10797903 DOI: 10.1186/s13059-023-03134-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 11/30/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA.
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA.
- Department of Neuro-Science, University of Wisconsin-Madison - Waisman Center, Madison, USA.
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York, NY, 10003, USA
- Department of Biology, NYU, New York, NY, 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA
- Department of Biology, NYU, New York, NY, 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY, 10008, USA.
- Department of Biology, NYU, New York, NY, 10008, USA.
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, 10010, USA.
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY, 10003, USA.
- Center For Data Science, NYU, New York, NY, 10008, USA.
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA.
| |
Collapse
|
7
|
Dubinkina V, Bhogale S, Hsieh PH, Dibaeinia P, Nambiar A, Maslov S, Yoshikuni Y, Sinha S. A transcriptomic atlas of acute stress response to low pH in multiple Issatchenkia orientalis strains. Microbiol Spectr 2024; 12:e0253623. [PMID: 38018981 PMCID: PMC10783018 DOI: 10.1128/spectrum.02536-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 10/27/2023] [Indexed: 11/30/2023] Open
Abstract
IMPORTANCE Issatchenkia orientalis is a promising industrial chassis to produce biofuels and bioproducts due to its high tolerance to multiple environmental stresses such as low pH, heat, and other chemicals otherwise toxic for the most widely used microbes. Yet, little is known about specific mechanisms of such tolerance in this organism, hindering our ability to engineer this species to produce valuable biochemicals. Here, we report a comprehensive study of the mechanisms of acidic tolerance in this species via transcriptome profiling across variable pH for 12 different strains with different phenotypes. We found multiple regulatory mechanisms involved in tolerance to low pH in different strains of I. orientalis, marking potential targets for future gene editing and perturbation experiments.
Collapse
Affiliation(s)
- Veronika Dubinkina
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- The Gladstone Institute of Data Science and Biotechnology, San Francisco, California, USA
| | - Shounak Bhogale
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Ping-Hung Hsieh
- Center for Advanced Bioenergy and Bioproducts Innovation, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Payam Dibaeinia
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Ananthan Nambiar
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Sergei Maslov
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Physics, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Yasuo Yoshikuni
- Center for Advanced Bioenergy and Bioproducts Innovation, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Global Institution for Collaborative Research and Education, Hokkaido University, Hokkaido, Japan
- Institute of Global Innovation Research, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Saurabh Sinha
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Cancer Center at Illinois, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Biomedical Engineering at Georgia Tech and Emory University, Atlanta, Georgia, USA
- Department of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| |
Collapse
|
8
|
Hecker D, Lauber M, Behjati Ardakani F, Ashrafiyan S, Manz Q, Kersting J, Hoffmann M, Schulz MH, List M. Computational tools for inferring transcription factor activity. Proteomics 2023; 23:e2200462. [PMID: 37706624 DOI: 10.1002/pmic.202200462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/11/2023] [Accepted: 08/22/2023] [Indexed: 09/15/2023]
Abstract
Transcription factors (TFs) are essential players in orchestrating the regulatory landscape in cells. Still, their exact modes of action and dependencies on other regulatory aspects remain elusive. Since TFs act cell type-specific and each TF has its own characteristics, untangling their regulatory interactions from an experimental point of view is laborious and convoluted. Thus, there is an ongoing development of computational tools that estimate transcription factor activity (TFA) from a variety of data modalities, either based on a mapping of TFs to their putative target genes or in a genome-wide, gene-unspecific fashion. These tools can help to gain insights into TF regulation and to prioritize candidates for experimental validation. We want to give an overview of available computational tools that estimate TFA, illustrate examples of their application, debate common result validation strategies, and discuss assumptions and concomitant limitations.
Collapse
Affiliation(s)
- Dennis Hecker
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Michael Lauber
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Fatemeh Behjati Ardakani
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Shamim Ashrafiyan
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Quirin Manz
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Johannes Kersting
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- GeneSurge GmbH, München, Germany
| | - Markus Hoffmann
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- Institute for Advanced Study, Technical University of Munich, Garching, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Marcel H Schulz
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Markus List
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|
9
|
Saha E, Fanfani V, Mandros P, Ben-Guebila M, Fischer J, Hoff-Shutta K, Glass K, DeMeo DL, Lopes-Ramos C, Quackenbush J. Bayesian Optimized sample-specific Networks Obtained By Omics data (BONOBO). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.16.567119. [PMID: 38014256 PMCID: PMC10680741 DOI: 10.1101/2023.11.16.567119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Gene regulatory networks (GRNs) are effective tools for inferring complex interactions between molecules that regulate biological processes and hence can provide insights into drivers of biological systems. Inferring co-expression networks is a critical element of GRN inference as the correlation between expression patterns may indicate that genes are coregulated by common factors. However, methods that estimate co-expression networks generally derive an aggregate network representing the mean regulatory properties of the population and so fail to fully capture population heterogeneity. To address these concerns, we introduce BONOBO (Bayesian Optimized Networks Obtained By assimilating Omics data), a scalable Bayesian model for deriving individual sample-specific co-expression networks by recognizing variations in molecular interactions across individuals. For every sample, BONOBO assumes a Gaussian distribution on the log-transformed centered gene expression and a conjugate prior distribution on the sample-specific co-expression matrix constructed from all other samples in the data. Combining the sample-specific gene expression with the prior distribution, BONOBO yields a closed-form solution for the posterior distribution of the sample-specific co-expression matrices, thus making the method extremely scalable. We demonstrate the utility of BONOBO in several contexts, including analyzing gene regulation in yeast transcription factor knockout studies, prognostic significance of miRNA-mRNA interaction in human breast cancer subtypes, and sex differences in gene regulation within human thyroid tissue. We find that BONOBO outperforms other sample-specific co-expression network inference methods and provides insight into individual differences in the drivers of biological processes.
Collapse
Affiliation(s)
- Enakshi Saha
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Viola Fanfani
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Panagiotis Mandros
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Marouen Ben-Guebila
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Jonas Fischer
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Katherine Hoff-Shutta
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Kimberly Glass
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Dawn Lisa DeMeo
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Camila Lopes-Ramos
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
10
|
Nambiar A, Dubinkina V, Liu S, Maslov S. FUN-PROSE: A deep learning approach to predict condition-specific gene expression in fungi. PLoS Comput Biol 2023; 19:e1011563. [PMID: 37971967 PMCID: PMC10653424 DOI: 10.1371/journal.pcbi.1011563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 09/30/2023] [Indexed: 11/19/2023] Open
Abstract
mRNA levels of all genes in a genome is a critical piece of information defining the overall state of the cell in a given environmental condition. Being able to reconstruct such condition-specific expression in fungal genomes is particularly important to metabolically engineer these organisms to produce desired chemicals in industrially scalable conditions. Most previous deep learning approaches focused on predicting the average expression levels of a gene based on its promoter sequence, ignoring its variation across different conditions. Here we present FUN-PROSE-a deep learning model trained to predict differential expression of individual genes across various conditions using their promoter sequences and expression levels of all transcription factors. We train and test our model on three fungal species and get the correlation between predicted and observed condition-specific gene expression as high as 0.85. We then interpret our model to extract promoter sequence motifs responsible for variable expression of individual genes. We also carried out input feature importance analysis to connect individual transcription factors to their gene targets. A sizeable fraction of both sequence motifs and TF-gene interactions learned by our model agree with previously known biological information, while the rest corresponds to either novel biological facts or indirect correlations.
Collapse
Affiliation(s)
- Ananthan Nambiar
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
- Carl R. Woese Institute for Genomic Biology, Urbana, Illinois, United States of America
| | - Veronika Dubinkina
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
- Carl R. Woese Institute for Genomic Biology, Urbana, Illinois, United States of America
- The Gladstone Institute of Data Science and Biotechnology, San Francisco, California, United States of America
| | - Simon Liu
- Carl R. Woese Institute for Genomic Biology, Urbana, Illinois, United States of America
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
| | - Sergei Maslov
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
- Carl R. Woese Institute for Genomic Biology, Urbana, Illinois, United States of America
- Department of Physics, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
- Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, Illinois, United States of America
| |
Collapse
|
11
|
Jackson CA, Beheler-Amass M, Tjärnberg A, Suresh I, Hickey ASM, Bonneau R, Gresham D. Simultaneous estimation of gene regulatory network structure and RNA kinetics from single cell gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.21.558277. [PMID: 37790443 PMCID: PMC10542544 DOI: 10.1101/2023.09.21.558277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Cells respond to environmental and developmental stimuli by remodeling their transcriptomes through regulation of both mRNA transcription and mRNA decay. A central goal of biology is identifying the global set of regulatory relationships between factors that control mRNA production and degradation and their target transcripts and construct a predictive model of gene expression. Regulatory relationships are typically identified using transcriptome measurements and causal inference algorithms. RNA kinetic parameters are determined experimentally by employing run-on or metabolic labeling (e.g. 4-thiouracil) methods that allow transcription and decay rates to be separately measured. Here, we develop a deep learning model, trained with single-cell RNA-seq data, that both infers causal regulatory relationships and estimates RNA kinetic parameters. The resulting in silico model predicts future gene expression states and can be perturbed to simulate the effect of transcription factor changes. We acquired model training data by sequencing the transcriptomes of 175,000 individual Saccharomyces cerevisiae cells that were subject to an external perturbation and continuously sampled over a one hour period. The rate of change for each transcript was calculated on a per-cell basis to estimate RNA velocity. We then trained a deep learning model with transcriptome and RNA velocity data to calculate time-dependent estimates of mRNA production and decay rates. By separating RNA velocity into transcription and decay rates, we show that rapamycin treatment causes existing ribosomal protein transcripts to be rapidly destabilized, while production of new transcripts gradually slows over the course of an hour. The neural network framework we present is designed to explicitly model causal regulatory relationships between transcription factors and their genes, and shows superior performance to existing models on the basis of recovery of known regulatory relationships. We validated the predictive power of the model by perturbing transcription factors in silico and comparing transcriptome-wide effects with experimental data. Our study represents the first step in constructing a complete, predictive, biophysical model of gene expression regulation.
Collapse
Affiliation(s)
- Christopher A Jackson
- Center For Genomics and Systems Biology, New York University, New York, NY, USA
- Department of Biology, New York University, New York, NY, USA
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, New York University, New York, NY, USA
- Department of Biology, New York University, New York, NY, USA
| | - Andreas Tjärnberg
- Center For Genomics and Systems Biology, New York University, New York, NY, USA
- Department of Biology, New York University, New York, NY, USA
| | - Ina Suresh
- Center For Genomics and Systems Biology, New York University, New York, NY, USA
- Department of Biology, New York University, New York, NY, USA
| | - Angela Shang-mei Hickey
- Center For Genomics and Systems Biology, New York University, New York, NY, USA
- Department of Biology, New York University, New York, NY, USA
| | | | - David Gresham
- Center For Genomics and Systems Biology, New York University, New York, NY, USA
- Department of Biology, New York University, New York, NY, USA
| |
Collapse
|
12
|
Turco G, Chang C, Wang RY, Kim G, Stoops EH, Richardson B, Sochat V, Rust J, Oughtred R, Thayer N, Kang F, Livstone MS, Heinicke S, Schroeder M, Dolinski KJ, Botstein D, Baryshnikova A. Global analysis of the yeast knockout phenome. SCIENCE ADVANCES 2023; 9:eadg5702. [PMID: 37235661 PMCID: PMC11326039 DOI: 10.1126/sciadv.adg5702] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 04/20/2023] [Indexed: 05/28/2023]
Abstract
Genome-wide phenotypic screens in the budding yeast Saccharomyces cerevisiae, enabled by its knockout collection, have produced the largest, richest, and most systematic phenotypic description of any organism. However, integrative analyses of this rich data source have been virtually impossible because of the lack of a central data repository and consistent metadata annotations. Here, we describe the aggregation, harmonization, and analysis of ~14,500 yeast knockout screens, which we call Yeast Phenome. Using this unique dataset, we characterized two unknown genes (YHR045W and YGL117W) and showed that tryptophan starvation is a by-product of many chemical treatments. Furthermore, we uncovered an exponential relationship between phenotypic similarity and intergenic distance, which suggests that gene positions in both yeast and human genomes are optimized for function.
Collapse
Affiliation(s)
- Gina Turco
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Christie Chang
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | | | - Griffin Kim
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | - Brianna Richardson
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Vanessa Sochat
- Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Jennifer Rust
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Rose Oughtred
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | | | - Fan Kang
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Michael S Livstone
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Sven Heinicke
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Mark Schroeder
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Kara J Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | | | | |
Collapse
|
13
|
Hossain I, Fanfani V, Quackenbush J, Burkholz R. Biologically informed NeuralODEs for genome-wide regulatory dynamics. RESEARCH SQUARE 2023:rs.3.rs-2675584. [PMID: 36993392 PMCID: PMC10055646 DOI: 10.21203/rs.3.rs-2675584/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Models that are formulated as ordinary differential equations (ODEs) can accurately explain temporal gene expression patterns and promise to yield new insights into important cellular processes, disease progression, and intervention design. Learning such ODEs is challenging, since we want to predict the evolution of gene expression in a way that accurately encodes the causal gene-regulatory network (GRN) governing the dynamics and the nonlinear functional relationships between genes. Most widely used ODE estimation methods either impose too many parametric restrictions or are not guided by meaningful biological insights, both of which impedes scalability and/or explainability. To overcome these limitations, we developed PHOENIX, a modeling framework based on neural ordinary differential equations (NeuralODEs) and Hill-Langmuir kinetics, that can flexibly incorporate prior domain knowledge and biological constraints to promote sparse, biologically interpretable representations of ODEs. We test accuracy of PHOENIX in a series of in silico experiments benchmarking it against several currently used tools for ODE estimation. We also demonstrate PHOENIX's flexibility by studying oscillating expression data from synchronized yeast cells and assess its scalability by modelling genome-scale breast cancer expression for samples ordered in pseudotime. Finally, we show how the combination of user-defined prior knowledge and functional forms from systems biology allows PHOENIX to encode key properties of the underlying GRN, and subsequently predict expression patterns in a biologically explainable way.
Collapse
Affiliation(s)
- Intekhab Hossain
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Viola Fanfani
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Rebekka Burkholz
- Helmholtz Center for Information Security (CISPA), Saarbrücken, Germany
| |
Collapse
|
14
|
Nabuco Leva Ferreira de Freitas JA, Bischof O. Dynamic modeling of the cellular senescence gene regulatory network. Heliyon 2023; 9:e14007. [PMID: 36938415 PMCID: PMC10015196 DOI: 10.1016/j.heliyon.2023.e14007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 02/13/2023] [Accepted: 02/17/2023] [Indexed: 02/27/2023] Open
Abstract
Cellular senescence is a cell fate that prominently impacts physiological and pathophysiological processes. Diverse cellular stresses induce it, and dramatic gene expression changes accompany it. However, determining the interactions comprising the gene regulatory network (GRN) governing senescence remains challenging. Recent advances in signal processing techniques provide opportunities to reconstruct GRNs. Here, we describe a GRN for senescence integrating time-series transcriptome and transcription factor depletion datasets. Specifically, we infer a set of differential equations using the "Sparse Identification of Nonlinear Dynamics" (SINDy) algorithm, discriminate genes with potential hidden regulators, validate the inferred GRN for time-points not included in the training data, and comprehensively benchmark our approach. Our work is a proof of concept for a data-driven GRN reconstruction method, consolidating an iterative, powerful mathematical platform for senescence modeling that can be used to test hypotheses in silico and has the potential for future discoveries of clinical impact.
Collapse
Affiliation(s)
- José Américo Nabuco Leva Ferreira de Freitas
- IMRB, Mondor Institute for Biomedical Research, INSERM U955 – Université Paris Est Créteil, UPEC, Faculté de Médecine de Créteil 8, rue du Général Sarrail, 94010 Créteil
- Sorbonne Université, UMR 8256, Biological Adaptation and Ageing B2A–IBPS, F-75005, Paris, France
- INSERM U1164, F-75005, Paris, France
| | - Oliver Bischof
- IMRB, Mondor Institute for Biomedical Research, INSERM U955 – Université Paris Est Créteil, UPEC, Faculté de Médecine de Créteil 8, rue du Général Sarrail, 94010 Créteil
- Corresponding author.
| |
Collapse
|
15
|
Abid D, Brent MR. NetProphet 3: a machine learning framework for transcription factor network mapping and multi-omics integration. Bioinformatics 2023; 39:7000334. [PMID: 36692138 PMCID: PMC9912366 DOI: 10.1093/bioinformatics/btad038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 01/11/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION Many methods have been proposed for mapping the targets of transcription factors (TFs) from gene expression data. It is known that combining outputs from multiple methods can improve performance. To date, outputs have been combined by using either simplistic formulae, such as geometric mean, or carefully hand-tuned formulae that may not generalize well to new inputs. Finally, the evaluation of accuracy has been challenging due to the lack of genome-scale, ground-truth networks. RESULTS We developed NetProphet3, which combines scores from multiple analyses automatically, using a tree boosting algorithm trained on TF binding location data. We also developed three independent, genome-scale evaluation metrics. By these metrics, NetProphet3 is more accurate than other commonly used packages, including NetProphet 2.0, when gene expression data from direct TF perturbations are available. Furthermore, its integration mode can forge a consensus network from gene expression data and TF binding location data. AVAILABILITY AND IMPLEMENTATION All data and code are available at https://zenodo.org/record/7504131#.Y7Wu3i-B2x8. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dhoha Abid
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
16
|
Tjärnberg A, Beheler-Amass M, Jackson CA, Christiaen LA, Gresham D, Bonneau R. Structure primed embedding on the transcription factor manifold enables transparent model architectures for gene regulatory network and latent activity inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.02.526909. [PMID: 36778259 PMCID: PMC9915715 DOI: 10.1101/2023.02.02.526909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.
Collapse
Affiliation(s)
- Andreas Tjärnberg
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, 10010, USA
| | - Maggie Beheler-Amass
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Christopher A Jackson
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Lionel A Christiaen
- Center for Developmental Genetics, New York University, New York 10003 NY, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
- Department of Heart Disease, Haukeland University Hospital, Bergen, Norway
| | - David Gresham
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
| | - Richard Bonneau
- Center For Genomics and Systems Biology, NYU, New York, NY 10008, USA
- Department of Biology, NYU, New York, NY 10008, USA
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Courant Institute of Mathematical Sciences, Computer Science Department, New York University, New York, NY 10003, USA
- Center For Data Science, NYU, New York, NY 10008, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| |
Collapse
|
17
|
Sarmah D, Smith GR, Bouhaddou M, Stern AD, Erskine J, Birtwistle MR. Network inference from perturbation time course data. NPJ Syst Biol Appl 2022; 8:42. [PMID: 36316338 PMCID: PMC9622863 DOI: 10.1038/s41540-022-00253-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 10/18/2022] [Indexed: 11/05/2022] Open
Abstract
Networks underlie much of biology from subcellular to ecological scales. Yet, understanding what experimental data are needed and how to use them for unambiguously identifying the structure of even small networks remains a broad challenge. Here, we integrate a dynamic least squares framework into established modular response analysis (DL-MRA), that specifies sufficient experimental perturbation time course data to robustly infer arbitrary two and three node networks. DL-MRA considers important network properties that current methods often struggle to capture: (i) edge sign and directionality; (ii) cycles with feedback or feedforward loops including self-regulation; (iii) dynamic network behavior; (iv) edges external to the network; and (v) robust performance with experimental noise. We evaluate the performance of and the extent to which the approach applies to cell state transition networks, intracellular signaling networks, and gene regulatory networks. Although signaling networks are often an application of network reconstruction methods, the results suggest that only under quite restricted conditions can they be robustly inferred. For gene regulatory networks, the results suggest that incomplete knockdown is often more informative than full knockout perturbation, which may change experimental strategies for gene regulatory network reconstruction. Overall, the results give a rational basis to experimental data requirements for network reconstruction and can be applied to any such problem where perturbation time course experiments are possible.
Collapse
Affiliation(s)
- Deepraj Sarmah
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Gregory R Smith
- Department of Neurology, Center for Advanced Research on Diagnostic Assays, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mehdi Bouhaddou
- J. David Gladstone Institutes, San Francisco, CA, 94158, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Alan D Stern
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - James Erskine
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Marc R Birtwistle
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA.
- Department of Bioengineering, Clemson University, Clemson, SC, USA.
| |
Collapse
|
18
|
Yazdani A, Yazdani A, Mendez-Giraldez R, Samiei A, Kosorok MR, Schaid DJ. From classical mendelian randomization to causal networks for systematic integration of multi-omics. Front Genet 2022; 13:990486. [PMID: 36186433 PMCID: PMC9520987 DOI: 10.3389/fgene.2022.990486] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Accepted: 08/17/2022] [Indexed: 11/17/2022] Open
Abstract
The number of studies with information at multiple biological levels of granularity, such as genomics, proteomics, and metabolomics, is increasing each year, and a biomedical questaion is how to systematically integrate these data to discover new biological mechanisms that have the potential to elucidate the processes of health and disease. Causal frameworks, such as Mendelian randomization (MR), provide a foundation to begin integrating data for new biological discoveries. Despite the growing number of MR applications in a wide variety of biomedical studies, there are few approaches for the systematic analysis of omic data. The large number and diverse types of molecular components involved in complex diseases interact through complex networks, and classical MR approaches targeting individual components do not consider the underlying relationships. In contrast, causal network models established in the principles of MR offer significant improvements to the classical MR framework for understanding omic data. Integration of these mostly distinct branches of statistics is a recent development, and we here review the current progress. To set the stage for causal network models, we review some recent progress in the classical MR framework. We then explain how to transition from the classical MR framework to causal networks. We discuss the identification of causal networks and evaluate the underlying assumptions. We also introduce some tests for sensitivity analysis and stability assessment of causal networks. We then review practical details to perform real data analysis and identify causal networks and highlight some of the utility of causal networks. The utilities with validated novel findings reveal the full potential of causal networks as a systems approach that will become necessary to integrate large-scale omic data.
Collapse
Affiliation(s)
- Azam Yazdani
- Center of Perioperative Genetics and Genomics, Brigham Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Akram Yazdani
- Health Science Center at Houston, McGovern Medical School, Division of Clinical and Translational Sciences, University of Texas, Houston, TX, United States
| | - Raul Mendez-Giraldez
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, United States
| | - Ahmad Samiei
- Division of Pulmonary Medicine, Boston Children's Hospital, Boston, MA, United States
| | - Michael R Kosorok
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
19
|
Gupta A, Martin-Rufino JD, Jones TR, Subramanian V, Qiu X, Grody EI, Bloemendal A, Weng C, Niu SY, Min KH, Mehta A, Zhang K, Siraj L, Al' Khafaji A, Sankaran VG, Raychaudhuri S, Cleary B, Grossman S, Lander ES. Inferring gene regulation from stochastic transcriptional variation across single cells at steady state. Proc Natl Acad Sci U S A 2022; 119:e2207392119. [PMID: 35969771 PMCID: PMC9407670 DOI: 10.1073/pnas.2207392119] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 07/20/2022] [Indexed: 12/24/2022] Open
Abstract
Regulatory relationships between transcription factors (TFs) and their target genes lie at the heart of cellular identity and function; however, uncovering these relationships is often labor-intensive and requires perturbations. Here, we propose a principled framework to systematically infer gene regulation for all TFs simultaneously in cells at steady state by leveraging the intrinsic variation in the transcriptional abundance across single cells. Through modeling and simulations, we characterize how transcriptional bursts of a TF gene are propagated to its target genes, including the expected ranges of time delay and magnitude of maximum covariation. We distinguish these temporal trends from the time-invariant covariation arising from cell states, and we delineate the experimental and technical requirements for leveraging these small but meaningful cofluctuations in the presence of measurement noise. While current technology does not yet allow adequate power for definitively detecting regulatory relationships for all TFs simultaneously in cells at steady state, we investigate a small-scale dataset to inform future experimental design. This study supports the potential value of mapping regulatory connections through stochastic variation, and it motivates further technological development to achieve its full potential.
Collapse
Affiliation(s)
- Anika Gupta
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115
| | - Jorge D. Martin-Rufino
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115
- Dana-Farber Cancer Institute, Boston, MA 02215
| | | | | | - Xiaojie Qiu
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142
- HHMI, Massachusetts Institute of Technology, Cambridge, MA 02139
| | | | | | - Chen Weng
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115
- Dana-Farber Cancer Institute, Boston, MA 02215
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142
| | | | - Kyung Hoi Min
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Arnav Mehta
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Dana-Farber Cancer Institute, Boston, MA 02215
- Department of Medicine, Massachusetts General Hospital, Boston, MA 02114
| | - Kaite Zhang
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | - Layla Siraj
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | | | - Vijay G. Sankaran
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115
- Dana-Farber Cancer Institute, Boston, MA 02215
| | - Soumya Raychaudhuri
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115
- Center for Data Sciences, Brigham and Women’s Hospital, Boston, MA 02115
| | - Brian Cleary
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | | | - Eric S. Lander
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
20
|
Kang Y, Jung WJ, Brent MR. Predicting which genes will respond to transcription factor perturbations. G3 (BETHESDA, MD.) 2022; 12:jkac144. [PMID: 35666184 PMCID: PMC9339286 DOI: 10.1093/g3journal/jkac144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 05/25/2022] [Indexed: 11/13/2022]
Abstract
The ability to predict which genes will respond to the perturbation of a transcription factor serves as a benchmark for our systems-level understanding of transcriptional regulatory networks. In previous work, machine learning models have been trained to predict static gene expression levels in a biological sample by using data from the same or similar samples, including data on their transcription factor binding locations, histone marks, or DNA sequence. We report on a different challenge-training machine learning models to predict which genes will respond to the perturbation of a transcription factor without using any data from the perturbed cells. We find that existing transcription factor location data (ChIP-seq) from human cells have very little detectable utility for predicting which genes will respond to perturbation of a transcription factor. Features of genes, including their preperturbation expression level and expression variation, are very useful for predicting responses to perturbation of any transcription factor. This shows that some genes are poised to respond to transcription factor perturbations and others are resistant, shedding light on why it has been so difficult to predict responses from binding locations. Certain histone marks, including H3K4me1 and H3K4me3, have some predictive power when located downstream of the transcription start site. However, the predictive power of histone marks is much less than that of gene expression level and expression variation. Sequence-based or epigenetic properties of genes strongly influence their tendency to respond to direct transcription factor perturbations, partially explaining the oft-noted difficulty of predicting responsiveness from transcription factor binding location data. These molecular features are largely reflected in and summarized by the gene's expression level and expression variation. Code is available at https://github.com/BrentLab/TFPertRespExplainer.
Collapse
Affiliation(s)
- Yiming Kang
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Computer Science and Engineering, Washington University, St. Louis, MO 63108, USA
| | - Wooseok J Jung
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Computer Science and Engineering, Washington University, St. Louis, MO 63108, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Computer Science and Engineering, Washington University, St. Louis, MO 63108, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
21
|
Kumar S, Song M. Overcoming biases in causal inference of molecular interactions. Bioinformatics 2022; 38:2818-2825. [PMID: 35561208 DOI: 10.1093/bioinformatics/btac206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 02/03/2022] [Accepted: 04/04/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Computer inference of biological mechanisms is increasingly approachable due to dynamically rich data sources such as single-cell genomics. Inferred molecular interactions can prioritize hypotheses for wet-lab experiments to expedite biological discovery. However, complex data often come with unwanted biological or technical variations, exposing biases over marginal distribution and sample size in current methods to favor spurious causal relationships. RESULTS Considering function direction and strength as evidence for causality, we present an adapted functional chi-squared test (AdpFunChisq) that rewards functional patterns over non-functional or independent patterns. On synthetic and three biology datasets, we demonstrate the advantages of AdpFunChisq over 10 methods on overcoming biases that give rise to wide fluctuations in the performance of alternative approaches. On single-cell multiomics data of multiple phenotype acute leukemia, we found that the T-cell surface glycoprotein CD3 delta chain may causally mediate specific genes in the viral carcinogenesis pathway. Using the causality-by-functionality principle, AdpFunChisq offers a viable option for robust causal inference in dynamical systems. AVAILABILITY AND IMPLEMENTATION The AdpFunChisq test is implemented in the R package 'FunChisq' (2.5.2 or above) at https://cran.r-project.org/package=FunChisq. All other source code along with pre-processed data is available at Code Ocean https://doi.org/10.24433/CO.2907738.v1. SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sajal Kumar
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
| | - Mingzhou Song
- Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, USA
- Molecular Biology and Interdisciplinary Life Sciences Graduate Program, New Mexico State University, Las Cruces, NM 88003, USA
| |
Collapse
|
22
|
Wu Y, Judge MT, Edison AS, Arnold J. Uncovering in vivo biochemical patterns from time-series metabolic dynamics. PLoS One 2022; 17:e0268394. [PMID: 35550643 PMCID: PMC9098013 DOI: 10.1371/journal.pone.0268394] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 04/28/2022] [Indexed: 11/19/2022] Open
Abstract
System biology relies on holistic biomolecule measurements, and untangling biochemical networks requires time-series metabolomics profiling. With current metabolomic approaches, time-series measurements can be taken for hundreds of metabolic features, which decode underlying metabolic regulation. Such a metabolomic dataset is untargeted with most features unannotated and inaccessible to statistical analysis and computational modeling. The high dimensionality of the metabolic space also causes mechanistic modeling to be rather cumbersome computationally. We implemented a faster exploratory workflow to visualize and extract chemical and biochemical dependencies. Time-series metabolic features (about 300 for each dataset) were extracted by Ridge Tracking-based Extract (RTExtract) on measurements from continuous in vivo monitoring of metabolism by NMR (CIVM-NMR) in Neurospora crassa under different conditions. The metabolic profiles were then smoothed and projected into lower dimensions, enabling a comparison of metabolic trends in the cultures. Next, we expanded incomplete metabolite annotation using a correlation network. Lastly, we uncovered meaningful metabolic clusters by estimating dependencies between smoothed metabolic profiles. We thus sidestepped the processes of time-consuming mechanistic modeling, difficult global optimization, and labor-intensive annotation. Multiple clusters guided insights into central energy metabolism and membrane synthesis. Dense connections with glucose 1-phosphate indicated its central position in metabolism in N. crassa. Our approach was benchmarked on simulated random network dynamics and provides a novel exploratory approach to analyzing high-dimensional metabolic dynamics.
Collapse
Affiliation(s)
- Yue Wu
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
| | - Michael T. Judge
- Department of Genetics, University of Georgia, Athens, GA, United States of America
| | - Arthur S. Edison
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
- Department of Genetics, University of Georgia, Athens, GA, United States of America
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, United States of America
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, United States of America
- * E-mail: (ASE); (JA)
| | - Jonathan Arnold
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
- Department of Genetics, University of Georgia, Athens, GA, United States of America
- Department of Statistics, University of Georgia, Athens, GA, United States of America
- Department of Physics and Astronomy, University of Georgia, Athens, GA, United States of America
- * E-mail: (ASE); (JA)
| |
Collapse
|
23
|
Skok Gibbs C, Jackson CA, Saldi GA, Tjärnberg A, Shah A, Watters A, De Veaux N, Tchourine K, Yi R, Hamamsy T, Castro DM, Carriero N, Gorissen BL, Gresham D, Miraldi ER, Bonneau R. High-performance single-cell gene regulatory network inference at scale: the Inferelator 3.0. Bioinformatics 2022; 38:2519-2528. [PMID: 35188184 PMCID: PMC9048651 DOI: 10.1093/bioinformatics/btac117] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 12/08/2021] [Accepted: 02/17/2022] [Indexed: 12/04/2022] Open
Abstract
MOTIVATION Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above. RESULTS In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data. AVAILABILITY AND IMPLEMENTATION The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Claudia Skok Gibbs
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Center for Data Science, New York University, New York, NY 10003, USA
| | - Christopher A Jackson
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
| | - Giuseppe-Antonio Saldi
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
| | - Andreas Tjärnberg
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
| | - Aashna Shah
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | - Aaron Watters
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | - Nicholas De Veaux
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
| | | | - Ren Yi
- Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
| | - Tymor Hamamsy
- Center for Data Science, New York University, New York, NY 10003, USA
| | - Dayanne M Castro
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
| | - Nicholas Carriero
- Flatiron Institute, Scientific Computing Core, Simons Foundation, New York, NY 10010, USA
| | - Bram L Gorissen
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - David Gresham
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
| | - Emily R Miraldi
- Divisions of Immunobiology and Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA
| | - Richard Bonneau
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY 10010, USA
- Center for Data Science, New York University, New York, NY 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Biology, New York University, New York, NY 10003, USA
- Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
| |
Collapse
|
24
|
Barberis M, Mondeel TD. Unveiling Forkhead-mediated regulation of yeast cell cycle and metabolic networks. Comput Struct Biotechnol J 2022; 20:1743-1751. [PMID: 35495119 PMCID: PMC9024378 DOI: 10.1016/j.csbj.2022.03.033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 03/10/2022] [Accepted: 03/29/2022] [Indexed: 11/25/2022] Open
Abstract
Findings from genome-wide ChIP studies on budding yeast Forkheads are interpreted. Power, challenges and limitation of ChIP studies are presented by target gene analysis. Forkheads regulate metabolic targets through which cell division may be coordinated.
Transcription factors are regulators of the cell’s genomic landscape. By switching single genes or entire molecular pathways on or off, transcription factors modulate the precise timing of their activation. The Forkhead (Fkh) transcription factors are evolutionarily conserved to regulate organismal physiology and cell division. In addition to molecular biology and biochemical efforts, genome-wide studies have been conducted to characterize the genomic landscape potentially regulated by Forkheads in eukaryotes. Here, we discuss and interpret findings reported in six genome-wide Chromatin ImmunoPrecipitation (ChIP) studies, with a particular focus on ChIP-chip and ChIP-exo. We highlight their power and challenges to address Forkhead-mediated regulation of the cellular landscape in budding yeast. Expression changes of the targets identified in the binding assays are investigated by taking expression data for Forkhead deletion and overexpression into account. Forkheads are revealed as regulators of the metabolic network through which cell cycle dynamics may be temporally coordinated further, in addition to their well-known role as regulators of the gene cluster responsible for cell division.
Collapse
|
25
|
Sahoo A, Pechmann S. Functional network motifs defined through integration of protein-protein and genetic interactions. PeerJ 2022; 10:e13016. [PMID: 35223214 PMCID: PMC8877332 DOI: 10.7717/peerj.13016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 02/06/2022] [Indexed: 01/11/2023] Open
Abstract
Cells are enticingly complex systems. The identification of feedback regulation is critically important for understanding this complexity. Network motifs defined as small graphlets that occur more frequently than expected by chance have revolutionized our understanding of feedback circuits in cellular networks. However, with their definition solely based on statistical over-representation, network motifs often lack biological context, which limits their usefulness. Here, we define functional network motifs (FNMs) through the systematic integration of genetic interaction data that directly inform on functional relationships between genes and encoded proteins. Occurring two orders of magnitude less frequently than conventional network motifs, we found FNMs significantly enriched in genes known to be functionally related. Moreover, our comprehensive analyses of FNMs in yeast showed that they are powerful at capturing both known and putative novel regulatory interactions, thus suggesting a promising strategy towards the systematic identification of feedback regulation in biological networks. Many FNMs appeared as excellent candidates for the prioritization of follow-up biochemical characterization, which is a recurring bottleneck in the targeting of complex diseases. More generally, our work highlights a fruitful avenue for integrating and harnessing genomic network data.
Collapse
Affiliation(s)
- Amruta Sahoo
- Département de Biochimie, Université de Montréal, Montréal, QC, Canada
| | | |
Collapse
|
26
|
Jackson CA, Vogel C. New horizons in the stormy sea of multimodal single-cell data integration. Mol Cell 2022; 82:248-259. [PMID: 35063095 PMCID: PMC8830781 DOI: 10.1016/j.molcel.2021.12.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 12/08/2021] [Accepted: 12/13/2021] [Indexed: 01/22/2023]
Abstract
While measurements of RNA expression have dominated the world of single-cell analyses, new single-cell techniques increasingly allow collection of different data modalities, measuring different molecules, structural connections, and intermolecular interactions. Integrating the resulting multimodal single-cell datasets is a new bioinformatics challenge. Equally important, it is a new experimental design challenge for the bench scientist, who is not only choosing from a myriad of techniques for each data modality but also faces new challenges in experimental design. The ultimate goal is to design, execute, and analyze multimodal single-cell experiments that are more than just descriptive but enable the learning of new causal and mechanistic biology. This objective requires strict consideration of the goals behind the analysis, which might range from mapping the heterogeneity of a cellular population to assembling system-wide causal networks that can further our understanding of cellular functions and eventually lead to models of tissues and organs. We review steps and challenges toward this goal. Single-cell transcriptomics is now a mature technology, and methods to measure proteins, lipids, small-molecule metabolites, and other molecular phenotypes at the single-cell level are rapidly developing. Integrating these single-cell readouts so that each cell has measurements of multiple types of data, e.g., transcriptomes, proteomes, and metabolomes, is expected to allow identification of highly specific cellular subpopulations and to provide the basis for inferring causal biological mechanisms.
Collapse
Affiliation(s)
- Christopher A Jackson
- New York University, Department of Biology, Center for Genomics and Systems Biology, New York, NY, USA.
| | - Christine Vogel
- New York University, Department of Biology, Center for Genomics and Systems Biology, New York, NY, USA
| |
Collapse
|
27
|
Transcription Factor Action Orchestrates the Complex Expression Pattern of CRABS CLAW in Arabidopsis. Genes (Basel) 2021; 12:genes12111663. [PMID: 34828269 PMCID: PMC8653963 DOI: 10.3390/genes12111663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 10/14/2021] [Accepted: 10/15/2021] [Indexed: 01/08/2023] Open
Abstract
Angiosperm flowers are the most complex organs that plants generate, and in their center, the gynoecium forms, assuring sexual reproduction. Gynoecium development requires tight regulation of developmental regulators across time and tissues. How simple on and off regulation of gene expression is achieved in plants was described previously, but molecular mechanisms generating complex expression patterns remain unclear. We use the gynoecium developmental regulator CRABS CLAW (CRC) to study factors contributing to its sophisticated expression pattern. We combine in silico promoter analyses, global TF-DNA interaction screens, and mutant analyses. We find that miRNA action, DNA methylation, and chromatin remodeling do not contribute substantially to CRC regulation. However, 119 TFs, including SEP3, ETT, CAL, FUL, NGA2, and JAG bind to the CRC promoter in yeast. These TFs finetune transcript abundance as homodimers by transcriptional activation. Interestingly, temporal–spatial aspects of expression regulation may be under the control of redundantly acting genes and require higher order complex formation at TF binding sites. Our work shows that endogenous regulation of complex expression pattern requires orchestrated transcription factor action on several conserved promotor sites covering almost 4 kb in length. Our results highlight the utility of comprehensive regulators screens directly linking transcriptional regulators with their targets.
Collapse
|
28
|
Habif M, Corbat AA, Silberberg M, Grecco HE. CASPAM: A Triple-Modality Biosensor for Multiplexed Imaging of Caspase Network Activity. ACS Sens 2021; 6:2642-2653. [PMID: 34191492 DOI: 10.1021/acssensors.1c00554] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Understanding signal propagation across biological networks requires to simultaneously monitor the dynamics of several nodes to uncover correlations masked by inherent intercellular variability. To monitor the enzymatic activity of more than two components over short time scales has proven challenging. Exploiting the narrow spectral width of homo-FRET-based biosensors, up to three activities can be imaged through fluorescence polarization anisotropy microscopy. We introduce Caspase Activity Sensor by Polarization Anisotropy Multiplexing (CASPAM) a single-plasmid triple-modality reporter of key nodes of the apoptotic network. Apoptosis provides an ideal molecular framework to study interactions between its three composing pathways (intrinsic, extrinsic, and effector). We characterized the biosensor performance and demonstrated the advantages that equimolar expression has in both simplifying experimental procedure and reducing observable variation, thus enabling robust data-driven modeling. Tools like CASPAM become essential to analyze molecular pathways where multiple nodes need to be simultaneously monitored.
Collapse
Affiliation(s)
- Martín Habif
- Department of Physics, FCEN, University of Buenos Aires and IFIBA, CONICET, Buenos Aires C1428EHA, Argentina
| | - Agustín A. Corbat
- Department of Physics, FCEN, University of Buenos Aires and IFIBA, CONICET, Buenos Aires C1428EHA, Argentina
| | - Mauro Silberberg
- Department of Physics, FCEN, University of Buenos Aires and IFIBA, CONICET, Buenos Aires C1428EHA, Argentina
| | - Hernán E. Grecco
- Department of Physics, FCEN, University of Buenos Aires and IFIBA, CONICET, Buenos Aires C1428EHA, Argentina
- Department of Systemic Cell Biology, Max Planck Institute of Molecular Physiology, Dortmund 44227, Germany
| |
Collapse
|
29
|
Multiscale models quantifying yeast physiology: towards a whole-cell model. Trends Biotechnol 2021; 40:291-305. [PMID: 34303549 DOI: 10.1016/j.tibtech.2021.06.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 06/26/2021] [Accepted: 06/28/2021] [Indexed: 12/21/2022]
Abstract
The yeast Saccharomyces cerevisiae is widely used as a cell factory and as an important eukaryal model organism for studying cellular physiology related to human health and disease. Yeast was also the first eukaryal organism for which a genome-scale metabolic model (GEM) was developed. In recent years there has been interest in expanding the modeling framework for yeast by incorporating enzymatic parameters and other heterogeneous cellular networks to obtain a more comprehensive description of cellular physiology. We review the latest developments in multiscale models of yeast, and illustrate how a new generation of multiscale models could significantly enhance the predictive performance and expand the applications of classical GEMs in cell factory design and basic studies of yeast physiology.
Collapse
|
30
|
Lee JY, Nguyen B, Orosco C, Styczynski MP. SCOUR: a stepwise machine learning framework for predicting metabolite-dependent regulatory interactions. BMC Bioinformatics 2021; 22:365. [PMID: 34238207 PMCID: PMC8268592 DOI: 10.1186/s12859-021-04281-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 06/30/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND The topology of metabolic networks is both well-studied and remarkably well-conserved across many species. The regulation of these networks, however, is much more poorly characterized, though it is known to be divergent across organisms-two characteristics that make it difficult to model metabolic networks accurately. While many computational methods have been built to unravel transcriptional regulation, there have been few approaches developed for systems-scale analysis and study of metabolic regulation. Here, we present a stepwise machine learning framework that applies established algorithms to identify regulatory interactions in metabolic systems based on metabolic data: stepwise classification of unknown regulation, or SCOUR. RESULTS We evaluated our framework on both noiseless and noisy data, using several models of varying sizes and topologies to show that our approach is generalizable. We found that, when testing on data under the most realistic conditions (low sampling frequency and high noise), SCOUR could identify reaction fluxes controlled only by the concentration of a single metabolite (its primary substrate) with high accuracy. The positive predictive value (PPV) for identifying reactions controlled by the concentration of two metabolites ranged from 32 to 88% for noiseless data, 9.2 to 49% for either low sampling frequency/low noise or high sampling frequency/high noise data, and 6.6-27% for low sampling frequency/high noise data, with results typically sufficiently high for lab validation to be a practical endeavor. While the PPVs for reactions controlled by three metabolites were lower, they were still in most cases significantly better than random classification. CONCLUSIONS SCOUR uses a novel approach to synthetically generate the training data needed to identify regulators of reaction fluxes in a given metabolic system, enabling metabolomics and fluxomics data to be leveraged for regulatory structure inference. By identifying and triaging the most likely candidate regulatory interactions, SCOUR can drastically reduce the amount of time needed to identify and experimentally validate metabolic regulatory interactions. As high-throughput experimental methods for testing these interactions are further developed, SCOUR will provide critical impact in the development of predictive metabolic models in new organisms and pathways.
Collapse
Affiliation(s)
- Justin Y Lee
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Britney Nguyen
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Carlos Orosco
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Mark P Styczynski
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
31
|
Ma CZ, Brent MR. Inferring TF activities and activity regulators from gene expression data with constraints from TF perturbation data. Bioinformatics 2021; 37:1234-1245. [PMID: 33135076 PMCID: PMC8189679 DOI: 10.1093/bioinformatics/btaa947] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 09/26/2020] [Accepted: 10/27/2020] [Indexed: 12/20/2022] Open
Abstract
Motivation The activity of a transcription factor (TF) in a sample of cells is the extent to which it is exerting its regulatory potential. Many methods of inferring TF activity from gene expression data have been described, but due to the lack of appropriate large-scale datasets, systematic and objective validation has not been possible until now. Results We systematically evaluate and optimize the approach to TF activity inference in which a gene expression matrix is factored into a condition-independent matrix of control strengths and a condition-dependent matrix of TF activity levels. We find that expression data in which the activities of individual TFs have been perturbed are both necessary and sufficient for obtaining good performance. To a considerable extent, control strengths inferred using expression data from one growth condition carry over to other conditions, so the control strength matrices derived here can be used by others. Finally, we apply these methods to gain insight into the upstream factors that regulate the activities of yeast TFs Gcr2, Gln3, Gcn4 and Msn2. Availability and implementation Evaluation code and data are available at https://doi.org/10.5281/zenodo.4050573. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cynthia Z Ma
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
32
|
Arita Y, Kim G, Li Z, Friesen H, Turco G, Wang RY, Climie D, Usaj M, Hotz M, Stoops EH, Baryshnikova A, Boone C, Botstein D, Andrews BJ, McIsaac RS. A genome-scale yeast library with inducible expression of individual genes. Mol Syst Biol 2021; 17:e10207. [PMID: 34096681 PMCID: PMC8182650 DOI: 10.15252/msb.202110207] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 04/27/2021] [Accepted: 04/30/2021] [Indexed: 11/09/2022] Open
Abstract
The ability to switch a gene from off to on and monitor dynamic changes provides a powerful approach for probing gene function and elucidating causal regulatory relationships. Here, we developed and characterized YETI (Yeast Estradiol strains with Titratable Induction), a collection in which > 5,600 yeast genes are engineered for transcriptional inducibility with single-gene precision at their native loci and without plasmids. Each strain contains SGA screening markers and a unique barcode, enabling high-throughput genetics. We characterized YETI using growth phenotyping and BAR-seq screens, and we used a YETI allele to identify the regulon of Rof1, showing that it acts to repress transcription. We observed that strains with inducible essential genes that have low native expression can often grow without inducer. Analysis of data from eukaryotic and prokaryotic systems shows that native expression is a variable that can bias promoter-perturbing screens, including CRISPRi. We engineered a second expression system, Z3 EB42, that gives lower expression than Z3 EV, a feature enabling conditional activation and repression of lowly expressed essential genes that grow without inducer in the YETI library.
Collapse
Affiliation(s)
- Yuko Arita
- Terrence Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONCanada
- RIKEN Centre for Sustainable Resource ScienceWakoSaitamaJapan
| | - Griffin Kim
- Calico Life Sciences LLCSouth San FranciscoCAUSA
| | - Zhijian Li
- Terrence Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Helena Friesen
- Terrence Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Gina Turco
- Calico Life Sciences LLCSouth San FranciscoCAUSA
| | | | - Dale Climie
- Terrence Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Matej Usaj
- Terrence Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Manuel Hotz
- Calico Life Sciences LLCSouth San FranciscoCAUSA
| | | | | | - Charles Boone
- Terrence Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONCanada
- RIKEN Centre for Sustainable Resource ScienceWakoSaitamaJapan
- Department of Molecular GeneticsUniversity of TorontoTorontoONCanada
| | | | - Brenda J Andrews
- Terrence Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONCanada
- Department of Molecular GeneticsUniversity of TorontoTorontoONCanada
| | | |
Collapse
|
33
|
Sanborn AL, Yeh BT, Feigerle JT, Hao CV, Townshend RJ, Lieberman Aiden E, Dror RO, Kornberg RD. Simple biochemical features underlie transcriptional activation domain diversity and dynamic, fuzzy binding to Mediator. eLife 2021; 10:68068. [PMID: 33904398 PMCID: PMC8137143 DOI: 10.7554/elife.68068] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 04/25/2021] [Indexed: 01/07/2023] Open
Abstract
Gene activator proteins comprise distinct DNA-binding and transcriptional activation domains (ADs). Because few ADs have been described, we tested domains tiling all yeast transcription factors for activation in vivo and identified 150 ADs. By mRNA display, we showed that 73% of ADs bound the Med15 subunit of Mediator, and that binding strength was correlated with activation. AD-Mediator interaction in vitro was unaffected by a large excess of free activator protein, pointing to a dynamic mechanism of interaction. Structural modeling showed that ADs interact with Med15 without shape complementarity (‘fuzzy’ binding). ADs shared no sequence motifs, but mutagenesis revealed biochemical and structural constraints. Finally, a neural network trained on AD sequences accurately predicted ADs in human proteins and in other yeast proteins, including chromosomal proteins and chromatin remodeling complexes. These findings solve the longstanding enigma of AD structure and function and provide a rationale for their role in biology. Cells adapt and respond to changes by regulating the activity of their genes. To turn genes on or off, they use a family of proteins called transcription factors. Transcription factors influence specific but overlapping groups of genes, so that each gene is controlled by several transcription factors that act together like a dimmer switch to regulate gene activity. The presence of transcription factors attracts proteins such as the Mediator complex, which activates genes by gathering the protein machines that read the genes. The more transcription factors are found near a specific gene, the more strongly they attract Mediator and the more active the gene is. A specific region on the transcription factor called the activation domain is necessary for this process. The biochemical sequences of these domains vary greatly between species, yet activation domains from, for example, yeast and human proteins are often interchangeable. To understand why this is the case, Sanborn et al. analyzed the genome of baker’s yeast and identified 150 activation domains, each very different in sequence. Three-quarters of them bound to a subunit of the Mediator complex called Med15. Sanborn et al. then developed a machine learning algorithm to predict activation domains in both yeast and humans. This algorithm also showed that negatively charged and greasy regions on the activation domains were essential to be activated by the Mediator complex. Further analyses revealed that activation domains used different poses to bind multiple sites on Med15, a behavior known as ‘fuzzy’ binding. This creates a high overall affinity even though the binding strength at each individual site is low, enabling the protein complexes to remain dynamic. These weak interactions together permit fine control over the activity of several genes, allowing cells to respond quickly and precisely to many changes. The computer algorithm used here provides a new way to identify activation domains across species and could improve our understanding of how living things grow, adapt and evolve. It could also give new insights into mechanisms of disease, particularly cancer, where transcription factors are often faulty.
Collapse
Affiliation(s)
- Adrian L Sanborn
- Department of Structural Biology, Stanford University School of Medicine, Stanford, United States.,Department of Computer Science, Stanford University, Stanford, United States
| | - Benjamin T Yeh
- Department of Computer Science, Stanford University, Stanford, United States
| | - Jordan T Feigerle
- Department of Structural Biology, Stanford University School of Medicine, Stanford, United States
| | - Cynthia V Hao
- Department of Structural Biology, Stanford University School of Medicine, Stanford, United States
| | | | - Erez Lieberman Aiden
- The Center for Genome Architecture, Baylor College of Medicine, Houston, United States.,Center for Theoretical Biological Physics, Rice University, Houston, United States
| | - Ron O Dror
- Department of Computer Science, Stanford University, Stanford, United States
| | - Roger D Kornberg
- Department of Structural Biology, Stanford University School of Medicine, Stanford, United States
| |
Collapse
|
34
|
Hackett SR, Baltz EA, Coram M, Wranik BJ, Kim G, Baker A, Fan M, Hendrickson DG, Berndl M, McIsaac RS. Learning causal networks using inducible transcription factors and transcriptome-wide time series. Mol Syst Biol 2021; 16:e9174. [PMID: 32181581 PMCID: PMC7076914 DOI: 10.15252/msb.20199174] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 02/13/2020] [Accepted: 02/19/2020] [Indexed: 11/27/2022] Open
Abstract
We present IDEA (the Induction Dynamics gene Expression Atlas), a dataset constructed by independently inducing hundreds of transcription factors (TFs) and measuring timecourses of the resulting gene expression responses in budding yeast. Each experiment captures a regulatory cascade connecting a single induced regulator to the genes it causally regulates. We discuss the regulatory cascade of a single TF, Aft1, in detail; however, IDEA contains > 200 TF induction experiments with 20 million individual observations and 100,000 signal‐containing dynamic responses. As an application of IDEA, we integrate all timecourses into a whole‐cell transcriptional model, which is used to predict and validate multiple new and underappreciated transcriptional regulators. We also find that the magnitudes of coefficients in this model are predictive of genetic interaction profile similarities. In addition to being a resource for exploring regulatory connectivity between TFs and their target genes, our modeling approach shows that combining rapid perturbations of individual genes with genome‐scale time‐series measurements is an effective strategy for elucidating gene regulatory networks.
Collapse
Affiliation(s)
| | | | | | | | - Griffin Kim
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Adam Baker
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | | | | | | |
Collapse
|
35
|
Srinivasan R, Walvekar AS, Rashida Z, Seshasayee A, Laxman S. Genome-scale reconstruction of Gcn4/ATF4 networks driving a growth program. PLoS Genet 2020; 16:e1009252. [PMID: 33378328 PMCID: PMC7773203 DOI: 10.1371/journal.pgen.1009252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 11/04/2020] [Indexed: 12/13/2022] Open
Abstract
Growth and starvation are considered opposite ends of a spectrum. To sustain growth, cells use coordinated gene expression programs and manage biomolecule supply in order to match the demands of metabolism and translation. Global growth programs complement increased ribosomal biogenesis with sufficient carbon metabolism, amino acid and nucleotide biosynthesis. How these resources are collectively managed is a fundamental question. The role of the Gcn4/ATF4 transcription factor has been best studied in contexts where cells encounter amino acid starvation. However, high Gcn4 activity has been observed in contexts of rapid cell proliferation, and the roles of Gcn4 in such growth contexts are unclear. Here, using a methionine-induced growth program in yeast, we show that Gcn4/ATF4 is the fulcrum that maintains metabolic supply in order to sustain translation outputs. By integrating matched transcriptome and ChIP-Seq analysis, we decipher genome-wide direct and indirect roles for Gcn4 in this growth program. Genes that enable metabolic precursor biosynthesis indispensably require Gcn4; contrastingly ribosomal genes are partly repressed by Gcn4. Gcn4 directly binds promoter-regions and transcribes a subset of metabolic genes, particularly driving lysine and arginine biosynthesis. Gcn4 also globally represses lysine and arginine enriched transcripts, which include genes encoding the translation machinery. The Gcn4 dependent lysine and arginine supply thereby maintains the synthesis of the translation machinery. This is required to maintain translation capacity. Gcn4 consequently enables metabolic-precursor supply to bolster protein synthesis, and drive a growth program. Thus, we illustrate how growth and starvation outcomes are both controlled using the same Gcn4 transcriptional outputs that function in distinct contexts.
Collapse
Affiliation(s)
- Rajalakshmi Srinivasan
- Institute for Stem Cell Science and Regenerative Medicine (inStem), GKVK post, Bangalore, India
| | - Adhish S. Walvekar
- Institute for Stem Cell Science and Regenerative Medicine (inStem), GKVK post, Bangalore, India
| | - Zeenat Rashida
- Institute for Stem Cell Science and Regenerative Medicine (inStem), GKVK post, Bangalore, India
| | - Aswin Seshasayee
- National Centre for Biological Sciences–TIFR, GKVK post, Bellary Road, Bangalore, India
| | - Sunil Laxman
- Institute for Stem Cell Science and Regenerative Medicine (inStem), GKVK post, Bangalore, India
| |
Collapse
|
36
|
Gómez-Schiavon M, Dods G, El-Samad H, Ng AH. Multidimensional Characterization of Parts Enhances Modeling Accuracy in Genetic Circuits. ACS Synth Biol 2020; 9:2917-2926. [PMID: 33166452 DOI: 10.1021/acssynbio.0c00288] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Mathematical models can aid the design of genetic circuits, but may yield inaccurate results if individual parts are not modeled at the appropriate resolution. To illustrate the importance of this concept, we study transcriptional cascades consisting of two inducible synthetic transcription factors connected in series. Despite the simplicity of this design, we find that accurate prediction of circuit behavior requires mapping the dose responses of each circuit component along the dimensions of both its expression level and its inducer concentration. Using this multidimensional characterization, we were able to computationally explore the behavior of 16 different circuit designs. We experimentally verified a subset of these predictions and found substantial agreement. This method of biological part characterization enables the use of models to identify (un)desired circuit behaviors prior to experimental implementation, thus shortening the design-build-test cycle for more complex circuits.
Collapse
Affiliation(s)
- Mariana Gómez-Schiavon
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, California 94158, United States
| | - Galen Dods
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, California 94158, United States
| | - Hana El-Samad
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, California 94158, United States
- Chan−Zuckerberg Biohub, San Francisco, California 94158, United States
- Cell Design Institute, University of California, San Francisco, San Francisco, California 94158, United States
| | - Andrew H. Ng
- Cell Design Institute, University of California, San Francisco, San Francisco, California 94158, United States
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, California 94158, United States
| |
Collapse
|
37
|
Brodsky S, Jana T, Mittelman K, Chapal M, Kumar DK, Carmi M, Barkai N. Intrinsically Disordered Regions Direct Transcription Factor In Vivo Binding Specificity. Mol Cell 2020; 79:459-471.e4. [DOI: 10.1016/j.molcel.2020.05.032] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 03/10/2020] [Accepted: 05/21/2020] [Indexed: 11/25/2022]
|
38
|
Jackson C, Gresham D. A Bright IDEA. Mol Syst Biol 2020; 16:e9502. [PMID: 32253808 PMCID: PMC7136649 DOI: 10.15252/msb.20209502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Transcription factors (TFs) control the rate of mRNA production. Technological advances have made the task of measuring mRNA levels for all genes straightforward, but identifying causal relationships between TFs and their target genes remains an unsolved problem in biology. In their recent study, McIsaac and colleagues (Hackett et al, 2020) apply a method for inducing the overexpression of a TF and studying the dynamics with which all transcripts respond. Using time series analysis, they are able to resolve direct effects of TFs from secondary effects. This new experimental and analytical approach provides an efficient means of defining gene regulatory relationships for all TFs.
Collapse
Affiliation(s)
- Christopher Jackson
- Center for Genomics and Systems BiologyDepartment of BiologyNew York UniversityNew YorkNYUSA
| | - David Gresham
- Center for Genomics and Systems BiologyDepartment of BiologyNew York UniversityNew YorkNYUSA
| |
Collapse
|
39
|
Kang Y, Patel NR, Shively C, Recio PS, Chen X, Wranik BJ, Kim G, McIsaac RS, Mitra R, Brent MR. Dual threshold optimization and network inference reveal convergent evidence from TF binding locations and TF perturbation responses. Genome Res 2020; 30:459-471. [PMID: 32060051 PMCID: PMC7111528 DOI: 10.1101/gr.259655.119] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 02/11/2020] [Indexed: 12/22/2022]
Abstract
A high-confidence map of the direct, functional targets of each transcription factor (TF) requires convergent evidence from independent sources. Two significant sources of evidence are TF binding locations and the transcriptional responses to direct TF perturbations. Systematic data sets of both types exist for yeast and human, but they rarely converge on a common set of direct, functional targets for a TF. Even the few genes that are both bound and responsive may not be direct functional targets. Our analysis shows that when there are many nonfunctional binding sites and many indirect targets, nonfunctional sites are expected to occur in the cis-regulatory DNA of indirect targets by chance. To address this problem, we introduce dual threshold optimization (DTO), a new method for setting significance thresholds on binding and perturbation-response data, and show that it improves convergence. It also enables comparison of binding data to perturbation-response data that have been processed by network inference algorithms, which further improves convergence. The combination of dual threshold optimization and network inference greatly expands the high-confidence TF network map in both yeast and human. Next, we analyze a comprehensive new data set measuring the transcriptional response shortly after inducing overexpression of a yeast TF. We also present a new yeast binding location data set obtained by transposon calling cards and compare it to recent ChIP-exo data. These new data sets improve convergence and expand the high-confidence network synergistically.
Collapse
Affiliation(s)
- Yiming Kang
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| | - Nikhil R Patel
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| | - Christian Shively
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Pamela Samantha Recio
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Xuhua Chen
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Bernd J Wranik
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - Griffin Kim
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - R Scott McIsaac
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - Robi Mitra
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|