1
|
Pan RW, Röschinger T, Faizi K, Garcia H, Phillips R. Deciphering regulatory architectures from synthetic single-cell expression patterns. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.28.577658. [PMID: 38352569 PMCID: PMC10862715 DOI: 10.1101/2024.01.28.577658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
For the vast majority of genes in sequenced genomes, there is limited understanding of how they are regulated. Without such knowledge, it is not possible to perform a quantitative theory-experiment dialogue on how such genes give rise to physiological and evolutionary adaptation. One category of high-throughput experiments used to understand the sequence-phenotype relationship of the transcriptome is massively parallel reporter assays (MPRAs). However, to improve the versatility and scalability of MPRA pipelines, we need a "theory of the experiment" to help us better understand the impact of various biological and experimental parameters on the interpretation of experimental data. These parameters include binding site copy number, where a large number of specific binding sites may titrate away transcription factors, as well as the presence of overlapping binding sites, which may affect analysis of the degree of mutual dependence between mutations in the regulatory region and expression levels. To that end, in this paper we create tens of thousands of synthetic single-cell gene expression outputs using both equilibrium and out-of-equilibrium models. These models make it possible to imitate the summary statistics (information footprints and expression shift matrices) used to characterize the output of MPRAs and from this summary statistic to infer the underlying regulatory architecture. Specifically, we use a more refined implementation of the so-called thermodynamic models in which the binding energies of each sequence variant are derived from energy matrices. Our simulations reveal important effects of the parameters on MPRA data and we demonstrate our ability to optimize MPRA experimental designs with the goal of generating thermodynamic models of the transcriptome with base-pair specificity. Further, this approach makes it possible to carefully examine the mapping between mutations in binding sites and their corresponding expression profiles, a tool useful not only for better designing MPRAs, but also for exploring regulatory evolution.
Collapse
Affiliation(s)
- Rosalind Wenshan Pan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA
| | - Tom Röschinger
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA
| | - Kian Faizi
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA
| | - Hernan Garcia
- Biophysics Graduate Group, University of California, Berkeley, CA
- Department of Physics, University of California, Berkeley, CA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA
- Institute for Quantitative Biosciences-QB3, University of California, Berkeley, CA
| | - Rob Phillips
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA
- Department of Physics, California Institute of Technology, Pasadena, CA
| |
Collapse
|
2
|
Lally P, Gómez-Romero L, Tierrafría VH, Aquino P, Rioualen C, Zhang X, Kim S, Baniulyte G, Plitnick J, Smith C, Babu M, Collado-Vides J, Wade JT, Galagan JE. Predictive Biophysical Neural Network Modeling of a Compendium of in vivo Transcription Factor DNA Binding Profiles for Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.23.594371. [PMID: 38826350 PMCID: PMC11142182 DOI: 10.1101/2024.05.23.594371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We used these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We used BoltzNet to quantitatively design novel binding sites, which we validated with biophysical experiments on purified protein. We have generated models for 125 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.
Collapse
Affiliation(s)
- Patrick Lally
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Laura Gómez-Romero
- Instituto Nacional de Medicina Genómica, Periférico Sur 4809, Arenal Tepepan, Ciudad de México 14610, México
- Escuela de Medicina y Ciencias de la Salud, Tecnológico de Monterrey, Ciudad de México, México
| | - Víctor H. Tierrafría
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
| | - Patricia Aquino
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Claire Rioualen
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
| | - Xiaoman Zhang
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Sunyoung Kim
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK S4S 0A2, Canada
| | | | - Jonathan Plitnick
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Carol Smith
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK S4S 0A2, Canada
| | - Julio Collado-Vides
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Joseph T. Wade
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
- Department of Biomedical Sciences, University at Albany, SUNY, Albany, NY, USA
| | - James E. Galagan
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Bioinformatics Program, Boston University, 24 Cummington Mall, Boston, MA 02215
| |
Collapse
|
3
|
Miyakoshi M. Multilayered regulation of amino acid metabolism in Escherichia coli. Curr Opin Microbiol 2024; 77:102406. [PMID: 38061078 DOI: 10.1016/j.mib.2023.102406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 11/08/2023] [Accepted: 11/09/2023] [Indexed: 02/12/2024]
Abstract
Amino acid metabolism in Escherichia coli has long been studied and has established the basis for regulatory mechanisms at the transcriptional, posttranscriptional, and posttranslational levels. In addition to the classical signal transduction cascade involving posttranslational modifications (PTMs), novel PTMs in the two primary nitrogen assimilation pathways have recently been uncovered. The regulon of the master transcriptional regulator NtrC is further expanded by a small RNA derived from the 3´UTR of glutamine synthetase mRNA, which coordinates central carbon and nitrogen metabolism. Furthermore, recent advances in sequencing technologies have revealed the global regulatory networks of transcriptional and posttranscriptional regulators, Lrp and GcvB. This review provides an update of the multilayered and interconnected regulatory networks governing amino acid metabolism in E. coli.
Collapse
Affiliation(s)
- Masatoshi Miyakoshi
- Department of Infection Biology, Institute of Medicine, University of Tsukuba, 305-8575 Ibaraki, Japan.
| |
Collapse
|