1
|
Padmakumar JP, Sun JJ, Cho W, Zhou Y, Krenz C, Han WZ, Densmore D, Sontag ED, Voigt CA. Partitioning of a 2-bit hash function across 66 communicating cells. Nat Chem Biol 2024:10.1038/s41589-024-01730-1. [PMID: 39317847 DOI: 10.1038/s41589-024-01730-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 08/14/2024] [Indexed: 09/26/2024]
Abstract
Powerful distributed computing can be achieved by communicating cells that individually perform simple operations. Here, we report design software to divide a large genetic circuit across cells as well as the genetic parts to implement the subcircuits in their genomes. These tools were demonstrated using a 2-bit version of the MD5 hashing algorithm, which is an early predecessor to the cryptographic functions underlying cryptocurrency. One iteration requires 110 logic gates, which were partitioned across 66 Escherichia coli strains, requiring the introduction of a total of 1.1 Mb of recombinant DNA into their genomes. The strains were individually experimentally verified to integrate their assigned input signals, process this information correctly and propagate the result to the cell in the next layer. This work demonstrates the potential to obtain programable control of multicellular biological processes.
Collapse
Affiliation(s)
- Jai P Padmakumar
- MIT Microbiology Program, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jessica J Sun
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - William Cho
- Department of Bioengineering, Northeastern University, Boston, MA, USA
| | - Yangruirui Zhou
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA
| | - Christopher Krenz
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA
| | - Woo Zhong Han
- Department of Computer Science, Boston University, Boston, MA, USA
| | - Douglas Densmore
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| | - Eduardo D Sontag
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA
| | - Christopher A Voigt
- MIT Microbiology Program, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
2
|
Ghose DA, Przydzial KE, Mahoney EM, Keating AE, Laub MT. Marginal specificity in protein interactions constrains evolution of a paralogous family. Proc Natl Acad Sci U S A 2023; 120:e2221163120. [PMID: 37098061 PMCID: PMC10160972 DOI: 10.1073/pnas.2221163120] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 03/24/2023] [Indexed: 04/26/2023] Open
Abstract
The evolution of novel functions in biology relies heavily on gene duplication and divergence, creating large paralogous protein families. Selective pressure to avoid detrimental cross-talk often results in paralogs that exhibit exquisite specificity for their interaction partners. But how robust or sensitive is this specificity to mutation? Here, using deep mutational scanning, we demonstrate that a paralogous family of bacterial signaling proteins exhibits marginal specificity, such that many individual substitutions give rise to substantial cross-talk between normally insulated pathways. Our results indicate that sequence space is locally crowded despite overall sparseness, and we provide evidence that this crowding has constrained the evolution of bacterial signaling proteins. These findings underscore how evolution selects for "good enough" rather than optimized phenotypes, leading to restrictions on the subsequent evolution of paralogs.
Collapse
Affiliation(s)
- Dia A. Ghose
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Kaitlyn E. Przydzial
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Emily M. Mahoney
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Amy E. Keating
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- Koch Center for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Michael T. Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA02139
- HHMI, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
3
|
Su CJ, Murugan A, Linton JM, Yeluri A, Bois J, Klumpe H, Langley MA, Antebi YE, Elowitz MB. Ligand-receptor promiscuity enables cellular addressing. Cell Syst 2022; 13:408-425.e12. [PMID: 35421362 PMCID: PMC10897978 DOI: 10.1016/j.cels.2022.03.001] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 11/08/2021] [Accepted: 03/16/2022] [Indexed: 12/24/2022]
Abstract
In multicellular organisms, secreted ligands selectively activate, or "address," specific target cell populations to control cell fate decision-making and other processes. Key cell-cell communication pathways use multiple promiscuously interacting ligands and receptors, provoking the question of how addressing specificity can emerge from molecular promiscuity. To investigate this issue, we developed a general mathematical modeling framework based on the bone morphogenetic protein (BMP) pathway architecture. We find that promiscuously interacting ligand-receptor systems allow a small number of ligands, acting in combinations, to address a larger number of individual cell types, defined by their receptor expression profiles. Promiscuous systems outperform seemingly more specific one-to-one signaling architectures in addressing capability. Combinatorial addressing extends to groups of cell types, is robust to receptor expression noise, grows more powerful with increases in the number of receptor variants, and is maximized by specific biochemical parameter relationships. Together, these results identify design principles governing cellular addressing by ligand combinations.
Collapse
Affiliation(s)
- Christina J Su
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Arvind Murugan
- Department of Physics, University of Chicago, Chicago, IL 60637, USA
| | - James M Linton
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Akshay Yeluri
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Justin Bois
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Heidi Klumpe
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Matthew A Langley
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Yaron E Antebi
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel.
| | - Michael B Elowitz
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Department of Applied Physics, California Institute of Technology, Pasadena, CA 91125, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA.
| |
Collapse
|
4
|
Transcription factor specificity limits the number of DNA-binding motifs. PLoS One 2022; 17:e0263307. [PMID: 35089985 PMCID: PMC8797260 DOI: 10.1371/journal.pone.0263307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 01/15/2022] [Indexed: 11/19/2022] Open
Abstract
We study the limits imposed by transcription factor specificity on the maximum number of binding motifs that can coexist in a gene regulatory network, using the SwissRegulon Fantom5 collection of 684 human transcription factor binding sites as a model. We describe transcription factor specificity using regular expressions and find that most human transcription factor binding site motifs are separated in sequence space by one to three motif-discriminating positions. We apply theorems based on the pigeonhole principle to calculate the maximum number of transcription factors that can coexist given this degree of specificity, which is in the order of ten thousand and would fully utilize the space of DNA subsequences. Taking into account an expanded DNA alphabet with modified bases can further raise this limit by several orders of magnitude, at a lower level of sequence space usage. Our results may guide the design of transcription factors at both the molecular and system scale.
Collapse
|
5
|
Abstract
Duplication and divergence is a major mechanism by which new proteins and functions emerge in biology. Consequently, most organisms, in all domains of life, have genomes that encode large paralogous families of proteins. For recently duplicated pathways to acquire different, independent functions, the two paralogs must acquire mutations that effectively insulate them from one another. For instance, paralogous signaling proteins must acquire mutations that endow them with different interaction specificities such that they can participate in different signaling pathways without disruptive cross-talk. Although duplicated genes undoubtedly shape each other's evolution as they diverge and attain new functions, it is less clear how other paralogs impact or constrain gene duplication. Does the establishment of a new pathway by duplication and divergence require the system-wide optimization of all paralogs? The answer has profound implications for molecular evolution and our ability to engineer biological systems. Here, we discuss models, experiments, and approaches for tackling this question, and for understanding how new proteins and pathways are born.
Collapse
Affiliation(s)
- Conor J McClune
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Chemical Engineering and ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
6
|
Al-Radhawi MA, Tran AP, Ernst EA, Chen T, Voigt CA, Sontag ED. Distributed Implementation of Boolean Functions by Transcriptional Synthetic Circuits. ACS Synth Biol 2020; 9:2172-2187. [PMID: 32589837 DOI: 10.1021/acssynbio.0c00228] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Starting in the early 2000s, sophisticated technologies have been developed for the rational construction of synthetic genetic networks that implement specified logical functionalities. Despite impressive progress, however, the scaling necessary in order to achieve greater computational power has been hampered by many constraints, including repressor toxicity and the lack of large sets of mutually orthogonal repressors. As a consequence, a typical circuit contains no more than roughly seven repressor-based gates per cell. A possible way around this scalability problem is to distribute the computation among multiple cell types, each of which implements a small subcircuit, which communicate among themselves using diffusible small molecules (DSMs). Examples of DSMs are those employed by quorum sensing systems in bacteria. This paper focuses on systematic ways to implement this distributed approach, in the context of the evaluation of arbitrary Boolean functions. The unique characteristics of genetic circuits and the properties of DSMs require the development of new Boolean synthesis methods, distinct from those classically used in electronic circuit design. In this work, we propose a fast algorithm to synthesize distributed realizations for any Boolean function, under constraints on the number of gates per cell and the number of orthogonal DSMs. The method is based on an exact synthesis algorithm to find the minimal circuit per cell, which in turn allows us to build an extensive database of Boolean functions up to a given number of inputs. For concreteness, we will specifically focus on circuits of up to 4 inputs, which might represent, for example, two chemical inducers and two light inputs at different frequencies. Our method shows that, with a constraint of no more than seven gates per cell, the use of a single DSM increases the total number of realizable circuits by at least 7.58-fold compared to centralized computation. Moreover, when allowing two DSM's, one can realize 99.995% of all possible 4-input Boolean functions, still with at most 7 gates per cell. The methodology introduced here can be readily adapted to complement recent genetic circuit design automation software. A toolbox that uses the proposed algorithm was created and made available at https://github.com/sontaglab/DBC/.
Collapse
Affiliation(s)
- M. Ali Al-Radhawi
- Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts 02115, United States
| | - Anh Phong Tran
- Department of Chemical Engineering, Northeastern University, Boston, Massachusetts 02115, United States
| | - Elizabeth A. Ernst
- Department of Mathematics, Statistics, and Computer Science, Macalester College, Saint Paul, Minnesota 55105, United States
| | - Tianchi Chen
- Department of Bioengineering, Northeastern University, Boston, Massachusetts 02115, United States
| | - Christopher A. Voigt
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Eduardo D. Sontag
- Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts 02115, United States
- Department of Bioengineering, Northeastern University, Boston, Massachusetts 02115, United States
- Laboratory of Systems Pharmacology, Program in Therapeutic Science, Harvard Medical School, Boston, Massachusetts 02115, United States
| |
Collapse
|
7
|
The relation between crosstalk and gene regulation form revisited. PLoS Comput Biol 2020; 16:e1007642. [PMID: 32097416 PMCID: PMC7059967 DOI: 10.1371/journal.pcbi.1007642] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 03/06/2020] [Accepted: 01/08/2020] [Indexed: 01/11/2023] Open
Abstract
Genes differ in the frequency at which they are expressed and in the form of regulation used to control their activity. In particular, positive or negative regulation can lead to activation of a gene in response to an external signal. Previous works proposed that the form of regulation of a gene correlates with its frequency of usage: positive regulation when the gene is frequently expressed and negative regulation when infrequently expressed. Such network design means that, in the absence of their regulators, the genes are found in their least required activity state, hence regulatory intervention is often necessary. Due to the multitude of genes and regulators, spurious binding and unbinding events, called “crosstalk”, could occur. To determine how the form of regulation affects the global crosstalk in the network, we used a mathematical model that includes multiple regulators and multiple target genes. We found that crosstalk depends non-monotonically on the availability of regulators. Our analysis showed that excess use of regulation entailed by the formerly suggested network design caused high crosstalk levels in a large part of the parameter space. We therefore considered the opposite ‘idle’ design, where the default unregulated state of genes is their frequently required activity state. We found, that ‘idle’ design minimized the use of regulation and thus minimized crosstalk. In addition, we estimated global crosstalk of S. cerevisiae using transcription factors binding data. We demonstrated that even partial network data could suffice to estimate its global crosstalk, suggesting its applicability to additional organisms. We found that S. cerevisiae estimated crosstalk is lower than that of a random network, suggesting that natural selection reduces crosstalk. In summary, our study highlights a new type of protein production cost which is typically overlooked: that of regulatory interference caused by the presence of excess regulators in the cell. It demonstrates the importance of whole-network descriptions, which could show effects missed by single-gene models. Genes differ in the frequency at which they are expressed and in the form of regulation used to control their activity. The basic level of regulation is mediated by different types of DNA-binding proteins, where each type regulates particular gene(s). We distinguish between two basic forms of regulation: positive—if a gene is activated by the binding of its regulatory protein, and negative—if it is active unless bound by its regulatory protein. Due to the multitude of genes and regulators, spurious binding and unbinding events, called “crosstalk”, could occur. How does the form of regulation, positive or negative, affect the extent of regulatory crosstalk? To address this question, we used a mathematical model integrating many genes and many regulators. As intuition suggests, we found that in most of the parameter space, crosstalk increased with the availability of regulators. We propose, that crosstalk is usually reduced when networks are designed such that minimal regulation is needed, which we call the ‘idle’ design. In other words: a frequently needed gene will use negative regulation and conversely, a scarcely needed gene will employ positive regulation. In both cases, the requirement for the regulators is minimized. In addition, we demonstrate how crosstalk can be calculated from available datasets and discuss the technical challenges in such calculation, specifically data incompleteness.
Collapse
|
8
|
Cheng X, Li M, Abdullah M, Li G, Zhang J, Manzoor MA, Wang H, Jin Q, Jiang T, Cai Y, Li D, Lin Y. In Silico Genome-Wide Analysis of the Pear ( Pyrus bretschneideri) KNOX Family and the Functional Characterization of PbKNOX1, an Arabidopsis BREVIPEDICELLUS Orthologue Gene, Involved in Cell Wall and Lignin Biosynthesis. Front Genet 2019; 10:632. [PMID: 31333718 PMCID: PMC6624237 DOI: 10.3389/fgene.2019.00632] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 06/17/2019] [Indexed: 12/18/2022] Open
Abstract
Stone cells are a characteristic trait of pear fruit, but the contents and sizes of stone cells negatively correlate with fruit texture and flavor. Secondary cell wall thickening and lignification have been established as key steps of stone cell development. KNOTTED-LIKE HOMEOBOX (KNOX) proteins play important roles in plant cell growth and development, including cell wall formation and lignification. Although the characteristics and biological functions of KNOX proteins have been investigated in other plants, this gene family has not been functionally characterized in pear. Eighteen PbKNOX genes were identified in the present study, and all of the identified family members contained the KNOX I and/or KNOX II domains. Based on the phylogenetic tree and chromosomal localization, the 18 PbKNOX genes were divided into five subfamilies [SHOOT MERISTEMLESS (STM)-like, BREVIPEDICELLUS (BP)-like, KNOTTED ARABIDOPSIS THALIANA 2/6 (KNAT2/6)-like, KNAT7-like, and KNAT3-5-like] and were distributed among 10 chromosomes. In addition, we identified 9, 11, and 11 KNOX genes in the genomes of grape, mei, and strawberry, respectively, and the greatest number of collinear KNOX gene pairs formed between pears and peaches. Analyses of the spatiotemporal expression patterns showed that the tissue specificity of PbKNOX gene expression was not very significant and that the level of the PbKNOX1 transcript showed an opposite trend to the levels of stone cells and lignin accumulation. Furthermore, PbKNOX1 has high sequence identity and similarity with Arabidopsis BP. Compared with wild-type Arabidopsis, plants overexpressing PbKNOX1 not only showed an approximately 19% decrease in the secondary cell wall thickness of vessel cells but also exhibited an approximately 13% reduction in the lignin content of inflorescence stems. Moreover, the expression of several genes involved in lignin biosynthesis was downregulated in transgenic lines. Based on our results, PbKNOX1/BP participates in cell wall-thickening and lignin biosynthesis and represses the transcription of key structural genes involved in lignin synthesis, providing genetic evidence for the roles of KNOX in cell wall thickening and lignin biosynthesis in pear.
Collapse
Affiliation(s)
- Xi Cheng
- School of Life Science, Anhui Agricultural University, Hefei, China
| | - Manli Li
- School of Life Science, Anhui Agricultural University, Hefei, China
| | | | - Guohui Li
- School of Life Science, Anhui Agricultural University, Hefei, China
| | - Jingyun Zhang
- School of Life Science, Anhui Agricultural University, Hefei, China.,Horticultural Institute, Anhui Academy of Agricultural Sciences, Hefei, China
| | | | - Han Wang
- School of Life Science, Anhui Agricultural University, Hefei, China
| | - Qing Jin
- School of Life Science, Anhui Agricultural University, Hefei, China
| | - Taoshan Jiang
- School of Life Science, Anhui Agricultural University, Hefei, China
| | - Yongping Cai
- School of Life Science, Anhui Agricultural University, Hefei, China
| | - Dahui Li
- School of Life Science, Anhui Agricultural University, Hefei, China
| | - Yi Lin
- School of Life Science, Anhui Agricultural University, Hefei, China
| |
Collapse
|
9
|
Bradley D, Beltrao P. Evolution of protein kinase substrate recognition at the active site. PLoS Biol 2019; 17:e3000341. [PMID: 31233486 PMCID: PMC6611643 DOI: 10.1371/journal.pbio.3000341] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 07/05/2019] [Accepted: 06/12/2019] [Indexed: 02/05/2023] Open
Abstract
Protein kinases catalyse the phosphorylation of target proteins, controlling most cellular processes. The specificity of serine/threonine kinases is partly determined by interactions with a few residues near the phospho-acceptor residue, forming the so-called kinase-substrate motif. Kinases have been extensively duplicated throughout evolution, but little is known about when in time new target motifs have arisen. Here, we show that sequence variation occurring early in the evolution of kinases is dominated by changes in specificity-determining residues. We then analysed kinase specificity models, based on known target sites, observing that specificity has remained mostly unchanged for recent kinase duplications. Finally, analysis of phosphorylation data from a taxonomically broad set of 48 eukaryotic species indicates that most phosphorylation motifs are broadly distributed in eukaryotes but are not present in prokaryotes. Overall, our results suggest that the set of eukaryotes kinase motifs present today was acquired around the time of the eukaryotic last common ancestor and that early expansions of the protein kinase fold rapidly explored the space of possible target motifs.
Collapse
Affiliation(s)
- David Bradley
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| |
Collapse
|
10
|
Rivera-Gómez N, Martínez-Núñez MA, Pastor N, Rodriguez-Vazquez K, Perez-Rueda E. Dissecting the protein architecture of DNA-binding transcription factors in bacteria and archaea. MICROBIOLOGY-SGM 2017; 163:1167-1178. [PMID: 28777072 DOI: 10.1099/mic.0.000504] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Gene regulation at the transcriptional level is a central process in all organisms where DNA-binding transcription factors play a fundamental role. This class of proteins binds specifically at DNA sequences, activating or repressing gene expression as a function of the cell's metabolic status, operator context and ligand-binding status, among other factors, through the DNA-binding domain (DBD). In addition, TFs may contain partner domains (PaDos), which are involved in ligand binding and protein-protein interactions. In this work, we systematically evaluated the distribution, abundance and domain organization of DNA-binding TFs in 799 non-redundant bacterial and archaeal genomes. We found that the distributions of the DBDs and their corresponding PaDos correlated with the size of the genome. We also identified specific combinations between the DBDs and their corresponding PaDos. Within each class of DBDs there are differences in the actual angle formed at the dimerization interface, responding to the presence/absence of ligands and/or crystallization conditions, setting the orientation of the resulting helices and wings facing the DNA. Our results highlight the importance of PaDos as central elements that enhance the diversity of regulatory functions in all bacterial and archaeal organisms, and our results also demonstrate the role of PaDos in sensing diverse signal compounds. The highly specific interactions between DBDs and PaDos observed in this work, together with our structural analysis highlighting the difficulty in predicting both inter-domain geometry and quaternary structure, suggest that these systems appeared once and evolved with diverse duplication events in all the analysed organisms.
Collapse
Affiliation(s)
- Nancy Rivera-Gómez
- Centro de Investigaciones en Biotecnología, Universidad Autónoma del Estado de Morelos, Cuernavaca, México
| | - Mario Alberto Martínez-Núñez
- Laboratorio de Estudios Ecogenómicos, Facultad de Ciencias, Unidad Académica de Ciencias y Tecnología de Yucatán, Universidad Nacional Autónoma de México, Mérida, Yucatán, México
| | - Nina Pastor
- Centro de Investigación en Dinámica Celular, IICBA. Universidad Autónoma del Estado de Morelos Av. Universidad 1001, Col. Chamilpa, Cuernavaca, Morelos 62209, México
| | - Katya Rodriguez-Vazquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización. Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas. Ciudad Universitaria, Universidad Nacional Autónoma de México, México, D.F, México
| | - Ernesto Perez-Rueda
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México.,Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mérida, Yucatán, México
| |
Collapse
|
11
|
Modelling the evolution of transcription factor binding preferences in complex eukaryotes. Sci Rep 2017; 7:7596. [PMID: 28790414 PMCID: PMC5548724 DOI: 10.1038/s41598-017-07761-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Accepted: 06/30/2017] [Indexed: 12/27/2022] Open
Abstract
Transcription factors (TFs) exert their regulatory action by binding to DNA with specific sequence preferences. However, different TFs can partially share their binding sequences due to their common evolutionary origin. This "redundancy" of binding defines a way of organizing TFs in "motif families" by grouping TFs with similar binding preferences. Since these ultimately define the TF target genes, the motif family organization entails information about the structure of transcriptional regulation as it has been shaped by evolution. Focusing on the human TF repertoire, we show that a one-parameter evolutionary model of the Birth-Death-Innovation type can explain the TF empirical repartition in motif families, and allows to highlight the relevant evolutionary forces at the origin of this organization. Moreover, the model allows to pinpoint few deviations from the neutral scenario it assumes: three over-expanded families (including HOX and FOX genes), a set of "singleton" TFs for which duplication seems to be selected against, and a higher-than-average rate of diversification of the binding preferences of TFs with a Zinc Finger DNA binding domain. Finally, a comparison of the TF motif family organization in different eukaryotic species suggests an increase of redundancy of binding with organism complexity.
Collapse
|
12
|
Adler M, Szekely P, Mayo A, Alon U. Optimal Regulatory Circuit Topologies for Fold-Change Detection. Cell Syst 2017; 4:171-181.e8. [DOI: 10.1016/j.cels.2016.12.009] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 09/21/2016] [Accepted: 12/08/2016] [Indexed: 12/29/2022]
|
13
|
Sebé-Pedrós A, Ruiz-Trillo I. Evolution and Classification of the T-Box Transcription Factor Family. Curr Top Dev Biol 2017; 122:1-26. [DOI: 10.1016/bs.ctdb.2016.06.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
14
|
Intrinsic limits to gene regulation by global crosstalk. Nat Commun 2016; 7:12307. [PMID: 27489144 PMCID: PMC4976215 DOI: 10.1038/ncomms12307] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Accepted: 06/21/2016] [Indexed: 01/21/2023] Open
Abstract
Gene regulation relies on the specificity of transcription factor (TF)–DNA interactions. Limited specificity may lead to crosstalk: a regulatory state in which a gene is either incorrectly activated due to noncognate TF–DNA interactions or remains erroneously inactive. As each TF can have numerous interactions with noncognate cis-regulatory elements, crosstalk is inherently a global problem, yet has previously not been studied as such. We construct a theoretical framework to analyse the effects of global crosstalk on gene regulation. We find that crosstalk presents a significant challenge for organisms with low-specificity TFs, such as metazoans. Crosstalk is not easily mitigated by known regulatory schemes acting at equilibrium, including variants of cooperativity and combinatorial regulation. Our results suggest that crosstalk imposes a previously unexplored global constraint on the functioning and evolution of regulatory networks, which is qualitatively distinct from the known constraints that act at the level of individual gene regulatory elements. Limited specificity of transcription factor-DNA interactions leads to crosstalk in gene regulation. Here the authors consider global crosstalk in regulatory networks of growing size and complexity, and show that it imposes constraints on gene regulation and on the evolution of regulatory networks.
Collapse
|
15
|
Prediction and Validation of Transcription Factors Modulating the Expression of Sestrin3 Gene Using an Integrated Computational and Experimental Approach. PLoS One 2016; 11:e0160228. [PMID: 27466818 PMCID: PMC4965051 DOI: 10.1371/journal.pone.0160228] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Accepted: 07/16/2016] [Indexed: 02/03/2023] Open
Abstract
SESN3 has been implicated in multiple biological processes including protection against oxidative stress, regulation of glucose and lipid metabolism. However, little is known about the factors and mechanisms controlling its gene expression at the transcriptional level. We performed in silico phylogenetic footprinting analysis of 5 kb upstream regions of a diverse set of human SESN3 orthologs for the identification of high confidence conserved binding motifs (BMo). We further analyzed the predicted BMo by a motif comparison tool to identify the TFs likely to bind these discovered motifs. Predicted TFs were then integrated with experimentally known protein-protein interactions and experimentally validated to delineate the important transcriptional regulators of SESN3. Our study revealed high confidence set of BMos (integrated with DNase I hypersensitivity sites) in the upstream regulatory regions of SESN3 that could be bound by transcription factors from multiple families including FOXOs, SMADs, SOXs, TCFs and HNF4A. TF-TF network analysis established hubs of interaction that include SMAD3, TCF3, SMAD2, HDAC2, SOX2, TAL1 and TCF12 as well as the likely protein complexes formed between them. We show using ChIP-PCR as well as over-expression and knock out studies that FOXO3 and SOX2 transcriptionally regulate the expression of SESN3 gene. Our findings provide an important roadmap to further our understanding on the regulation of SESN3.
Collapse
|
16
|
Schmitz JF, Zimmer F, Bornberg-Bauer E. Mechanisms of transcription factor evolution in Metazoa. Nucleic Acids Res 2016; 44:6287-97. [PMID: 27288445 PMCID: PMC5291267 DOI: 10.1093/nar/gkw492] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Revised: 05/18/2016] [Accepted: 05/22/2016] [Indexed: 11/12/2022] Open
Abstract
Transcriptions factors (TFs) are pivotal for the regulation of virtually all cellular processes, including growth and development. Expansions of TF families are causally linked to increases in organismal complexity. Here we study the evolutionary dynamics, genetic causes and functional implications of the five largest metazoan TF families. We find that family expansions dominate across the whole metazoan tree; however, some branches experience exceptional family-specific accelerated expansions. Additionally, we find that such expansions are often predated by modular domain rearrangements, which spur the expansion of a new sub-family by separating it from the rest of the TF family in terms of protein-protein interactions. This separation allows for radical shifts in the functional spectrum of a duplicated TF. We also find functional differentiation inside TF sub-families as changes in expression specificity. Furthermore, accelerated family expansions are facilitated by repeats of sequence motifs such as C2H2 zinc fingers. We quantify whole genome duplications and single gene duplications as sources of TF family expansions, implying that some, but not all, TF duplicates are preferentially retained. We conclude that trans-regulatory changes (domain rearrangements) are instrumental for fundamental functional innovations, that cis-regulatory changes (affecting expression) accomplish wide-spread fine tuning and both jointly contribute to the functional diversification of TFs.
Collapse
Affiliation(s)
- Jonathan F Schmitz
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, Hüfferstrasse 1, D-48149 Münster, Germany
| | - Fabian Zimmer
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, Hüfferstrasse 1, D-48149 Münster, Germany Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Erich Bornberg-Bauer
- Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, Hüfferstrasse 1, D-48149 Münster, Germany
| |
Collapse
|
17
|
Martin O, Krzywicki A, Zagorski M. Drivers of structural features in gene regulatory networks: From biophysical constraints to biological function. Phys Life Rev 2016; 17:124-58. [DOI: 10.1016/j.plrev.2016.06.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Revised: 03/25/2016] [Accepted: 04/20/2016] [Indexed: 12/23/2022]
|
18
|
Abstract
Specific interactions are a hallmark feature of self-assembly and signal-processing systems in both synthetic and biological settings. Specificity between components may arise from a wide variety of physical and chemical mechanisms in diverse contexts, from DNA hybridization to shape-sensitive depletion interactions. Despite this diversity, all systems that rely on interaction specificity operate under the constraint that increasing the number of distinct components inevitably increases off-target binding. Here we introduce "capacity," the maximal information encodable using specific interactions, to compare specificity across diverse experimental systems and to compute how specificity changes with physical parameters. Using this framework, we find that "shape" coding of interactions has higher capacity than chemical ("color") coding because the strength of off-target binding is strongly sublinear in binding-site size for shapes while being linear for colors. We also find that different specificity mechanisms, such as shape and color, can be combined in a synergistic manner, giving a capacity greater than the sum of the parts.
Collapse
|
19
|
Information Limited Oligonucleotide Amplification Assay for Affinity-Based, Parallel Detection Studies. PLoS One 2016; 11:e0151072. [PMID: 26978653 PMCID: PMC4792472 DOI: 10.1371/journal.pone.0151072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 02/23/2016] [Indexed: 11/19/2022] Open
Abstract
Molecular communication systems encounter similar constraints as telecommunications. In either case, channel crosstalk at the receiver end will result in information loss that statistical analysis cannot compensate. This is because in any communication channel there is a physical limit to the amount of information that can be transmitted. We present a novel and simple modified end amplification (MEA) technique to generate reduced and defined amounts of specific information in form of short fragments from an oligonucleotide source that also contains unrelated and redundant information. Our method can be a valuable tool to investigate information overflow and channel capacity in biomolecular recognition systems.
Collapse
|
20
|
Tools and Principles for Microbial Gene Circuit Engineering. J Mol Biol 2016; 428:862-88. [DOI: 10.1016/j.jmb.2015.10.004] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Revised: 10/05/2015] [Accepted: 10/06/2015] [Indexed: 12/26/2022]
|
21
|
Ma X, Ezer D, Navarro C, Adryan B. Reliable scaling of position weight matrices for binding strength comparisons between transcription factors. BMC Bioinformatics 2015; 16:265. [PMID: 26289072 PMCID: PMC4545934 DOI: 10.1186/s12859-015-0666-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 07/08/2015] [Indexed: 01/05/2023] Open
Abstract
Background Scoring DNA sequences against Position Weight Matrices (PWMs) is a widely adopted method to identify putative transcription factor binding sites. While common bioinformatics tools produce scores that can reflect the binding strength between a specific transcription factor and the DNA, these scores are not directly comparable between different transcription factors. Other methods, including p-value associated approaches (Touzet H, Varré J-S. Efficient and accurate p-value computation for position weight matrices. Algorithms Mol Biol. 2007;2(1510.1186):1748–7188), provide more rigorous ways to identify potential binding sites, but their results are difficult to interpret in terms of binding energy, which is essential for the modeling of transcription factor binding dynamics and enhancer activities. Results Here, we provide two different ways to find the scaling parameter λ that allows us to infer binding energy from a PWM score. The first approach uses a PWM and background genomic sequence as input to estimate λ for a specific transcription factor, which we applied to show that λ distributions for different transcription factor families correspond with their DNA binding properties. Our second method can reliably convert λ between different PWMs of the same transcription factor, which allows us to directly compare PWMs that were generated by different approaches. Conclusion These two approaches provide computationally efficient ways to scale PWM scores and estimate the strength of transcription factor binding sites in quantitative studies of binding dynamics. Their results are consistent with each other and previous reports in most of cases. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0666-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyan Ma
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK. .,Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK.
| | - Daphne Ezer
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK. .,Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK.
| | - Carmen Navarro
- Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK. .,Department of Computer Science and Artificial Intelligence, University of Granada, Periodista Daniel Saucedo Aranda, Granada, Spain.
| | - Boris Adryan
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK. .,Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK.
| |
Collapse
|
22
|
The functional landscape bound to the transcription factors of Escherichia coli K-12. Comput Biol Chem 2015; 58:93-103. [PMID: 26094112 DOI: 10.1016/j.compbiolchem.2015.06.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Revised: 05/31/2015] [Accepted: 06/03/2015] [Indexed: 01/05/2023]
Abstract
Motivated by the experimental evidences accumulated in the last ten years and based on information deposited in RegulonDB, literature look up, and sequence analysis, we analyze the repertoire of 304 DNA-binding Transcription factors (TFs) in Escherichia coli K-12. These regulators were grouped in 78 evolutionary families and are regulating almost half of the total genes in this bacterium. In structural terms, 60% of TFs are composed by two-domains, 30% are monodomain, and 10% three- and four-structural domains. As previously noticed, the most abundant DNA-binding domain corresponds to the winged helix-turn-helix, with few alternative DNA-binding structures, resembling the hypothesis of successful protein structures with the emergence of new ones at low scales. In summary, we identified and described the characteristics associated to the DNA-binding TF in E. coli K-12. We also identified twelve functional modules based on a co-regulated gene matrix. Finally, diverse regulons were predicted based on direct associations between the TFs and potential regulated genes. This analysis should increase our knowledge about the gene regulation in the bacterium E. coli K-12, and provide more additional clues for comprehensive modelling of transcriptional regulatory networks in other bacteria.
Collapse
|
23
|
Plaisier CL, Lo FY, Ashworth J, Brooks AN, Beer KD, Kaur A, Pan M, Reiss DJ, Facciotti MT, Baliga NS. Evolution of context dependent regulation by expansion of feast/famine regulatory proteins. BMC SYSTEMS BIOLOGY 2014; 8:122. [PMID: 25394904 PMCID: PMC4236453 DOI: 10.1186/s12918-014-0122-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Accepted: 10/16/2014] [Indexed: 11/25/2022]
Abstract
Background Expansion of transcription factors is believed to have played a crucial role in evolution of all organisms by enabling them to deal with dynamic environments and colonize new environments. We investigated how the expansion of the Feast/Famine Regulatory Protein (FFRP) or Lrp-like proteins into an eight-member family in Halobacterium salinarum NRC-1 has aided in niche-adaptation of this archaeon to a complex and dynamically changing hypersaline environment. Results We mapped genome-wide binding locations for all eight FFRPs, investigated their preference for binding different effector molecules, and identified the contexts in which they act by analyzing transcriptional responses across 35 growth conditions that mimic different environmental and nutritional conditions this organism is likely to encounter in the wild. Integrative analysis of these data constructed an FFRP regulatory network with conditionally active states that reveal how interrelated variations in DNA-binding domains, effector-molecule preferences, and binding sites in target gene promoters have tuned the functions of each FFRP to the environments in which they act. We demonstrate how conditional regulation of similar genes by two FFRPs, AsnC (an activator) and VNG1237C (a repressor), have striking environment-specific fitness consequences for oxidative stress management and growth, respectively. Conclusions This study provides a systems perspective into the evolutionary process by which gene duplication within a transcription factor family contributes to environment-specific adaptation of an organism. Electronic supplementary material The online version of this article (doi:10.1186/s12918-014-0122-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Fang-Yin Lo
- Institute for Systems Biology, Seattle, WA, USA. .,Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA.
| | | | - Aaron N Brooks
- Institute for Systems Biology, Seattle, WA, USA. .,Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA.
| | - Karlyn D Beer
- Institute for Systems Biology, Seattle, WA, USA. .,Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA.
| | | | - Min Pan
- Institute for Systems Biology, Seattle, WA, USA.
| | | | - Marc T Facciotti
- Department of Biomedical Engineering, University of California, Davis, CA, USA. .,Genome Center, University of California, Davis, CA, USA.
| | - Nitin S Baliga
- Institute for Systems Biology, Seattle, WA, USA. .,Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA. .,Department of Microbiology, University of Washington, Seattle, WA, USA. .,Department of Biology, University of Washington, Seattle, WA, USA.
| |
Collapse
|
24
|
Abstract
The widespread exchange of genes between bacteria must have consequences on the global architecture of their genomes, which are being found in the abundant genomic data available today. Most of the expansion of bacterial protein families can be attributed to transfer events, which are positively biased for smaller evolutionary distances between genomes, and more frequent for classes that are larger, when summed over all known bacteria. Moreover, “innovation” events where horizontal transfers carry exogenous evolutionary families appear to be less frequent for larger genomes. This dynamic expansion of evolutionary families is interconnected with the acquisition of new biological functions and thus with the size and distribution of the genes’ functional categories found on a genome. This commentary presents our recent contributions to this line of work and possible future directions.
Collapse
Affiliation(s)
- Luigi Grassi
- Dipartimento di Fisica, Sapienza Università di Roma; Rome, Italy
| | | | | |
Collapse
|
25
|
An intricate network of conserved DNA upstream motifs and associated transcription factors regulate the expression of uromodulin gene. J Urol 2014; 192:981-9. [PMID: 24594405 DOI: 10.1016/j.juro.2014.02.095] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/21/2014] [Indexed: 12/21/2022]
Abstract
PURPOSE Uromodulin is a kidney specific glycoprotein whose expression can modulate kidney homeostasis. However, the set of sequence specific transcription factors that regulate the uromodulin gene UMOD and their upstream binding locations are not well characterized. We built a high resolution map of its transcriptional regulation. MATERIALS AND METHODS We applied in silico phylogenetic footprinting on the upstream regulatory regions of a diverse set of human UMOD orthologs to identify conserved binding motifs and corresponding position specific weight matrices. We further analyzed the predicted binding motifs by motif comparison, which identified transcription factors likely to bind these discovered motifs. Predicted transcription factors were then integrated with experimentally known protein-protein interactions available from public databases and tissue specific expression resources to delineate important regulators controlling UMOD expression. RESULTS Analysis allowed the identification of a reliable set of binding motifs in the upstream regulatory regions of UMOD to build a high confidence compendium of transcription factors that could bind these motifs, such as GATA3, HNF1B, SP1, SMAD3, RUNX2 and KLF4. ENCODE deoxyribonuclease I hypersensitivity sites in the UMOD upstream region of the mouse kidney confirmed that some of these binding motifs were open to binding by predicted transcription factors. The transcription factor-transcription factor network revealed several highly connected transcription factors, such as SP1, SP3, TP53, POU2F1, RARB, RARA and RXRA, as well as the likely protein complexes formed between them. Expression levels of these transcription factors in the kidney suggest their central role in controlling UMOD expression. CONCLUSIONS Our findings will form a map for understanding the regulation of uromodulin expression in health and disease.
Collapse
|
26
|
In silico identification of transcription factors in Medicago sativa using available transcriptomic resources. Mol Genet Genomics 2014; 289:457-68. [PMID: 24556904 DOI: 10.1007/s00438-014-0823-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 01/30/2014] [Indexed: 12/17/2022]
Abstract
Transcription factors (TFs) are proteins that govern organismal development and response to the environment by regulating gene expression. Information on the amount and diversity of TFs within individual plant species is critical for understanding of their biological roles and evolutionary history across the plant kingdom. Currently, only scattered information on separate TFs is available for alfalfa, the most extensively cultivated forage legume in the world. In the meantime, several large transcriptomic resources that can be used to identify and characterize alfalfa TF genes are freely accessible online. In this study, we have performed an in silico analysis of transcriptome data generated in our laboratory and publicly acquirable from other sources to reveal and systematize alfalfa transcription factors. Transcriptome-wide mining enabled prediction of 983 TFs along with their sequence features and putative phylogenies of the largest families. All data were assembled into a simple open-access database named AlfalfaTFDB ( http://plantpathology.ba.ars.usda.gov/alfalfatfdb.html ). Transcriptomic analysis used in this work represents an effective approach for the identification of TF genes in plants with incomplete genomes, such as alfalfa. Integrated TF repertoires of Medicago sativa will provide an important tool for studying regulation of gene expression in other complex non-model species of agricultural significance.
Collapse
|
27
|
Stanton BC, Nielsen AAK, Tamsir A, Clancy K, Peterson T, Voigt CA. Genomic mining of prokaryotic repressors for orthogonal logic gates. Nat Chem Biol 2014; 10:99-105. [PMID: 24316737 PMCID: PMC4165527 DOI: 10.1038/nchembio.1411] [Citation(s) in RCA: 260] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 10/30/2013] [Indexed: 01/25/2023]
Abstract
Genetic circuits perform computational operations based on interactions between freely diffusing molecules within a cell. When transcription factors are combined to build a circuit, unintended interactions can disrupt its function. Here, we apply 'part mining' to build a library of 73 TetR-family repressors gleaned from prokaryotic genomes. The operators of a subset were determined using an in vitro method, and this information was used to build synthetic promoters. The promoters and repressors were screened for cross-reactions. Of these, 16 were identified that both strongly repress their cognate promoter (5- to 207-fold) and exhibit minimal interactions with other promoters. Each repressor-promoter pair was converted to a NOT gate and characterized. Used as a set of 16 NOT/NOR gates, there are >10(54) circuits that could be built by changing the pattern of input and output promoters. This represents a large set of compatible gates that can be used to construct user-defined circuits.
Collapse
Affiliation(s)
- Brynne C Stanton
- Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Alec A K Nielsen
- Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Alvin Tamsir
- Department of Biochemistry and Biophysics, University of California-San Francisco, San Francisco, California, USA
| | - Kevin Clancy
- Synthetic Biology R&D Unit, Life Technologies, Carlsbad, California, USA
| | - Todd Peterson
- Synthetic Biology R&D Unit, Life Technologies, Carlsbad, California, USA
| | - Christopher A Voigt
- Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
28
|
Beltrao P, Bork P, Krogan NJ, van Noort V. Evolution and functional cross-talk of protein post-translational modifications. Mol Syst Biol 2013; 9:714. [PMID: 24366814 PMCID: PMC4019982 DOI: 10.1002/msb.201304521] [Citation(s) in RCA: 257] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Revised: 11/18/2013] [Accepted: 11/22/2013] [Indexed: 12/19/2022] Open
Abstract
Protein post-translational modifications (PTMs) allow the cell to regulate protein activity and play a crucial role in the response to changes in external conditions or internal states. Advances in mass spectrometry now enable proteome wide characterization of PTMs and have revealed a broad functional role for a range of different types of modifications. Here we review advances in the study of the evolution and function of PTMs that were spurred by these technological improvements. We provide an overview of studies focusing on the origin and evolution of regulatory enzymes as well as the evolutionary dynamics of modification sites. Finally, we discuss different mechanisms of altering protein activity via post-translational regulation and progress made in the large-scale functional characterization of PTM function.
Collapse
Affiliation(s)
- Pedro Beltrao
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)CambridgeUK
| | - Peer Bork
- Structural and Computational Biology UnitEuropean Molecular Biology LaboratoryHeidelbergGermany
- Max‐Delbruck‐Centre for Molecular MedicineBerlin‐BuchGermany
| | - Nevan J. Krogan
- Department of Cellular and Molecular PharmacologyUniversity of CaliforniaSan FranciscoCaliforniaUSA
- California Institute for Quantitative BiosciencesSan FranciscoCaliforniaUSA
- J. David Gladstone InstitutesSan FranciscoCaliforniaUSA
| | - Vera van Noort
- Structural and Computational Biology UnitEuropean Molecular Biology LaboratoryHeidelbergGermany
| |
Collapse
|
29
|
Abstract
Developmental transcription factors are key players in animal multicellularity, being members of the T-box family that are among the most important. Until recently, T-box transcription factors were thought to be exclusively present in metazoans. Here, we report the presence of T-box genes in several nonmetazoan lineages, including ichthyosporeans, filastereans, and fungi. Our data confirm that Brachyury is the most ancient member of the T-box family and establish that the T-box family diversified at the onset of Metazoa. Moreover, we demonstrate functional conservation of a homolog of Brachyury of the protist Capsaspora owczarzaki in Xenopus laevis. By comparing the molecular phenotype of C. owczarzaki Brachyury with that of homologs of early branching metazoans, we define a clear difference between unicellular holozoan and metazoan Brachyury homologs, suggesting that the specificity of Brachyury emerged at the origin of Metazoa. Experimental determination of the binding preferences of the C. owczarzaki Brachyury results in a similar motif to that of metazoan Brachyury and other T-box classes. This finding suggests that functional specificity between different T-box classes is likely achieved by interaction with alternative cofactors, as opposed to differences in binding specificity.
Collapse
|
30
|
Price MN, Deutschbauer AM, Skerker JM, Wetmore KM, Ruths T, Mar JS, Kuehl JV, Shao W, Arkin AP. Indirect and suboptimal control of gene expression is widespread in bacteria. Mol Syst Biol 2013; 9:660. [PMID: 23591776 PMCID: PMC3658271 DOI: 10.1038/msb.2013.16] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2012] [Accepted: 03/13/2013] [Indexed: 11/09/2022] Open
Abstract
Gene regulation in bacteria is usually described as an adaptive response to an environmental change so that genes are expressed when they are required. We instead propose that most genes are under indirect control: their expression responds to signal(s) that are not directly related to the genes' function. Indirect control should perform poorly in artificial conditions, and we show that gene regulation is often maladaptive in the laboratory. In Shewanella oneidensis MR-1, 24% of genes are detrimental to fitness in some conditions, and detrimental genes tend to be highly expressed instead of being repressed when not needed. In diverse bacteria, there is little correlation between when genes are important for optimal growth or fitness and when those genes are upregulated. Two common types of indirect control are constitutive expression and regulation by growth rate; these occur for genes with diverse functions and often seem to be suboptimal. Because genes that have closely related functions can have dissimilar expression patterns, regulation may be suboptimal in the wild as well as in the laboratory.
Collapse
Affiliation(s)
- Morgan N Price
- Physical Biosciences Division, Lawrence Berkeley National Lab, Berkeley, CA 94720, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Testone G, Condello E, Verde I, Nicolodi C, Caboni E, Dettori MT, Vendramin E, Bruno L, Bitonti MB, Mele G, Giannino D. The peach (Prunus persica L. Batsch) genome harbours 10 KNOX genes, which are differentially expressed in stem development, and the class 1 KNOPE1 regulates elongation and lignification during primary growth. JOURNAL OF EXPERIMENTAL BOTANY 2012; 63:5417-35. [PMID: 22888130 PMCID: PMC3444263 DOI: 10.1093/jxb/ers194] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The KNOTTED-like (KNOX) genes encode homeodomain transcription factors and regulate several processes of plant organ development. The peach (Prunus persica L. Batsch) genome was found to contain 10 KNOX members (KNOPE genes); six of them were experimentally located on the Prunus reference map and the class 1 KNOPE1 was found to link to a quantitative trait locus (QTL) for the internode length in the peach×Ferganensis population. All the KNOPE genes were differentially transcribed in the internodes of growing shoots; the KNOPE1 mRNA abundance decreased progressively from primary (elongation) to secondary growth (radial expansion). During primary growth, the KNOPE1 mRNA was localized in the cortex and in the procambium/metaphloem zones, whereas it was undetected in incipient phloem and xylem fibres. KNOPE1 overexpression in the Arabidopsis bp4 loss-of-function background (35S:KNOPE1/bp genotype) restored the rachis length, suggesting, together with the QTL association, a role for KNOPE1 in peach shoot elongation. Several lignin biosynthesis genes were up-regulated in the bp4 internodes but repressed in the 35S:KNOPE1/bp lines similarly to the wild type. Moreover, the lignin deposition pattern of the 35S:KNOPE1/bp and the wild-type internodes were the same. The KNOPE1 protein was found to recognize in vitro one of the typical KNOX DNA-binding sites that recurred in peach and Arabidopsis lignin genes. KNOPE1 expression was inversely correlated with that of lignin genes and lignin deposition along the peach shoot stems and was down-regulated in lignifying vascular tissues. These data strongly support that KNOPE1 prevents cell lignification by repressing lignin genes during peach stem primary growth.
Collapse
Affiliation(s)
- Giulio Testone
- Institute of Agricultural Biology and Biotechnology, National Research Council of Italy (CNR), via Salaria km 29,300, 00015, Monterotondo Scalo, Rome, Italy
- These authors contributed equally to this work
| | - Emiliano Condello
- Fruit Tree Research Centre, Agriculture Research Council (CRA), Via di Fioranello 52, 00134 Rome, Italy
- These authors contributed equally to this work
| | - Ignazio Verde
- Fruit Tree Research Centre, Agriculture Research Council (CRA), Via di Fioranello 52, 00134 Rome, Italy
| | - Chiara Nicolodi
- Institute of Agricultural Biology and Biotechnology, National Research Council of Italy (CNR), via Salaria km 29,300, 00015, Monterotondo Scalo, Rome, Italy
| | - Emilia Caboni
- Fruit Tree Research Centre, Agriculture Research Council (CRA), Via di Fioranello 52, 00134 Rome, Italy
| | - Maria Teresa Dettori
- Fruit Tree Research Centre, Agriculture Research Council (CRA), Via di Fioranello 52, 00134 Rome, Italy
| | - Elisa Vendramin
- Fruit Tree Research Centre, Agriculture Research Council (CRA), Via di Fioranello 52, 00134 Rome, Italy
| | - Leonardo Bruno
- Department of Ecology, University of Calabria, Ponte Bucci, 87030 Arcavacata di Rende, Cosenza, Italy
| | - Maria Beatrice Bitonti
- Department of Ecology, University of Calabria, Ponte Bucci, 87030 Arcavacata di Rende, Cosenza, Italy
| | - Giovanni Mele
- Institute of Agricultural Biology and Biotechnology, National Research Council of Italy (CNR), via Salaria km 29,300, 00015, Monterotondo Scalo, Rome, Italy
| | | |
Collapse
|
32
|
Sasson V, Shachrai I, Bren A, Dekel E, Alon U. Mode of regulation and the insulation of bacterial gene expression. Mol Cell 2012; 46:399-407. [PMID: 22633488 DOI: 10.1016/j.molcel.2012.04.032] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2011] [Revised: 04/16/2012] [Accepted: 04/27/2012] [Indexed: 10/28/2022]
Abstract
A gene can be said to be insulated from environmental variations if its expression level depends only on its cognate inducers, and not on variations in conditions. We tested the insulation of the lac promoter of E. coli and of synthetic constructs in which the transcription factor CRP acts as either an activator or a repressor, by measuring their input function-their expression as a function of inducers-in different growth conditions. We find that the promoter activities show sizable variation across conditions of 10%-100% (SD/mean). When the promoter is bound to its cognate regulator(s), variation across conditions is smaller than when it is unbound. Thus, mode of regulation affects insulation: activators seem to show better insulation at high expression levels, and repressors at low expression levels. This may explain the Savageau demand rule, in which E. coli genes needed often in the natural environment tend to be regulated by activators, and rarely needed genes by repressors. The present approach can be used to study insulation in other genes and organisms.
Collapse
Affiliation(s)
- Vered Sasson
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | | | |
Collapse
|
33
|
Silva-Rocha R, de Lorenzo V. A GFP-lacZ bicistronic reporter system for promoter analysis in environmental gram-negative bacteria. PLoS One 2012; 7:e34675. [PMID: 22493710 PMCID: PMC3321037 DOI: 10.1371/journal.pone.0034675] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Accepted: 03/06/2012] [Indexed: 01/08/2023] Open
Abstract
Here, we describe a bicistronic reporter system for the analysis of promoter activity in a variety of Gram-negative bacteria at both the population and single-cell levels. This synthetic genetic tool utilizes an artificial operon comprising the gfp and lacZ genes that are assembled in a suicide vector, which is integrated at specific sites within the chromosome of the target bacterium, thereby creating a monocopy reporter system. This tool was instrumental for the complete in vivo characterization of two promoters, Pb and Pc, that drive the expression of the benzoate and catechol degradation pathways, respectively, of the soil bacterium Pseudomonas putida KT2440. The parameterization of these promoters in a population (using β-galactosidase assays) and in single cells (using flow cytometry) was necessary to examine the basic numerical features of these systems, such as the basal and maximal levels and the induction kinetics in response to an inducer (benzoate). Remarkably, GFP afforded a view of the process at a much higher resolution compared with standard lacZ tests; changes in fluorescence faithfully reflected variations in the transcriptional regimes of individual bacteria. The broad host range of the vector/reporter platform is an asset for the characterization of promoters in different bacteria, thereby expanding the diversity of genomic chasses amenable to Synthetic Biology methods.
Collapse
Affiliation(s)
- Rafael Silva-Rocha
- Systems Biology Program, Centro Nacional de Biotecnología, CSIC, Cantoblanco, Madrid, Spain
| | - Victor de Lorenzo
- Systems Biology Program, Centro Nacional de Biotecnología, CSIC, Cantoblanco, Madrid, Spain
- * E-mail:
| |
Collapse
|
34
|
Probing the informational and regulatory plasticity of a transcription factor DNA-binding domain. PLoS Genet 2012; 8:e1002614. [PMID: 22496663 PMCID: PMC3315485 DOI: 10.1371/journal.pgen.1002614] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2011] [Accepted: 02/07/2012] [Indexed: 11/19/2022] Open
Abstract
Transcription factors have two functional constraints on their evolution: (1) their binding sites must have enough information to be distinguishable from all other sequences in the genome, and (2) they must bind these sites with an affinity that appropriately modulates the rate of transcription. Since both are determined by the biophysical properties of the DNA–binding domain, selection on one will ultimately affect the other. We were interested in understanding how plastic the informational and regulatory properties of a transcription factor are and how transcription factors evolve to balance these constraints. To study this, we developed an in vivo selection system in Escherichia coli to identify variants of the helix-turn-helix transcription factor MarA that bind different sets of binding sites with varying degrees of degeneracy. Unlike previous in vitro methods used to identify novel DNA binders and to probe the plasticity of the binding domain, our selections were done within the context of the initiation complex, selecting for both specific binding within the genome and for a physiologically significant strength of interaction to maintain function of the factor. Using MITOMI, quantitative PCR, and a binding site fitness assay, we characterized the binding, function, and fitness of some of these variants. We observed that a large range of binding preferences, information contents, and activities could be accessed with a few mutations, suggesting that transcriptional regulatory networks are highly adaptable and expandable. The main role of transcription factors is to modulate the expression levels of functionally related genes in response to environmental and cellular cues. For this process to be precise, the transcription factor needs to locate and bind specific DNA sequences in the genome and needs to bind these sites with a strength that appropriately adjusts the amount of gene expressed. Both specific protein–DNA interactions and transcription factor activity are intimately coupled, because they are both dependent upon the biochemical properties of the DNA–binding domain. Here we experimentally probe how variable these properties are using a novel in vivo selection assay. We observed that the specific binding preferences for the transcription factor MarA and its transcriptional activity can be altered over a large range with a few mutations and that selection on one function will impact the other. This work helps us to better understand the mechanism of transcriptional regulation and its evolution, and may prove useful for the engineering of transcription factors and regulatory networks.
Collapse
|
35
|
Perez-Rueda E, Martinez-Nuñez MA. The repertoire of DNA-binding transcription factors in prokaryotes: functional and evolutionary lessons. Sci Prog 2012; 95:315-29. [PMID: 23094327 PMCID: PMC10365527 DOI: 10.3184/003685012x13420097673409] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The capabilities of organisms to contend with environmental changes depend on their genes and their ability to regulate their expression. DNA-binding transcription factors (TFs) play a central role in this process, because they regulate gene expression positively and/or negatively, depending on the operator context and ligand-binding status. In this review, we summarise recent findings regarding the function and evolution of TFs in prokaryotes. We consider the abundance of TFs in bacteria and archaea, the role of DNA-binding domains and their partner domains, and the effects of duplication events in the evolution of regulatory networks. Finally, a comprehensive picture for how regulatory networks have evolved in prokaryotes is provided.
Collapse
Affiliation(s)
- Ernesto Perez-Rueda
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos 62100, Mexico.
| | | |
Collapse
|
36
|
Andrulis ED. Theory of the origin, evolution, and nature of life. Life (Basel) 2011; 2:1-105. [PMID: 25382118 PMCID: PMC4187144 DOI: 10.3390/life2010001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Revised: 12/10/2011] [Accepted: 12/13/2011] [Indexed: 12/22/2022] Open
Abstract
Life is an inordinately complex unsolved puzzle. Despite significant theoretical progress, experimental anomalies, paradoxes, and enigmas have revealed paradigmatic limitations. Thus, the advancement of scientific understanding requires new models that resolve fundamental problems. Here, I present a theoretical framework that economically fits evidence accumulated from examinations of life. This theory is based upon a straightforward and non-mathematical core model and proposes unique yet empirically consistent explanations for major phenomena including, but not limited to, quantum gravity, phase transitions of water, why living systems are predominantly CHNOPS (carbon, hydrogen, nitrogen, oxygen, phosphorus, and sulfur), homochirality of sugars and amino acids, homeoviscous adaptation, triplet code, and DNA mutations. The theoretical framework unifies the macrocosmic and microcosmic realms, validates predicted laws of nature, and solves the puzzle of the origin and evolution of cellular life in the universe.
Collapse
Affiliation(s)
- Erik D Andrulis
- Department of Molecular Biology and Microbiology, Case Western Reserve University School of Medicine, Wood Building, W212, Cleveland, OH 44106, USA.
| |
Collapse
|
37
|
Kostadinov I, Kottmann R, Ramette A, Waldmann J, Buttigieg PL, Glöckner FO. Quantifying the effect of environment stability on the transcription factor repertoire of marine microbes. MICROBIAL INFORMATICS AND EXPERIMENTATION 2011; 1:9. [PMID: 22587903 PMCID: PMC3372289 DOI: 10.1186/2042-5783-1-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Accepted: 09/07/2011] [Indexed: 11/14/2022]
Abstract
Background DNA-binding transcription factors (TFs) regulate cellular functions in prokaryotes, often in response to environmental stimuli. Thus, the environment exerts constant selective pressure on the TF gene content of microbial communities. Recently a study on marine Synechococcus strains detected differences in their genomic TF content related to environmental adaptation, but so far the effect of environmental parameters on the content of TFs in bacterial communities has not been systematically investigated. Results We quantified the effect of environment stability on the transcription factor repertoire of marine pelagic microbes from the Global Ocean Sampling (GOS) metagenome using interpolated physico-chemical parameters and multivariate statistics. Thirty-five percent of the difference in relative TF abundances between samples could be explained by environment stability. Six percent was attributable to spatial distance but none to a combination of both spatial distance and stability. Some individual TFs showed a stronger relationship to environment stability and space than the total TF pool. Conclusions Environmental stability appears to have a clearly detectable effect on TF gene content in bacterioplanktonic communities described by the GOS metagenome. Interpolated environmental parameters were shown to compare well to in situ measurements and were essential for quantifying the effect of the environment on the TF content. It is demonstrated that comprehensive and well-structured contextual data will strongly enhance our ability to interpret the functional potential of microbes from metagenomic data.
Collapse
Affiliation(s)
- Ivaylo Kostadinov
- Max Planck Institute for Marine Microbiology, Celsiusstrasse 1, 28359 Bremen, Germany.
| | | | | | | | | | | |
Collapse
|
38
|
Beslon G, Parsons D, Sanchez-Dehesa Y, Peña JM, Knibbe C. Scaling laws in bacterial genomes: A side-effect of selection of mutational robustness? Biosystems 2010; 102:32-40. [DOI: 10.1016/j.biosystems.2010.07.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2010] [Accepted: 07/15/2010] [Indexed: 11/25/2022]
|
39
|
Tlusty T. A colorful origin for the genetic code: Information theory, statistical mechanics and the emergence of molecular codes. Phys Life Rev 2010; 7:362-76. [DOI: 10.1016/j.plrev.2010.06.002] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Revised: 01/25/2010] [Accepted: 02/06/2010] [Indexed: 10/19/2022]
|
40
|
Tlusty T. How could prebiotic molecules make the code and how all this is related to proteins? Phys Life Rev 2010. [DOI: 10.1016/j.plrev.2010.08.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
41
|
Grassi L, Fusco D, Sellerio A, Corà D, Bassetti B, Caselle M, Lagomarsino MC. Identity and divergence of protein domain architectures after the yeast whole-genome duplication event. MOLECULAR BIOSYSTEMS 2010; 6:2305-15. [PMID: 20820472 DOI: 10.1039/c003507f] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Gene duplication is a key mechanism in evolution for generating new functionality, and it is known to have produced a large proportion of genes. Duplication mechanisms include small-scale, or "local", events such as unequal crossing over and retroposition, together with global events, such as chromosomal or whole genome duplication (WGD). In particular, different studies confirmed that the yeast S. cerevisiae arose from a 100-150 million-year old whole-genome duplication. Detection and study of duplications are usually based on sequence alignment, synteny and phylogenetic techniques, but protein domains are also useful in assessing protein homology. We develop a simple and computationally efficient protein domain architecture comparison method based on the domain assignments available from public databases. We test the accuracy and the reliability of this method in detecting instances of gene duplication in the yeast S. cerevisiae. In particular, we analyze the evolution of WGD and non-WGD paralogs from the domain viewpoint, in comparison with a more standard functional analysis of the genes. A large number of domains is shared by genes that underwent local and global duplications, indicating the existence of a common set of "duplicable" domains. On the other hand, WGD and non-WGD paralogs tend to have different functions. We find evidence that this comes from functional migration within similar domain superfamilies, but also from the existence of small sets of WGD and non-WGD specific domain superfamilies with largely different functions. This observation gives a novel perspective on the finding that WGD paralogs tend to be functionally different from small-scale paralogs. WGD and non-WGD superfamilies carry distinct functions. Finally, the Gene Ontology similarity of paralogs tends to decrease with duplication age, while this tendency is weaker or not observable by the comparison of the domain architectures of paralogs. This suggests that the set of domains composing a protein tends to be maintained, while its function, cellular process or localization diversifies. Overall, the gathered evidence gives a different viewpoint on the biological specificity of the WGD and at the same time points out the validity of domain architecture comparison as a tool for detecting homology.
Collapse
Affiliation(s)
- Luigi Grassi
- Università degli Studi di Torino, Dip. Fisica Teorica-Via Giuria 1, 10125 Torino, Italy
| | | | | | | | | | | | | |
Collapse
|
42
|
Santos MA, Turinsky AL, Ong S, Tsai J, Berger MF, Badis G, Talukder S, Gehrke AR, Bulyk ML, Hughes TR, Wodak SJ. Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences. Nucleic Acids Res 2010; 38:7927-42. [PMID: 20705649 PMCID: PMC3001082 DOI: 10.1093/nar/gkq714] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Classifying proteins into subgroups with similar molecular function on the basis of sequence is an important step in deriving reliable functional annotations computationally. So far, however, available classification procedures have been evaluated against protein subgroups that are defined by experts using mainly qualitative descriptions of molecular function. Recently, in vitro DNA-binding preferences to all possible 8-nt DNA sequences have been measured for 178 mouse homeodomains using protein-binding microarrays, offering the unprecedented opportunity of evaluating the classification methods against quantitative measures of molecular function. To this end, we automatically derive homeodomain subtypes from the DNA-binding data and independently group the same domains using sequence information alone. We test five sequence-based methods, which use different sequence-similarity measures and algorithms to group sequences. Results show that methods that optimize the classification robustness reflect well the detailed functional specificity revealed by the experimental data. In some of these classifications, 73–83% of the subfamilies exactly correspond to, or are completely contained in, the function-based subtypes. Our findings demonstrate that certain sequence-based classifications are capable of yielding very specific molecular function annotations. The availability of quantitative descriptions of molecular function, such as DNA-binding data, will be a key factor in exploiting this potential in the future.
Collapse
Affiliation(s)
- Miguel A Santos
- Molecular Structure and Function Program, Hospital for Sick Children, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Charoensawan V, Wilson D, Teichmann SA. Lineage-specific expansion of DNA-binding transcription factor families. Trends Genet 2010; 26:388-93. [PMID: 20675012 PMCID: PMC2937223 DOI: 10.1016/j.tig.2010.06.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2010] [Revised: 06/11/2010] [Accepted: 06/11/2010] [Indexed: 11/06/2022]
Abstract
DNA-binding domains (DBDs) are essential components of sequence-specific transcription factors (TFs). We have investigated the distribution of all known DBDs in more than 500 completely sequenced genomes from the three major superkingdoms (Bacteria, Archaea and Eukaryota) and documented conserved and specific DBD occurrence in diverse taxonomic lineages. By combining DBD occurrence in different species with taxonomic information, we have developed an automatic method for inferring the origins of DBD families and their specific combinations with other protein families in TFs. We found only three out of 131 (2%) DBD families shared by the three superkingdoms.
Collapse
|
44
|
Charoensawan V, Wilson D, Teichmann SA. Genomic repertoires of DNA-binding transcription factors across the tree of life. Nucleic Acids Res 2010; 38:7364-77. [PMID: 20675356 PMCID: PMC2995046 DOI: 10.1093/nar/gkq617] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Sequence-specific transcription factors (TFs) are important to genetic regulation in all organisms because they recognize and directly bind to regulatory regions on DNA. Here, we survey and summarize the TF resources available. We outline the organisms for which TF annotation is provided, and discuss the criteria and methods used to annotate TFs by different databases. By using genomic TF repertoires from ∼700 genomes across the tree of life, covering Bacteria, Archaea and Eukaryota, we review TF abundance with respect to the number of genes, as well as their structural complexity in diverse lineages. While typical eukaryotic TFs are longer than the average eukaryotic proteins, the inverse is true for prokaryotes. Only in eukaryotes does the same family of DNA-binding domain (DBD) occur multiple times within one polypeptide chain. This potentially increases the length and diversity of DNA-recognition sequence by reusing DBDs from the same family. We examined the increase in TF abundance with the number of genes in genomes, using the largest set of prokaryotic and eukaryotic genomes to date. As pointed out before, prokaryotic TFs increase faster than linearly. We further observe a similar relationship in eukaryotic genomes with a slower increase in TFs.
Collapse
|
45
|
Adryan B, Teichmann SA. The developmental expression dynamics of Drosophila melanogaster transcription factors. Genome Biol 2010; 11:R40. [PMID: 20384991 PMCID: PMC2884543 DOI: 10.1186/gb-2010-11-4-r40] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Revised: 01/22/2010] [Accepted: 04/12/2010] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Site-specific transcription factors (TFs) are coordinators of developmental and physiological gene expression programs. Their binding to cis-regulatory modules of target genes mediates the precise cell- and context-specific activation and repression of genes. The expression of TFs should therefore reflect the core expression program of each cell. RESULTS We studied the expression dynamics of about 750 TFs using the available genomics resources in Drosophila melanogaster. We find that 95% of these TFs are expressed at some point during embryonic development, with a peak roughly between 10 and 12 hours after egg laying, the core stages of organogenesis. We address the differential utilization of DNA-binding domains in different developmental programs systematically in a spatio-temporal context, and show that the zinc finger class of TFs is predominantly early expressed, while Homeobox TFs exhibit later expression in embryogenesis. CONCLUSIONS Previous work, dissecting cis-regulatory modules during Drosophila development, suggests that TFs are deployed in groups acting in a cooperative manner. In contrast, we find that there is rapid exchange of co-expressed partners amongst the fly TFs, at rates similar to the genome-wide dynamics of co-expression clusters. This suggests there may also be a high level of combinatorial complexity of TFs at cis-regulatory modules.
Collapse
Affiliation(s)
- Boris Adryan
- Computational Biology Group, Structural Studies Division, MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK.
| | | |
Collapse
|
46
|
wDBTF: an integrated database resource for studying wheat transcription factor families. BMC Genomics 2010; 11:185. [PMID: 20298594 PMCID: PMC2858749 DOI: 10.1186/1471-2164-11-185] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2009] [Accepted: 03/18/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transcription factors (TFs) regulate gene expression by interacting with promoters of their target genes and are classified into families based on their DNA-binding domains. Genes coding for TFs have been identified in the sequences of model plant genomes. The rice (Oryza sativa spp. japonica) genome contains 2,384 TF gene models, which represent the mRNA transcript of a locus, classed into 63 families. RESULTS We have created an extensive list of wheat (Triticum aestivum L) TF sequences based on sequence homology with rice TFs identified and classified in the Database of Rice Transcription Factors (DRTF). We have identified 7,112 wheat sequences (contigs and singletons) from a dataset of 1,033,960 expressed sequence tag and mRNA (ET) sequences available. This number is about three times the number of TFs in rice so proportionally is very similar if allowance is made for the hexaploidy of wheat. Of these sequences 3,820 encode gene products with a DNA-binding domain and thus were confirmed as potential regulators. These 3,820 sequences were classified into 40 families and 84 subfamilies and some members defined orphan families. The results were compiled in the Database of Wheat Transcription Factor (wDBTF), an inventory available on the web http://wwwappli.nantes.inra.fr:8180/wDBFT/. For each accession, a link to its library source and its Affymetrix identification number is provided. The positions of Pfam (protein family database) motifs were given when known. CONCLUSIONS wDBTF collates 3,820 wheat TF sequences validated by the presence of a DNA-binding domain out of 7,112 potential TF sequences identified from publicly available gene expression data. We also incorporated in silico expression data on these TFs into the database. Thus this database provides a major resource for systematic studies of TF families and their expression in wheat as illustrated here in a study of DOF family members expressed during seed development.
Collapse
|
47
|
Angelini A, Amato A, Bianconi G, Bassetti B, Cosentino Lagomarsino M. Mean-field methods in evolutionary duplication-innovation-loss models for the genome-level repertoire of protein domains. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 81:021919. [PMID: 20365607 DOI: 10.1103/physreve.81.021919] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2009] [Revised: 01/22/2010] [Indexed: 05/29/2023]
Abstract
We present a combined mean-field and simulation approach to different models describing the dynamics of classes formed by elements that can appear, disappear, or copy themselves. These models, related to a paradigm duplication-innovation model known as Chinese restaurant process, are devised to reproduce the scaling behavior observed in the genome-wide repertoire of protein domains of all known species. In view of these data, we discuss the qualitative and quantitative differences of the alternative model formulations, focusing in particular on the roles of element loss and of the specificity of empirical domain classes.
Collapse
Affiliation(s)
- A Angelini
- Dipartimento di Fisica, Università degli Studi di Milano, Via Celoria 16, 20133 Milano, Italy
| | | | | | | | | |
Collapse
|
48
|
McAnally AA, Yampolsky LY. Widespread transcriptional autosomal dosage compensation in Drosophila correlates with gene expression level. Genome Biol Evol 2009; 2:44-52. [PMID: 20333221 PMCID: PMC2839349 DOI: 10.1093/gbe/evp054] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/15/2009] [Indexed: 11/12/2022] Open
Abstract
Little is known about dosage compensation in autosomal genes. Transcription-level compensation of deletions and other loss-of-function mutations may be a mechanism of dominance of wild-type alleles, a ubiquitous phenomenon whose nature has been a subject of a long debate. We measured gene expression in two isogenic Drosophila lines heterozygous for long deletions and compared our results with previously published gene expression data in a line heterozygous for a long duplication. We find that a majority of genes are at least partially compensated at transcription, both for (1/2)-fold dosage (in heterozygotes for deletions) and for 1.5-fold dosage (in heterozygotes for a duplication). The degree of compensation does not vary among functional classes of genes. Compensation for deletions is stronger for highly expressed genes. In contrast, the degree of compensation for duplications is stronger for weakly expressed genes. Thus, partial transcriptional compensation appears to be based on regulatory mechanisms that insure high transcription levels of some genes and low transcription levels of other genes, instead of precise maintenance of a particular homeostatic expression level. Given the ubiquity of transcriptional compensation, dominance of wild-type alleles may be at least partially caused by of the regulation at transcription level.
Collapse
Affiliation(s)
- Ashley A McAnally
- Department of Biological Sciences, East Tennessee State University, USA
| | | |
Collapse
|
49
|
Mochida K, Yoshida T, Sakurai T, Yamaguchi-Shinozaki K, Shinozaki K, Tran LSP. In silico analysis of transcription factor repertoire and prediction of stress responsive transcription factors in soybean. DNA Res 2009; 16:353-69. [PMID: 19884168 PMCID: PMC2780956 DOI: 10.1093/dnares/dsp023] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2009] [Accepted: 10/05/2009] [Indexed: 12/29/2022] Open
Abstract
Sequence-specific DNA-binding transcription factors (TFs) are often termed as 'master regulators' which bind to DNA and either activate or repress gene transcription. We have computationally analysed the soybean genome sequence data and constructed a proper set of TFs based on the Hidden Markov Model profiles of DNA-binding domain families. Within the soybean genome, we identified 4342 loci encoding 5035 TF models which grouped into 61 families. We constructed a database named SoybeanTFDB (http://soybeantfdb.psc.riken.jp) containing the full compilation of soybean TFs and significant information such as: functional motifs, full-length cDNAs, domain alignments, promoter regions, genomic organization and putative regulatory functions based on annotations of gene ontology (GO) inferred by comparative analysis with Arabidopsis. With particular interest in abiotic stress signalling, we analysed the promoter regions for all of the TF encoding genes as a means to identify abiotic stress responsive cis-elements as well as all types of cis-motifs provided by the PLACE database. SoybeanTFDB enables scientists to easily access cis-element and GO annotations to aid in the prediction of TF function and selection of TFs with functions of interest. This study provides a basic framework and an important user-friendly public information resource which enables analyses of transcriptional regulation in soybean.
Collapse
Affiliation(s)
- Keiichi Mochida
- RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Takuhiro Yoshida
- RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Tetsuya Sakurai
- RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | | | - Kazuo Shinozaki
- RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Lam-Son Phan Tran
- RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
50
|
A census of human transcription factors: function, expression and evolution. Nat Rev Genet 2009; 10:252-63. [PMID: 19274049 DOI: 10.1038/nrg2538] [Citation(s) in RCA: 1095] [Impact Index Per Article: 73.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Transcription factors are key cellular components that control gene expression: their activities determine how cells function and respond to the environment. Currently, there is great interest in research into human transcriptional regulation. However, surprisingly little is known about these regulators themselves. For example, how many transcription factors does the human genome contain? How are they expressed in different tissues? Are they evolutionarily conserved? Here, we present an analysis of 1,391 manually curated sequence-specific DNA-binding transcription factors, their functions, genomic organization and evolutionary conservation. Much remains to be explored, but this study provides a solid foundation for future investigations to elucidate regulatory mechanisms underlying diverse mammalian biological processes.
Collapse
|