Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Haldane A, Manhart M, Morozov AV. Biophysical fitness landscapes for transcription factor binding sites. PLoS Comput Biol 2014;10:e1003683. [PMID: 25010228 PMCID: PMC4091707 DOI: 10.1371/journal.pcbi.1003683] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Accepted: 05/11/2014] [Indexed: 11/18/2022] Open

For:	Haldane A, Manhart M, Morozov AV. Biophysical fitness landscapes for transcription factor binding sites. PLoS Comput Biol 2014;10:e1003683. [PMID: 25010228 PMCID: PMC4091707 DOI: 10.1371/journal.pcbi.1003683] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Accepted: 05/11/2014] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Vahab N, Bonu T, Kuhlmann L, Ramialison M, Tyagi S. Uncovering co-regulatory modules and gene regulatory networks in the heart through machine learning-based analysis of large-scale epigenomic data. Comput Biol Med 2024;171:108068. [PMID: 38354497 DOI: 10.1016/j.compbiomed.2024.108068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/30/2023] [Accepted: 01/27/2024] [Indexed: 02/16/2024]

Abstract

The availability of large-scale epigenomic data from various cell types and conditions has yielded valuable insights for evaluating and learning features predicting the co-binding of transcription factors (TF). However, prior attempts to develop models predicting motif co-occurrence lacked scalability for globally analyzing any motif combination or making cross-species predictions. Moreover, mapping co-regulatory modules (CRM) to gene regulatory networks (GRN) is crucial for understanding underlying function. Currently, no comprehensive pipeline exists for large-scale, rapid, and accurate CRM and GRN identification. In this study, we analyzed and evaluated different TF binding characteristics facilitating biologically significant co-binding to identify all potential clusters of co-binding TFs. We curated the UniBind database, containing ChIP-Seq data from over 1983 samples and 232 TFs, and implemented two machine learning models to predict CRMs and the potential regulatory networks they operate on. Two machine learning models, Convolution Neural Networks (CNN) and Random Forest Classifier(RFC), used to predict co-binding between TFs, were compared using precision-recall Receiver Operating Characteristic (ROC) curves. CNN outperformed RFC (AUC 0.94 vs. 0.88) and achieved higher F1 scores (0.938 vs. 0.872). The CRMs generated by the clustering algorithm were validated against ChipAtlas and MCOT, revealing additional motifs forming CRMs. We predicted 200k CRMs for 50k+ human genes, validated against recent CRM prediction methods with 100% overlap. Further, we narrowed our focus to study heart-related regulatory motifs, filtering the generated CRMs to report 1784 Cardiac CRMs containing at least four cardiac TFs. Identified cardiac CRMs revealed potential novel regulators like ARID3A and RXRB for SCAD, including known TFs like PPARG for F11R. Our findings highlight the importance of the NKX family of transcription factors in cardiac development and provide potential targets for further investigation in cardiac disease.

Collapse

Srivastava M, Payne JL. On the incongruence of genotype-phenotype and fitness landscapes. PLoS Comput Biol 2022;18:e1010524. [PMID: 36121840 PMCID: PMC9521842 DOI: 10.1371/journal.pcbi.1010524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 09/29/2022] [Accepted: 08/30/2022] [Indexed: 11/22/2022] Open

Fernandez-de-Cossio-Diaz J, Uguzzoni G, Pagnani A. Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan. Mol Biol Evol 2021;38:318-328. [PMID: 32770229 PMCID: PMC7783173 DOI: 10.1093/molbev/msaa204] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

Gautam P, Kumar Sinha S. Anticipating response function in gene regulatory networks. J R Soc Interface 2021;18:20210206. [PMID: 34062105 DOI: 10.1098/rsif.2021.0206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Manrubia S, Cuesta JA, Aguirre J, Ahnert SE, Altenberg L, Cano AV, Catalán P, Diaz-Uriarte R, Elena SF, García-Martín JA, Hogeweg P, Khatri BS, Krug J, Louis AA, Martin NS, Payne JL, Tarnowski MJ, Weiß M. From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys Life Rev 2021;38:55-106. [PMID: 34088608 DOI: 10.1016/j.plrev.2021.03.004] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022]

Affiliation(s)

Susanna Manrubia Department of Systems Biology, Centro Nacional de Biotecnología (CSIC), Madrid, Spain; Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.
José A Cuesta Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain; Instituto de Biocomputación y Física de Sistemas Complejos (BiFi), Universidad de Zaragoza, Spain; UC3M-Santander Big Data Institute (IBiDat), Getafe, Madrid, Spain
Jacobo Aguirre Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Centro de Astrobiología, CSIC-INTA, ctra. de Ajalvir km 4, 28850 Torrejón de Ardoz, Madrid, Spain
Sebastian E Ahnert Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
Lee Altenberg University of Hawai'i at Manoa, HI, USA
Alejandro V Cano Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
Pablo Catalán Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain
Ramon Diaz-Uriarte Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain; Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain
Santiago F Elena Instituto de Biología Integrativa de Sistemas, I(2)SysBio (CSIC-UV), València, Spain; The Santa Fe Institute, Santa Fe, NM, USA
Juan Antonio García-Martín Bioinformatics for Genomics and Proteomics. Centro Nacional de Biotecnología (CSIC), Madrid, Spain
Paulien Hogeweg Theoretical Biology and Bioinformatics Group, Utrecht University, the Netherlands
Bhavin S Khatri The Francis Crick Institute, London, UK; Department of Life Sciences, Imperial College London, London, UK
Joachim Krug Institute for Biological Physics, University of Cologne, Köln, Germany
Ard A Louis Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
Nora S Martin Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
Joshua L Payne Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
Matthew J Tarnowski School of Biological Sciences, University of Bristol, Bristol, UK
Marcel Weiß Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK

Collapse

Ballal A, Laurendon C, Salmon M, Vardakou M, Cheema J, Defernez M, O'Maille PE, Morozov AV. Sparse Epistatic Patterns in the Evolution of Terpene Synthases. Mol Biol Evol 2020;37:1907-1924. [PMID: 32119077 DOI: 10.1093/molbev/msaa052] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

The relation between crosstalk and gene regulation form revisited. PLoS Comput Biol 2020;16:e1007642. [PMID: 32097416 PMCID: PMC7059967 DOI: 10.1371/journal.pcbi.1007642] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 03/06/2020] [Accepted: 01/08/2020] [Indexed: 01/11/2023] Open

Abstract

Genes differ in the frequency at which they are expressed and in the form of regulation used to control their activity. In particular, positive or negative regulation can lead to activation of a gene in response to an external signal. Previous works proposed that the form of regulation of a gene correlates with its frequency of usage: positive regulation when the gene is frequently expressed and negative regulation when infrequently expressed. Such network design means that, in the absence of their regulators, the genes are found in their least required activity state, hence regulatory intervention is often necessary. Due to the multitude of genes and regulators, spurious binding and unbinding events, called “crosstalk”, could occur. To determine how the form of regulation affects the global crosstalk in the network, we used a mathematical model that includes multiple regulators and multiple target genes. We found that crosstalk depends non-monotonically on the availability of regulators. Our analysis showed that excess use of regulation entailed by the formerly suggested network design caused high crosstalk levels in a large part of the parameter space. We therefore considered the opposite ‘idle’ design, where the default unregulated state of genes is their frequently required activity state. We found, that ‘idle’ design minimized the use of regulation and thus minimized crosstalk. In addition, we estimated global crosstalk of S. cerevisiae using transcription factors binding data. We demonstrated that even partial network data could suffice to estimate its global crosstalk, suggesting its applicability to additional organisms. We found that S. cerevisiae estimated crosstalk is lower than that of a random network, suggesting that natural selection reduces crosstalk. In summary, our study highlights a new type of protein production cost which is typically overlooked: that of regulatory interference caused by the presence of excess regulators in the cell. It demonstrates the importance of whole-network descriptions, which could show effects missed by single-gene models.

Genes differ in the frequency at which they are expressed and in the form of regulation used to control their activity. The basic level of regulation is mediated by different types of DNA-binding proteins, where each type regulates particular gene(s). We distinguish between two basic forms of regulation: positive—if a gene is activated by the binding of its regulatory protein, and negative—if it is active unless bound by its regulatory protein. Due to the multitude of genes and regulators, spurious binding and unbinding events, called “crosstalk”, could occur. How does the form of regulation, positive or negative, affect the extent of regulatory crosstalk? To address this question, we used a mathematical model integrating many genes and many regulators. As intuition suggests, we found that in most of the parameter space, crosstalk increased with the availability of regulators. We propose, that crosstalk is usually reduced when networks are designed such that minimal regulation is needed, which we call the ‘idle’ design. In other words: a frequently needed gene will use negative regulation and conversely, a scarcely needed gene will employ positive regulation. In both cases, the requirement for the regulators is minimized. In addition, we demonstrate how crosstalk can be calculated from available datasets and discuss the technical challenges in such calculation, specifically data incompleteness.

Collapse

Frochaux MV, Bou Sleiman M, Gardeux V, Dainese R, Hollis B, Litovchenko M, Braman VS, Andreani T, Osman D, Deplancke B. cis-regulatory variation modulates susceptibility to enteric infection in the Drosophila genetic reference panel. Genome Biol 2020;21:6. [PMID: 31948474 PMCID: PMC6966807 DOI: 10.1186/s13059-019-1912-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 12/05/2019] [Indexed: 02/07/2023] Open

Affiliation(s)

Michael V. Frochaux Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
Maroun Bou Sleiman Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland Current Address: Laboratory of Integrative Systems Physiology, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Vincent Gardeux Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
Riccardo Dainese Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
Brian Hollis Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland Current Address: Department of Biological Sciences, University of South Carolina, Columbia, South Carolina USA
Maria Litovchenko Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
Virginie S. Braman Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Tommaso Andreani Computational Biology and Data Mining Group, Institute of Molecular Biology, Johannes Gutenberg-Universität Mainz, Mainz, Germany
Dani Osman Faculty of Sciences III and Azm Center for Research in Biotechnology and its Applications, LBA3B, EDST, Lebanese University, Tripoli, 1300 Lebanon
Bart Deplancke Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

Collapse

Khatri BS, Goldstein RA. Biophysics and population size constrains speciation in an evolutionary model of developmental system drift. PLoS Comput Biol 2019;15:e1007177. [PMID: 31335870 PMCID: PMC6677325 DOI: 10.1371/journal.pcbi.1007177] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 08/02/2019] [Accepted: 06/13/2019] [Indexed: 02/06/2023] Open

Abstract

Developmental system drift is a likely mechanism for the origin of hybrid incompatibilities between closely related species. We examine here the detailed mechanistic basis of hybrid incompatibilities between two allopatric lineages, for a genotype-phenotype map of developmental system drift under stabilising selection, where an organismal phenotype is conserved, but the underlying molecular phenotypes and genotype can drift. This leads to number of emergent phenomenon not obtainable by modelling genotype or phenotype alone. Our results show that: 1) speciation is more rapid at smaller population sizes with a characteristic, Orr-like, power law, but at large population sizes slow, characterised by a sub-diffusive growth law; 2) the molecular phenotypes under weakest selection contribute to the earliest incompatibilities; and 3) pair-wise incompatibilities dominate over higher order, contrary to previous predictions that the latter should dominate. The population size effect we find is consistent with previous results on allopatric divergence of transcription factor-DNA binding, where smaller populations have common ancestors with a larger drift load because genetic drift favours phenotypes which have a larger number of genotypes (higher sequence entropy) over more fit phenotypes which have far fewer genotypes; this means less substitutions are required in either lineage before incompatibilities arise. Overall, our results indicate that biophysics and population size provide a much stronger constraint to speciation than suggested by previous models, and point to a general mechanistic principle of how incompatibilities arise the under stabilising selection for an organismal phenotype.

The process of speciation is of fundamental importance to the field of evolution as it is intimately connected to understanding the immense bio-diversity of life. There is still relatively little understanding of the underlying genetic mechanisms that give rise to hybrid incompatibilities with results suggesting that divergence in transcription factor DNA binding and gene expression play an important role. A key finding from the field of evo-devo is that organismal phenotypes show developmental system drift, where species maintain the same phenotype, but diverge in developmental pathways; this is an important potential source of hybrid incompatibilities. Here, we explore a theoretical framework to understand how incompatibilities arise due to developmental system drift, using a tractable biophysically inspired genotype-phenotype for spatial gene expression. Modelling the evolution of phenotypes in this way has the key advantage that it mirrors how selection works in nature, i.e. that selection acts on phenotypes, but variation (mutation) arise at the level of genotypes. This results, as we demonstrate, in a number of non-trivial and testable predictions concerning speciation due to developmental system drift, which would not be obtainable by modelling evolution of genotypes or phenotypes alone.

Collapse

Gilpin W, Feldman MW. Cryptic selection forces and dynamic heritability in generalized phenotypic evolution. Theor Popul Biol 2018;125:20-29. [PMID: 30528351 DOI: 10.1016/j.tpb.2018.11.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Revised: 11/10/2018] [Accepted: 11/14/2018] [Indexed: 11/26/2022]

Otwinowski J. Biophysical Inference of Epistasis and the Effects of Mutations on Protein Stability and Function. Mol Biol Evol 2018;35:2345-2354. [PMID: 30085303 PMCID: PMC6188545 DOI: 10.1093/molbev/msy141] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding. Proc Natl Acad Sci U S A 2018;115:E3702-E3711. [PMID: 29588420 PMCID: PMC5910820 DOI: 10.1073/pnas.1715888115] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Gibert JM, Blanco J, Dolezal M, Nolte V, Peronnet F, Schlötterer C. Strong epistatic and additive effects of linked candidate SNPs for Drosophila pigmentation have implications for analysis of genome-wide association studies results. Genome Biol 2017;18:126. [PMID: 28673357 PMCID: PMC5496195 DOI: 10.1186/s13059-017-1262-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 06/19/2017] [Indexed: 01/01/2023] Open

Teufel AI, Wilke CO. Accelerated simulation of evolutionary trajectories in origin-fixation models. J R Soc Interface 2017;14:20160906. [PMID: 28228542 PMCID: PMC5332577 DOI: 10.1098/rsif.2016.0906] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 01/31/2017] [Indexed: 11/12/2022] Open

A thousand empirical adaptive landscapes and their navigability. Nat Ecol Evol 2017;1:45. [PMID: 28812623 DOI: 10.1038/s41559-016-0045] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 12/05/2016] [Indexed: 01/22/2023]

Dresch JM, Zellers RG, Bork DK, Drewell RA. Nucleotide Interdependency in Transcription Factor Binding Sites in the Drosophila Genome. GENE REGULATION AND SYSTEMS BIOLOGY 2016;10:21-33. [PMID: 27330274 PMCID: PMC4907338 DOI: 10.4137/grsb.s38462] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Revised: 04/17/2016] [Accepted: 04/28/2016] [Indexed: 01/14/2023]

Abstract

A long-standing objective in modern biology is to characterize the molecular components that drive the development of an organism. At the heart of eukaryotic development lies gene regulation. On the molecular level, much of the research in this field has focused on the binding of transcription factors (TFs) to regulatory regions in the genome known as cis-regulatory modules (CRMs). However, relatively little is known about the sequence-specific binding preferences of many TFs, especially with respect to the possible interdependencies between the nucleotides that make up binding sites. A particular limitation of many existing algorithms that aim to predict binding site sequences is that they do not allow for dependencies between nonadjacent nucleotides. In this study, we use a recently developed computational algorithm, MARZ, to compare binding site sequences using 32 distinct models in a systematic and unbiased approach to explore nucleotide dependencies within binding sites for 15 distinct TFs known to be critical to Drosophila development. Our results indicate that many of these proteins have varying levels of nucleotide interdependencies within their DNA recognition sequences, and that, in some cases, models that account for these dependencies greatly outperform traditional models that are used to predict binding sites. We also directly compare the ability of different models to identify the known KRUPPEL TF binding sites in CRMs and demonstrate that a more complex model that accounts for nucleotide interdependencies performs better when compared with simple models. This ability to identify TFs with critical nucleotide interdependencies in their binding sites will lead to a deeper understanding of how these molecular characteristics contribute to the architecture of CRMs and the precise regulation of transcription during organismal development.

Collapse

Tuğrul M, Paixão T, Barton NH, Tkačik G. Dynamics of Transcription Factor Binding Site Evolution. PLoS Genet 2015;11:e1005639. [PMID: 26545200 PMCID: PMC4636380 DOI: 10.1371/journal.pgen.1005639] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 10/09/2015] [Indexed: 11/19/2022] Open

Abstract

Evolution of gene regulation is crucial for our understanding of the phenotypic differences between species, populations and individuals. Sequence-specific binding of transcription factors to the regulatory regions on the DNA is a key regulatory mechanism that determines gene expression and hence heritable phenotypic variation. We use a biophysical model for directional selection on gene expression to estimate the rates of gain and loss of transcription factor binding sites (TFBS) in finite populations under both point and insertion/deletion mutations. Our results show that these rates are typically slow for a single TFBS in an isolated DNA region, unless the selection is extremely strong. These rates decrease drastically with increasing TFBS length or increasingly specific protein-DNA interactions, making the evolution of sites longer than ∼ 10 bp unlikely on typical eukaryotic speciation timescales. Similarly, evolution converges to the stationary distribution of binding sequences very slowly, making the equilibrium assumption questionable. The availability of longer regulatory sequences in which multiple binding sites can evolve simultaneously, the presence of “pre-sites” or partially decayed old sites in the initial sequence, and biophysical cooperativity between transcription factors, can all facilitate gain of TFBS and reconcile theoretical calculations with timescales inferred from comparative genomics.

Evolution has produced a remarkable diversity of living forms that manifests in qualitative differences as well as quantitative traits. An essential factor that underlies this variability is transcription factor binding sites, short pieces of DNA that control gene expression levels. Nevertheless, we lack a thorough theoretical understanding of the evolutionary times required for the appearance and disappearance of these sites. By combining a biophysically realistic model for how cells read out information in transcription factor binding sites with model for DNA sequence evolution, we explore these timescales and ask what factors crucially affect them. We find that the emergence of binding sites from a random sequence is generically slow under point and insertion/deletion mutational mechanisms. Strong selection, sufficient genomic sequence in which the sites can evolve, the existence of partially decayed old binding sites in the sequence, as well as certain biophysical mechanisms such as cooperativity, can accelerate the binding site gain times and make them consistent with the timescales suggested by comparative analyses of genomic data.

Collapse

Prindull G. Potential Gene Interactions in the Cell Cycles of Gametes, Zygotes, Embryonic Stem Cells and the Development of Cancer. Front Oncol 2015;5:200. [PMID: 26442212 PMCID: PMC4585297 DOI: 10.3389/fonc.2015.00200] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Accepted: 08/31/2015] [Indexed: 11/13/2022] Open

Simple Biophysical Model Predicts Faster Accumulation of Hybrid Incompatibilities in Small Populations Under Stabilizing Selection. Genetics 2015;201:1525-37. [PMID: 26434721 PMCID: PMC4676520 DOI: 10.1534/genetics.115.181685] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 09/23/2015] [Indexed: 01/07/2023] Open

Khatri BS, Goldstein RA. A coarse-grained biophysical model of sequence evolution and the population size dependence of the speciation rate. J Theor Biol 2015;378:56-64. [PMID: 25936759 PMCID: PMC4457359 DOI: 10.1016/j.jtbi.2015.04.027] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 02/20/2015] [Accepted: 04/20/2015] [Indexed: 11/29/2022]

Manhart M, Morozov AV. Scaling properties of evolutionary paths in a biophysical model of protein adaptation. Phys Biol 2015;12:045001. [PMID: 26020812 DOI: 10.1088/1478-3975/12/4/045001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Evolutionary meandering of intermolecular interactions along the drift barrier. Proc Natl Acad Sci U S A 2014;112:E30-8. [PMID: 25535374 DOI: 10.1073/pnas.1421641112] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open