1
|
Vahab N, Bonu T, Kuhlmann L, Ramialison M, Tyagi S. Uncovering co-regulatory modules and gene regulatory networks in the heart through machine learning-based analysis of large-scale epigenomic data. Comput Biol Med 2024; 171:108068. [PMID: 38354497 DOI: 10.1016/j.compbiomed.2024.108068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/30/2023] [Accepted: 01/27/2024] [Indexed: 02/16/2024]
Abstract
The availability of large-scale epigenomic data from various cell types and conditions has yielded valuable insights for evaluating and learning features predicting the co-binding of transcription factors (TF). However, prior attempts to develop models predicting motif co-occurrence lacked scalability for globally analyzing any motif combination or making cross-species predictions. Moreover, mapping co-regulatory modules (CRM) to gene regulatory networks (GRN) is crucial for understanding underlying function. Currently, no comprehensive pipeline exists for large-scale, rapid, and accurate CRM and GRN identification. In this study, we analyzed and evaluated different TF binding characteristics facilitating biologically significant co-binding to identify all potential clusters of co-binding TFs. We curated the UniBind database, containing ChIP-Seq data from over 1983 samples and 232 TFs, and implemented two machine learning models to predict CRMs and the potential regulatory networks they operate on. Two machine learning models, Convolution Neural Networks (CNN) and Random Forest Classifier(RFC), used to predict co-binding between TFs, were compared using precision-recall Receiver Operating Characteristic (ROC) curves. CNN outperformed RFC (AUC 0.94 vs. 0.88) and achieved higher F1 scores (0.938 vs. 0.872). The CRMs generated by the clustering algorithm were validated against ChipAtlas and MCOT, revealing additional motifs forming CRMs. We predicted 200k CRMs for 50k+ human genes, validated against recent CRM prediction methods with 100% overlap. Further, we narrowed our focus to study heart-related regulatory motifs, filtering the generated CRMs to report 1784 Cardiac CRMs containing at least four cardiac TFs. Identified cardiac CRMs revealed potential novel regulators like ARID3A and RXRB for SCAD, including known TFs like PPARG for F11R. Our findings highlight the importance of the NKX family of transcription factors in cardiac development and provide potential targets for further investigation in cardiac disease.
Collapse
Affiliation(s)
- Naima Vahab
- School of Computational Technologies, RMIT University, Melbourne VIC 3000, Australia; Department of Infectious Diseases, Alfred Hospital, Prahran VIC 3008, Australia
| | - Tarun Bonu
- Faculty of Information Technology, Monash University, Clayton VIC 3800, Australia
| | - Levin Kuhlmann
- Faculty of Information Technology, Monash University, Clayton VIC 3800, Australia
| | | | - Sonika Tyagi
- School of Computational Technologies, RMIT University, Melbourne VIC 3000, Australia; Department of Infectious Diseases, Alfred Hospital, Prahran VIC 3008, Australia.
| |
Collapse
|
2
|
Srivastava M, Payne JL. On the incongruence of genotype-phenotype and fitness landscapes. PLoS Comput Biol 2022; 18:e1010524. [PMID: 36121840 PMCID: PMC9521842 DOI: 10.1371/journal.pcbi.1010524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 09/29/2022] [Accepted: 08/30/2022] [Indexed: 11/22/2022] Open
Abstract
The mapping from genotype to phenotype to fitness typically involves multiple nonlinearities that can transform the effects of mutations. For example, mutations may contribute additively to a phenotype, but their effects on fitness may combine non-additively because selection favors a low or intermediate value of that phenotype. This can cause incongruence between the topographical properties of a fitness landscape and its underlying genotype-phenotype landscape. Yet, genotype-phenotype landscapes are often used as a proxy for fitness landscapes to study the dynamics and predictability of evolution. Here, we use theoretical models and empirical data on transcription factor-DNA interactions to systematically study the incongruence of genotype-phenotype and fitness landscapes when selection favors a low or intermediate phenotypic value. Using the theoretical models, we prove a number of fundamental results. For example, selection for low or intermediate phenotypic values does not change simple sign epistasis into reciprocal sign epistasis, implying that genotype-phenotype landscapes with only simple sign epistasis motifs will always give rise to single-peaked fitness landscapes under such selection. More broadly, we show that such selection tends to create fitness landscapes that are more rugged than the underlying genotype-phenotype landscape, but this increased ruggedness typically does not frustrate adaptive evolution because the local adaptive peaks in the fitness landscape tend to be nearly as tall as the global peak. Many of these results carry forward to the empirical genotype-phenotype landscapes, which may help to explain why low- and intermediate-affinity transcription factor-DNA interactions are so prevalent in eukaryotic gene regulation.
Collapse
Affiliation(s)
- Malvika Srivastava
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joshua L. Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
3
|
Fernandez-de-Cossio-Diaz J, Uguzzoni G, Pagnani A. Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan. Mol Biol Evol 2021; 38:318-328. [PMID: 32770229 PMCID: PMC7783173 DOI: 10.1093/molbev/msaa204] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The recent technological advances underlying the screening of large combinatorial libraries in high-throughput mutational scans deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects, and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype-fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold.
Collapse
Affiliation(s)
- Jorge Fernandez-de-Cossio-Diaz
- Systems Biology Department, Center of Molecular Immunology, Havana, Cuba.,Laboratory of Physics of the Ecole Normale Superieure, CNRS UMR 8023 & PSL Research, Paris, France
| | | | - Andrea Pagnani
- Politecnico di Torino, Torino, Italy.,Italian Institute for Genomic Medicine, IRCCS Candiolo, Candiolo, TO, Italy.,INFN, Sezione di Torino, Torino, Italy
| |
Collapse
|
4
|
Gautam P, Kumar Sinha S. Anticipating response function in gene regulatory networks. J R Soc Interface 2021; 18:20210206. [PMID: 34062105 DOI: 10.1098/rsif.2021.0206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The origin of an ordered genetic response of a complex and noisy biological cell is intimately related to the detailed mechanism of protein-DNA interactions present in a wide variety of gene regulatory (GR) systems. However, the quantitative prediction of genetic response and the correlation between the mechanism and the response curve is poorly understood. Here, we report in silico binding studies of GR systems to show that the transcription factor (TF) binds to multiple DNA sites with high cooperativity spreads from specific binding sites into adjacent non-specific DNA and bends the DNA. Our analysis is not limited only to the isolated model system but also can be applied to a system containing multiple interacting genes. The controlling role of TF oligomerization, TF-ligand interactions, and DNA looping for gene expression has been also characterized. The predictions are validated against detailed grand canonical Monte Carlo simulations and published data for the lac operon system. Overall, our study reveals that the expression of target genes can be quantitatively controlled by modulating TF-ligand interactions and the bending energy of DNA.
Collapse
Affiliation(s)
- Pankaj Gautam
- Theoretical and Computational Biophysical Chemistry Group, Department of Chemistry, Indian Institute of Technology, Ropar 140001, India
| | - Sudipta Kumar Sinha
- Theoretical and Computational Biophysical Chemistry Group, Department of Chemistry, Indian Institute of Technology, Ropar 140001, India
| |
Collapse
|
5
|
Manrubia S, Cuesta JA, Aguirre J, Ahnert SE, Altenberg L, Cano AV, Catalán P, Diaz-Uriarte R, Elena SF, García-Martín JA, Hogeweg P, Khatri BS, Krug J, Louis AA, Martin NS, Payne JL, Tarnowski MJ, Weiß M. From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys Life Rev 2021; 38:55-106. [PMID: 34088608 DOI: 10.1016/j.plrev.2021.03.004] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022]
Abstract
Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves with a critical and constructive attitude into our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis.
Collapse
Affiliation(s)
- Susanna Manrubia
- Department of Systems Biology, Centro Nacional de Biotecnología (CSIC), Madrid, Spain; Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain; Instituto de Biocomputación y Física de Sistemas Complejos (BiFi), Universidad de Zaragoza, Spain; UC3M-Santander Big Data Institute (IBiDat), Getafe, Madrid, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Centro de Astrobiología, CSIC-INTA, ctra. de Ajalvir km 4, 28850 Torrejón de Ardoz, Madrid, Spain
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain; Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas, I(2)SysBio (CSIC-UV), València, Spain; The Santa Fe Institute, Santa Fe, NM, USA
| | | | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, the Netherlands
| | - Bhavin S Khatri
- The Francis Crick Institute, London, UK; Department of Life Sciences, Imperial College London, London, UK
| | - Joachim Krug
- Institute for Biological Physics, University of Cologne, Köln, Germany
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
| | - Nora S Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Marcel Weiß
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
6
|
Ballal A, Laurendon C, Salmon M, Vardakou M, Cheema J, Defernez M, O'Maille PE, Morozov AV. Sparse Epistatic Patterns in the Evolution of Terpene Synthases. Mol Biol Evol 2020; 37:1907-1924. [PMID: 32119077 DOI: 10.1093/molbev/msaa052] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
We explore sequence determinants of enzyme activity and specificity in a major enzyme family of terpene synthases. Most enzymes in this family catalyze reactions that produce cyclic terpenes-complex hydrocarbons widely used by plants and insects in diverse biological processes such as defense, communication, and symbiosis. To analyze the molecular mechanisms of emergence of terpene cyclization, we have carried out in-depth examination of mutational space around (E)-β-farnesene synthase, an Artemisia annua enzyme which catalyzes production of a linear hydrocarbon chain. Each mutant enzyme in our synthetic libraries was characterized biochemically, and the resulting reaction rate data were used as input to the Michaelis-Menten model of enzyme kinetics, in which free energies were represented as sums of one-amino-acid contributions and two-amino-acid couplings. Our model predicts measured reaction rates with high accuracy and yields free energy landscapes characterized by relatively few coupling terms. As a result, the Michaelis-Menten free energy landscapes have simple, interpretable structure and exhibit little epistasis. We have also developed biophysical fitness models based on the assumption that highly fit enzymes have evolved to maximize the output of correct products, such as cyclic products or a specific product of interest, while minimizing the output of byproducts. This approach results in nonlinear fitness landscapes that are considerably more epistatic. Overall, our experimental and computational framework provides focused characterization of evolutionary emergence of novel enzymatic functions in the context of microevolutionary exploration of sequence space around naturally occurring enzymes.
Collapse
Affiliation(s)
- Aditya Ballal
- Department of Physics & Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, NJ
| | - Caroline Laurendon
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom
| | - Melissa Salmon
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom.,Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | - Maria Vardakou
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom.,School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, United Kingdom
| | - Jitender Cheema
- John Innes Centre, Department of Computational and Systems Biology, Norwich Research Park, Norwich, United Kingdom
| | - Marianne Defernez
- Core Science Resources, Quadram Institute, Norwich Research Park, Norwich, United Kingdom
| | - Paul E O'Maille
- John Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich, United Kingdom.,Food & Health Programme, Institute of Food Research, Norwich Research Park, Norwich, United Kingdom.,SRI International, Menlo Park, CA
| | - Alexandre V Morozov
- Department of Physics & Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, NJ
| |
Collapse
|
7
|
The relation between crosstalk and gene regulation form revisited. PLoS Comput Biol 2020; 16:e1007642. [PMID: 32097416 PMCID: PMC7059967 DOI: 10.1371/journal.pcbi.1007642] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 03/06/2020] [Accepted: 01/08/2020] [Indexed: 01/11/2023] Open
Abstract
Genes differ in the frequency at which they are expressed and in the form of regulation used to control their activity. In particular, positive or negative regulation can lead to activation of a gene in response to an external signal. Previous works proposed that the form of regulation of a gene correlates with its frequency of usage: positive regulation when the gene is frequently expressed and negative regulation when infrequently expressed. Such network design means that, in the absence of their regulators, the genes are found in their least required activity state, hence regulatory intervention is often necessary. Due to the multitude of genes and regulators, spurious binding and unbinding events, called “crosstalk”, could occur. To determine how the form of regulation affects the global crosstalk in the network, we used a mathematical model that includes multiple regulators and multiple target genes. We found that crosstalk depends non-monotonically on the availability of regulators. Our analysis showed that excess use of regulation entailed by the formerly suggested network design caused high crosstalk levels in a large part of the parameter space. We therefore considered the opposite ‘idle’ design, where the default unregulated state of genes is their frequently required activity state. We found, that ‘idle’ design minimized the use of regulation and thus minimized crosstalk. In addition, we estimated global crosstalk of S. cerevisiae using transcription factors binding data. We demonstrated that even partial network data could suffice to estimate its global crosstalk, suggesting its applicability to additional organisms. We found that S. cerevisiae estimated crosstalk is lower than that of a random network, suggesting that natural selection reduces crosstalk. In summary, our study highlights a new type of protein production cost which is typically overlooked: that of regulatory interference caused by the presence of excess regulators in the cell. It demonstrates the importance of whole-network descriptions, which could show effects missed by single-gene models. Genes differ in the frequency at which they are expressed and in the form of regulation used to control their activity. The basic level of regulation is mediated by different types of DNA-binding proteins, where each type regulates particular gene(s). We distinguish between two basic forms of regulation: positive—if a gene is activated by the binding of its regulatory protein, and negative—if it is active unless bound by its regulatory protein. Due to the multitude of genes and regulators, spurious binding and unbinding events, called “crosstalk”, could occur. How does the form of regulation, positive or negative, affect the extent of regulatory crosstalk? To address this question, we used a mathematical model integrating many genes and many regulators. As intuition suggests, we found that in most of the parameter space, crosstalk increased with the availability of regulators. We propose, that crosstalk is usually reduced when networks are designed such that minimal regulation is needed, which we call the ‘idle’ design. In other words: a frequently needed gene will use negative regulation and conversely, a scarcely needed gene will employ positive regulation. In both cases, the requirement for the regulators is minimized. In addition, we demonstrate how crosstalk can be calculated from available datasets and discuss the technical challenges in such calculation, specifically data incompleteness.
Collapse
|
8
|
Frochaux MV, Bou Sleiman M, Gardeux V, Dainese R, Hollis B, Litovchenko M, Braman VS, Andreani T, Osman D, Deplancke B. cis-regulatory variation modulates susceptibility to enteric infection in the Drosophila genetic reference panel. Genome Biol 2020; 21:6. [PMID: 31948474 PMCID: PMC6966807 DOI: 10.1186/s13059-019-1912-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 12/05/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Resistance to enteric pathogens is a complex trait at the crossroads of multiple biological processes. We have previously shown in the Drosophila Genetic Reference Panel (DGRP) that resistance to infection is highly heritable, but our understanding of how the effects of genetic variants affect different molecular mechanisms to determine gut immunocompetence is still limited. RESULTS To address this, we perform a systems genetics analysis of the gut transcriptomes from 38 DGRP lines that were orally infected with Pseudomonas entomophila. We identify a large number of condition-specific, expression quantitative trait loci (local-eQTLs) with infection-specific ones located in regions enriched for FOX transcription factor motifs. By assessing the allelic imbalance in the transcriptomes of 19 F1 hybrid lines from a large round robin design, we independently attribute a robust cis-regulatory effect to only 10% of these detected local-eQTLs. However, additional analyses indicate that many local-eQTLs may act in trans instead. Comparison of the transcriptomes of DGRP lines that were either susceptible or resistant to Pseudomonas entomophila infection reveals nutcracker as the only differentially expressed gene. Interestingly, we find that nutcracker is linked to infection-specific eQTLs that correlate with its expression level and to enteric infection susceptibility. Further regulatory analysis reveals one particular eQTL that significantly decreases the binding affinity for the repressor Broad, driving differential allele-specific nutcracker expression. CONCLUSIONS Our collective findings point to a large number of infection-specific cis- and trans-acting eQTLs in the DGRP, including one common non-coding variant that lowers enteric infection susceptibility.
Collapse
Affiliation(s)
- Michael V. Frochaux
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Maroun Bou Sleiman
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Current Address: Laboratory of Integrative Systems Physiology, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Vincent Gardeux
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Riccardo Dainese
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Brian Hollis
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Current Address: Department of Biological Sciences, University of South Carolina, Columbia, South Carolina USA
| | - Maria Litovchenko
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Virginie S. Braman
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Tommaso Andreani
- Computational Biology and Data Mining Group, Institute of Molecular Biology, Johannes Gutenberg-Universität Mainz, Mainz, Germany
| | - Dani Osman
- Faculty of Sciences III and Azm Center for Research in Biotechnology and its Applications, LBA3B, EDST, Lebanese University, Tripoli, 1300 Lebanon
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
9
|
Khatri BS, Goldstein RA. Biophysics and population size constrains speciation in an evolutionary model of developmental system drift. PLoS Comput Biol 2019; 15:e1007177. [PMID: 31335870 PMCID: PMC6677325 DOI: 10.1371/journal.pcbi.1007177] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 08/02/2019] [Accepted: 06/13/2019] [Indexed: 02/06/2023] Open
Abstract
Developmental system drift is a likely mechanism for the origin of hybrid incompatibilities between closely related species. We examine here the detailed mechanistic basis of hybrid incompatibilities between two allopatric lineages, for a genotype-phenotype map of developmental system drift under stabilising selection, where an organismal phenotype is conserved, but the underlying molecular phenotypes and genotype can drift. This leads to number of emergent phenomenon not obtainable by modelling genotype or phenotype alone. Our results show that: 1) speciation is more rapid at smaller population sizes with a characteristic, Orr-like, power law, but at large population sizes slow, characterised by a sub-diffusive growth law; 2) the molecular phenotypes under weakest selection contribute to the earliest incompatibilities; and 3) pair-wise incompatibilities dominate over higher order, contrary to previous predictions that the latter should dominate. The population size effect we find is consistent with previous results on allopatric divergence of transcription factor-DNA binding, where smaller populations have common ancestors with a larger drift load because genetic drift favours phenotypes which have a larger number of genotypes (higher sequence entropy) over more fit phenotypes which have far fewer genotypes; this means less substitutions are required in either lineage before incompatibilities arise. Overall, our results indicate that biophysics and population size provide a much stronger constraint to speciation than suggested by previous models, and point to a general mechanistic principle of how incompatibilities arise the under stabilising selection for an organismal phenotype. The process of speciation is of fundamental importance to the field of evolution as it is intimately connected to understanding the immense bio-diversity of life. There is still relatively little understanding of the underlying genetic mechanisms that give rise to hybrid incompatibilities with results suggesting that divergence in transcription factor DNA binding and gene expression play an important role. A key finding from the field of evo-devo is that organismal phenotypes show developmental system drift, where species maintain the same phenotype, but diverge in developmental pathways; this is an important potential source of hybrid incompatibilities. Here, we explore a theoretical framework to understand how incompatibilities arise due to developmental system drift, using a tractable biophysically inspired genotype-phenotype for spatial gene expression. Modelling the evolution of phenotypes in this way has the key advantage that it mirrors how selection works in nature, i.e. that selection acts on phenotypes, but variation (mutation) arise at the level of genotypes. This results, as we demonstrate, in a number of non-trivial and testable predictions concerning speciation due to developmental system drift, which would not be obtainable by modelling evolution of genotypes or phenotypes alone.
Collapse
Affiliation(s)
| | - Richard A. Goldstein
- Division of Infection & Immunity, University College London, London, United Kingdom
| |
Collapse
|
10
|
Gilpin W, Feldman MW. Cryptic selection forces and dynamic heritability in generalized phenotypic evolution. Theor Popul Biol 2018; 125:20-29. [PMID: 30528351 DOI: 10.1016/j.tpb.2018.11.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Revised: 11/10/2018] [Accepted: 11/14/2018] [Indexed: 11/26/2022]
Abstract
Individuals with different phenotypes can have widely-varying responses to natural selection, yet many classical approaches to evolutionary dynamics emphasize only how a population's average phenotype increases in fitness over time. However, recent experimental results have produced examples of populations that have multiple fitness peaks, or that experience frequency-dependence that affects the direction and strength of selection on certain individuals. Here, we extend classical fitness gradient formulations of natural selection in order to describe the dynamics of a phenotype distribution in terms of its moments-such as the mean, variance, and skewness. The number of governing equations in our model can be adjusted in order to capture different degrees of detail about the population. We compare our simplified model to direct Wright-Fisher simulations of evolution in several canonical fitness landscapes, and we find that our model provides a low-dimensional description of complex dynamics not typically explained by classical theory, such as cryptic selection forces due to selection on trait ranges, time-variation of the heritability, and nonlinear responses to stabilizing or disruptive selection due to asymmetric trait distributions. In addition to providing a framework for extending general understanding of common qualitative concepts in phenotypic evolution - such as fitness gradients, selection pressures, and heritability - our approach has practical importance for studying evolution in contexts in which genetic analysis is infeasible.
Collapse
Affiliation(s)
- William Gilpin
- Department of Applied Physics, Stanford University, Stanford, CA, United States.
| | - Marcus W Feldman
- Department of Biology, Stanford University, Stanford, CA, United States
| |
Collapse
|
11
|
Otwinowski J. Biophysical Inference of Epistasis and the Effects of Mutations on Protein Stability and Function. Mol Biol Evol 2018; 35:2345-2354. [PMID: 30085303 PMCID: PMC6188545 DOI: 10.1093/molbev/msy141] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Understanding the relationship between protein sequence, function, and stability is a fundamental problem in biology. The essential function of many proteins that fold into a specific structure is their ability to bind to a ligand, which can be assayed for thousands of mutated variants. However, binding assays do not distinguish whether mutations affect the stability of the binding interface or the overall fold. Here, we introduce a statistical method to infer a detailed energy landscape of how a protein folds and binds to a ligand by combining information from many mutated variants. We fit a thermodynamic model describing the bound, unbound, and unfolded states to high quality data of protein G domain B1 binding to IgG-Fc. We infer distinct folding and binding energies for each mutation providing a detailed view of how mutations affect binding and stability across the protein. We accurately infer the folding energy of each variant in physical units, validated by independent data, whereas previous high-throughput methods could only measure indirect changes in stability. While we assume an additive sequence-energy relationship, the binding fraction is epistatic due its nonlinear relation to energy. Despite having no epistasis in energy, our model explains much of the observed epistasis in binding fraction, with the remaining epistasis identifying conformationally dynamic regions.
Collapse
Affiliation(s)
- Jakub Otwinowski
- Biology Department, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
12
|
Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding. Proc Natl Acad Sci U S A 2018; 115:E3702-E3711. [PMID: 29588420 PMCID: PMC5910820 DOI: 10.1073/pnas.1715888115] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Transcription factors (TFs) are primary regulators of gene expression in cells, where they bind specific genomic target sites to control transcription. Quantitative measurements of TF-DNA binding energies can improve the accuracy of predictions of TF occupancy and downstream gene expression in vivo and shed light on how transcriptional networks are rewired throughout evolution. Here, we present a sequencing-based TF binding assay and analysis pipeline (BET-seq, for Binding Energy Topography by sequencing) capable of providing quantitative estimates of binding energies for more than one million DNA sequences in parallel at high energetic resolution. Using this platform, we measured the binding energies associated with all possible combinations of 10 nucleotides flanking the known consensus DNA target interacting with two model yeast TFs, Pho4 and Cbf1. A large fraction of these flanking mutations change overall binding energies by an amount equal to or greater than consensus site mutations, suggesting that current definitions of TF binding sites may be too restrictive. By systematically comparing estimates of binding energies output by deep neural networks (NNs) and biophysical models trained on these data, we establish that dinucleotide (DN) specificities are sufficient to explain essentially all variance in observed binding behavior, with Cbf1 binding exhibiting significantly more nonadditivity than Pho4. NN-derived binding energies agree with orthogonal biochemical measurements and reveal that dynamically occupied sites in vivo are both energetically and mutationally distant from the highest affinity sites.
Collapse
|
13
|
Gibert JM, Blanco J, Dolezal M, Nolte V, Peronnet F, Schlötterer C. Strong epistatic and additive effects of linked candidate SNPs for Drosophila pigmentation have implications for analysis of genome-wide association studies results. Genome Biol 2017; 18:126. [PMID: 28673357 PMCID: PMC5496195 DOI: 10.1186/s13059-017-1262-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 06/19/2017] [Indexed: 01/01/2023] Open
Abstract
Background The mapping resolution of genome-wide association studies (GWAS) is limited by historic recombination events and effects are often assigned to haplotype blocks rather than individual SNPs. It is not clear how many of the SNPs in the block, and which ones, are causative. Drosophila pigmentation is a powerful model to dissect the genetic basis of intra-specific and inter-specific phenotypic variation. Three tightly linked SNPs in the t-MSE enhancer have been identified in three D. melanogaster populations as major contributors to female abdominal pigmentation. This enhancer controls the expression of the pigmentation gene tan (t) in the abdominal epidermis. Two of the three SNPs were confirmed in an independent study using the D. melanogaster Genetic Reference Panel established from a North American population. Results We determined the functional impact of SNP1, SNP2, and SNP3 using transgenic lines to test all possible haplotypes in vivo. We show that all three candidate SNPs contribute to female Drosophila abdominal pigmentation. Interestingly, only two SNPs agree with the effect predicted by GWAS; the third one goes in the opposite direction because of linkage disequilibrium between multiple functional SNPs. Our experimental design uncovered strong additive effects for the three SNPs, but we also found significant epistatic effects explaining up to 11% of the total variation. Conclusions Our results suggest that linked causal variants are important for the interpretation of GWAS and functional validation is needed to understand the genetic architecture of traits. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1262-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jean-Michel Gibert
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Biologie du Développement Paris Seine-Institut de Biologie Paris Seine (LBD-IBPS), case 24, 9 quai St-Bernard, 75005, Paris, France
| | - Jorge Blanco
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, 1210, Wien, Austria
| | - Marlies Dolezal
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, 1210, Wien, Austria
| | - Viola Nolte
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, 1210, Wien, Austria
| | - Frédérique Peronnet
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Biologie du Développement Paris Seine-Institut de Biologie Paris Seine (LBD-IBPS), case 24, 9 quai St-Bernard, 75005, Paris, France
| | - Christian Schlötterer
- Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, 1210, Wien, Austria.
| |
Collapse
|
14
|
Teufel AI, Wilke CO. Accelerated simulation of evolutionary trajectories in origin-fixation models. J R Soc Interface 2017; 14:20160906. [PMID: 28228542 PMCID: PMC5332577 DOI: 10.1098/rsif.2016.0906] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 01/31/2017] [Indexed: 11/12/2022] Open
Abstract
We present an accelerated algorithm to forward-simulate origin-fixation models. Our algorithm requires, on average, only about two fitness evaluations per fixed mutation, whereas traditional algorithms require, per one fixed mutation, a number of fitness evaluations of the order of the effective population size, Ne Our accelerated algorithm yields the exact same steady state as the original algorithm but produces a different order of fixed mutations. By comparing several relevant evolutionary metrics, such as the distribution of fixed selection coefficients and the probability of reversion, we find that the two algorithms behave equivalently in many respects. However, the accelerated algorithm yields less variance in fixed selection coefficients. Notably, we are able to recover the expected amount of variance by rescaling population size, and we find a linear relationship between the rescaled population size and the population size used by the original algorithm. Considering the widespread usage of origin-fixation simulations across many areas of evolutionary biology, we introduce our accelerated algorithm as a useful tool for increasing the computational complexity of fitness functions without sacrificing much in terms of accuracy of the evolutionary simulation.
Collapse
Affiliation(s)
- Ashley I Teufel
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
15
|
A thousand empirical adaptive landscapes and their navigability. Nat Ecol Evol 2017; 1:45. [PMID: 28812623 DOI: 10.1038/s41559-016-0045] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 12/05/2016] [Indexed: 01/22/2023]
Abstract
The adaptive landscape is an iconic metaphor that pervades evolutionary biology. It was mostly applied in theoretical models until recent years, when empirical data began to allow partial landscape reconstructions. Here, we exhaustively analyse 1,137 complete landscapes from 129 eukaryotic species, each describing the binding affinity of a transcription factor to all possible short DNA sequences. We find that the navigability of these landscapes through single mutations is intermediate to that of additive and shuffled null models, suggesting that binding affinity-and thereby gene expression-is readily fine-tuned via mutations in transcription factor binding sites. The landscapes have few peaks that vary in their accessibility and in the number of sequences they contain. Binding sites in the mouse genome are enriched in sequences found in the peaks of especially navigable landscapes and the genetic diversity of binding sites in yeast increases with the number of sequences in a peak. Our findings suggest that landscape navigability may have contributed to the enormous success of transcriptional regulation as a source of evolutionary adaptations and innovations.
Collapse
|
16
|
Dresch JM, Zellers RG, Bork DK, Drewell RA. Nucleotide Interdependency in Transcription Factor Binding Sites in the Drosophila Genome. GENE REGULATION AND SYSTEMS BIOLOGY 2016; 10:21-33. [PMID: 27330274 PMCID: PMC4907338 DOI: 10.4137/grsb.s38462] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Revised: 04/17/2016] [Accepted: 04/28/2016] [Indexed: 01/14/2023]
Abstract
A long-standing objective in modern biology is to characterize the molecular components that drive the development of an organism. At the heart of eukaryotic development lies gene regulation. On the molecular level, much of the research in this field has focused on the binding of transcription factors (TFs) to regulatory regions in the genome known as cis-regulatory modules (CRMs). However, relatively little is known about the sequence-specific binding preferences of many TFs, especially with respect to the possible interdependencies between the nucleotides that make up binding sites. A particular limitation of many existing algorithms that aim to predict binding site sequences is that they do not allow for dependencies between nonadjacent nucleotides. In this study, we use a recently developed computational algorithm, MARZ, to compare binding site sequences using 32 distinct models in a systematic and unbiased approach to explore nucleotide dependencies within binding sites for 15 distinct TFs known to be critical to Drosophila development. Our results indicate that many of these proteins have varying levels of nucleotide interdependencies within their DNA recognition sequences, and that, in some cases, models that account for these dependencies greatly outperform traditional models that are used to predict binding sites. We also directly compare the ability of different models to identify the known KRUPPEL TF binding sites in CRMs and demonstrate that a more complex model that accounts for nucleotide interdependencies performs better when compared with simple models. This ability to identify TFs with critical nucleotide interdependencies in their binding sites will lead to a deeper understanding of how these molecular characteristics contribute to the architecture of CRMs and the precise regulation of transcription during organismal development.
Collapse
Affiliation(s)
- Jacqueline M. Dresch
- Department of Mathematics and Computer Science, Clark University, Worcester, MA, USA
| | - Rowan G. Zellers
- Computer Science Department, Harvey Mudd College, Claremont, CA, USA
- Mathematics Department, Harvey Mudd College, Claremont, CA, USA
| | - Daniel K. Bork
- Computer Science Department, Harvey Mudd College, Claremont, CA, USA
- Mathematics Department, Harvey Mudd College, Claremont, CA, USA
| | | |
Collapse
|
17
|
Tuğrul M, Paixão T, Barton NH, Tkačik G. Dynamics of Transcription Factor Binding Site Evolution. PLoS Genet 2015; 11:e1005639. [PMID: 26545200 PMCID: PMC4636380 DOI: 10.1371/journal.pgen.1005639] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 10/09/2015] [Indexed: 11/19/2022] Open
Abstract
Evolution of gene regulation is crucial for our understanding of the phenotypic differences between species, populations and individuals. Sequence-specific binding of transcription factors to the regulatory regions on the DNA is a key regulatory mechanism that determines gene expression and hence heritable phenotypic variation. We use a biophysical model for directional selection on gene expression to estimate the rates of gain and loss of transcription factor binding sites (TFBS) in finite populations under both point and insertion/deletion mutations. Our results show that these rates are typically slow for a single TFBS in an isolated DNA region, unless the selection is extremely strong. These rates decrease drastically with increasing TFBS length or increasingly specific protein-DNA interactions, making the evolution of sites longer than ∼ 10 bp unlikely on typical eukaryotic speciation timescales. Similarly, evolution converges to the stationary distribution of binding sequences very slowly, making the equilibrium assumption questionable. The availability of longer regulatory sequences in which multiple binding sites can evolve simultaneously, the presence of “pre-sites” or partially decayed old sites in the initial sequence, and biophysical cooperativity between transcription factors, can all facilitate gain of TFBS and reconcile theoretical calculations with timescales inferred from comparative genomics. Evolution has produced a remarkable diversity of living forms that manifests in qualitative differences as well as quantitative traits. An essential factor that underlies this variability is transcription factor binding sites, short pieces of DNA that control gene expression levels. Nevertheless, we lack a thorough theoretical understanding of the evolutionary times required for the appearance and disappearance of these sites. By combining a biophysically realistic model for how cells read out information in transcription factor binding sites with model for DNA sequence evolution, we explore these timescales and ask what factors crucially affect them. We find that the emergence of binding sites from a random sequence is generically slow under point and insertion/deletion mutational mechanisms. Strong selection, sufficient genomic sequence in which the sites can evolve, the existence of partially decayed old binding sites in the sequence, as well as certain biophysical mechanisms such as cooperativity, can accelerate the binding site gain times and make them consistent with the timescales suggested by comparative analyses of genomic data.
Collapse
Affiliation(s)
- Murat Tuğrul
- Institute of Science and Technology Austria, Klosterneuburg, Austria
- * E-mail:
| | - Tiago Paixão
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | | | - Gašper Tkačik
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| |
Collapse
|
18
|
Prindull G. Potential Gene Interactions in the Cell Cycles of Gametes, Zygotes, Embryonic Stem Cells and the Development of Cancer. Front Oncol 2015; 5:200. [PMID: 26442212 PMCID: PMC4585297 DOI: 10.3389/fonc.2015.00200] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Accepted: 08/31/2015] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES This review is to explore whether potential gene interactions in the cell cycles of gametes, zygotes, and embryonic stem (ES) cells are associated with the development of cancer. METHODS MEDPILOT at the Central Library of the University of Cologne, Germany (Zentralbibliothek Köln) that covers 5,800 international medical journals and 4,300 E-journals was used to collect data. The initial searches were done in December 2012 and additional searches in October 2013-May 2015. The search terms included "cancer development," "gene interaction," and "ES cells," and the time period was between 1998 and 2015. A total of 147 articles in English language only were included in this review. RESULTS Transgenerational gene translation is implemented in the zygote through interactions of epigenetic isoforms of transcription factors (TFs) from parental gametes, predominantly during the first two zygote cleavages. Pluripotent transcription factors may provide interacting links with mutated genes during zygote-to-ES cell switches. Translation of post-transcriptional carcinogenic genes is implemented by abnormally spliced, tumor-specific isoforms of gene-encoded mRNA/non-coding RNA variants of TFs employing de novo gene synthesis and neofunctionalization. Post-translationally, mutated genes are preserved in pre-neoplastic ES cell subpopulations that can give rise to overt cancer stem cells. Thus, TFs operate as cell/disease-specific epigenetic messengers triggering clinical expression of neoplasms. CONCLUSION Potential gene interactions in the cell cycle of gametes, zygotes, and ES cells may play some roles in the development of cancer.
Collapse
Affiliation(s)
- Gregor Prindull
- Medical Faculty, University of Göttingen , Göttingen , Germany
| |
Collapse
|
19
|
Simple Biophysical Model Predicts Faster Accumulation of Hybrid Incompatibilities in Small Populations Under Stabilizing Selection. Genetics 2015; 201:1525-37. [PMID: 26434721 PMCID: PMC4676520 DOI: 10.1534/genetics.115.181685] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 09/23/2015] [Indexed: 01/07/2023] Open
Abstract
Speciation is fundamental to the process of generating the huge diversity of life on Earth. However, we are yet to have a clear understanding of its molecular-genetic basis. Here, we examine a computational model of reproductive isolation that explicitly incorporates a map from genotype to phenotype based on the biophysics of protein–DNA binding. In particular, we model the binding of a protein transcription factor to a DNA binding site and how their independent coevolution, in a stabilizing fitness landscape, of two allopatric lineages leads to incompatibilities. Complementing our previous coarse-grained theoretical results, our simulations give a new prediction for the monomorphic regime of evolution that smaller populations should develop incompatibilities more quickly. This arises as (1) smaller populations have a greater initial drift load, as there are more sequences that bind poorly than well, so fewer substitutions are needed to reach incompatible regions of phenotype space, and (2) slower divergence when the population size is larger than the inverse of discrete differences in fitness. Further, we find longer sequences develop incompatibilities more quickly at small population sizes, but more slowly at large population sizes. The biophysical model thus represents a robust mechanism of rapid reproductive isolation for small populations and large sequences that does not require peak shifts or positive selection. Finally, we show that the growth of DMIs with time is quadratic for small populations, agreeing with Orr’s model, but nonpower law for large populations, with a form consistent with our previous theoretical results.
Collapse
|
20
|
Khatri BS, Goldstein RA. A coarse-grained biophysical model of sequence evolution and the population size dependence of the speciation rate. J Theor Biol 2015; 378:56-64. [PMID: 25936759 PMCID: PMC4457359 DOI: 10.1016/j.jtbi.2015.04.027] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 02/20/2015] [Accepted: 04/20/2015] [Indexed: 11/29/2022]
Abstract
Speciation is fundamental to understanding the huge diversity of life on Earth. Although still controversial, empirical evidence suggests that the rate of speciation is larger for smaller populations. Here, we explore a biophysical model of speciation by developing a simple coarse-grained theory of transcription factor-DNA binding and how their co-evolution in two geographically isolated lineages leads to incompatibilities. To develop a tractable analytical theory, we derive a Smoluchowski equation for the dynamics of binding energy evolution that accounts for the fact that natural selection acts on phenotypes, but variation arises from mutations in sequences; the Smoluchowski equation includes selection due to both gradients in fitness and gradients in sequence entropy, which is the logarithm of the number of sequences that correspond to a particular binding energy. This simple consideration predicts that smaller populations develop incompatibilities more quickly in the weak mutation regime; this trend arises as sequence entropy poises smaller populations closer to incompatible regions of phenotype space. These results suggest a generic coarse-grained approach to evolutionary stochastic dynamics, allowing realistic modelling at the phenotypic level.
Collapse
Affiliation(s)
- Bhavin S Khatri
- The Francis Crick Institute, Mill Hill Laboratory, The Ridgeway, London NW7 1AA, UK; Division of Infection & Immunity, University College London, London WC1E 6BT, UK.
| | - Richard A Goldstein
- Division of Infection & Immunity, University College London, London WC1E 6BT, UK.
| |
Collapse
|
21
|
Manhart M, Morozov AV. Scaling properties of evolutionary paths in a biophysical model of protein adaptation. Phys Biol 2015; 12:045001. [PMID: 26020812 DOI: 10.1088/1478-3975/12/4/045001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The enormous size and complexity of genotypic sequence space frequently requires consideration of coarse-grained sequences in empirical models. We develop scaling relations to quantify the effect of this coarse-graining on properties of fitness landscapes and evolutionary paths. We first consider evolution on a simple Mount Fuji fitness landscape, focusing on how the length and predictability of evolutionary paths scale with the coarse-grained sequence length and alphabet. We obtain simple scaling relations for both the weak- and strong-selection limits, with a non-trivial crossover regime at intermediate selection strengths. We apply these results to evolution on a biophysical fitness landscape that describes how proteins evolve new binding interactions while maintaining their folding stability. We combine the scaling relations with numerical calculations for coarse-grained protein sequences to obtain quantitative properties of the model for realistic binding interfaces and a full amino acid alphabet.
Collapse
Affiliation(s)
- Michael Manhart
- Department of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854, USA
| | | |
Collapse
|
22
|
Evolutionary meandering of intermolecular interactions along the drift barrier. Proc Natl Acad Sci U S A 2014; 112:E30-8. [PMID: 25535374 DOI: 10.1073/pnas.1421641112] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Many cellular functions depend on highly specific intermolecular interactions, for example transcription factors and their DNA binding sites, microRNAs and their RNA binding sites, the interfaces between heterodimeric protein molecules, the stems in RNA molecules, and kinases and their response regulators in signal-transduction systems. Despite the need for complementarity between interacting partners, such pairwise systems seem to be capable of high levels of evolutionary divergence, even when subject to strong selection. Such behavior is a consequence of the diminishing advantages of increasing binding affinity between partners, the multiplicity of evolutionary pathways between selectively equivalent alternatives, and the stochastic nature of evolutionary processes. Because mutation pressure toward reduced affinity conflicts with selective pressure for greater interaction, situations can arise in which the expected distribution of the degree of matching between interacting partners is bimodal, even in the face of constant selection. Although biomolecules with larger numbers of interacting partners are subject to increased levels of evolutionary conservation, their more numerous partners need not converge on a single sequence motif or be increasingly constrained in more complex systems. These results suggest that most phylogenetic differences in the sequences of binding interfaces are not the result of adaptive fine tuning but a simple consequence of random genetic drift.
Collapse
|