1
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
2
|
Lamolle G, Simón D, Iriarte A, Musto H. Main Factors Shaping Amino Acid Usage Across Evolution. J Mol Evol 2023:10.1007/s00239-023-10120-5. [PMID: 37264211 DOI: 10.1007/s00239-023-10120-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 05/17/2023] [Indexed: 06/03/2023]
Abstract
The standard genetic code determines that in most species, including viruses, there are 20 amino acids that are coded by 61 codons, while the other three codons are stop triplets. Considering the whole proteome each species features its own amino acid frequencies, given the slow rate of change, closely related species display similar GC content and amino acids usage. In contrast, distantly related species display different amino acid frequencies. Furthermore, within certain multicellular species, as mammals, intragenomic differences in the usage of amino acids are evident. In this communication, we shall summarize some of the most prominent and well-established factors that determine the differences found in the amino acid usage, both across evolution and intragenomically.
Collapse
Affiliation(s)
- Guillermo Lamolle
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
| | - Diego Simón
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Evolución Experimental de Virus, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Andrés Iriarte
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de La República, Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay.
| |
Collapse
|
3
|
Latrille T, Rodrigue N, Lartillot N. Genes and sites under adaptation at the phylogenetic scale also exhibit adaptation at the population-genetic scale. Proc Natl Acad Sci U S A 2023; 120:e2214977120. [PMID: 36897968 PMCID: PMC10089192 DOI: 10.1073/pnas.2214977120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/11/2023] [Indexed: 03/12/2023] Open
Abstract
Adaptation in protein-coding sequences can be detected from multiple sequence alignments across species or alternatively by leveraging polymorphism data within a population. Across species, quantification of the adaptive rate relies on phylogenetic codon models, classically formulated in terms of the ratio of nonsynonymous over synonymous substitution rates. Evidence of an accelerated nonsynonymous substitution rate is considered a signature of pervasive adaptation. However, because of the background of purifying selection, these models are potentially limited in their sensitivity. Recent developments have led to more sophisticated mutation-selection codon models aimed at making a more detailed quantitative assessment of the interplay between mutation, purifying, and positive selection. In this study, we conducted a large-scale exome-wide analysis of placental mammals with mutation-selection models, assessing their performance at detecting proteins and sites under adaptation. Importantly, mutation-selection codon models are based on a population-genetic formalism and thus are directly comparable to the McDonald and Kreitman test at the population level to quantify adaptation. Taking advantage of this relationship between phylogenetic and population genetics analyses, we integrated divergence and polymorphism data across the entire exome for 29 populations across 7 genera and showed that proteins and sites detected to be under adaptation at the phylogenetic scale are also under adaptation at the population-genetic scale. Altogether, our exome-wide analysis shows that phylogenetic mutation-selection codon models and the population-genetic test of adaptation can be reconciled and are congruent, paving the way for integrative models and analyses across individuals and populations.
Collapse
Affiliation(s)
- Thibault Latrille
- Université de Lyon, Université Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et Biologie Evolutive, UMR5558, 69100Villeurbanne, France
- École Normale Supérieure de Lyon, Université de Lyon, 69342Lyon, France
- Department of Computational Biology, Université de Lausanne, 1015Lausanne, Switzerland
| | - Nicolas Rodrigue
- Department of Biology, Institute of Biochemistry, and School of Mathematics and Statistics, Carleton University, K1S 5B6Ottawa, Canada
| | - Nicolas Lartillot
- Université de Lyon, Université Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et Biologie Evolutive, UMR5558, 69100Villeurbanne, France
| |
Collapse
|
4
|
Diaz DJ, Kulikova AV, Ellington AD, Wilke CO. Using machine learning to predict the effects and consequences of mutations in proteins. Curr Opin Struct Biol 2023; 78:102518. [PMID: 36603229 PMCID: PMC9908841 DOI: 10.1016/j.sbi.2022.102518] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 11/07/2022] [Accepted: 11/20/2022] [Indexed: 01/05/2023]
Abstract
Machine and deep learning approaches can leverage the increasingly available massive datasets of protein sequences, structures, and mutational effects to predict variants with improved fitness. Many different approaches are being developed, but systematic benchmarking studies indicate that even though the specifics of the machine learning algorithms matter, the more important constraint comes from the data availability and quality utilized during training. In cases where little experimental data are available, unsupervised and self-supervised pre-training with generic protein datasets can still perform well after subsequent refinement via hybrid or transfer learning approaches. Overall, recent progress in this field has been staggering, and machine learning approaches will likely play a major role in future breakthroughs in protein biochemistry and engineering.
Collapse
Affiliation(s)
- Daniel J Diaz
- Department of Chemistry, The University of Texas at Austin, 105 E 24TH St., Austin, 78712, Texas, USA; Department of Molecular Biosciences, The University of Texas at Austin, 100 East 24th St., Stop A5000, Austin, 78712, Texas, USA. https://twitter.com/aiproteins
| | - Anastasiya V Kulikova
- Department of Integrative Biology, The University of Texas at Austin, 2415 Speedway, Stop C0930, Austin, 78712, Texas, USA
| | - Andrew D Ellington
- Department of Molecular Biosciences, The University of Texas at Austin, 100 East 24th St., Stop A5000, Austin, 78712, Texas, USA. https://twitter.com/CSSBatUT
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, 2415 Speedway, Stop C0930, Austin, 78712, Texas, USA.
| |
Collapse
|
5
|
Susko E. Complex statistical modelling for phylogenetic inference. CAN J STAT 2022. [DOI: 10.1002/cjs.11741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Edward Susko
- Department of Mathematics and Statistics Dalhousie University Halifax Nova Scotia Canada B3H 3J5
| |
Collapse
|
6
|
Dyakin VV, Uversky VN. Arrow of Time, Entropy, and Protein Folding: Holistic View on Biochirality. Int J Mol Sci 2022; 23:ijms23073687. [PMID: 35409047 PMCID: PMC8998916 DOI: 10.3390/ijms23073687] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 03/23/2022] [Accepted: 03/25/2022] [Indexed: 02/06/2023] Open
Abstract
Chirality is a universal phenomenon, embracing the space–time domains of non-organic and organic nature. The biological time arrow, evident in the aging of proteins and organisms, should be linked to the prevalent biomolecular chirality. This hypothesis drives our exploration of protein aging, in relation to the biological aging of an organism. Recent advances in the chirality discrimination methods and theoretical considerations of the non-equilibrium thermodynamics clarify the fundamental issues, concerning the biphasic, alternative, and stepwise changes in the conformational entropy associated with protein folding. Living cells represent open, non-equilibrium, self-organizing, and dissipative systems. The non-equilibrium thermodynamics of cell biology are determined by utilizing the energy stored, transferred, and released, via adenosine triphosphate (ATP). At the protein level, the synthesis of a homochiral polypeptide chain of L-amino acids (L-AAs) represents the first state in the evolution of the dynamic non-equilibrium state of the system. At the next step the non-equilibrium state of a protein-centric system is supported and amended by a broad set of posttranslational modifications (PTMs). The enzymatic phosphorylation, being the most abundant and ATP-driven form of PTMs, illustrates the principal significance of the energy-coupling, in maintaining and reshaping the system. However, the physiological functions of phosphorylation are under the permanent risk of being compromised by spontaneous racemization. Therefore, the major distinct steps in protein-centric aging include the biosynthesis of a polypeptide chain, protein folding assisted by the system of PTMs, and age-dependent spontaneous protein racemization and degradation. To the best of our knowledge, we are the first to pay attention to the biphasic, alternative, and stepwise changes in the conformational entropy of protein folding. The broader view on protein folding, including the impact of spontaneous racemization, will help in the goal-oriented experimental design in the field of chiral proteomics.
Collapse
Affiliation(s)
- Victor V. Dyakin
- Virtual Reality Perception Lab (VRPL), The Nathan S. Kline Institute for Psychiatric Research (NKI), 140 Old Orangeburg Road, Bldg, 35, Orangeburg, NY 10962, USA
- Correspondence:
| | - Vladimir N. Uversky
- Department of Molecular Medicine, Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd., MDC07, Tampa, FL 33612, USA;
| |
Collapse
|
7
|
Youssef N, Susko E, Roger AJ, Bielawski JP. Evolution of amino acid propensities under stability-mediated epistasis. Mol Biol Evol 2022; 39:6522130. [PMID: 35134997 PMCID: PMC8896634 DOI: 10.1093/molbev/msac030] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Site-specific amino acid preferences are influenced by the genetic background of the protein. The preferences for resident amino acids are expected to, on average, increase over time because of replacements at other sites - a nonadaptive phenomenon referred to as the 'evolutionary Stokes shift'. Alternatively, decreases in resident amino acid propensity have recently been viewed as evidence of adaptations to external environmental changes. Using population genetics theory and thermodynamic stability-constraints, we show that nonadaptive evolution can lead to both positive and negative shifts in propensities following the fixation of an amino acid, emphasizing that the detection of negative shifts is not conclusive evidence of adaptation. Considering shifts in propensities over windows between substitutions at a focal site, we find that following ≈ 50% of substitutions the propensity for the new resident amino acid decreases over time, and both positive and negative shifts were comparable in magnitude. Preferences were often conserved via a significant negative autocorrelation in propensity changes-increases in propensities often followed by decreases, and vice versa. Lastly, we explore the underlying mechanisms that lead propensities to fluctuate. We observe that stabilizing replacements increase the mutational tolerance at a site and in doing so decrease the propensity for the resident amino acid. In contrast, destabilizing substitutions result in more rugged fitness landscapes that tend to favor the resident amino acid. In summary, our results characterize propensity trajectories under nonadaptive stability-constrained evolution against which evidence of adaptations should be calibrated.
Collapse
Affiliation(s)
- Noor Youssef
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - Andrew J Roger
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada
| | - Joseph P Bielawski
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
8
|
Learning the local landscape of protein structures with convolutional neural networks. J Biol Phys 2021; 47:435-454. [PMID: 34751854 DOI: 10.1007/s10867-021-09593-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 10/18/2021] [Indexed: 10/19/2022] Open
Abstract
One fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.
Collapse
|
9
|
Echave J. Evolutionary coupling range varies widely among enzymes depending on selection pressure. Biophys J 2021; 120:4320-4324. [PMID: 34480927 DOI: 10.1016/j.bpj.2021.08.042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 07/19/2021] [Accepted: 08/30/2021] [Indexed: 10/20/2022] Open
Abstract
Recent studies proposed that enzyme-active sites induce evolutionary constraints at long distances. The physical origin of such long-range evolutionary coupling is unknown. Here, I use a recent biophysical model of evolution to study the relationship between physical and evolutionary couplings on a diverse data set of monomeric enzymes. I show that evolutionary coupling is not universally long-range. Rather, range varies widely among enzymes, from 2 to 20 Å. Furthermore, the evolutionary coupling range of an enzyme does not inform on the underlying physical coupling, which is short range for all enzymes. Rather, evolutionary coupling range is determined by functional selection pressure.
Collapse
Affiliation(s)
- Julian Echave
- Instituto de Ciencias Físicas, Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, San Martín, Buenos Aires, Argentina.
| |
Collapse
|
10
|
Youssef N, Susko E, Roger AJ, Bielawski JP. Shifts in amino acid preferences as proteins evolve: A synthesis of experimental and theoretical work. Protein Sci 2021; 30:2009-2028. [PMID: 34322924 PMCID: PMC8442975 DOI: 10.1002/pro.4161] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/19/2021] [Accepted: 07/26/2021] [Indexed: 11/08/2022]
Abstract
Amino acid preferences vary across sites and time. While variation across sites is widely accepted, the extent and frequency of temporal shifts are contentious. Our understanding of the drivers of amino acid preference change is incomplete: To what extent are temporal shifts driven by adaptive versus nonadaptive evolutionary processes? We review phenomena that cause preferences to vary (e.g., evolutionary Stokes shift, contingency, and entrenchment) and clarify how they differ. To determine the extent and prevalence of shifted preferences, we review experimental and theoretical studies. Analyses of natural sequence alignments often detect decreases in homoplasy (convergence and reversions) rates, and variation in replacement rates with time-signals that are consistent with temporally changing preferences. While approaches inferring shifts in preferences from patterns in natural alignments are valuable, they are indirect since multiple mechanisms (both adaptive and nonadaptive) could lead to the observed signal. Alternatively, site-directed mutagenesis experiments allow for a more direct assessment of shifted preferences. They corroborate evidence from multiple sequence alignments, revealing that the preference for an amino acid at a site varies depending on the background sequence. However, shifts in preferences are usually minor in magnitude and sites with significantly shifted preferences are low in frequency. The small yet consistent perturbations in preferences could, nevertheless, jeopardize the accuracy of inference procedures, which assume constant preferences. We conclude by discussing if and how such shifts in preferences might influence widely used time-homogenous inference procedures and potential ways to mitigate such effects.
Collapse
Affiliation(s)
- Noor Youssef
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Edward Susko
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| | - Andrew J. Roger
- Department of Biochemistry and Molecular BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Joseph P. Bielawski
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| |
Collapse
|
11
|
Latrille T, Lartillot N. Quantifying the impact of changes in effective population size and expression level on the rate of coding sequence evolution. Theor Popul Biol 2021; 142:57-66. [PMID: 34563555 DOI: 10.1016/j.tpb.2021.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 09/08/2021] [Accepted: 09/11/2021] [Indexed: 02/07/2023]
Abstract
Molecular sequences are shaped by selection, where the strength of selection relative to drift is determined by effective population size (Ne). Populations with high Ne are expected to undergo stronger purifying selection, and consequently to show a lower substitution rate for selected mutations relative to the substitution rate for neutral mutations (ω). However, computational models based on biophysics of protein stability have suggested that ω can also be independent of Ne. Together, the response of ω to changes in Ne depends on the specific mapping from sequence to fitness. Importantly, an increase in protein expression level has been found empirically to result in decrease of ω, an observation predicted by theoretical models assuming selection for protein stability. Here, we derive a theoretical approximation for the response of ω to changes in Ne and expression level, under an explicit genotype-phenotype-fitness map. The method is generally valid for additive traits and log-concave fitness functions. We applied these results to protein undergoing selection for their conformational stability and corroborate out findings with simulations under more complex models. We predict a weak response of ω to changes in either Ne or expression level, which are interchangeable. Based on empirical data, we propose that fitness based on the conformational stability may not be a sufficient mechanism to explain the empirically observed variation in ω across species. Other aspects of protein biophysics might be explored, such as protein-protein interactions, which can lead to a stronger response of ω to changes in Ne.
Collapse
Affiliation(s)
- T Latrille
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France; École Normale Supérieure de Lyon, Université de Lyon, Université Lyon 1, Lyon, France.
| | - N Lartillot
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France
| |
Collapse
|
12
|
Marcos ML, Echave J. The variation among sites of protein structure divergence is shaped by mutation and scaled by selection. Curr Res Struct Biol 2021; 2:156-163. [PMID: 34235475 PMCID: PMC8244499 DOI: 10.1016/j.crstbi.2020.08.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 07/09/2020] [Accepted: 08/17/2020] [Indexed: 12/30/2022] Open
Abstract
Protein structures do not evolve uniformly, but the degree of structure divergence varies among sites. The resulting site-dependent structure divergence patterns emerge from a process that involves mutation and selection, which may both, in principle, influence the emergent pattern. In contrast with sequence divergence patterns, which are known to be mainly determined by selection, the relative contributions of mutation and selection to structure divergence patterns is unclear. Here, studying 6 protein families with a mechanistic biophysical model of protein evolution, we untangle the effects of mutation and selection. We found that even in the absence of selection, structure divergence varies from site to site because the mutational sensitivity is not uniform. Selection scales the profile, increasing its amplitude, without changing its shape. This scaling effect follows from the similarity between mutational sensitivity and sequence variability profiles. The degree of evolutionary divergence of protein structures varies among sites. A Mutation-Selection model (MSM) of protein structure evolution with selection for stability is developed. Even in the case of no selection, the sensitivity of the structure to random mutations varies among sites. Selection amplifies this variation but it does not affect its shape. This scaling effect of selection follows from the similarity between the selection-independent mutational sensitivity and the selection-dependent sequence divergence, the two contributions that are combined to produce the observed structural divergence profile.
Collapse
Affiliation(s)
- María Laura Marcos
- Instituto de Ciencias Físicas, Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, Martín de Irigoyen 3100, 1650 San Martín, Buenos Aires, Argentina
| | - Julian Echave
- Instituto de Ciencias Físicas, Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, Martín de Irigoyen 3100, 1650 San Martín, Buenos Aires, Argentina
| |
Collapse
|
13
|
Latrille T, Lanore V, Lartillot N. Inferring long-term effective population size with Mutation-Selection Models. Mol Biol Evol 2021; 38:4573-4587. [PMID: 34191010 PMCID: PMC8476147 DOI: 10.1093/molbev/msab160] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Mutation–selection phylogenetic codon models are grounded on population genetics first principles and represent a principled approach for investigating the intricate interplay between mutation, selection, and drift. In their current form, mutation–selection codon models are entirely characterized by the collection of site-specific amino-acid fitness profiles. However, thus far, they have relied on the assumption of a constant genetic drift, translating into a unique effective population size (Ne) across the phylogeny, clearly an unrealistic assumption. This assumption can be alleviated by introducing variation in Ne between lineages. In addition to Ne, the mutation rate (μ) is susceptible to vary between lineages, and both should covary with life-history traits (LHTs). This suggests that the model should more globally account for the joint evolutionary process followed by all of these lineage-specific variables (Ne, μ, and LHTs). In this direction, we introduce an extended mutation–selection model jointly reconstructing in a Bayesian Monte Carlo framework the fitness landscape across sites and long-term trends in Ne, μ, and LHTs along the phylogeny, from an alignment of DNA coding sequences and a matrix of observed LHTs in extant species. The model was tested against simulated data and applied to empirical data in mammals, isopods, and primates. The reconstructed history of Ne in these groups appears to correlate with LHTs or ecological variables in a way that suggests that the reconstruction is reasonable, at least in its global trends. On the other hand, the range of variation in Ne inferred across species is surprisingly narrow. This last point suggests that some of the assumptions of the model, in particular concerning the assumed absence of epistatic interactions between sites, are potentially problematic.
Collapse
Affiliation(s)
- T Latrille
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR, 5558, F-69622, Villeurbanne, France.,École Normale Supérieure de Lyon, Université de Lyon, Université Lyon 1, Lyon, France,
| | - V Lanore
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR, 5558, F-69622, Villeurbanne, France
| | - N Lartillot
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR, 5558, F-69622, Villeurbanne, France
| |
Collapse
|
14
|
Ritchie AM, Stark TL, Liberles DA. Inferring the number and position of changes in selective regime in a non-equilibrium mutation-selection framework. BMC Ecol Evol 2021; 21:39. [PMID: 33691618 PMCID: PMC7944921 DOI: 10.1186/s12862-021-01770-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 02/25/2021] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Recovering the historical patterns of selection acting on a protein coding sequence is a major goal of evolutionary biology. Mutation-selection models address this problem by explicitly modelling fixation rates as a function of site-specific amino acid fitness values.However, they are restricted in their utility for investigating directional evolution because they require prior knowledge of the locations of fitness changes in the lineages of a phylogeny. RESULTS We apply a modified mutation-selection methodology that relaxes assumptions of equlibrium and time-reversibility. Our implementation allows us to identify branches where adaptive or compensatory shifts in the fitness landscape have taken place, signalled by a change in amino acid fitness profiles. Through simulation and analysis of an empirical data set of [Formula: see text]-lactamase genes, we test our ability to recover the position of adaptive events within the tree and successfully reconstruct initial codon frequencies and fitness profile parameters generated under the non-stationary model. CONCLUSION We demonstrate successful detection of selective shifts and identification of the affected branch on partitions of 300 codons or more. We successfully reconstruct fitness parameters and initial codon frequencies in simulated data and demonstrate that failing to account for non-equilibrium evolution can increase the error in fitness profile estimation. We also demonstrate reconstruction of plausible shifts in amino acid fitnesses in the bacterial [Formula: see text]-lactamase family and discuss some caveats for interpretation.
Collapse
Affiliation(s)
- Andrew M Ritchie
- Department of Biology, Temple University, 1900 North 12th Street, Philadelphia, PA, USA
| | - Tristan L Stark
- Department of Biology, Temple University, 1900 North 12th Street, Philadelphia, PA, USA
| | - David A Liberles
- Department of Biology, Temple University, 1900 North 12th Street, Philadelphia, PA, USA.
| |
Collapse
|
15
|
Structure and function of naturally evolved de novo proteins. Curr Opin Struct Biol 2021; 68:175-183. [PMID: 33567396 DOI: 10.1016/j.sbi.2020.11.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/16/2020] [Accepted: 11/27/2020] [Indexed: 01/05/2023]
Abstract
Comparative evolutionary genomics has revealed that novel protein coding genes can emerge randomly from non-coding DNA. While most of the myriad of transcripts which continuously emerge vanish rapidly, some attain regulatory regions, become translated and survive. More surprisingly, sequence properties of de novo proteins are almost indistinguishable from randomly obtained sequences, yet de novo proteins may gain functions and integrate into eukaryotic cellular networks quite easily. We here discuss current knowledge on de novo proteins, their structures, functions and evolution. Since the existence of de novo proteins seems at odds with decade-long attempts to construct proteins with novel structures and functions from scratch, we suggest that a better understanding of de novo protein evolution may fuel new strategies for protein design.
Collapse
|
16
|
Stolyarova AV, Nabieva E, Ptushenko VV, Favorov AV, Popova AV, Neverov AD, Bazykin GA. Senescence and entrenchment in evolution of amino acid sites. Nat Commun 2020; 11:4603. [PMID: 32929079 PMCID: PMC7490271 DOI: 10.1038/s41467-020-18366-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 08/20/2020] [Indexed: 01/01/2023] Open
Abstract
Amino acid propensities at a site change in the course of protein evolution. This may happen for two reasons. Changes may be triggered by substitutions at epistatically interacting sites elsewhere in the genome. Alternatively, they may arise due to environmental changes that are external to the genome. Here, we design a framework for distinguishing between these alternatives. Using analytical modelling and simulations, we show that they cause opposite dynamics of the fitness of the allele currently occupying the site: it tends to increase with the time since its origin due to epistasis ("entrenchment"), but to decrease due to random environmental fluctuations ("senescence"). By analysing the genomes of vertebrates and insects, we show that the amino acids originating at negatively selected sites experience strong entrenchment. By contrast, the amino acids originating at positively selected sites experience senescence. We propose that senescence of the current allele is a cause of adaptive evolution.
Collapse
Affiliation(s)
- A V Stolyarova
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, 143028, Russia.
| | - E Nabieva
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, 143028, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, 127051, Russia
| | - V V Ptushenko
- Department of Photochemistry and Photobiology, N. M. Emanuel Institute of Biochemical Physics of Russian Academy of Sciences, Moscow, 119334, Russia
- A. N. Belozersky Institute of Physical-Chemical Biology, M. V. Lomonosov Moscow State University, Moscow, 119992, Russia
| | - A V Favorov
- Division of Biostatistics and Bioinformatics, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Laboratory of System Biology and Computational Genetics, Vavilov Institute of General Genetics, Moscow, 119991, Russia
| | - A V Popova
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, 111123, Russia
| | - A D Neverov
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, 111123, Russia
| | - G A Bazykin
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, 143028, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, 127051, Russia
| |
Collapse
|
17
|
Youssef N, Susko E, Bielawski JP. Consequences of Stability-Induced Epistasis for Substitution Rates. Mol Biol Evol 2020; 37:3131-3148. [DOI: 10.1093/molbev/msaa151] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
AbstractDo interactions between residues in a protein (i.e., epistasis) significantly alter evolutionary dynamics? If so, what consequences might they have on inference from traditional codon substitution models which assume site-independence for the sake of computational tractability? To investigate the effects of epistasis on substitution rates, we employed a mechanistic mutation-selection model in conjunction with a fitness framework derived from protein stability. We refer to this as the stability-informed site-dependent (S-SD) model and developed a new stability-informed site-independent (S-SI) model that captures the average effect of stability constraints on individual sites of a protein. Comparison of S-SI and S-SD offers a novel and direct method for investigating the consequences of stability-induced epistasis on protein evolution. We developed S-SI and S-SD models for three natural proteins and showed that they generate sequences consistent with real alignments. Our analyses revealed that epistasis tends to increase substitution rates compared with the rates under site-independent evolution. We then assessed the epistatic sensitivity of individual site and discovered a counterintuitive effect: Highly connected sites were less influenced by epistasis relative to exposed sites. Lastly, we show that, despite the unrealistic assumptions, traditional models perform comparably well in the presence and absence of epistasis and provide reasonable summaries of average selection intensities. We conclude that epistatic models are critical to understanding protein evolutionary dynamics, but epistasis might not be required for reasonable inference of selection pressure when averaging over time and sites.
Collapse
Affiliation(s)
- Noor Youssef
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Edward Susko
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Joseph P Bielawski
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
18
|
The Marginal Stability of Proteins: How the Jiggling and Wiggling of Atoms is Connected to Neutral Evolution. J Mol Evol 2020; 88:424-426. [DOI: 10.1007/s00239-020-09940-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/19/2020] [Indexed: 01/29/2023]
|
19
|
Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Proc Natl Acad Sci U S A 2020; 117:5873-5882. [PMID: 32123092 PMCID: PMC7084075 DOI: 10.1073/pnas.1913071117] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Mathematical models of evolution help us understand mechanisms driving protein-sequence change. Previous models recapitulate a disjoint subset of statistical features of natural sequences. We present a neutral evolution model that unifies features including extreme variance of the molecular clock’s tick rate and the observation of an evolutionary Stokes shift, an irreversible effect of mutations in the fitness landscape during sequence evolution. We show that interactions between amino acid sites, which inform our fitness metric, are required to observe these features. These interactions are inferred by using direct coupling analysis, which has been successfully utilized to predict protein structures, dynamics, and complexes from coevolutionary information. We anticipate our model will have applications in phylogenetics, ancestral reconstruction of sequences, and protein design. We introduce a model of amino acid sequence evolution that accounts for the statistical behavior of real sequences induced by epistatic interactions. We base the model dynamics on parameters derived from multiple sequence alignments analyzed by using direct coupling analysis methodology. Known statistical properties such as overdispersion, heterotachy, and gamma-distributed rate-across-sites are shown to be emergent properties of this model while being consistent with neutral evolution theory, thereby unifying observations from previously disjointed evolutionary models of sequences. The relationship between site restriction and heterotachy is characterized by tracking the effective alphabet dynamics of sites. We also observe an evolutionary Stokes shift in the fitness of sequences that have undergone evolution under our simulation. By analyzing the structural information of some proteins, we corroborate that the strongest Stokes shifts derive from sites that physically interact in networks near biochemically important regions. Perspectives on the implementation of our model in the context of the molecular clock are discussed.
Collapse
|
20
|
Khatri BS, Goldstein RA. Biophysics and population size constrains speciation in an evolutionary model of developmental system drift. PLoS Comput Biol 2019; 15:e1007177. [PMID: 31335870 PMCID: PMC6677325 DOI: 10.1371/journal.pcbi.1007177] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 08/02/2019] [Accepted: 06/13/2019] [Indexed: 02/06/2023] Open
Abstract
Developmental system drift is a likely mechanism for the origin of hybrid incompatibilities between closely related species. We examine here the detailed mechanistic basis of hybrid incompatibilities between two allopatric lineages, for a genotype-phenotype map of developmental system drift under stabilising selection, where an organismal phenotype is conserved, but the underlying molecular phenotypes and genotype can drift. This leads to number of emergent phenomenon not obtainable by modelling genotype or phenotype alone. Our results show that: 1) speciation is more rapid at smaller population sizes with a characteristic, Orr-like, power law, but at large population sizes slow, characterised by a sub-diffusive growth law; 2) the molecular phenotypes under weakest selection contribute to the earliest incompatibilities; and 3) pair-wise incompatibilities dominate over higher order, contrary to previous predictions that the latter should dominate. The population size effect we find is consistent with previous results on allopatric divergence of transcription factor-DNA binding, where smaller populations have common ancestors with a larger drift load because genetic drift favours phenotypes which have a larger number of genotypes (higher sequence entropy) over more fit phenotypes which have far fewer genotypes; this means less substitutions are required in either lineage before incompatibilities arise. Overall, our results indicate that biophysics and population size provide a much stronger constraint to speciation than suggested by previous models, and point to a general mechanistic principle of how incompatibilities arise the under stabilising selection for an organismal phenotype. The process of speciation is of fundamental importance to the field of evolution as it is intimately connected to understanding the immense bio-diversity of life. There is still relatively little understanding of the underlying genetic mechanisms that give rise to hybrid incompatibilities with results suggesting that divergence in transcription factor DNA binding and gene expression play an important role. A key finding from the field of evo-devo is that organismal phenotypes show developmental system drift, where species maintain the same phenotype, but diverge in developmental pathways; this is an important potential source of hybrid incompatibilities. Here, we explore a theoretical framework to understand how incompatibilities arise due to developmental system drift, using a tractable biophysically inspired genotype-phenotype for spatial gene expression. Modelling the evolution of phenotypes in this way has the key advantage that it mirrors how selection works in nature, i.e. that selection acts on phenotypes, but variation (mutation) arise at the level of genotypes. This results, as we demonstrate, in a number of non-trivial and testable predictions concerning speciation due to developmental system drift, which would not be obtainable by modelling evolution of genotypes or phenotypes alone.
Collapse
Affiliation(s)
| | - Richard A. Goldstein
- Division of Infection & Immunity, University College London, London, United Kingdom
| |
Collapse
|
21
|
Zabel WJ, Hagner KP, Livesey BJ, Marsh JA, Setayeshgar S, Lynch M, Higgs PG. Evolution of protein interfaces in multimers and fibrils. J Chem Phys 2019; 150:225102. [PMID: 31202237 PMCID: PMC6561775 DOI: 10.1063/1.5086042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A majority of cellular proteins function as part of multimeric complexes of two or more subunits. Multimer formation requires interactions between protein surfaces that lead to closed structures, such as dimers and tetramers. If proteins interact in an open-ended way, uncontrolled growth of fibrils can occur, which is likely to be detrimental in most cases. We present a statistical physics model that allows aggregation of proteins as either closed dimers or open fibrils of all lengths. We use pairwise amino-acid contact energies to calculate the energies of interacting protein surfaces. The probabilities of all possible aggregate configurations can be calculated for any given sequence of surface amino acids. We link the statistical physics model to a population genetics model that describes the evolution of the surface residues. When proteins evolve neutrally, without selection for or against multimer formation, we find that a majority of proteins remain as monomers at moderate concentrations, but strong dimer-forming or fibril-forming sequences are also possible. If selection is applied in favor of dimers or in favor of fibrils, then it is easy to select either dimer-forming or fibril-forming sequences. It is also possible to select for oriented fibrils with protein subunits all aligned in the same direction. We measure the propensities of amino acids to occur at interfaces relative to noninteracting surfaces and show that the propensities in our model are strongly correlated with those that have been measured in real protein structures. We also show that there are significant differences between amino acid frequencies at isologous and heterologous interfaces in our model, and we observe that similar effects occur in real protein structures.
Collapse
Affiliation(s)
- W Jeffrey Zabel
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1, Canada
| | - Kyle P Hagner
- Department of Physics, Indiana University, Bloomington, Indiana 47405, USA
| | - Benjamin J Livesey
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, United Kingdom
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, United Kingdom
| | - Sima Setayeshgar
- Department of Physics, Indiana University, Bloomington, Indiana 47405, USA
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, Arizona 85287, USA
| | - Paul G Higgs
- Department of Physics and Astronomy, McMaster University, Hamilton, Ontario L8S 4M1, Canada
| |
Collapse
|
22
|
Kuzminkova AA, Sokol AD, Ushakova KE, Popadin KY, Gunbin KV. mtProtEvol: the resource presenting molecular evolution analysis of proteins involved in the function of Vertebrate mitochondria. BMC Evol Biol 2019; 19:47. [PMID: 30813887 PMCID: PMC6391778 DOI: 10.1186/s12862-019-1371-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Heterotachy is the variation in the evolutionary rate of aligned sites in different parts of the phylogenetic tree. It occurs mainly due to epistatic interactions among the substitutions, which are highly complex and make it difficult to study protein evolution. The vast majority of computational evolutionary approaches for studying these epistatic interactions or their evolutionary consequences in proteins require high computational time. However, recently, it has been shown that the evolution of residue solvent accessibility (RSA) is tightly linked with changes in protein fitness and intra-protein epistatic interactions. This provides a computationally fast alternative, based on comparison of evolutionary rates of amino acid replacements with the rates of RSA evolutionary changes in order to recognize any shifts in epistatic interaction. RESULTS Based on RSA information, data randomization and phylogenetic approaches, we constructed a software pipeline, which can be used to analyze the evolutionary consequences of intra-protein epistatic interactions with relatively low computational time. We analyzed the evolution of 512 protein families tightly linked to mitochondrial function in Vertebrates and created "mtProtEvol", the web resource with data on protein evolution. In strict agreement with lifespan and metabolic rate data, we demonstrated that different functional categories of mitochondria-related proteins subjected to selection on accelerated and decelerated RSA rates in rodents and primates. For example, accelerated RSA evolution in rodents has been shown for Krebs cycle enzymes, respiratory chain and reactive oxygen species metabolism, while in primates these functions are stress-response, translation and mtDNA integrity. Decelerated RSA evolution in rodents has been demonstrated for translational machinery and oxidative stress response components. CONCLUSIONS mtProtEvol is an interactive resource focused on evolutionary analysis of epistatic interactions in protein families involved in Vertebrata mitochondria function and available at http://bioinfodbs.kantiana.ru/mtProtEvol /. This resource and the devised software pipeline may be useful tool for researchers in area of protein evolution.
Collapse
Affiliation(s)
- Anastasia A. Kuzminkova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Anastasia D. Sokol
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Kristina E. Ushakova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Konstantin Yu. Popadin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Konstantin V. Gunbin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center of Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
23
|
Doroshkov AV, Konstantinov DK, Afonnikov DA, Gunbin KV. The evolution of gene regulatory networks controlling Arabidopsis thaliana L. trichome development. BMC PLANT BIOLOGY 2019; 19:53. [PMID: 30813891 PMCID: PMC6393967 DOI: 10.1186/s12870-019-1640-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
BACKGROUND The variation in structure and function of gene regulatory networks (GRNs) participating in organisms development is a key for understanding species-specific evolutionary strategies. Even the tiniest modification of developmental GRN might result in a substantial change of a complex morphogenetic pattern. Great variety of trichomes and their accessibility makes them a useful model for studying the molecular processes of cell fate determination, cell cycle control and cellular morphogenesis. Nowadays, a large number of genes regulating the morphogenesis of A. thaliana trichomes are described. Here we aimed at a study the evolution of the GRN defining the trichome formation, and evaluation its importance in other developmental processes. RESULTS In study of the evolution of trichomes formation GRN we combined classical phylogenetic analysis with information on the GRN topology and composition in major plants taxa. This approach allowed us to estimate both times of evolutionary emergence of the GRN components which are mainly proteins, and the relative rate of their molecular evolution. Various simplifications of protein structure (based on the position of amino acid residues in protein globula, secondary structure type, and structural disorder) allowed us to demonstrate the evolutionary associations between changes in protein globules and speciations/duplications events. We discussed their potential involvement in protein-protein interactions and GRN function. CONCLUSIONS We hypothesize that the divergence and/or the specialization of the trichome-forming GRN is linked to the emergence of plant taxa. Information about the structural targets of the protein evolution in the GRN may predict switching points in gene networks functioning in course of evolution. We also propose a list of candidate genes responsible for the development of trichomes in a wide range of plant species.
Collapse
Affiliation(s)
- Alexey V. Doroshkov
- The Siberian Branch of the Russian Academy of Sciences (IC&G SB RAS), The Institute of Cytology and Genetics, Novosibirsk, Russia
- Novosibirsk State University (NSU), Novosibirsk, Russia
| | - Dmitrii K. Konstantinov
- The Siberian Branch of the Russian Academy of Sciences (IC&G SB RAS), The Institute of Cytology and Genetics, Novosibirsk, Russia
- Novosibirsk State University (NSU), Novosibirsk, Russia
| | - Dmitrij A. Afonnikov
- The Siberian Branch of the Russian Academy of Sciences (IC&G SB RAS), The Institute of Cytology and Genetics, Novosibirsk, Russia
- Novosibirsk State University (NSU), Novosibirsk, Russia
| | - Konstantin V. Gunbin
- Novosibirsk State University (NSU), Novosibirsk, Russia
- School of Life Science, Immanuel Kant Federal Baltic University, Kaliningrad, Russia
- Center of Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
| |
Collapse
|
24
|
Echave J. Beyond Stability Constraints: A Biophysical Model of Enzyme Evolution with Selection on Stability and Activity. Mol Biol Evol 2018; 36:613-620. [DOI: 10.1093/molbev/msy244] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín (UNSAM), Buenos Aires, Argentina
| |
Collapse
|
25
|
Hilton SK, Bloom JD. Modeling site-specific amino-acid preferences deepens phylogenetic estimates of viral sequence divergence. Virus Evol 2018; 4:vey033. [PMID: 30425841 PMCID: PMC6220371 DOI: 10.1093/ve/vey033] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Molecular phylogenetics is often used to estimate the time since the divergence of modern gene sequences. For highly diverged sequences, such phylogenetic techniques sometimes estimate surprisingly recent divergence times. In the case of viruses, independent evidence indicates that the estimates of deep divergence times from molecular phylogenetics are sometimes too recent. This discrepancy is caused in part by inadequate models of purifying selection leading to branch-length underestimation. Here we examine the effect on branch-length estimation of using models that incorporate experimental measurements of purifying selection. We find that models informed by experimentally measured site-specific amino-acid preferences estimate longer deep branches on phylogenies of influenza virus hemagglutinin. This lengthening of branches is due to more realistic stationary states of the models, and is mostly independent of the branch-length extension from modeling site-to-site variation in amino-acid substitution rate. The branch-length extension from experimentally informed site-specific models is similar to that achieved by other approaches that allow the stationary state to vary across sites. However, the improvements from all of these site-specific but time homogeneous and site independent models are limited by the fact that a protein’s amino-acid preferences gradually shift as it evolves. Overall, our work underscores the importance of modeling site-specific amino-acid preferences when estimating deep divergence times—but also shows the inherent limitations of approaches that fail to account for how these preferences shift over time.
Collapse
Affiliation(s)
- Sarah K Hilton
- Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center.,Department of Genome Sciences, University of Washington, USA
| | - Jesse D Bloom
- Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center.,Department of Genome Sciences, University of Washington, USA.,Howard Hughes Medical Institute, Seattle, WA, USA
| |
Collapse
|
26
|
Castiglione GM, Chang BS. Functional trade-offs and environmental variation shaped ancient trajectories in the evolution of dim-light vision. eLife 2018; 7:35957. [PMID: 30362942 PMCID: PMC6203435 DOI: 10.7554/elife.35957] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 09/09/2018] [Indexed: 12/11/2022] Open
Abstract
Trade-offs between protein stability and activity can restrict access to evolutionary trajectories, but widespread epistasis may facilitate indirect routes to adaptation. This may be enhanced by natural environmental variation, but in multicellular organisms this process is poorly understood. We investigated a paradoxical trajectory taken during the evolution of tetrapod dim-light vision, where in the rod visual pigment rhodopsin, E122 was fixed 350 million years ago, a residue associated with increased active-state (MII) stability but greatly diminished rod photosensitivity. Here, we demonstrate that high MII stability could have likely evolved without E122, but instead, selection appears to have entrenched E122 in tetrapods via epistatic interactions with nearby coevolving sites. In fishes by contrast, selection may have exploited these epistatic effects to explore alternative trajectories, but via indirect routes with low MII stability. Our results suggest that within tetrapods, E122 and high MII stability cannot be sacrificed-not even for improvements to rod photosensitivity.
Collapse
Affiliation(s)
- Gianni M Castiglione
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
| | - Belinda Sw Chang
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada.,Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Canada
| |
Collapse
|
27
|
Jiménez-Santos MJ, Arenas M, Bastolla U. Influence of mutation bias and hydrophobicity on the substitution rates and sequence entropies of protein evolution. PeerJ 2018; 6:e5549. [PMID: 30310736 PMCID: PMC6174885 DOI: 10.7717/peerj.5549] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 08/10/2018] [Indexed: 01/13/2023] Open
Abstract
The number of amino acids that occupy a given protein site during evolution reflects the selective constraints operating on the site. This evolutionary variability is strongly influenced by the structural properties of the site in the native structure, and it is quantified either through sequence entropy or through substitution rates. However, while the sequence entropy only depends on the equilibrium frequencies of the amino acids, the substitution rate also depends on the exchangeability matrix that describes mutations in the mathematical model of the substitution process. Here we apply two variants of a mathematical model of protein evolution with selection for protein stability, both against unfolding and against misfolding. Exploiting the approximation of independent sites, these models allow computing site-specific substitution processes that satisfy global constraints on folding stability. We find that site-specific substitution rates do not depend only on the selective constraints acting on the site, quantified through its sequence entropy. In fact, polar sites evolve faster than hydrophobic sites even for equal sequence entropy, as a consequence of the fact that polar amino acids are characterized by higher mutational exchangeability than hydrophobic ones. Accordingly, the model predicts that more polar proteins tend to evolve faster. Nevertheless, these results change if we compare proteins that evolve under different mutation biases, such as orthologous proteins in different bacterial genomes. In this case, the substitution rates are faster in genomes that evolve under mutational bias that favor hydrophobic amino acids by preferentially incorporating the nucleotide Thymine that is more frequent in hydrophobic codons. This appearingly contradictory result arises because buried sites occupied by hydrophobic amino acids are characterized by larger selective factors that largely amplify the substitution rate between hydrophobic amino acids, while the selective factors of exposed sites have a weaker effect. Thus, changes in the mutational bias produce deep effects on the biophysical properties of the protein (hydrophobicity) and on its evolutionary properties (sequence entropy and substitution rate) at the same time. The program Prot_evol that implements the two site-specific substitution processes is freely available at https://ub.cbm.uam.es/prot_fold_evol/prot_fold_evol_soft_main.php#Prot_Evol.
Collapse
Affiliation(s)
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Ugo Bastolla
- Bioinformatics Unit, Center for Molecular Biology Severo Ochoa, CSIC-UAM, Madrid, Spain
| |
Collapse
|
28
|
Posfai A, Zhou J, Plotkin JB, Kinney JB, McCandlish DM. Selection for Protein Stability Enriches for Epistatic Interactions. Genes (Basel) 2018; 9:E423. [PMID: 30134605 PMCID: PMC6162820 DOI: 10.3390/genes9090423] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 07/30/2018] [Accepted: 08/14/2018] [Indexed: 12/15/2022] Open
Abstract
A now classical argument for the marginal thermodynamic stability of proteins explains the distribution of observed protein stabilities as a consequence of an entropic pull in protein sequence space. In particular, most sequences that are sufficiently stable to fold will have stabilities near the folding threshold. Here, we extend this argument to consider its predictions for epistatic interactions for the effects of mutations on the free energy of folding. Although there is abundant evidence to indicate that the effects of mutations on the free energy of folding are nearly additive and conserved over evolutionary time, we show that these observations are compatible with the hypothesis that a non-additive contribution to the folding free energy is essential for observed proteins to maintain their native structure. In particular, through both simulations and analytical results, we show that even very small departures from additivity are sufficient to drive this effect.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - Joshua B Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| |
Collapse
|
29
|
Jimenez MJ, Arenas M, Bastolla U. Substitution Rates Predicted by Stability-Constrained Models of Protein Evolution Are Not Consistent with Empirical Data. Mol Biol Evol 2017; 35:743-755. [PMID: 29294047 DOI: 10.1093/molbev/msx327] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Protein structures strongly influence molecular evolution. In particular, the evolutionary rate of a protein site depends on the number of its native contacts. Stability-constrained models of protein evolution consider this influence of protein structure on evolution by predicting the effect of mutations on the stability of the native state, but they currently neglect how mutations affect the protein structure. These models predict that buried protein sites with more native contacts are more constrained by natural selection and less variable, as observed. Nevertheless, previous work did not consider the stability against compact misfolded conformations, although it is known that the negative design that destabilizes these misfolded conformations influences protein evolution significantly. Here, we show that stability-constrained models that consider misfolding predict that site-specific sequence entropy and substitution rate peak at amphiphilic sites with an intermediate number of contacts, as these sites are less constrained than exposed sites with few contacts whose hydrophobicity must be limited. This result holds both for a mean-field model with independent sites and for a pairwise model that takes as a reference the wild-type sequence, but it contrasts with the observations that indicate that the entropy and the substitution rate decrease monotonically with the number of contacts. Our work suggests that stability-constrained models overestimate the tolerance of amphiphilic sites against mutations, either because of the limits of the free energy function or, more importantly in our opinion, because they do not consider how mutations perturb the native protein structure.
Collapse
Affiliation(s)
- María José Jimenez
- Centro de Biologia Molecular "Severo Ochoa" CSIC-UAM Cantoblanco, Madrid, Spain
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Ugo Bastolla
- Centro de Biologia Molecular "Severo Ochoa" CSIC-UAM Cantoblanco, Madrid, Spain
| |
Collapse
|