1
|
Saikia B, Baruah A. In silico design of misfolding resistant proteins: the role of structural similarity of a competing conformational ensemble in the optimization of frustration. SOFT MATTER 2024; 20:3283-3298. [PMID: 38529658 DOI: 10.1039/d4sm00171k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Most state-of-the-art in silico design methods fail due to misfolding of designed sequences to a conformation other than the target. Thus, a method to design misfolding resistant proteins will provide a better understanding of the misfolding phenomenon and will also increase the success rate of in silico design methods. In this work, we optimize the conformational ensemble to be selected for negative design purposes based on the similarity of the conformational ensemble to the target. Five ensembles with different degrees of similarity to the target are created and destabilized and the target is stabilized while designing sequences using mean field theory and Monte Carlo simulation methods. The results suggest that the degree of similarity of the non-native conformations to the target plays a prominent role in designing misfolding resistant protein sequences. The design procedures that destabilize the conformational ensemble with moderate similarity to the target have proven to be more promising. Incorporation of either highly similar or highly dissimilar conformations to the target conformation into the non-native ensemble to be destabilized may lead to sequences with a higher misfolding propensity. This will significantly reduce the conformational space to be considered in any protein design procedure. Interestingly, the results suggest that a sequence with higher frustration in the target structure does not necessarily lead to a misfold prone sequence. A successful design method may purposefully choose a frustrated sequence in the target conformation if that sequence is even more frustrated in the competing non-native conformations.
Collapse
Affiliation(s)
- Bondeepa Saikia
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India.
| | - Anupaul Baruah
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India.
| |
Collapse
|
2
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
3
|
Ferreiro D, Khalil R, Sousa SF, Arenas M. Substitution Models of Protein Evolution with Selection on Enzymatic Activity. Mol Biol Evol 2024; 41:msae026. [PMID: 38314876 PMCID: PMC10873502 DOI: 10.1093/molbev/msae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 02/07/2024] Open
Abstract
Substitution models of evolution are necessary for diverse evolutionary analyses including phylogenetic tree and ancestral sequence reconstructions. At the protein level, empirical substitution models are traditionally used due to their simplicity, but they ignore the variability of substitution patterns among protein sites. Next, in order to improve the realism of the modeling of protein evolution, a series of structurally constrained substitution models were presented, but still they usually ignore constraints on the protein activity. Here, we present a substitution model of protein evolution with selection on both protein structure and enzymatic activity, and that can be applied to phylogenetics. In particular, the model considers the binding affinity of the enzyme-substrate complex as well as structural constraints that include the flexibility of structural flaps, hydrogen bonds, amino acids backbone radius of gyration, and solvent-accessible surface area that are quantified through molecular dynamics simulations. We applied the model to the HIV-1 protease and evaluated it by phylogenetic likelihood in comparison with the best-fitting empirical substitution model and a structurally constrained substitution model that ignores the enzymatic activity. We found that accounting for selection on the protein activity improves the fitting of the modeled functional regions with the real observations, especially in data with high molecular identity, which recommends considering constraints on the protein activity in the development of substitution models of evolution.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Ruqaiya Khalil
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Sergio F Sousa
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, 4200-319 Porto, Portugal
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
4
|
Experimental and Bioinformatic Insights into the Effects of Epileptogenic Variants on the Function and Trafficking of the GABA Transporter GAT-1. Int J Mol Sci 2023; 24:ijms24020955. [PMID: 36674476 PMCID: PMC9862756 DOI: 10.3390/ijms24020955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/27/2022] [Accepted: 12/31/2022] [Indexed: 01/06/2023] Open
Abstract
In this article, we identified a novel epileptogenic variant (G307R) of the gene SLC6A1, which encodes the GABA transporter GAT-1. Our main goal was to investigate the pathogenic mechanisms of this variant, located near the neurotransmitter permeation pathway, and compare it with other variants located either in the permeation pathway or close to the lipid bilayer. The mutants G307R and A334P, close to the gates of the transporter, could be glycosylated with variable efficiency and reached the membrane, albeit inactive. Mutants located in the center of the permeation pathway (G297R) or close to the lipid bilayer (A128V, G550R) were retained in the endoplasmic reticulum. Applying an Elastic Network Model, to these and to other previously characterized variants, we found that G307R and A334P significantly perturb the structure and dynamics of the intracellular gate, which can explain their reduced activity, while for A228V and G362R, the reduced translocation to the membrane quantitatively accounts for the reduced activity. The addition of a chemical chaperone (4-phenylbutyric acid, PBA), which improves protein folding, increased the activity of GAT-1WT, as well as most of the assayed variants, including G307R, suggesting that PBA might also assist the conformational changes occurring during the alternative access transport cycle.
Collapse
|
5
|
Del Amparo R, González-Vázquez LD, Rodríguez-Moure L, Bastolla U, Arenas M. Consequences of Genetic Recombination on Protein Folding Stability. J Mol Evol 2023; 91:33-45. [PMID: 36463317 PMCID: PMC9849154 DOI: 10.1007/s00239-022-10080-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022]
Abstract
Genetic recombination is a common evolutionary mechanism that produces molecular diversity. However, its consequences on protein folding stability have not attracted the same attention as in the case of point mutations. Here, we studied the effects of homologous recombination on the computationally predicted protein folding stability for several protein families, finding less detrimental effects than we previously expected. Although recombination can affect multiple protein sites, we found that the fraction of recombined proteins that are eliminated by negative selection because of insufficient stability is not significantly larger than the corresponding fraction of proteins produced by mutation events. Indeed, although recombination disrupts epistatic interactions, the mean stability of recombinant proteins is not lower than that of their parents. On the other hand, the difference of stability between recombined proteins is amplified with respect to the parents, promoting phenotypic diversity. As a result, at least one third of recombined proteins present stability between those of their parents, and a substantial fraction have higher or lower stability than those of both parents. As expected, we found that parents with similar sequences tend to produce recombined proteins with stability close to that of the parents. Finally, the simulation of protein evolution along the ancestral recombination graph with empirical substitution models commonly used in phylogenetics, which ignore constraints on protein folding stability, showed that recombination favors the decrease of folding stability, supporting the convenience of adopting structurally constrained models when possible for inferences of protein evolutionary histories with recombination.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Luis Daniel González-Vázquez
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Laura Rodríguez-Moure
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Ugo Bastolla
- Centre for Molecular Biology Severo Ochoa (CSIC-UAM), 28049 Madrid, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain ,Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
6
|
Abstract
The reconstruction of genetic material of ancestral organisms constitutes a powerful application of evolutionary biology. A fundamental step in this inference is the ancestral sequence reconstruction (ASR), which can be performed with diverse methodologies implemented in computer frameworks. However, most of these methodologies ignore evolutionary properties frequently observed in microbes, such as genetic recombination and complex selection processes, that can bias the traditional ASR. From a practical perspective, here I review methodologies for the reconstruction of ancestral DNA and protein sequences, with particular focus on microbes, and including biases, recommendations, and software implementations. I conclude that microbial ASR is a complex analysis that should be carefully performed and that there is a need for methods to infer more realistic ancestral microbial sequences.
Collapse
Affiliation(s)
- Miguel Arenas
- Biomedical Research Center (CINBIO), University of Vigo, Vigo, Spain.
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain.
- Galicia Sur Health Research Institute (IIS Galicia Sur), Vigo, Spain.
| |
Collapse
|
7
|
Aadland K, Kolaczkowski B. Alignment-Integrated Reconstruction of Ancestral Sequences Improves Accuracy. Genome Biol Evol 2021; 12:1549-1565. [PMID: 32785673 PMCID: PMC7523730 DOI: 10.1093/gbe/evaa164] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2020] [Indexed: 12/31/2022] Open
Abstract
Ancestral sequence reconstruction (ASR) uses an alignment of extant protein sequences, a phylogeny describing the history of the protein family and a model of the molecular-evolutionary process to infer the sequences of ancient proteins, allowing researchers to directly investigate the impact of sequence evolution on protein structure and function. Like all statistical inferences, ASR can be sensitive to violations of its underlying assumptions. Previous studies have shown that, whereas phylogenetic uncertainty has only a very weak impact on ASR accuracy, uncertainty in the protein sequence alignment can more strongly affect inferred ancestral sequences. Here, we show that errors in sequence alignment can produce errors in ASR across a range of realistic and simplified evolutionary scenarios. Importantly, sequence reconstruction errors can lead to errors in estimates of structural and functional properties of ancestral proteins, potentially undermining the reliability of analyses relying on ASR. We introduce an alignment-integrated ASR approach that combines information from many different sequence alignments. We show that integrating alignment uncertainty improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly accurate structure-guided alignment. Given the growing evidence that sequence alignment errors can impact the reliability of ASR studies, we recommend that future studies incorporate approaches to mitigate the impact of alignment uncertainty. Probabilistic modeling of insertion and deletion events has the potential to radically improve ASR accuracy when the model reflects the true underlying evolutionary history, but further studies are required to thoroughly evaluate the reliability of these approaches under realistic conditions.
Collapse
Affiliation(s)
- Kelsey Aadland
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida
| | - Bryan Kolaczkowski
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida
| |
Collapse
|
8
|
Schwersensky M, Rooman M, Pucci F. Large-scale in silico mutagenesis experiments reveal optimization of genetic code and codon usage for protein mutational robustness. BMC Biol 2020; 18:146. [PMID: 33081759 PMCID: PMC7576759 DOI: 10.1186/s12915-020-00870-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 09/16/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND How, and the extent to which, evolution acts on DNA and protein sequences to ensure mutational robustness and evolvability is a long-standing open question in the field of molecular evolution. We addressed this issue through the first structurome-scale computational investigation, in which we estimated the change in folding free energy upon all possible single-site mutations introduced in more than 20,000 protein structures, as well as through available experimental stability and fitness data. RESULTS At the amino acid level, we found the protein surface to be more robust against random mutations than the core, this difference being stronger for small proteins. The destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, whereas the stabilizing mutations are about 4% in both regions. At the genetic code level, we observed smallest destabilization for mutations that are due to substitutions of base III in the codon, followed by base I, bases I+III, base II, and other multiple base substitutions. This ranking highly anticorrelates with the codon-anticodon mispairing frequency in the translation process. This suggests that the standard genetic code is optimized to limit the impact of random mutations, but even more so to limit translation errors. At the codon level, both the codon usage and the usage bias appear to optimize mutational robustness and translation accuracy, especially for surface residues. CONCLUSION Our results highlight the non-universality of mutational robustness and its multiscale dependence on protein features, the structure of the genetic code, and the codon usage. Our analyses and approach are strongly supported by available experimental mutagenesis data.
Collapse
Affiliation(s)
- Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| |
Collapse
|
9
|
Arenas M, Bastolla U. ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13341] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology University of Vigo Vigo Spain
- Biomedical Research Center (CINBIO) University of Vigo Vigo Spain
| | - Ugo Bastolla
- Bioinformatics Unit Centre for Molecular Biology Severo Ochoa (CSIC) Madrid Spain
| |
Collapse
|
10
|
Pascual-García A, Arenas M, Bastolla U. The Molecular Clock in the Evolution of Protein Structures. Syst Biol 2019; 68:987-1002. [DOI: 10.1093/sysbio/syz022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 03/20/2019] [Accepted: 04/09/2019] [Indexed: 12/11/2022] Open
Abstract
Abstract
The molecular clock hypothesis, which states that substitutions accumulate in protein sequences at a constant rate, plays a fundamental role in molecular evolution but it is violated when selective or mutational processes vary with time. Such violations of the molecular clock have been widely investigated for protein sequences, but not yet for protein structures. Here, we introduce a novel statistical test (Significant Clock Violations) and perform a large scale assessment of the molecular clock in the evolution of both protein sequences and structures in three large superfamilies. After validating our method with computer simulations, we find that clock violations are generally consistent in sequence and structure evolution, but they tend to be larger and more significant in structure evolution. Moreover, changes of function assessed through Gene Ontology and InterPro terms are associated with large and significant clock violations in structure evolution. We found that almost one third of significant clock violations are significant in structure evolution but not in sequence evolution, highlighting the advantage to use structure information for assessing accelerated evolution and gathering hints of positive selection. Clock violations between closely related pairs are frequently significant in sequence evolution, consistent with the observed time dependence of the substitution rate attributed to segregation of neutral and slightly deleterious polymorphisms, but not in structure evolution, suggesting that these substitutions do not affect protein structure although they may affect stability. These results are consistent with the view that natural selection, both negative and positive, constrains more strongly protein structures than protein sequences. Our code for computing clock violations is freely available at https://github.com/ugobas/Molecular_clock.
Collapse
Affiliation(s)
- Alberto Pascual-García
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, UK
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
| | - Miguel Arenas
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Spain
| | - Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
11
|
The Influence of Protein Stability on Sequence Evolution: Applications to Phylogenetic Inference. Methods Mol Biol 2019; 1851:215-231. [PMID: 30298399 DOI: 10.1007/978-1-4939-8736-8_11] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Phylogenetic inference from protein data is traditionally based on empirical substitution models of evolution that assume that protein sites evolve independently of each other and under the same substitution process. However, it is well known that the structural properties of a protein site in the native state affect its evolution, in particular the sequence entropy and the substitution rate. Starting from the seminal proposal by Halpern and Bruno, where structural properties are incorporated in the evolutionary model through site-specific amino acid frequencies, several models have been developed to tackle the influence of protein structure on sequence evolution. Here we describe stability-constrained substitution (SCS) models that explicitly consider the stability of the native state against both unfolded and misfolded states. One of them, the mean-field model, provides an independent sites approximation that can be readily incorporated in maximum likelihood methods of phylogenetic inference, including ancestral sequence reconstruction. Next, we describe its validation with simulated and real proteins and its limitations and advantages with respect to empirical models that lack site specificity. We finally provide guidelines and recommendations to analyze protein data accounting for stability constraints, including computer simulations and inferences of protein evolution based on maximum likelihood. Some practical examples are included to illustrate these procedures.
Collapse
|
12
|
Jiménez-Santos MJ, Arenas M, Bastolla U. Influence of mutation bias and hydrophobicity on the substitution rates and sequence entropies of protein evolution. PeerJ 2018; 6:e5549. [PMID: 30310736 PMCID: PMC6174885 DOI: 10.7717/peerj.5549] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 08/10/2018] [Indexed: 01/13/2023] Open
Abstract
The number of amino acids that occupy a given protein site during evolution reflects the selective constraints operating on the site. This evolutionary variability is strongly influenced by the structural properties of the site in the native structure, and it is quantified either through sequence entropy or through substitution rates. However, while the sequence entropy only depends on the equilibrium frequencies of the amino acids, the substitution rate also depends on the exchangeability matrix that describes mutations in the mathematical model of the substitution process. Here we apply two variants of a mathematical model of protein evolution with selection for protein stability, both against unfolding and against misfolding. Exploiting the approximation of independent sites, these models allow computing site-specific substitution processes that satisfy global constraints on folding stability. We find that site-specific substitution rates do not depend only on the selective constraints acting on the site, quantified through its sequence entropy. In fact, polar sites evolve faster than hydrophobic sites even for equal sequence entropy, as a consequence of the fact that polar amino acids are characterized by higher mutational exchangeability than hydrophobic ones. Accordingly, the model predicts that more polar proteins tend to evolve faster. Nevertheless, these results change if we compare proteins that evolve under different mutation biases, such as orthologous proteins in different bacterial genomes. In this case, the substitution rates are faster in genomes that evolve under mutational bias that favor hydrophobic amino acids by preferentially incorporating the nucleotide Thymine that is more frequent in hydrophobic codons. This appearingly contradictory result arises because buried sites occupied by hydrophobic amino acids are characterized by larger selective factors that largely amplify the substitution rate between hydrophobic amino acids, while the selective factors of exposed sites have a weaker effect. Thus, changes in the mutational bias produce deep effects on the biophysical properties of the protein (hydrophobicity) and on its evolutionary properties (sequence entropy and substitution rate) at the same time. The program Prot_evol that implements the two site-specific substitution processes is freely available at https://ub.cbm.uam.es/prot_fold_evol/prot_fold_evol_soft_main.php#Prot_Evol.
Collapse
Affiliation(s)
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Ugo Bastolla
- Bioinformatics Unit, Center for Molecular Biology Severo Ochoa, CSIC-UAM, Madrid, Spain
| |
Collapse
|
13
|
Jimenez MJ, Arenas M, Bastolla U. Substitution Rates Predicted by Stability-Constrained Models of Protein Evolution Are Not Consistent with Empirical Data. Mol Biol Evol 2017; 35:743-755. [PMID: 29294047 DOI: 10.1093/molbev/msx327] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Protein structures strongly influence molecular evolution. In particular, the evolutionary rate of a protein site depends on the number of its native contacts. Stability-constrained models of protein evolution consider this influence of protein structure on evolution by predicting the effect of mutations on the stability of the native state, but they currently neglect how mutations affect the protein structure. These models predict that buried protein sites with more native contacts are more constrained by natural selection and less variable, as observed. Nevertheless, previous work did not consider the stability against compact misfolded conformations, although it is known that the negative design that destabilizes these misfolded conformations influences protein evolution significantly. Here, we show that stability-constrained models that consider misfolding predict that site-specific sequence entropy and substitution rate peak at amphiphilic sites with an intermediate number of contacts, as these sites are less constrained than exposed sites with few contacts whose hydrophobicity must be limited. This result holds both for a mean-field model with independent sites and for a pairwise model that takes as a reference the wild-type sequence, but it contrasts with the observations that indicate that the entropy and the substitution rate decrease monotonically with the number of contacts. Our work suggests that stability-constrained models overestimate the tolerance of amphiphilic sites against mutations, either because of the limits of the free energy function or, more importantly in our opinion, because they do not consider how mutations perturb the native protein structure.
Collapse
Affiliation(s)
- María José Jimenez
- Centro de Biologia Molecular "Severo Ochoa" CSIC-UAM Cantoblanco, Madrid, Spain
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Ugo Bastolla
- Centro de Biologia Molecular "Severo Ochoa" CSIC-UAM Cantoblanco, Madrid, Spain
| |
Collapse
|
14
|
de la Higuera I, Ferrer-Orta C, de Ávila AI, Perales C, Sierra M, Singh K, Sarafianos SG, Dehouck Y, Bastolla U, Verdaguer N, Domingo E. Molecular and Functional Bases of Selection against a Mutation Bias in an RNA Virus. Genome Biol Evol 2017; 9:1212-1228. [PMID: 28460010 PMCID: PMC5433387 DOI: 10.1093/gbe/evx075] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2017] [Indexed: 12/12/2022] Open
Abstract
The selective pressures acting on viruses that replicate under enhanced mutation rates are largely unknown. Here, we describe resistance of foot-and-mouth disease virus to the mutagen 5-fluorouracil (FU) through a single polymerase substitution that prevents an excess of A to G and U to C transitions evoked by FU on the wild-type foot-and-mouth disease virus, while maintaining the same level of mutant spectrum complexity. The polymerase substitution inflicts upon the virus a fitness loss during replication in absence of FU but confers a fitness gain in presence of FU. The compensation of mutational bias was documented by in vitro nucleotide incorporation assays, and it was associated with structural modifications at the N-terminal region and motif B of the viral polymerase. Predictions of the effect of mutations that increase the frequency of G and C in the viral genome and encoded polymerase suggest multiple points in the virus life cycle where the mutational bias in favor of G and C may be detrimental. Application of predictive algorithms suggests adverse effects of the FU-directed mutational bias on protein stability. The results reinforce modulation of nucleotide incorporation as a lethal mutagenesis-escape mechanism (that permits eluding virus extinction despite replication in the presence of a mutagenic agent) and suggest that mutational bias can be a target of selection during virus replication.
Collapse
Affiliation(s)
- Ignacio de la Higuera
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
| | - Cristina Ferrer-Orta
- Institut de Biologia Molecular de Barcelona (CSIC), Parc Científic de Barcelona, Barcelona, Spain
| | - Ana I de Ávila
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
| | - Celia Perales
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Barcelona, Spain.,Liver Unit, Internal Medicine, Laboratory of Malalties Hepàtiques, Vall d'Hebron Institut de Recerca-Hospital Universitari Vall d'Hebron (VHIR-HUVH), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Macarena Sierra
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
| | - Kamalendra Singh
- Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
| | - Stefan G Sarafianos
- Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
| | - Yves Dehouck
- Machine Learning Group, Université Libre de Bruxelles (ULB), Brussels, Belgium
| | - Ugo Bastolla
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
| | - Nuria Verdaguer
- Institut de Biologia Molecular de Barcelona (CSIC), Parc Científic de Barcelona, Barcelona, Spain
| | - Esteban Domingo
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Barcelona, Spain
| |
Collapse
|
15
|
Bastolla U, Dehouck Y, Echave J. What evolution tells us about protein physics, and protein physics tells us about evolution. Curr Opin Struct Biol 2017; 42:59-66. [DOI: 10.1016/j.sbi.2016.10.020] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Revised: 10/19/2016] [Accepted: 10/24/2016] [Indexed: 12/21/2022]
|
16
|
Chi PB, Liberles DA. Selection on protein structure, interaction, and sequence. Protein Sci 2016; 25:1168-78. [PMID: 26808055 PMCID: PMC4918422 DOI: 10.1002/pro.2886] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Revised: 01/18/2016] [Accepted: 01/19/2016] [Indexed: 11/10/2022]
Abstract
Characterizing the probabilities of observing amino acid substitutions at specific sites in a protein over evolutionary time is a major goal in the field of molecular evolution. While purely statistical approaches at different levels of complexity exist, approaches rooted in underlying biological processes are necessary to characterize both the context-dependence of sequence changes (epistasis) and to extrapolate to sequences not observed in biological databases. To develop such approaches, an understanding of the different selective forces that act on amino acid substitution is necessary. Here, an overview of selection on and corresponding modeling of folding stability, folding specificity, binding affinity and specificity for ligands, the evolution of new binding sites on protein surfaces, protein dynamics, intrinsic disorder, and protein aggregation as well as the interplay with protein expression level (concentration) and biased mutational processes are presented.
Collapse
Affiliation(s)
- Peter B Chi
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
- Department of Mathematics and Computer Science, Ursinus College, Collegeville, Pennsylvania, 19426
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
| |
Collapse
|
17
|
Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models. PLoS Comput Biol 2016; 12:e1004889. [PMID: 27177270 PMCID: PMC4866778 DOI: 10.1371/journal.pcbi.1004889] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2015] [Accepted: 03/30/2016] [Indexed: 12/05/2022] Open
Abstract
Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of ‘true’ LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations. Inverse statistical approaches, modeling pairwise correlations between amino acids in the sequences of homologous proteins across many different organisms, can successfully extract protein structure (contact) information. Here, we benchmark those statistical approaches on exactly solvable models of proteins, folding on a 3D lattice, to assess the reasons underlying their success and their limitations. We show that the inferred parameters (effective pairwise interactions) of the statistical models have clear and quantitative interpretations in terms of positive (favoring the native fold) and negative (disfavoring competing folds) protein sequence design. New sequences randomly drawn from the statistical models are likely to fold into the native structures when effective pairwise interactions are accurately inferred, a performance which cannot be achieved with independent-site models.
Collapse
|
18
|
Hoffmann J, Wrabl JO, Hilser VJ. The role of negative selection in protein evolution revealed through the energetics of the native state ensemble. Proteins 2016; 84:435-47. [PMID: 26800099 DOI: 10.1002/prot.24989] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 12/15/2015] [Accepted: 12/19/2015] [Indexed: 12/14/2022]
Abstract
Knowing the determinants of conformational specificity is essential for understanding protein structure, stability, and fold evolution. To address this issue, a novel statistical measure of energetic compatibility between sequence and structure was developed using an experimentally validated model of the energetics of the native state ensemble. This approach successfully matched sequences from a diverse subset of the human proteome to their respective folds. Unexpectedly, significant energetic compatibility between ostensibly unrelated sequences and structures was also observed. Interrogation of these matches revealed a general framework for understanding the origins of conformational specificity within a proteome: specificity is a complex function of both the ability of a sequence to adopt folds other than the native, and ability of a fold to accommodate sequences other than the native. The regional variation in energetic compatibility indicates that the compatibility is dominated by incompatibility of sequence for alternative fold segments, suggesting that evolution of protein sequences has involved substantial negative selection, with certain segments serving as "gatekeepers" that presumably prevent alternative structures. Beyond these global trends, a size dependence exists in the degree to which the energetic compatibility is determined from negative selection, with smaller proteins displaying more negative selection. This partially explains how short sequences can adopt unique folds, despite the higher probability in shorter proteins for small numbers of mutations to increase compatibility with other folds. In providing evolutionary ground rules for the thermodynamic relationship between sequence and fold, this framework imparts valuable insight for rational design of unique folds or fold switches.
Collapse
Affiliation(s)
- Jordan Hoffmann
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, 21218.,T. C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, 21218
| | - James O Wrabl
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, 21218.,T. C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, 21218
| | - Vincent J Hilser
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, 21218.,T. C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland, 21218
| |
Collapse
|
19
|
Arenas M, Sánchez-Cobos A, Bastolla U. Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability. Mol Biol Evol 2015; 32:2195-207. [PMID: 25837579 DOI: 10.1093/molbev/msv085] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Despite intense work, incorporating constraints on protein native structures into the mathematical models of molecular evolution remains difficult, because most models and programs assume that protein sites evolve independently, whereas protein stability is maintained by interactions between sites. Here, we address this problem by developing a new mean-field substitution model that generates independent site-specific amino acid distributions with constraints on the stability of the native state against both unfolding and misfolding. The model depends on a background distribution of amino acids and one selection parameter that we fix maximizing the likelihood of the observed protein sequence. The analytic solution of the model shows that the main determinant of the site-specific distributions is the number of native contacts of the site and that the most variable sites are those with an intermediate number of native contacts. The mean-field models obtained, taking into account misfolded conformations, yield larger likelihood than models that only consider the native state, because their average hydrophobicity is more realistic, and they produce on the average stable sequences for most proteins. We evaluated the mean-field model with respect to empirical substitution models on 12 test data sets of different protein families. In all cases, the observed site-specific sequence profiles presented smaller Kullback-Leibler divergence from the mean-field distributions than from the empirical substitution model. Next, we obtained substitution rates combining the mean-field frequencies with an empirical substitution model. The resulting mean-field substitution model assigns larger likelihood than the empirical model to all studied families when we consider sequences with identity larger than 0.35, plausibly a condition that enforces conservation of the native structure across the family. We found that the mean-field model performs better than other structurally constrained models with similar or higher complexity. With respect to the much more complex model recently developed by Bordner and Mittelmann, which takes into account pairwise terms in the amino acid distributions and also optimizes the exchangeability matrix, our model performed worse for data with small sequence divergence but better for data with larger sequence divergence. The mean-field model has been implemented into the computer program Prot_Evol that is freely available at http://ub.cbm.uam.es/software/Prot_Evol.php.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| | - Agustin Sánchez-Cobos
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| | - Ugo Bastolla
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| |
Collapse
|
20
|
Detecting selection on protein stability through statistical mechanical models of folding and evolution. Biomolecules 2014; 4:291-314. [PMID: 24970217 PMCID: PMC4030984 DOI: 10.3390/biom4010291] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2013] [Revised: 02/13/2014] [Accepted: 02/14/2014] [Indexed: 12/31/2022] Open
Abstract
The properties of biomolecules depend both on physics and on the evolutionary process that formed them. These two points of view produce a powerful synergism. Physics sets the stage and the constraints that molecular evolution has to obey, and evolutionary theory helps in rationalizing the physical properties of biomolecules, including protein folding thermodynamics. To complete the parallelism, protein thermodynamics is founded on the statistical mechanics in the space of protein structures, and molecular evolution can be viewed as statistical mechanics in the space of protein sequences. In this review, we will integrate both points of view, applying them to detecting selection on the stability of the folded state of proteins. We will start discussing positive design, which strengthens the stability of the folded against the unfolded state of proteins. Positive design justifies why statistical potentials for protein folding can be obtained from the frequencies of structural motifs. Stability against unfolding is easier to achieve for longer proteins. On the contrary, negative design, which consists in destabilizing frequently formed misfolded conformations, is more difficult to achieve for longer proteins. The folding rate can be enhanced by strengthening short-range native interactions, but this requirement contrasts with negative design, and evolution has to trade-off between them. Finally, selection can accelerate functional movements by favoring low frequency normal modes of the dynamics of the native state that strongly correlate with the functional conformation change.
Collapse
|
21
|
Arenas M, Dos Santos HG, Posada D, Bastolla U. Protein evolution along phylogenetic histories under structurally constrained substitution models. ACTA ACUST UNITED AC 2013; 29:3020-8. [PMID: 24037213 DOI: 10.1093/bioinformatics/btt530] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Models of molecular evolution aim at describing the evolutionary processes at the molecular level. However, current models rarely incorporate information from protein structure. Conversely, structure-based models of protein evolution have not been commonly applied to simulate sequence evolution in a phylogenetic framework, and they often ignore relevant evolutionary processes such as recombination. A simulation evolutionary framework that integrates substitution models that account for protein structure stability should be able to generate more realistic in silico evolved proteins for a variety of purposes. RESULTS We developed a method to simulate protein evolution that combines models of protein folding stability, such that the fitness depends on the stability of the native state both with respect to unfolding and misfolding, with phylogenetic histories that can be either specified by the user or simulated with the coalescent under complex evolutionary scenarios, including recombination, demographics and migration. We have implemented this framework in a computer program called ProteinEvolver. Remarkably, comparing these models with empirical amino acid replacement models, we found that the former produce amino acid distributions closer to distributions observed in real protein families, and proteins that are predicted to be more stable. Therefore, we conclude that evolutionary models that consider protein stability and realistic evolutionary histories constitute a better approximation of the real evolutionary process.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology 'Severo Ochoa', Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain and Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | | | | | | |
Collapse
|