1
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
2
|
Ferreiro D, Khalil R, Sousa SF, Arenas M. Substitution Models of Protein Evolution with Selection on Enzymatic Activity. Mol Biol Evol 2024; 41:msae026. [PMID: 38314876 PMCID: PMC10873502 DOI: 10.1093/molbev/msae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 02/07/2024] Open
Abstract
Substitution models of evolution are necessary for diverse evolutionary analyses including phylogenetic tree and ancestral sequence reconstructions. At the protein level, empirical substitution models are traditionally used due to their simplicity, but they ignore the variability of substitution patterns among protein sites. Next, in order to improve the realism of the modeling of protein evolution, a series of structurally constrained substitution models were presented, but still they usually ignore constraints on the protein activity. Here, we present a substitution model of protein evolution with selection on both protein structure and enzymatic activity, and that can be applied to phylogenetics. In particular, the model considers the binding affinity of the enzyme-substrate complex as well as structural constraints that include the flexibility of structural flaps, hydrogen bonds, amino acids backbone radius of gyration, and solvent-accessible surface area that are quantified through molecular dynamics simulations. We applied the model to the HIV-1 protease and evaluated it by phylogenetic likelihood in comparison with the best-fitting empirical substitution model and a structurally constrained substitution model that ignores the enzymatic activity. We found that accounting for selection on the protein activity improves the fitting of the modeled functional regions with the real observations, especially in data with high molecular identity, which recommends considering constraints on the protein activity in the development of substitution models of evolution.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Ruqaiya Khalil
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Sergio F Sousa
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, 4200-319 Porto, Portugal
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
3
|
Del Amparo R, Arenas M. Influence of substitution model selection on protein phylogenetic tree reconstruction. Gene 2023; 865:147336. [PMID: 36871672 DOI: 10.1016/j.gene.2023.147336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 02/22/2023] [Accepted: 02/28/2023] [Indexed: 03/06/2023]
Abstract
Probabilistic phylogenetic tree reconstruction is traditionally performed under a best-fitting substitution model of molecular evolution previously selected according to diverse statistical criteria. Interestingly, some recent studies proposed that this procedure is unnecessary for phylogenetic tree reconstruction leading to a debate in the field. In contrast to DNA sequences, phylogenetic tree reconstruction from protein sequences is traditionally based on empirical exchangeability matrices that can differ among taxonomic groups and protein families. Considering this aspect, here we investigated the influence of selecting a substitution model of protein evolution on phylogenetic tree reconstruction by the analyses of real and simulated data. We found that phylogenetic tree reconstructions based on a selected best-fitting substitution model of protein evolution are the most accurate, in terms of topology and branch lengths, compared with those derived from substitution models with amino acid replacement matrices far from the selected best-fitting model, especially when the data has large genetic diversity. Indeed, we found that substitution models with similar amino acid replacement matrices produce similar reconstructed phylogenetic trees, suggesting the use of substitution models as similar as possible to a selected best-fitting model when the latter cannot be used. Therefore, we recommend the use of the traditional protocol of selection among substitution models of evolution for protein phylogenetic tree reconstruction.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain; Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain.
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain; Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain; Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain.
| |
Collapse
|
4
|
Del Amparo R, González-Vázquez LD, Rodríguez-Moure L, Bastolla U, Arenas M. Consequences of Genetic Recombination on Protein Folding Stability. J Mol Evol 2023; 91:33-45. [PMID: 36463317 PMCID: PMC9849154 DOI: 10.1007/s00239-022-10080-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022]
Abstract
Genetic recombination is a common evolutionary mechanism that produces molecular diversity. However, its consequences on protein folding stability have not attracted the same attention as in the case of point mutations. Here, we studied the effects of homologous recombination on the computationally predicted protein folding stability for several protein families, finding less detrimental effects than we previously expected. Although recombination can affect multiple protein sites, we found that the fraction of recombined proteins that are eliminated by negative selection because of insufficient stability is not significantly larger than the corresponding fraction of proteins produced by mutation events. Indeed, although recombination disrupts epistatic interactions, the mean stability of recombinant proteins is not lower than that of their parents. On the other hand, the difference of stability between recombined proteins is amplified with respect to the parents, promoting phenotypic diversity. As a result, at least one third of recombined proteins present stability between those of their parents, and a substantial fraction have higher or lower stability than those of both parents. As expected, we found that parents with similar sequences tend to produce recombined proteins with stability close to that of the parents. Finally, the simulation of protein evolution along the ancestral recombination graph with empirical substitution models commonly used in phylogenetics, which ignore constraints on protein folding stability, showed that recombination favors the decrease of folding stability, supporting the convenience of adopting structurally constrained models when possible for inferences of protein evolutionary histories with recombination.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Luis Daniel González-Vázquez
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Laura Rodríguez-Moure
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain
| | - Ugo Bastolla
- Centre for Molecular Biology Severo Ochoa (CSIC-UAM), 28049 Madrid, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain ,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, 36310 Vigo, Spain ,Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
5
|
Ferreiro D, Khalil R, Gallego MJ, Osorio NS, Arenas M. The evolution of the HIV-1 protease folding stability. Virus Evol 2022; 8:veac115. [PMID: 36601299 PMCID: PMC9802575 DOI: 10.1093/ve/veac115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 10/10/2022] [Accepted: 12/03/2022] [Indexed: 12/11/2022] Open
Abstract
The evolution of structural proteins is generally constrained by the folding stability. However, little is known about the particular capacity of viral proteins to accommodate mutations that can potentially affect the protein stability and, in general, the evolution of the protein stability over time. As an illustrative model case, here, we investigated the evolution of the stability of the human immunodeficiency virus (HIV-1) protease (PR), which is a common HIV-1 drug target, under diverse evolutionary scenarios that include (1) intra-host virus evolution in a cohort of seventy-five patients sampled over time, (2) intra-host virus evolution sampled before and after specific PR-based treatments, and (3) inter-host evolution considering extant and ancestral (reconstructed) PR sequences from diverse HIV-1 subtypes. We also investigated the specific influence of currently known HIV-1 PR resistance mutations on the PR folding stability. We found that the HIV-1 PR stability fluctuated over time within a constant and wide range in any studied evolutionary scenario, accommodating multiple mutations that partially affected the stability while maintaining activity. We did not identify relationships between change of PR stability and diverse clinical parameters such as viral load, CD4+ T-cell counts, and a surrogate of time from infection. Counterintuitively, we predicted that nearly half of the studied HIV-1 PR resistance mutations do not significantly decrease stability, which, together with compensatory mutations, would allow the protein to adapt without requiring dramatic stability changes. We conclude that the HIV-1 PR presents a wide structural plasticity to acquire molecular adaptations without affecting the overall evolution of stability.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, Vigo 36310, Spain,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, Vigo 36310, Spain
| | - Ruqaiya Khalil
- CINBIO, Universidade de Vigo, Vigo 36310, Spain,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, Vigo 36310, Spain
| | - María J Gallego
- CINBIO, Universidade de Vigo, Vigo 36310, Spain,Departamento de Bioquímica, Genética e Inmunología, Universidade de Vigo, Vigo 36310, Spain
| | - Nuno S Osorio
- Life and Health Sciences Research Institute, School of Medicine, University of Minho, Braga 4710-057, Portugal,ICVS/3Bs—PT Government Associate Laboratory, Guimarães 4806-909, Portugal
| | | |
Collapse
|
6
|
Ayuso-Fernández I, Molpeceres G, Camarero S, Ruiz-Dueñas FJ, Martínez AT. Ancestral sequence reconstruction as a tool to study the evolution of wood decaying fungi. FRONTIERS IN FUNGAL BIOLOGY 2022; 3:1003489. [PMID: 37746217 PMCID: PMC10512382 DOI: 10.3389/ffunb.2022.1003489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 09/22/2022] [Indexed: 09/26/2023]
Abstract
The study of evolution is limited by the techniques available to do so. Aside from the use of the fossil record, molecular phylogenetics can provide a detailed characterization of evolutionary histories using genes, genomes and proteins. However, these tools provide scarce biochemical information of the organisms and systems of interest and are therefore very limited when they come to explain protein evolution. In the past decade, this limitation has been overcome by the development of ancestral sequence reconstruction (ASR) methods. ASR allows the subsequent resurrection in the laboratory of inferred proteins from now extinct organisms, becoming an outstanding tool to study enzyme evolution. Here we review the recent advances in ASR methods and their application to study fungal evolution, with special focus on wood-decay fungi as essential organisms in the global carbon cycling.
Collapse
Affiliation(s)
- Iván Ayuso-Fernández
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Gonzalo Molpeceres
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| | - Susana Camarero
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| | | | - Angel T. Martínez
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| |
Collapse
|
7
|
Qazi S, Jit BP, Das A, Karthikeyan M, Saxena A, Ray M, Singh AR, Raza K, Jayaram B, Sharma A. BESFA: bioinformatics based evolutionary, structural & functional analysis of prostrate, Placenta, Ovary, Testis, and Embryo (POTE) paralogs. Heliyon 2022; 8:e10476. [PMID: 36132183 PMCID: PMC9483601 DOI: 10.1016/j.heliyon.2022.e10476] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 01/25/2022] [Accepted: 08/23/2022] [Indexed: 11/30/2022] Open
Abstract
The POTE family comprises 14 paralogues and is primarily expressed in Prostrate, Placenta, Ovary, Testis, Embryo (POTE), and cancerous cells. The prospective function of the POTE protein family under physiological conditions is less understood. We systematically analyzed their cellular localization and molecular docking analysis to elucidate POTE proteins' structure, function, and Adaptive Divergence. Our results suggest that group three POTE paralogs (POTEE, POTEF, POTEI, POTEJ, and POTEKP (a pseudogene)) exhibits significant variation among other members could be because of their Adaptive Divergence. Furthermore, our molecular docking studies on POTE protein revealed the highest binding affinity with NCI-approved anticancer compounds. Additionally, POTEE, POTEF, POTEI, and POTEJ were subject to an explicit molecular dynamic simulation for 50ns. MM-GBSA and other essential electrostatics were calculated that showcased that only POTEE and POTEF have absolute binding affinities with minimum energy exploitation. Thus, this study’s outcomes are expected to drive cancer research to successful utilization of POTE genes family as a new biomarker, which could pave the way for the discovery of new therapies.
Collapse
Affiliation(s)
- Sahar Qazi
- Department of Biochemistry, All India Institute of Medical Sciences, Delhi 110029, India
- Department of Computer Science, Jamia Millia Islamia, New Delhi 110025, India
| | - Bimal Prasad Jit
- Department of Biochemistry, All India Institute of Medical Sciences, Delhi 110029, India
| | - Abhishek Das
- Department of Biochemistry, All India Institute of Medical Sciences, Delhi 110029, India
| | - Muthukumarasamy Karthikeyan
- National Chemical Laboratory, Council of Scientific and Industrial Research (NCL-CSIR), Pune, Maharashtra, India
| | - Amit Saxena
- Centre for Development of Advanced Computing, Pune, Maharashtra, India
| | - M.D. Ray
- Dr. B.R.A Institute-Rotary Cancer Hospital, All India Institute of Medical Sciences, Delhi 110029, India
| | - Angel Rajan Singh
- Dr. B.R.A Institute-Rotary Cancer Hospital, All India Institute of Medical Sciences, Delhi 110029, India
| | - Khalid Raza
- Department of Computer Science, Jamia Millia Islamia, New Delhi 110025, India
| | - B. Jayaram
- Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Delhi, India
| | - Ashok Sharma
- Department of Biochemistry, All India Institute of Medical Sciences, Delhi 110029, India
- Corresponding author.
| |
Collapse
|
8
|
Peng Y, Yan H, Guo L, Deng C, Wang C, Wang Y, Kang L, Zhou P, Yu K, Dong X, Liu X, Sun Z, Peng Y, Zhao J, Deng D, Xu Y, Li Y, Jiang Q, Li Y, Wei L, Wang J, Ma J, Hao M, Li W, Kang H, Peng Z, Liu D, Jia J, Zheng Y, Ma T, Wei Y, Lu F, Ren C. Reference genome assemblies reveal the origin and evolution of allohexaploid oat. Nat Genet 2022; 54:1248-1258. [PMID: 35851189 PMCID: PMC9355876 DOI: 10.1038/s41588-022-01127-7] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 06/08/2022] [Indexed: 12/13/2022]
Abstract
Common oat (Avena sativa) is an important cereal crop serving as a valuable source of forage and human food. Although reference genomes of many important crops have been generated, such work in oat has lagged behind, primarily owing to its large, repeat-rich polyploid genome. Here, using Oxford Nanopore ultralong sequencing and Hi-C technologies, we have generated a reference-quality genome assembly of hulless common oat, comprising 21 pseudomolecules with a total length of 10.76 Gb and contig N50 of 75.27 Mb. We also produced genome assemblies for diploid and tetraploid Avena ancestors, which enabled the identification of oat subgenomes and provided insights into oat chromosomal evolution. The origin of hexaploid oat is inferred from whole-genome sequencing, chloroplast genomes and transcriptome assemblies of different Avena species. These findings and the high-quality reference genomes presented here will facilitate the full use of crop genetic resources to accelerate oat improvement. A reference-quality genome assembly of hexaploid oat variety ‘Sanfensan’ and genome assemblies of its diploid and tetraploid Avena ancestors provide insights into the evolutionary history of allohexaploid oat.
Collapse
|
9
|
Abstract
The reconstruction of genetic material of ancestral organisms constitutes a powerful application of evolutionary biology. A fundamental step in this inference is the ancestral sequence reconstruction (ASR), which can be performed with diverse methodologies implemented in computer frameworks. However, most of these methodologies ignore evolutionary properties frequently observed in microbes, such as genetic recombination and complex selection processes, that can bias the traditional ASR. From a practical perspective, here I review methodologies for the reconstruction of ancestral DNA and protein sequences, with particular focus on microbes, and including biases, recommendations, and software implementations. I conclude that microbial ASR is a complex analysis that should be carefully performed and that there is a need for methods to infer more realistic ancestral microbial sequences.
Collapse
Affiliation(s)
- Miguel Arenas
- Biomedical Research Center (CINBIO), University of Vigo, Vigo, Spain.
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain.
- Galicia Sur Health Research Institute (IIS Galicia Sur), Vigo, Spain.
| |
Collapse
|
10
|
Del Amparo R, Arenas M. HIV Protease and Integrase Empirical Substitution Models of Evolution: Protein-Specific Models Outperform Generalist Models. Genes (Basel) 2021; 13:61. [PMID: 35052404 PMCID: PMC8774313 DOI: 10.3390/genes13010061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 12/22/2021] [Accepted: 12/22/2021] [Indexed: 12/24/2022] Open
Abstract
Diverse phylogenetic methods require a substitution model of evolution that should mimic, as accurately as possible, the real substitution process. At the protein level, empirical substitution models have traditionally been based on a large number of different proteins from particular taxonomic levels. However, these models assume that all of the proteins of a taxonomic level evolve under the same substitution patterns. We believe that this assumption is highly unrealistic and should be relaxed by considering protein-specific substitution models that account for protein-specific selection processes. In order to test this hypothesis, we inferred and evaluated four new empirical substitution models for the protease and integrase of HIV and other viruses. We found that these models more accurately fit, compared with any of the currently available empirical substitution models, the evolutionary process of these proteins. We conclude that evolutionary inferences from protein sequences are more accurate if they are based on protein-specific substitution models rather than taxonomic-specific (generalist) substitution models. We also present four new empirical substitution models of protein evolution that could be useful for phylogenetic inferences of viral protease and integrase.
Collapse
Affiliation(s)
- Roberto Del Amparo
- Centro de Investigacións Biomédicas (CINBIO), University of Vigo, 36310 Vigo, Spain;
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- Centro de Investigacións Biomédicas (CINBIO), University of Vigo, 36310 Vigo, Spain;
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
11
|
Marcos ML, Echave J. The variation among sites of protein structure divergence is shaped by mutation and scaled by selection. Curr Res Struct Biol 2021; 2:156-163. [PMID: 34235475 PMCID: PMC8244499 DOI: 10.1016/j.crstbi.2020.08.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 07/09/2020] [Accepted: 08/17/2020] [Indexed: 12/30/2022] Open
Abstract
Protein structures do not evolve uniformly, but the degree of structure divergence varies among sites. The resulting site-dependent structure divergence patterns emerge from a process that involves mutation and selection, which may both, in principle, influence the emergent pattern. In contrast with sequence divergence patterns, which are known to be mainly determined by selection, the relative contributions of mutation and selection to structure divergence patterns is unclear. Here, studying 6 protein families with a mechanistic biophysical model of protein evolution, we untangle the effects of mutation and selection. We found that even in the absence of selection, structure divergence varies from site to site because the mutational sensitivity is not uniform. Selection scales the profile, increasing its amplitude, without changing its shape. This scaling effect follows from the similarity between mutational sensitivity and sequence variability profiles. The degree of evolutionary divergence of protein structures varies among sites. A Mutation-Selection model (MSM) of protein structure evolution with selection for stability is developed. Even in the case of no selection, the sensitivity of the structure to random mutations varies among sites. Selection amplifies this variation but it does not affect its shape. This scaling effect of selection follows from the similarity between the selection-independent mutational sensitivity and the selection-dependent sequence divergence, the two contributions that are combined to produce the observed structural divergence profile.
Collapse
Affiliation(s)
- María Laura Marcos
- Instituto de Ciencias Físicas, Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, Martín de Irigoyen 3100, 1650 San Martín, Buenos Aires, Argentina
| | - Julian Echave
- Instituto de Ciencias Físicas, Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, Martín de Irigoyen 3100, 1650 San Martín, Buenos Aires, Argentina
| |
Collapse
|
12
|
Norn C, André I, Theobald DL. A thermodynamic model of protein structure evolution explains empirical amino acid substitution matrices. Protein Sci 2021; 30:2057-2068. [PMID: 34218472 PMCID: PMC8442976 DOI: 10.1002/pro.4155] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 06/25/2021] [Accepted: 06/29/2021] [Indexed: 12/30/2022]
Abstract
Proteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. These evolutionary pressures are sufficiently consistent over time and across protein families to produce substitution patterns, summarized in global amino acid substitution matrices such as BLOSUM, JTT, WAG, and LG, which can be used to successfully detect homologs, infer phylogenies, and reconstruct ancestral sequences. Although the factors that govern the variation of amino acid substitution rates have received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid substitution matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi‐nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex yet universal pattern observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary driver behind the global amino acid substitution patterns observed in proteins throughout the tree of life.
Collapse
Affiliation(s)
- Christoffer Norn
- Biochemistry and Structural Biology, Lund University, Lund, Sweden
| | - Ingemar André
- Biochemistry and Structural Biology, Lund University, Lund, Sweden
| | - Douglas L Theobald
- Biochemistry Department, Brandeis University, Waltham, Massachusetts, USA
| |
Collapse
|
13
|
Tao Q, Barba-Montoya J, Huuki LA, Durnan MK, Kumar S. Relative Efficiencies of Simple and Complex Substitution Models in Estimating Divergence Times in Phylogenomics. Mol Biol Evol 2021; 37:1819-1831. [PMID: 32119075 PMCID: PMC7253201 DOI: 10.1093/molbev/msaa049] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The conventional wisdom in molecular evolution is to apply parameter-rich models of nucleotide and amino acid substitutions for estimating divergence times. However, the actual extent of the difference between time estimates produced by highly complex models compared with those from simple models is yet to be quantified for contemporary data sets that frequently contain sequences from many species and genes. In a reanalysis of many large multispecies alignments from diverse groups of taxa, we found that the use of the simplest models can produce divergence time estimates and credibility intervals similar to those obtained from the complex models applied in the original studies. This result is surprising because the use of simple models underestimates sequence divergence for all the data sets analyzed. We found three fundamental reasons for the observed robustness of time estimates to model complexity in many practical data sets. First, the estimates of branch lengths and node-to-tip distances under the simplest model show an approximately linear relationship with those produced by using the most complex models applied on data sets with many sequences. Second, relaxed clock methods automatically adjust rates on branches that experience considerable underestimation of sequence divergences, resulting in time estimates that are similar to those from complex models. And, third, the inclusion of even a few good calibrations in an analysis can reduce the difference in time estimates from simple and complex models. The robustness of time estimates to model complexity in these empirical data analyses is encouraging, because all phylogenomics studies use statistical models that are oversimplified descriptions of actual evolutionary substitution processes.
Collapse
Affiliation(s)
- Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Jose Barba-Montoya
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Louise A Huuki
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Mary Kathleen Durnan
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
14
|
Del Amparo R, Branco C, Arenas J, Vicens A, Arenas M. Analysis of selection in protein-coding sequences accounting for common biases. Brief Bioinform 2021; 22:6105943. [PMID: 33479739 DOI: 10.1093/bib/bbaa431] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/17/2020] [Accepted: 12/22/2020] [Indexed: 12/16/2022] Open
Abstract
The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Jesús Arenas
- Unit of Microbiology and Immunology, University of Zaragoza, 50013 Zaragoza, Spain
| | - Alberto Vicens
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
15
|
Evolution of Protein Structure and Stability in Global Warming. Int J Mol Sci 2020; 21:ijms21249662. [PMID: 33352933 PMCID: PMC7767258 DOI: 10.3390/ijms21249662] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 12/15/2020] [Accepted: 12/16/2020] [Indexed: 12/12/2022] Open
Abstract
This review focuses on the molecular signatures of protein structures in relation to evolution and survival in global warming. It is based on the premise that the power of evolutionary selection may lead to thermotolerant organisms that will repopulate the planet and continue life in general, but perhaps with different kinds of flora and fauna. Our focus is on molecular mechanisms, whereby known examples of thermoresistance and their physicochemical characteristics were noted. A comparison of interactions of diverse residues in proteins from thermophilic and mesophilic organisms, as well as reverse genetic studies, revealed a set of imprecise molecular signatures that pointed to major roles of hydrophobicity, solvent accessibility, disulfide bonds, hydrogen bonds, ionic and π-electron interactions, and an overall condensed packing of the higher-order structure, especially in the hydrophobic regions. Regardless of mutations, specialized protein chaperones may play a cardinal role. In evolutionary terms, thermoresistance to global warming will likely occur in stepwise mutational changes, conforming to the molecular signatures, such that each "intermediate" fits a temporary niche through punctuated equilibrium, while maintaining protein functionality. Finally, the population response of different species to global warming may vary substantially, and, as such, some may evolve while others will undergo catastrophic mass extinction.
Collapse
|
16
|
Johnson MM, Wilke CO. Site-Specific Amino Acid Distributions Follow a Universal Shape. J Mol Evol 2020; 88:731-741. [PMID: 33230664 PMCID: PMC7717668 DOI: 10.1007/s00239-020-09976-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 11/17/2020] [Indexed: 11/25/2022]
Abstract
In many applications of evolutionary inference, a model of protein evolution needs to be fitted to the amino acid variation at individual sites in a multiple sequence alignment. Most existing models fall into one of two extremes: Either they provide a coarse-grained description that lacks biophysical realism (e.g., dN/dS models), or they require a large number of parameters to be fitted (e.g., mutation-selection models). Here, we ask whether a middle ground is possible: Can we obtain a realistic description of site-specific amino acid frequencies while severely restricting the number of free parameters in the model? We show that a distribution with a single free parameter can accurately capture the variation in amino acid frequency at most sites in an alignment, as long as we are willing to restrict our analysis to predicting amino acid frequencies by rank rather than by amino acid identity. This result holds equally well both in alignments of empirical protein sequences and of sequences evolved under a biophysically realistic all-atom force field. Our analysis reveals a near universal shape of the frequency distributions of amino acids. This insight has the potential to lead to new models of evolution that have both increased realism and a limited number of free parameters.
Collapse
Affiliation(s)
- Mackenzie M Johnson
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA.
| |
Collapse
|
17
|
Schwersensky M, Rooman M, Pucci F. Large-scale in silico mutagenesis experiments reveal optimization of genetic code and codon usage for protein mutational robustness. BMC Biol 2020; 18:146. [PMID: 33081759 PMCID: PMC7576759 DOI: 10.1186/s12915-020-00870-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 09/16/2020] [Indexed: 12/31/2022] Open
Abstract
Background How, and the extent to which, evolution acts on DNA and protein sequences to ensure mutational robustness and evolvability is a long-standing open question in the field of molecular evolution. We addressed this issue through the first structurome-scale computational investigation, in which we estimated the change in folding free energy upon all possible single-site mutations introduced in more than 20,000 protein structures, as well as through available experimental stability and fitness data. Results At the amino acid level, we found the protein surface to be more robust against random mutations than the core, this difference being stronger for small proteins. The destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, whereas the stabilizing mutations are about 4% in both regions. At the genetic code level, we observed smallest destabilization for mutations that are due to substitutions of base III in the codon, followed by base I, bases I+III, base II, and other multiple base substitutions. This ranking highly anticorrelates with the codon-anticodon mispairing frequency in the translation process. This suggests that the standard genetic code is optimized to limit the impact of random mutations, but even more so to limit translation errors. At the codon level, both the codon usage and the usage bias appear to optimize mutational robustness and translation accuracy, especially for surface residues. Conclusion Our results highlight the non-universality of mutational robustness and its multiscale dependence on protein features, the structure of the genetic code, and the codon usage. Our analyses and approach are strongly supported by available experimental mutagenesis data.
Collapse
Affiliation(s)
- Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium. .,Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium. .,Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| |
Collapse
|
18
|
Arenas M, Bastolla U. ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13341] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology University of Vigo Vigo Spain
- Biomedical Research Center (CINBIO) University of Vigo Vigo Spain
| | - Ugo Bastolla
- Bioinformatics Unit Centre for Molecular Biology Severo Ochoa (CSIC) Madrid Spain
| |
Collapse
|
19
|
Del Amparo R, Vicens A, Arenas M. The influence of heterogeneous codon frequencies along sequences on the estimation of molecular adaptation. Bioinformatics 2020; 36:430-436. [PMID: 31304972 DOI: 10.1093/bioinformatics/btz558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 07/08/2019] [Accepted: 07/11/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The nonsynonymous/synonymous substitution rate ratio (dN/dS) is a commonly used parameter to quantify molecular adaptation in protein-coding data. It is known that the estimation of dN/dS can be biased if some evolutionary processes are ignored. In this concern, common ML methods to estimate dN/dS assume invariable codon frequencies among sites, despite this characteristic is rare in nature, and it could bias the estimation of this parameter. RESULTS Here we studied the influence of variable codon frequencies among genetic regions on the estimation of dN/dS. We explored scenarios varying the number of genetic regions that differ in codon frequencies, the amount of variability of codon frequencies among regions and the nucleotide frequencies at each codon position among regions. We found that ignoring heterogeneous codon frequencies among regions overall leads to underestimation of dN/dS and the bias increases with the level of heterogeneity of codon frequencies. Interestingly, we also found that varying nucleotide frequencies among regions at the first or second codon position leads to underestimation of dN/dS while variation at the third codon position leads to overestimation of dN/dS. Next, we present a methodology to reduce this bias based on the analysis of partitions presenting similar codon frequencies and we applied it to analyze four real datasets. We conclude that accounting for heterogeneous codon frequencies along sequences is required to obtain realistic estimates of molecular adaptation through this relevant evolutionary parameter. AVAILABILITY AND IMPLEMENTATION The applied frameworks for the computer simulations of protein-coding data and estimation of molecular adaptation are SGWE and PAML, respectively. Both are publicly available and referenced in the study. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Roberto Del Amparo
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| | - Alberto Vicens
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
20
|
Pascual-García A, Arenas M, Bastolla U. The Molecular Clock in the Evolution of Protein Structures. Syst Biol 2019; 68:987-1002. [DOI: 10.1093/sysbio/syz022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 03/20/2019] [Accepted: 04/09/2019] [Indexed: 12/11/2022] Open
Abstract
Abstract
The molecular clock hypothesis, which states that substitutions accumulate in protein sequences at a constant rate, plays a fundamental role in molecular evolution but it is violated when selective or mutational processes vary with time. Such violations of the molecular clock have been widely investigated for protein sequences, but not yet for protein structures. Here, we introduce a novel statistical test (Significant Clock Violations) and perform a large scale assessment of the molecular clock in the evolution of both protein sequences and structures in three large superfamilies. After validating our method with computer simulations, we find that clock violations are generally consistent in sequence and structure evolution, but they tend to be larger and more significant in structure evolution. Moreover, changes of function assessed through Gene Ontology and InterPro terms are associated with large and significant clock violations in structure evolution. We found that almost one third of significant clock violations are significant in structure evolution but not in sequence evolution, highlighting the advantage to use structure information for assessing accelerated evolution and gathering hints of positive selection. Clock violations between closely related pairs are frequently significant in sequence evolution, consistent with the observed time dependence of the substitution rate attributed to segregation of neutral and slightly deleterious polymorphisms, but not in structure evolution, suggesting that these substitutions do not affect protein structure although they may affect stability. These results are consistent with the view that natural selection, both negative and positive, constrains more strongly protein structures than protein sequences. Our code for computing clock violations is freely available at https://github.com/ugobas/Molecular_clock.
Collapse
Affiliation(s)
- Alberto Pascual-García
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, UK
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
| | - Miguel Arenas
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Spain
| | - Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
21
|
The Influence of Protein Stability on Sequence Evolution: Applications to Phylogenetic Inference. Methods Mol Biol 2019; 1851:215-231. [PMID: 30298399 DOI: 10.1007/978-1-4939-8736-8_11] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Phylogenetic inference from protein data is traditionally based on empirical substitution models of evolution that assume that protein sites evolve independently of each other and under the same substitution process. However, it is well known that the structural properties of a protein site in the native state affect its evolution, in particular the sequence entropy and the substitution rate. Starting from the seminal proposal by Halpern and Bruno, where structural properties are incorporated in the evolutionary model through site-specific amino acid frequencies, several models have been developed to tackle the influence of protein structure on sequence evolution. Here we describe stability-constrained substitution (SCS) models that explicitly consider the stability of the native state against both unfolded and misfolded states. One of them, the mean-field model, provides an independent sites approximation that can be readily incorporated in maximum likelihood methods of phylogenetic inference, including ancestral sequence reconstruction. Next, we describe its validation with simulated and real proteins and its limitations and advantages with respect to empirical models that lack site specificity. We finally provide guidelines and recommendations to analyze protein data accounting for stability constraints, including computer simulations and inferences of protein evolution based on maximum likelihood. Some practical examples are included to illustrate these procedures.
Collapse
|
22
|
Jiménez-Santos MJ, Arenas M, Bastolla U. Influence of mutation bias and hydrophobicity on the substitution rates and sequence entropies of protein evolution. PeerJ 2018; 6:e5549. [PMID: 30310736 PMCID: PMC6174885 DOI: 10.7717/peerj.5549] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 08/10/2018] [Indexed: 01/13/2023] Open
Abstract
The number of amino acids that occupy a given protein site during evolution reflects the selective constraints operating on the site. This evolutionary variability is strongly influenced by the structural properties of the site in the native structure, and it is quantified either through sequence entropy or through substitution rates. However, while the sequence entropy only depends on the equilibrium frequencies of the amino acids, the substitution rate also depends on the exchangeability matrix that describes mutations in the mathematical model of the substitution process. Here we apply two variants of a mathematical model of protein evolution with selection for protein stability, both against unfolding and against misfolding. Exploiting the approximation of independent sites, these models allow computing site-specific substitution processes that satisfy global constraints on folding stability. We find that site-specific substitution rates do not depend only on the selective constraints acting on the site, quantified through its sequence entropy. In fact, polar sites evolve faster than hydrophobic sites even for equal sequence entropy, as a consequence of the fact that polar amino acids are characterized by higher mutational exchangeability than hydrophobic ones. Accordingly, the model predicts that more polar proteins tend to evolve faster. Nevertheless, these results change if we compare proteins that evolve under different mutation biases, such as orthologous proteins in different bacterial genomes. In this case, the substitution rates are faster in genomes that evolve under mutational bias that favor hydrophobic amino acids by preferentially incorporating the nucleotide Thymine that is more frequent in hydrophobic codons. This appearingly contradictory result arises because buried sites occupied by hydrophobic amino acids are characterized by larger selective factors that largely amplify the substitution rate between hydrophobic amino acids, while the selective factors of exposed sites have a weaker effect. Thus, changes in the mutational bias produce deep effects on the biophysical properties of the protein (hydrophobicity) and on its evolutionary properties (sequence entropy and substitution rate) at the same time. The program Prot_evol that implements the two site-specific substitution processes is freely available at https://ub.cbm.uam.es/prot_fold_evol/prot_fold_evol_soft_main.php#Prot_Evol.
Collapse
Affiliation(s)
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Ugo Bastolla
- Bioinformatics Unit, Center for Molecular Biology Severo Ochoa, CSIC-UAM, Madrid, Spain
| |
Collapse
|
23
|
Using the Mutation-Selection Framework to Characterize Selection on Protein Sequences. Genes (Basel) 2018; 9:genes9080409. [PMID: 30104502 PMCID: PMC6115872 DOI: 10.3390/genes9080409] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 08/02/2018] [Accepted: 08/09/2018] [Indexed: 12/13/2022] Open
Abstract
When mutational pressure is weak, the generative process of protein evolution involves explicit probabilities of mutations of different types coupled to their conditional probabilities of fixation dependent on selection. Establishing this mechanistic modeling framework for the detection of selection has been a goal in the field of molecular evolution. Building on a mathematical framework proposed more than a decade ago, numerous methods have been introduced in an attempt to detect and measure selection on protein sequences. In this review, we discuss the structure of the original model, subsequent advances, and the series of assumptions that these models operate under.
Collapse
|
24
|
Beyond Thermodynamic Constraints: Evolutionary Sampling Generates Realistic Protein Sequence Variation. Genetics 2018; 208:1387-1395. [PMID: 29382650 DOI: 10.1534/genetics.118.300699] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Accepted: 01/25/2018] [Indexed: 01/01/2023] Open
Abstract
Biological evolution generates a surprising amount of site-specific variability in protein sequences. Yet, attempts at modeling this process have been only moderately successful, and current models based on protein structural metrics explain, at best, 60% of the observed variation. Surprisingly, simple measures of protein structure, such as solvent accessibility, are often better predictors of site-specific variability than more complex models employing all-atom energy functions and detailed structural modeling. We suggest here that these more complex models perform poorly because they lack consideration of the evolutionary process, which is, in part, captured by the simpler metrics. We compare protein sequences that are computationally designed to sequences that are computationally evolved using the same protein-design energy function and to homologous natural sequences. We find that, by a wide variety of metrics, evolved sequences are much more similar to natural sequences than are designed sequences. In particular, designed sequences are too conserved on the protein surface relative to natural sequences, whereas evolved sequences are not. Our results suggest that evolutionary simulation produces a realistic sampling of sequence space. By contrast, protein design-at least as currently implemented-does not. Existing energy functions seem to be sufficiently accurate to correctly describe the key thermodynamic constraints acting on protein sequences, but they need to be paired with realistic sampling schemes to generate realistic sequence alignments.
Collapse
|
25
|
Jimenez MJ, Arenas M, Bastolla U. Substitution Rates Predicted by Stability-Constrained Models of Protein Evolution Are Not Consistent with Empirical Data. Mol Biol Evol 2017; 35:743-755. [PMID: 29294047 DOI: 10.1093/molbev/msx327] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Protein structures strongly influence molecular evolution. In particular, the evolutionary rate of a protein site depends on the number of its native contacts. Stability-constrained models of protein evolution consider this influence of protein structure on evolution by predicting the effect of mutations on the stability of the native state, but they currently neglect how mutations affect the protein structure. These models predict that buried protein sites with more native contacts are more constrained by natural selection and less variable, as observed. Nevertheless, previous work did not consider the stability against compact misfolded conformations, although it is known that the negative design that destabilizes these misfolded conformations influences protein evolution significantly. Here, we show that stability-constrained models that consider misfolding predict that site-specific sequence entropy and substitution rate peak at amphiphilic sites with an intermediate number of contacts, as these sites are less constrained than exposed sites with few contacts whose hydrophobicity must be limited. This result holds both for a mean-field model with independent sites and for a pairwise model that takes as a reference the wild-type sequence, but it contrasts with the observations that indicate that the entropy and the substitution rate decrease monotonically with the number of contacts. Our work suggests that stability-constrained models overestimate the tolerance of amphiphilic sites against mutations, either because of the limits of the free energy function or, more importantly in our opinion, because they do not consider how mutations perturb the native protein structure.
Collapse
Affiliation(s)
- María José Jimenez
- Centro de Biologia Molecular "Severo Ochoa" CSIC-UAM Cantoblanco, Madrid, Spain
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Ugo Bastolla
- Centro de Biologia Molecular "Severo Ochoa" CSIC-UAM Cantoblanco, Madrid, Spain
| |
Collapse
|
26
|
de la Higuera I, Ferrer-Orta C, de Ávila AI, Perales C, Sierra M, Singh K, Sarafianos SG, Dehouck Y, Bastolla U, Verdaguer N, Domingo E. Molecular and Functional Bases of Selection against a Mutation Bias in an RNA Virus. Genome Biol Evol 2017; 9:1212-1228. [PMID: 28460010 PMCID: PMC5433387 DOI: 10.1093/gbe/evx075] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2017] [Indexed: 12/12/2022] Open
Abstract
The selective pressures acting on viruses that replicate under enhanced mutation rates are largely unknown. Here, we describe resistance of foot-and-mouth disease virus to the mutagen 5-fluorouracil (FU) through a single polymerase substitution that prevents an excess of A to G and U to C transitions evoked by FU on the wild-type foot-and-mouth disease virus, while maintaining the same level of mutant spectrum complexity. The polymerase substitution inflicts upon the virus a fitness loss during replication in absence of FU but confers a fitness gain in presence of FU. The compensation of mutational bias was documented by in vitro nucleotide incorporation assays, and it was associated with structural modifications at the N-terminal region and motif B of the viral polymerase. Predictions of the effect of mutations that increase the frequency of G and C in the viral genome and encoded polymerase suggest multiple points in the virus life cycle where the mutational bias in favor of G and C may be detrimental. Application of predictive algorithms suggests adverse effects of the FU-directed mutational bias on protein stability. The results reinforce modulation of nucleotide incorporation as a lethal mutagenesis-escape mechanism (that permits eluding virus extinction despite replication in the presence of a mutagenic agent) and suggest that mutational bias can be a target of selection during virus replication.
Collapse
Affiliation(s)
- Ignacio de la Higuera
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
| | - Cristina Ferrer-Orta
- Institut de Biologia Molecular de Barcelona (CSIC), Parc Científic de Barcelona, Barcelona, Spain
| | - Ana I de Ávila
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
| | - Celia Perales
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Barcelona, Spain.,Liver Unit, Internal Medicine, Laboratory of Malalties Hepàtiques, Vall d'Hebron Institut de Recerca-Hospital Universitari Vall d'Hebron (VHIR-HUVH), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Macarena Sierra
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
| | - Kamalendra Singh
- Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
| | - Stefan G Sarafianos
- Christopher S. Bond Life Sciences Center and Department of Molecular Microbiology & Immunology, School of Medicine, University of Missouri, Columbia, Missouri
| | - Yves Dehouck
- Machine Learning Group, Université Libre de Bruxelles (ULB), Brussels, Belgium
| | - Ugo Bastolla
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
| | - Nuria Verdaguer
- Institut de Biologia Molecular de Barcelona (CSIC), Parc Científic de Barcelona, Barcelona, Spain
| | - Esteban Domingo
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Barcelona, Spain
| |
Collapse
|
27
|
Echave J, Wilke CO. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 2017; 46:85-103. [PMID: 28301766 DOI: 10.1146/annurev-biophys-070816-033819] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
For decades, rates of protein evolution have been interpreted in terms of the vague concept of functional importance. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating these sites has a large impact on protein structure and stability. In this article, we review the studies in the emerging field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina; .,Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Texas 78712;
| |
Collapse
|
28
|
Bastolla U, Dehouck Y, Echave J. What evolution tells us about protein physics, and protein physics tells us about evolution. Curr Opin Struct Biol 2017; 42:59-66. [DOI: 10.1016/j.sbi.2016.10.020] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Revised: 10/19/2016] [Accepted: 10/24/2016] [Indexed: 12/21/2022]
|
29
|
Bloom JD. Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models. Biol Direct 2017; 12:1. [PMID: 28095902 PMCID: PMC5240389 DOI: 10.1186/s13062-016-0172-z] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 12/14/2016] [Indexed: 12/23/2022] Open
Abstract
Background Sites of positive selection are identified by comparing observed evolutionary patterns to those expected under a null model for evolution in the absence of such selection. For protein-coding genes, the most common null model is that nonsynonymous and synonymous mutations fix at equal rates; this unrealistic model has limited power to detect many interesting forms of selection. Results I describe a new approach that uses a null model based on experimental measurements of a gene’s site-specific amino-acid preferences generated by deep mutational scanning in the lab. This null model makes it possible to identify both diversifying selection for repeated amino-acid change and differential selection for mutations to amino acids that are unexpected given the measurements made in the lab. I show that this approach identifies sites of adaptive substitutions in four genes (lactamase, Gal4, influenza nucleoprotein, and influenza hemagglutinin) far better than a comparable method that simply compares the rates of nonsynonymous and synonymous substitutions. Conclusions As rapid increases in biological data enable increasingly nuanced descriptions of the constraints on individual protein sites, approaches like the one here can improve our ability to identify many interesting forms of selection in natural sequences. Reviewers This article was reviewed by Sebastian Maurer-Stroh, Olivier Tenaillon, and Tal Pupko. All three reviewers are members of the Biology Direct editorial board. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0172-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, 98109, WA, USA.
| |
Collapse
|
30
|
Arenas M. Trends in substitution models of molecular evolution. Front Genet 2015; 6:319. [PMID: 26579193 PMCID: PMC4620419 DOI: 10.3389/fgene.2015.00319] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 10/09/2015] [Indexed: 11/13/2022] Open
Abstract
Substitution models of evolution describe the process of genetic variation through fixed mutations and constitute the basis of the evolutionary analysis at the molecular level. Almost 40 years after the development of first substitution models, highly sophisticated, and data-specific substitution models continue emerging with the aim of better mimicking real evolutionary processes. Here I describe current trends in substitution models of DNA, codon and amino acid sequence evolution, including advantages and pitfalls of the most popular models. The perspective concludes that despite the large number of currently available substitution models, further research is required for more realistic modeling, especially for DNA coding and amino acid data. Additionally, the development of more accurate complex models should be coupled with new implementations and improvements of methods and frameworks for substitution model selection and downstream evolutionary analysis.
Collapse
Affiliation(s)
- Miguel Arenas
- Institute of Molecular Pathology and Immunology of the University of Porto Porto, Portugal
| |
Collapse
|