1
|
Metzger BPH, Park Y, Starr TN, Thornton JW. Epistasis facilitates functional evolution in an ancient transcription factor. eLife 2024; 12:RP88737. [PMID: 38767330 PMCID: PMC11105156 DOI: 10.7554/elife.88737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open
Abstract
A protein's genetic architecture - the set of causal rules by which its sequence produces its functions - also determines its possible evolutionary trajectories. Prior research has proposed that the genetic architecture of proteins is very complex, with pervasive epistatic interactions that constrain evolution and make function difficult to predict from sequence. Most of this work has analyzed only the direct paths between two proteins of interest - excluding the vast majority of possible genotypes and evolutionary trajectories - and has considered only a single protein function, leaving unaddressed the genetic architecture of functional specificity and its impact on the evolution of new functions. Here, we develop a new method based on ordinal logistic regression to directly characterize the global genetic determinants of multiple protein functions from 20-state combinatorial deep mutational scanning (DMS) experiments. We use it to dissect the genetic architecture and evolution of a transcription factor's specificity for DNA, using data from a combinatorial DMS of an ancient steroid hormone receptor's capacity to activate transcription from two biologically relevant DNA elements. We show that the genetic architecture of DNA recognition consists of a dense set of main and pairwise effects that involve virtually every possible amino acid state in the protein-DNA interface, but higher-order epistasis plays only a tiny role. Pairwise interactions enlarge the set of functional sequences and are the primary determinants of specificity for different DNA elements. They also massively expand the number of opportunities for single-residue mutations to switch specificity from one DNA target to another. By bringing variants with different functions close together in sequence space, pairwise epistasis therefore facilitates rather than constrains the evolution of new functions.
Collapse
Affiliation(s)
- Brian PH Metzger
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
| | - Yeonwoo Park
- Program in Genetics, Genomics, and Systems Biology, University of ChicagoChicagoUnited States
| | - Tyler N Starr
- Department of Biochemistry and Molecular Biophysics, University of ChicagoChicagoUnited States
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
- Department of Human Genetics, University of ChicagoChicagoUnited States
| |
Collapse
|
2
|
Li X, Habibipour S, Chou T, Yang OO. The role of APOBEC3-induced mutations in the differential evolution of monkeypox virus. Virus Evol 2023; 9:vead058. [PMID: 37841642 PMCID: PMC10569380 DOI: 10.1093/ve/vead058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 09/03/2023] [Accepted: 09/18/2023] [Indexed: 10/17/2023] Open
Abstract
Recent studies show that newly sampled monkeypox virus (MPXV) genomes exhibit mutations consistent with Apolipoprotein B mRNA Editing Catalytic Polypeptide-like3 (APOBEC3)-mediated editing compared to MPXV genomes collected earlier. It is unclear whether these single-nucleotide polymorphisms (SNPs) result from APOBEC3-induced editing or are a consequence of genetic drift within one or more MPXV animal reservoirs. We develop a simple method based on a generalization of the General-Time-Reversible model to show that the observed SNPs are likely the result of APOBEC3-induced editing. The statistical features allow us to extract lineage information and estimate evolutionary events.
Collapse
Affiliation(s)
- Xiangting Li
- Department of Computational Medicine, UCLA, Los Angeles, CA, United States
| | - Sara Habibipour
- Departments of Medicine and Microbiology, Immunology, and Molecular Genetics, UCLA, Los Angeles, CA, United States
| | - Tom Chou
- Department of Computational Medicine, UCLA, Los Angeles, CA, United States
- Department of Mathematics, UCLA, Los Angeles, CA, United States
| | - Otto O Yang
- Departments of Medicine and Microbiology, Immunology, and Molecular Genetics, UCLA, Los Angeles, CA, United States
| |
Collapse
|
3
|
Li X, Habibipour S, Chou T, Yang OO. The role of APOBEC3-induced mutations in the differential evolution of monkeypox virus. ARXIV 2023:arXiv:2308.03714v1. [PMID: 37608930 PMCID: PMC10441442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Recent studies show that newly sampled monkeypox virus (MPXV) genomes exhibit mutations consistent with Apolipoprotein B mRNA Editing Catalytic Polypeptide-like3 (APOBEC3)-mediated editing, compared to MPXV genomes collected earlier. It is unclear whether these single nucleotide polymorphisms (SNPs) result from APOBEC3-induced editing or are a consequence of genetic drift within one or more MPXV animal reservoirs. We develop a simple method based on a generalization of the General-Time-Reversible (GTR) model to show that the observed SNPs are likely the result of APOBEC3-induced editing. The statistical features allow us to extract lineage information and estimate evolutionary events.
Collapse
Affiliation(s)
- Xiangting Li
- Department of Computational Medicine, UCLA, Los Angeles, CA, United States
| | - Sara Habibipour
- Depts. of Medicine and Microbiology, Immunology, and Molecular Genetics, UCLA, Los Angeles, CA, United States
| | - Tom Chou
- Department of Computational Medicine, UCLA, Los Angeles, CA, United States
- Department of Mathematics, UCLA, Los Angeles, CA, United States
| | - Otto O Yang
- Depts. of Medicine and Microbiology, Immunology, and Molecular Genetics, UCLA, Los Angeles, CA, United States
| |
Collapse
|
4
|
Duchemin L, Lanore V, Veber P, Boussau B. Evaluation of Methods to Detect Shifts in Directional Selection at the Genome Scale. Mol Biol Evol 2022; 40:6889995. [PMID: 36510704 PMCID: PMC9940701 DOI: 10.1093/molbev/msac247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 10/24/2022] [Accepted: 10/26/2022] [Indexed: 12/15/2022] Open
Abstract
Identifying the footprints of selection in coding sequences can inform about the importance and function of individual sites. Analyses of the ratio of nonsynonymous to synonymous substitutions (dN/dS) have been widely used to pinpoint changes in the intensity of selection, but cannot distinguish them from changes in the direction of selection, that is, changes in the fitness of specific amino acids at a given position. A few methods that rely on amino-acid profiles to detect changes in directional selection have been designed, but their performances have not been well characterized. In this paper, we investigate the performance of six of these methods. We evaluate them on simulations along empirical phylogenies in which transition events have been annotated and compare their ability to detect sites that have undergone changes in the direction or intensity of selection to that of a widely used dN/dS approach, codeml's branch-site model A. We show that all methods have reduced performance in the presence of biased gene conversion but not CpG hypermutability. The best profile method, Pelican, a new implementation of Tamuri AU, Hay AJ, Goldstein RA. (2009. Identifying changes in selective constraints: host shifts in influenza. PLoS Comput Biol. 5(11):e1000564), performs as well as codeml in a range of conditions except for detecting relaxations of selection, and performs better when tree length increases, or in the presence of persistent positive selection. It is fast, enabling genome-scale searches for site-wise changes in the direction of selection associated with phenotypic changes.
Collapse
Affiliation(s)
| | - Vincent Lanore
- Laboratoire de Biométrie et Biologie Evolutive, Univ Lyon, Univ Lyon 1, CNRS, VetAgro Sup, UMR5558, Villeurbanne, France
| | - Philippe Veber
- Laboratoire de Biométrie et Biologie Evolutive, Univ Lyon, Univ Lyon 1, CNRS, VetAgro Sup, UMR5558, Villeurbanne, France
| | | |
Collapse
|
5
|
Latrille T, Lartillot N. An Improved Codon Modeling Approach for Accurate Estimation of the Mutation Bias. Mol Biol Evol 2022; 39:6503505. [PMID: 35021218 PMCID: PMC8831783 DOI: 10.1093/molbev/msac005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Phylogenetic codon models are routinely used to characterize selective regimes in coding sequences. Their parametric design, however, is still a matter of debate, in particular concerning the question of how to account for differing nucleotide frequencies and substitution rates. This problem relates to the fact that nucleotide composition in protein-coding sequences is the result of the interactions between mutation and selection. In particular, because of the structure of the genetic code, the nucleotide composition differs between the three coding positions, with the third position showing a more extreme composition. Yet, phylogenetic codon models do not correctly capture this phenomenon and instead predict that the nucleotide composition should be the same for all three positions. Alternatively, some models allow for different nucleotide rates at the three positions, an approach conflating the effects of mutation and selection on nucleotide composition. In practice, it results in inaccurate estimation of the strength of selection. Conceptually, the problem comes from the fact that phylogenetic codon models do not correctly capture the fixation bias acting against the mutational pressure at the mutation–selection equilibrium. To address this problem and to more accurately identify mutation rates and selection strength, we present an improved codon modeling approach where the fixation rate is not seen as a scalar, but as a tensor. This approach gives an accurate representation of how mutation and selection oppose each other at equilibrium and yields a reliable estimate of the mutational process, while disentangling the mean fixation probabilities prevailing in different mutational directions.
Collapse
Affiliation(s)
- T Latrille
- CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR, Université de Lyon, Université Lyon 1, 5558, Villeurbanne, F-69622, France.,École Normale Supérieure de Lyon, Université de Lyon, Université Lyon 1, Lyon, France
| | - N Lartillot
- CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR, Université de Lyon, Université Lyon 1, 5558, Villeurbanne, F-69622, France
| |
Collapse
|
6
|
Youssef N, Susko E, Roger AJ, Bielawski JP. Shifts in amino acid preferences as proteins evolve: A synthesis of experimental and theoretical work. Protein Sci 2021; 30:2009-2028. [PMID: 34322924 PMCID: PMC8442975 DOI: 10.1002/pro.4161] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/19/2021] [Accepted: 07/26/2021] [Indexed: 11/08/2022]
Abstract
Amino acid preferences vary across sites and time. While variation across sites is widely accepted, the extent and frequency of temporal shifts are contentious. Our understanding of the drivers of amino acid preference change is incomplete: To what extent are temporal shifts driven by adaptive versus nonadaptive evolutionary processes? We review phenomena that cause preferences to vary (e.g., evolutionary Stokes shift, contingency, and entrenchment) and clarify how they differ. To determine the extent and prevalence of shifted preferences, we review experimental and theoretical studies. Analyses of natural sequence alignments often detect decreases in homoplasy (convergence and reversions) rates, and variation in replacement rates with time-signals that are consistent with temporally changing preferences. While approaches inferring shifts in preferences from patterns in natural alignments are valuable, they are indirect since multiple mechanisms (both adaptive and nonadaptive) could lead to the observed signal. Alternatively, site-directed mutagenesis experiments allow for a more direct assessment of shifted preferences. They corroborate evidence from multiple sequence alignments, revealing that the preference for an amino acid at a site varies depending on the background sequence. However, shifts in preferences are usually minor in magnitude and sites with significantly shifted preferences are low in frequency. The small yet consistent perturbations in preferences could, nevertheless, jeopardize the accuracy of inference procedures, which assume constant preferences. We conclude by discussing if and how such shifts in preferences might influence widely used time-homogenous inference procedures and potential ways to mitigate such effects.
Collapse
Affiliation(s)
- Noor Youssef
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Edward Susko
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| | - Andrew J. Roger
- Department of Biochemistry and Molecular BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Joseph P. Bielawski
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| |
Collapse
|
7
|
Spielman SJ. Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics. Mol Biol Evol 2021; 37:2110-2123. [PMID: 32191313 PMCID: PMC7306691 DOI: 10.1093/molbev/msaa075] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
It is regarded as best practice in phylogenetic reconstruction to perform relative model selection to determine an appropriate evolutionary model for the data. This procedure ranks a set of candidate models according to their goodness of fit to the data, commonly using an information theoretic criterion. Users then specify the best-ranking model for inference. Although it is often assumed that better-fitting models translate to increase accuracy, recent studies have shown that the specific model employed may not substantially affect inferences. We examine whether there is a systematic relationship between relative model fit and topological inference accuracy in protein phylogenetics, using simulations and real sequences. Simulations employed site-heterogeneous mechanistic codon models that are distinct from protein-level phylogenetic inference models, allowing us to investigate how protein models performs when they are misspecified to the data, as will be the case for any real sequence analysis. We broadly find that phylogenies inferred across models with vastly different fits to the data produce highly consistent topologies. We additionally find that all models infer similar proportions of false-positive splits, raising the possibility that all available models of protein evolution are similarly misspecified. Moreover, we find that the parameter-rich GTR (general time reversible) model, whose amino acid exchangeabilities are free parameters, performs similarly to models with fixed exchangeabilities, although the inference precision associated with GTR models was not examined. We conclude that, although relative model selection may not hinder phylogenetic analysis on protein data, it may not offer specific predictable improvements and is not a reliable proxy for accuracy.
Collapse
|
8
|
Saad-Roy CM, Arinaminpathy N, Wingreen NS, Levin SA, Akey JM, Grenfell BT. Implications of localized charge for human influenza A H1N1 hemagglutinin evolution: Insights from deep mutational scans. PLoS Comput Biol 2020; 16:e1007892. [PMID: 32584807 PMCID: PMC7316228 DOI: 10.1371/journal.pcbi.1007892] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 04/21/2020] [Indexed: 01/01/2023] Open
Abstract
Seasonal influenza A viruses of humans evolve rapidly due to strong selection pressures from host immune responses, principally on the hemagglutinin (HA) viral surface protein. Based on mouse transmission experiments, a proposed mechanism for immune evasion consists of increased avidity to host cellular receptors, mediated by electrostatic charge interactions with negatively charged cell surfaces. In support of this, the HA charge of the globally circulating H3N2 has increased over time since its pandemic. However, the same trend was not seen in H1N1 HA sequences. This is counter-intuitive, since immune escape due to increased avidity (due itself to an increase in charge) was determined experimentally. Here, we explore whether patterns of local charge of H1N1 HA can explain this discrepancy and thus further associate electrostatic charge with immune escape and viral evolutionary dynamics. Measures of site-wise functional selection and expected charge computed from deep mutational scan data on an early H1N1 HA yield a striking division of residues into three groups, separated by charge. We then explored evolutionary dynamics of these groups from 1918 to 2008. In particular, one group increases in net charge over time and consists of sites that are evolving the fastest, that are closest to the receptor binding site (RBS), and that are exposed to solvent (i.e., on the surface). By contrast, another group decreases in net charge and consists of sites that are further away from the RBS and evolving slower, but also exposed to solvent. The last group consists of those sites in the HA core, with no change in net charge and that evolve very slowly. Thus, there is a group of residues that follows the same trend as seen for the entire H3N2 HA. It is possible that the H1N1 HA is under other biophysical constraints that result in compensatory decreases in charge elsewhere on the protein. Our results implicate localized charge in HA interactions with host cells, and highlight how deep mutational scan data can inform evolutionary hypotheses. The hemagglutinin (HA) surface protein of influenza A viruses evolves rapidly to evade host immunity, leading to sizable yearly epidemics in human populations. Previous transmission experiments with H1N1 in mice have tied immune escape to an increase in HA avidity for cellular receptors, mediated by electrostatic charge. Furthermore, retrospective sequence analyses from a previous study confirmed that the HA of circulating global H3N2 has increased in net charge, yet surprisingly, that of H1N1 has not varied significantly. How is a stable net charge related to local patterns of H1N1 HA charge in response to selection? To elucidate the role of local electrostatic charge in host-virus interactions, we investigate characteristics of local charge on the H1N1 HA using functional data from deep mutational scan experiments. Combining measures of functional selection and expected charge at each site on DMS data from a 1933 H1N1 HA yields a striking visual pattern that identifies three groups of sites that have different biophysical properties and that prove to have distinct evolutionary patterns in natural human sequences. Essentially, we find evidence for an increase in charge near the receptor binding site and for compensatory changes elsewhere on the protein. Thus, our findings may reconcile disparate results from transmission experiments and natural sequence analyses, and highlight the importance of local properties of the HA protein. Overall, our findings further support the hypothesis of immune escape due to increased HA avidity to cellular receptors mediated by electrostatic charge. More generally, our work illustrates a novel method of leveraging deep mutational scans in conjunction with natural sequences to shed light on virus-host interactions.
Collapse
Affiliation(s)
- Chadi M. Saad-Roy
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- * E-mail: (CMSR); (BTG)
| | - Nimalan Arinaminpathy
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| | - Ned S. Wingreen
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- Department of Molecular Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Simon A. Levin
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Joshua M. Akey
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Bryan T. Grenfell
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
- Woodrow Wilson School of Public and International Affairs, Princeton University, Princeton, New Jersey, United States of America
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (CMSR); (BTG)
| |
Collapse
|
9
|
Qiu X, Duvvuri VR, Bahl J. Computational Approaches and Challenges to Developing Universal Influenza Vaccines. Vaccines (Basel) 2019; 7:E45. [PMID: 31141933 PMCID: PMC6631137 DOI: 10.3390/vaccines7020045] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 05/15/2019] [Accepted: 05/23/2019] [Indexed: 12/25/2022] Open
Abstract
The traditional design of effective vaccines for rapidly-evolving pathogens, such as influenza A virus, has failed to provide broad spectrum and long-lasting protection. With low cost whole genome sequencing technology and powerful computing capabilities, novel computational approaches have demonstrated the potential to facilitate the design of a universal influenza vaccine. However, few studies have integrated computational optimization in the design and discovery of new vaccines. Understanding the potential of computational vaccine design is necessary before these approaches can be implemented on a broad scale. This review summarizes some promising computational approaches under current development, including computationally optimized broadly reactive antigens with consensus sequences, phylogenetic model-based ancestral sequence reconstruction, and immunomics to compute conserved cross-reactive T-cell epitopes. Interactions between virus-host-environment determine the evolvability of the influenza population. We propose that with the development of novel technologies that allow the integration of data sources such as protein structural modeling, host antibody repertoire analysis and advanced phylodynamic modeling, computational approaches will be crucial for the development of a long-lasting universal influenza vaccine. Taken together, computational approaches are powerful and promising tools for the development of a universal influenza vaccine with durable and broad protection.
Collapse
Affiliation(s)
- Xueting Qiu
- Center for Ecology of Infectious Diseases, Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA 30602, USA.
| | - Venkata R Duvvuri
- Center for Ecology of Infectious Diseases, Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA 30602, USA.
| | - Justin Bahl
- Center for Ecology of Infectious Diseases, Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA 30602, USA.
- Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA 30606, USA.
- Duke-NUS Graduate Medical School, Singapore 169857, Singapore.
| |
Collapse
|
10
|
Hilton SK, Bloom JD. Modeling site-specific amino-acid preferences deepens phylogenetic estimates of viral sequence divergence. Virus Evol 2018; 4:vey033. [PMID: 30425841 PMCID: PMC6220371 DOI: 10.1093/ve/vey033] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Molecular phylogenetics is often used to estimate the time since the divergence of modern gene sequences. For highly diverged sequences, such phylogenetic techniques sometimes estimate surprisingly recent divergence times. In the case of viruses, independent evidence indicates that the estimates of deep divergence times from molecular phylogenetics are sometimes too recent. This discrepancy is caused in part by inadequate models of purifying selection leading to branch-length underestimation. Here we examine the effect on branch-length estimation of using models that incorporate experimental measurements of purifying selection. We find that models informed by experimentally measured site-specific amino-acid preferences estimate longer deep branches on phylogenies of influenza virus hemagglutinin. This lengthening of branches is due to more realistic stationary states of the models, and is mostly independent of the branch-length extension from modeling site-to-site variation in amino-acid substitution rate. The branch-length extension from experimentally informed site-specific models is similar to that achieved by other approaches that allow the stationary state to vary across sites. However, the improvements from all of these site-specific but time homogeneous and site independent models are limited by the fact that a protein’s amino-acid preferences gradually shift as it evolves. Overall, our work underscores the importance of modeling site-specific amino-acid preferences when estimating deep divergence times—but also shows the inherent limitations of approaches that fail to account for how these preferences shift over time.
Collapse
Affiliation(s)
- Sarah K Hilton
- Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center.,Department of Genome Sciences, University of Washington, USA
| | - Jesse D Bloom
- Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center.,Department of Genome Sciences, University of Washington, USA.,Howard Hughes Medical Institute, Seattle, WA, USA
| |
Collapse
|
11
|
Using the Mutation-Selection Framework to Characterize Selection on Protein Sequences. Genes (Basel) 2018; 9:genes9080409. [PMID: 30104502 PMCID: PMC6115872 DOI: 10.3390/genes9080409] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 08/02/2018] [Accepted: 08/09/2018] [Indexed: 12/13/2022] Open
Abstract
When mutational pressure is weak, the generative process of protein evolution involves explicit probabilities of mutations of different types coupled to their conditional probabilities of fixation dependent on selection. Establishing this mechanistic modeling framework for the detection of selection has been a goal in the field of molecular evolution. Building on a mathematical framework proposed more than a decade ago, numerous methods have been introduced in an attempt to detect and measure selection on protein sequences. In this review, we discuss the structure of the original model, subsequent advances, and the series of assumptions that these models operate under.
Collapse
|
12
|
Abstract
Genotype-phenotype relationships are notoriously complicated. Idiosyncratic interactions between specific combinations of mutations occur and are difficult to predict. Yet it is increasingly clear that many interactions can be understood in terms of global epistasis. That is, mutations may act additively on some underlying, unobserved trait, and this trait is then transformed via a nonlinear function to the observed phenotype as a result of subsequent biophysical and cellular processes. Here we infer the shape of such global epistasis in three proteins, based on published high-throughput mutagenesis data. To do so, we develop a maximum-likelihood inference procedure using a flexible family of monotonic nonlinear functions spanned by an I-spline basis. Our analysis uncovers dramatic nonlinearities in all three proteins; in some proteins a model with global epistasis accounts for virtually all of the measured variation, whereas in others we find substantial local epistasis as well. This method allows us to test hypotheses about the form of global epistasis and to distinguish variance components attributable to global epistasis, local epistasis, and measurement error.
Collapse
|
13
|
Haddox HK, Dingens AS, Hilton SK, Overbaugh J, Bloom JD. Mapping mutational effects along the evolutionary landscape of HIV envelope. eLife 2018; 7:34420. [PMID: 29590010 PMCID: PMC5910023 DOI: 10.7554/elife.34420] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2017] [Accepted: 03/15/2018] [Indexed: 01/04/2023] Open
Abstract
The immediate evolutionary space accessible to HIV is largely determined by how single amino acid mutations affect fitness. These mutational effects can shift as the virus evolves. However, the prevalence of such shifts in mutational effects remains unclear. Here, we quantify the effects on viral growth of all amino acid mutations to two HIV envelope (Env) proteins that differ at >100 residues. Most mutations similarly affect both Envs, but the amino acid preferences of a minority of sites have clearly shifted. These shifted sites usually prefer a specific amino acid in one Env, but tolerate many amino acids in the other. Surprisingly, shifts are only slightly enriched at sites that have substituted between the Envs—and many occur at residues that do not even contact substitutions. Therefore, long-range epistasis can unpredictably shift Env’s mutational tolerance during HIV evolution, although the amino acid preferences of most sites are conserved between moderately diverged viral strains. The virus that causes AIDS, or HIV, has a protein called Env on its surface, which is essential for the virus to infect cells. Env can also be recognized by the immune system, which then targets the virus for destruction or blocks it from infecting cells. Unfortunately, Env evolves very quickly, which means that HIV can evade our defenses. However, there are limits to how much this protein can change, since it still needs to perform its essential role in helping viruses enter cells. In the century since HIV first appeared in human populations, the virus has evolved considerably. There are now many HIV strains that infect people, and they bear Env proteins with substantially different sequences. However, it is not clear if these changes in sequence have resulted in Envs from distinct strains being able to tolerate different mutations. To examine this question, Haddox et al. compared how the Envs from two strains of HIV react to modifications in their sequences. They created all possible individual mutations in the proteins, and the resulting collections of mutated viruses were then tested for their ability to infect cells in the laboratory. Most mutations had similar effects in both Env proteins. This allowed Haddox et al. to identify portions of the protein that easily accommodate changes, and portions that must remain unchanged for viruses to remain infectious—at least in the laboratory. Some of these mutations are under different types of pressures when the virus faces the immune system, and those were identified using computational approaches. However, some mutations were tolerated differently by the two Env proteins. Therefore, viral strains differ in how their Env proteins can evolve. The parts of Env that showed differences in mutational tolerance between the strains were not necessarily the parts that differ in sequence. This shows that changes in sequence in one part of the protein can modify how other portions evolve. It remains to be determined whether changes in tolerance to mutations translate into differences in how the virus can escape immunity. This is an important question given that the rapid evolution of Env is a major obstacle to creating a vaccine for HIV.
Collapse
Affiliation(s)
- Hugh K Haddox
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, United States.,Molecular and Cellular Biology PhD program, University of Washington, Seattle, United States
| | - Adam S Dingens
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, United States.,Molecular and Cellular Biology PhD program, University of Washington, Seattle, United States
| | - Sarah K Hilton
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, United States.,Department of Genome Sciences, University of Washington, Seattle, United States
| | - Julie Overbaugh
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, United States.,Epidemiology Program, Fred Hutchinson Cancer Research Center, Seattle, United States
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, United States.,Department of Genome Sciences, University of Washington, Seattle, United States
| |
Collapse
|
14
|
Hilton SK, Doud MB, Bloom JD. phydms: software for phylogenetic analyses informed by deep mutational scanning. PeerJ 2017; 5:e3657. [PMID: 28785526 PMCID: PMC5541924 DOI: 10.7717/peerj.3657] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 07/15/2017] [Indexed: 11/30/2022] Open
Abstract
It has recently become possible to experimentally measure the effects of all amino-acid point mutations to proteins using deep mutational scanning. These experimental measurements can inform site-specific phylogenetic substitution models of gene evolution in nature. Here we describe software that efficiently performs analyses with such substitution models. This software, phydms, can be used to compare the results of deep mutational scanning experiments to the selection on genes in nature. Given a phylogenetic tree topology inferred with another program, phydms enables rigorous comparison of how well different experiments on the same gene capture actual natural selection. It also enables re-scaling of deep mutational scanning data to account for differences in the stringency of selection in the lab and nature. Finally, phydms can identify sites that are evolving differently in nature than expected from experiments in the lab. As data from deep mutational scanning experiments become increasingly widespread, phydms will facilitate quantitative comparison of the experimental results to the actual selection pressures shaping evolution in nature.
Collapse
Affiliation(s)
- Sarah K Hilton
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, United States of America
| | - Michael B Doud
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, United States of America.,Medical Scientist Training Program, University of Washington, Seattle, WA, United States of America
| | - Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, United States of America
| |
Collapse
|
15
|
Chan YH, Venev SV, Zeldovich KB, Matthews CR. Correlation of fitness landscapes from three orthologous TIM barrels originates from sequence and structure constraints. Nat Commun 2017; 8:14614. [PMID: 28262665 PMCID: PMC5343507 DOI: 10.1038/ncomms14614] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 01/11/2017] [Indexed: 02/07/2023] Open
Abstract
Sequence divergence of orthologous proteins enables adaptation to environmental stresses and promotes evolution of novel functions. Limits on evolution imposed by constraints on sequence and structure were explored using a model TIM barrel protein, indole-3-glycerol phosphate synthase (IGPS). Fitness effects of point mutations in three phylogenetically divergent IGPS proteins during adaptation to temperature stress were probed by auxotrophic complementation of yeast with prokaryotic, thermophilic IGPS. Analysis of beneficial mutations pointed to an unexpected, long-range allosteric pathway towards the active site of the protein. Significant correlations between the fitness landscapes of distant orthologues implicate both sequence and structure as primary forces in defining the TIM barrel fitness landscape and suggest that fitness landscapes can be translocated in sequence space. Exploration of fitness landscapes in the context of a protein fold provides a strategy for elucidating the sequence-structure-fitness relationships in other common motifs. The TIM barrel fold is an evolutionarily conserved motif found in proteins with a variety of enzymatic functions. Here the authors explore the fitness landscape of the TIM barrel protein IGPS and uncover evolutionary constraints on both sequence and structure, accompanied by long range allosteric interactions.
Collapse
Affiliation(s)
- Yvonne H Chan
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA
| | - Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, Massachusetts 01605, USA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation Street, Worcester, Massachusetts 01605, USA
| | - C Robert Matthews
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA
| |
Collapse
|
16
|
Bloom JD. Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models. Biol Direct 2017; 12:1. [PMID: 28095902 PMCID: PMC5240389 DOI: 10.1186/s13062-016-0172-z] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 12/14/2016] [Indexed: 12/23/2022] Open
Abstract
Background Sites of positive selection are identified by comparing observed evolutionary patterns to those expected under a null model for evolution in the absence of such selection. For protein-coding genes, the most common null model is that nonsynonymous and synonymous mutations fix at equal rates; this unrealistic model has limited power to detect many interesting forms of selection. Results I describe a new approach that uses a null model based on experimental measurements of a gene’s site-specific amino-acid preferences generated by deep mutational scanning in the lab. This null model makes it possible to identify both diversifying selection for repeated amino-acid change and differential selection for mutations to amino acids that are unexpected given the measurements made in the lab. I show that this approach identifies sites of adaptive substitutions in four genes (lactamase, Gal4, influenza nucleoprotein, and influenza hemagglutinin) far better than a comparable method that simply compares the rates of nonsynonymous and synonymous substitutions. Conclusions As rapid increases in biological data enable increasingly nuanced descriptions of the constraints on individual protein sites, approaches like the one here can improve our ability to identify many interesting forms of selection in natural sequences. Reviewers This article was reviewed by Sebastian Maurer-Stroh, Olivier Tenaillon, and Tal Pupko. All three reviewers are members of the Biology Direct editorial board. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0172-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, 98109, WA, USA.
| |
Collapse
|
17
|
Haddox HK, Dingens AS, Bloom JD. Experimental Estimation of the Effects of All Amino-Acid Mutations to HIV's Envelope Protein on Viral Replication in Cell Culture. PLoS Pathog 2016; 12:e1006114. [PMID: 27959955 PMCID: PMC5189966 DOI: 10.1371/journal.ppat.1006114] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Revised: 12/27/2016] [Accepted: 12/07/2016] [Indexed: 11/18/2022] Open
Abstract
HIV is notorious for its capacity to evade immunity and anti-viral drugs through rapid sequence evolution. Knowledge of the functional effects of mutations to HIV is critical for understanding this evolution. HIV's most rapidly evolving protein is its envelope (Env). Here we use deep mutational scanning to experimentally estimate the effects of all amino-acid mutations to Env on viral replication in cell culture. Most mutations are under purifying selection in our experiments, although a few sites experience strong selection for mutations that enhance HIV's replication in cell culture. We compare our experimental measurements of each site's preference for each amino acid to the actual frequencies of these amino acids in naturally occurring HIV sequences. Our measured amino-acid preferences correlate with amino-acid frequencies in natural sequences for most sites. However, our measured preferences are less concordant with natural amino-acid frequencies at surface-exposed sites that are subject to pressures absent from our experiments such as antibody selection. Our data enable us to quantify the inherent mutational tolerance of each site in Env. We show that the epitopes of broadly neutralizing antibodies have a significantly reduced inherent capacity to tolerate mutations, rigorously validating a pervasive idea in the field. Overall, our results help disentangle the role of inherent functional constraints and external selection pressures in shaping Env's evolution.
Collapse
Affiliation(s)
- Hugh K. Haddox
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Molecular and Cellular Biology PhD Program, University of Washington, Seattle, Washington, United States of America
| | - Adam S. Dingens
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Molecular and Cellular Biology PhD Program, University of Washington, Seattle, Washington, United States of America
| | - Jesse D. Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| |
Collapse
|
18
|
Rodrigue N, Lartillot N. Detecting Adaptation in Protein-Coding Genes Using a Bayesian Site-Heterogeneous Mutation-Selection Codon Substitution Model. Mol Biol Evol 2016; 34:204-214. [PMID: 27744408 PMCID: PMC5854120 DOI: 10.1093/molbev/msw220] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Codon substitution models have traditionally attempted to uncover signatures of adaptation within protein-coding genes by contrasting the rates of synonymous and non-synonymous substitutions. Another modeling approach, known as the mutation–selection framework, attempts to explicitly account for selective patterns at the amino acid level, with some approaches allowing for heterogeneity in these patterns across codon sites. Under such a model, substitutions at a given position occur at the neutral or nearly neutral rate when they are synonymous, or when they correspond to replacements between amino acids of similar fitness; substitutions from high to low (low to high) fitness amino acids have comparatively low (high) rates. Here, we study the use of such a mutation–selection framework as a null model for the detection of adaptation. Following previous works in this direction, we include a deviation parameter that has the effect of capturing the surplus, or deficit, in non-synonymous rates, relative to what would be expected under a mutation–selection modeling framework that includes a Dirichlet process approach to account for across-codon-site variation in amino acid fitness profiles. We use simulations, along with a few real data sets, to study the behavior of the approach, and find it to have good power with a low false-positive rate. Altogether, we emphasize the potential of recent mutation–selection models in the detection of adaptation, calling for further model refinements as well as large-scale applications.
Collapse
Affiliation(s)
- Nicolas Rodrigue
- Department of Biology, Institute of Biochemistry, and School of Mathematics and Statistics, Carleton University, Ottawa, Canada
| | - Nicolas Lartillot
- Université de Lyon, Laboratoire de Biométrie, Biologie Évolutive, Villeurbanne, France
| |
Collapse
|
19
|
Spielman SJ, Wilke CO. Extensively Parameterized Mutation-Selection Models Reliably Capture Site-Specific Selective Constraint. Mol Biol Evol 2016; 33:2990-3002. [PMID: 27512115 DOI: 10.1093/molbev/msw171] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The mutation-selection model of coding sequence evolution has received renewed attention for its use in estimating site-specific amino acid propensities and selection coefficient distributions. Two computationally tractable mutation-selection inference frameworks have been introduced: One framework employs a fixed-effects, highly parameterized maximum likelihood approach, whereas the other employs a random-effects Bayesian Dirichlet Process approach. While both implementations follow the same model, they appear to make distinct predictions about the distribution of selection coefficients. The fixed-effects framework estimates a large proportion of highly deleterious substitutions, whereas the random-effects framework estimates that all substitutions are either nearly neutral or weakly deleterious. It remains unknown, however, how accurately each method infers evolutionary constraints at individual sites. Indeed, selection coefficient distributions pool all site-specific inferences, thereby obscuring a precise assessment of site-specific estimates. Therefore, in this study, we use a simulation-based strategy to determine how accurately each approach recapitulates the selective constraint at individual sites. We find that the fixed-effects approach, despite its extensive parameterization, consistently and accurately estimates site-specific evolutionary constraint. By contrast, the random-effects Bayesian approach systematically underestimates the strength of natural selection, particularly for slowly evolving sites. We also find that, despite the strong differences between their inferred selection coefficient distributions, the fixed- and random-effects approaches yield surprisingly similar inferences of site-specific selective constraint. We conclude that the fixed-effects mutation-selection framework provides the more reliable software platform for model application and future development.
Collapse
Affiliation(s)
- Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX Present address: Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX
| |
Collapse
|
20
|
Abriata LA, Bovigny C, Dal Peraro M. Detection and sequence/structure mapping of biophysical constraints to protein variation in saturated mutational libraries and protein sequence alignments with a dedicated server. BMC Bioinformatics 2016; 17:242. [PMID: 27315797 PMCID: PMC4912743 DOI: 10.1186/s12859-016-1124-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 06/07/2016] [Indexed: 11/21/2022] Open
Abstract
Background Protein variability can now be studied by measuring high-resolution tolerance-to-substitution maps and fitness landscapes in saturated mutational libraries. But these rich and expensive datasets are typically interpreted coarsely, restricting detailed analyses to positions of extremely high or low variability or dubbed important beforehand based on existing knowledge about active sites, interaction surfaces, (de)stabilizing mutations, etc. Results Our new webserver PsychoProt (freely available without registration at http://psychoprot.epfl.ch or at http://lucianoabriata.altervista.org/psychoprot/index.html) helps to detect, quantify, and sequence/structure map the biophysical and biochemical traits that shape amino acid preferences throughout a protein as determined by deep-sequencing of saturated mutational libraries or from large alignments of naturally occurring variants. Discussion We exemplify how PsychoProt helps to (i) unveil protein structure-function relationships from experiments and from alignments that are consistent with structures according to coevolution analysis, (ii) recall global information about structural and functional features and identify hitherto unknown constraints to variation in alignments, and (iii) point at different sources of variation among related experimental datasets or between experimental and alignment-based data. Remarkably, metabolic costs of the amino acids pose strong constraints to variability at protein surfaces in nature but not in the laboratory. This and other differences call for caution when extrapolating results from in vitro experiments to natural scenarios in, for example, studies of protein evolution. Conclusion We show through examples how PsychoProt can be a useful tool for the broad communities of structural biology and molecular evolution, particularly for studies about protein modeling, evolution and design. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1124-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Luciano A Abriata
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland.
| | - Christophe Bovigny
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland.,Present address: Molecular Modeling Group, Swiss Institute of Bioinformatics, UNIL, Bâtiment Génopode, Lausanne, 1015, Switzerland
| | - Matteo Dal Peraro
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland
| |
Collapse
|
21
|
Accurate Measurement of the Effects of All Amino-Acid Mutations on Influenza Hemagglutinin. Viruses 2016; 8:v8060155. [PMID: 27271655 PMCID: PMC4926175 DOI: 10.3390/v8060155] [Citation(s) in RCA: 141] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2016] [Revised: 05/21/2016] [Accepted: 05/25/2016] [Indexed: 12/17/2022] Open
Abstract
Influenza genes evolve mostly via point mutations, and so knowing the effect of every amino-acid mutation provides information about evolutionary paths available to the virus. We and others have combined high-throughput mutagenesis with deep sequencing to estimate the effects of large numbers of mutations to influenza genes. However, these measurements have suffered from substantial experimental noise due to a variety of technical problems, the most prominent of which is bottlenecking during the generation of mutant viruses from plasmids. Here we describe advances that ameliorate these problems, enabling us to measure with greatly improved accuracy and reproducibility the effects of all amino-acid mutations to an H1 influenza hemagglutinin on viral replication in cell culture. The largest improvements come from using a helper virus to reduce bottlenecks when generating viruses from plasmids. Our measurements confirm at much higher resolution the results of previous studies suggesting that antigenic sites on the globular head of hemagglutinin are highly tolerant of mutations. We also show that other regions of hemagglutinin—including the stalk epitopes targeted by broadly neutralizing antibodies—have a much lower inherent capacity to tolerate point mutations. The ability to accurately measure the effects of all influenza mutations should enhance efforts to understand and predict viral evolution.
Collapse
|
22
|
Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 2016; 17:109-21. [PMID: 26781812 DOI: 10.1038/nrg.2015.18] [Citation(s) in RCA: 206] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina
| | - Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
23
|
Winkler ML, Bonomo RA. SHV-129: A Gateway to Global Suppressors in the SHV β-Lactamase Family? Mol Biol Evol 2015; 33:429-41. [PMID: 26531195 DOI: 10.1093/molbev/msv235] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Enzymes are continually evolving in response to environmental pressures. In order to increase enzyme fitness, amino acid substitutions can occur leading to a changing function or an increased stability. These evolutionary drivers determine the activity of an enzyme and its success in future generations in response to changing conditions such as environmental stressors or to improve physiological function allowing continual persistence of the enzyme. With recent warning reports on antibiotic resistance and multidrug resistant bacterial infections, understanding the evolution of β-lactamase enzymes, which are a large contributor to antibiotic resistance, is increasingly important. Here, we investigated a variant of the SHV β-lactamase identified from a clinical isolate of Escherichia coli in 2011 (SHV-129, G238S-E240K-R275L-N276D) to identify the first instance of a global suppressor substitution in the SHV β-lactamase family. We have used this enzyme to show that several evolutionary principles are conserved in different class A β-lactamases, such as active site mutations reducing stability and requiring compensating suppressor substitutions in order to ensure evolutionary persistence of a given β-lactamase. However, the pathway taken by a given β-lactamase in order to reach its evolutionary peak under a given set of conditions is likely different. We also provide further evidence for a conserved stabilizing substitution among class A β-lactamases, the back to consensus M182T substitution. In addition to expanding the spectrum of β-lactamase activity to include the hydrolysis of cefepime, the amino acid substitutions found in SHV-129 provide the enzyme with an excess of stability, which expands the evolutionary landscape of this enzyme and may result in further evolution to potentially include resistance to carbapenems or β-lactamase inhibitors.
Collapse
Affiliation(s)
- Marisa L Winkler
- Research Service, Louis Stokes Cleveland Department of Veterans Affairs Medical Center, Cleveland, OH Department of Molecular Biology and Microbiology, Case Western Reserve University
| | - Robert A Bonomo
- Research Service, Louis Stokes Cleveland Department of Veterans Affairs Medical Center, Cleveland, OH Department of Molecular Biology and Microbiology, Case Western Reserve University Department of Pharmacology, Case Western Reserve University Department of Biochemistry, Case Western Reserve University Department of Medicine, Case Western Reserve University
| |
Collapse
|
24
|
Doud MB, Ashenberg O, Bloom JD. Site-Specific Amino Acid Preferences Are Mostly Conserved in Two Closely Related Protein Homologs. Mol Biol Evol 2015; 32:2944-60. [PMID: 26226986 PMCID: PMC4626756 DOI: 10.1093/molbev/msv167] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Evolution drives changes in a protein’s sequence over time. The extent to which these changes in sequence lead to shifts in the underlying preference for each amino acid at each site is an important question with implications for comparative sequence-analysis methods, such as molecular phylogenetics. To quantify the extent that site-specific amino acid preferences shift during evolution, we performed deep mutational scanning on two homologs of human influenza nucleoprotein with 94% amino acid identity. We found that only a modest fraction of sites exhibited shifts in amino acid preferences that exceeded the noise in our experiments. Furthermore, even among sites that did exhibit detectable shifts, the magnitude tended to be small relative to differences between nonhomologous proteins. Given the limited change in amino acid preferences between these close homologs, we tested whether our measurements could inform site-specific substitution models that describe the evolution of nucleoproteins from more diverse influenza viruses. We found that site-specific evolutionary models informed by our experiments greatly outperformed nonsite-specific alternatives in fitting phylogenies of nucleoproteins from human, swine, equine, and avian influenza. Combining the experimental data from both homologs improved phylogenetic fit, partly because measurements in multiple genetic contexts better captured the evolutionary average of the amino acid preferences for sites with shifting preferences. Our results show that site-specific amino acid preferences are sufficiently conserved that measuring mutational effects in one protein provides information that can improve quantitative evolutionary modeling of nearby homologs.
Collapse
Affiliation(s)
- Michael B Doud
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA Department of Genome Sciences, University of Washington Medical Scientist Training Program, University of Washington School of Medicine
| | - Orr Ashenberg
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA Department of Genome Sciences, University of Washington
| |
Collapse
|
25
|
Software for the analysis and visualization of deep mutational scanning data. BMC Bioinformatics 2015; 16:168. [PMID: 25990960 PMCID: PMC4491876 DOI: 10.1186/s12859-015-0590-4] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 04/22/2015] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Deep mutational scanning is a technique to estimate the impacts of mutations on a gene by using deep sequencing to count mutations in a library of variants before and after imposing a functional selection. The impacts of mutations must be inferred from changes in their counts after selection. RESULTS I describe a software package, dms_tools, to infer the impacts of mutations from deep mutational scanning data using a likelihood-based treatment of the mutation counts. I show that dms_tools yields more accurate inferences on simulated data than simply calculating ratios of counts pre- and post-selection. Using dms_tools, one can infer the preference of each site for each amino acid given a single selection pressure, or assess the extent to which these preferences change under different selection pressures. The preferences and their changes can be intuitively visualized with sequence-logo-style plots created using an extension to weblogo. CONCLUSIONS dms_tools implements a statistically principled approach for the analysis and subsequent visualization of deep mutational scanning data.
Collapse
|
26
|
Abstract
Numerous computational methods exist to assess the mode and strength of natural selection in protein-coding sequences, yet how distinct methods relate to one another remains largely unknown. Here, we elucidate the relationship between two widely used phylogenetic modeling frameworks: dN/dS models and mutation-selection (MutSel) models. We derive a mathematical relationship between dN/dS and scaled selection coefficients, the focal parameters of MutSel models, and use this relationship to gain deeper insight into the behaviors, limitations, and applicabilities of these two modeling frameworks. We prove that, if all synonymous changes are neutral, standard MutSel models correspond to dN/dS ≤ 1. However, if synonymous codons differ in fitness, dN/dS can take on arbitrarily high values even if all selection is purifying. Thus, the MutSel modeling framework cannot necessarily accommodate positive, diversifying selection, while dN/dS cannot distinguish between purifying selection on synonymous codons and positive selection on amino acids. We further propose a new benchmarking strategy of dN/dS inferences against MutSel simulations and demonstrate that the widely used Goldman-Yang-style dN/dS models yield substantially biased dN/dS estimates on realistic sequence data. In contrast, the less frequently used Muse-Gaut-style models display much less bias. Strikingly, the least-biased and most precise dN/dS estimates are never found in the models with the best fit to the data, measured through both AIC and BIC scores. Thus, selecting models based on goodness-of-fit criteria can yield poor parameter estimates if the models considered do not precisely correspond to the underlying mechanism that generated the data. In conclusion, establishing mathematical links among modeling frameworks represents a novel, powerful strategy to pinpoint previously unrecognized model limitations and strengths.
Collapse
Affiliation(s)
- Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin
| |
Collapse
|
27
|
How structural and physicochemical determinants shape sequence constraints in a functional enzyme. PLoS One 2015; 10:e0118684. [PMID: 25706742 PMCID: PMC4338278 DOI: 10.1371/journal.pone.0118684] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 01/21/2015] [Indexed: 01/14/2023] Open
Abstract
The need for interfacing structural biology and biophysics to molecular evolution is being increasingly recognized. One part of the big problem is to understand how physics and chemistry shape the sequence space available to functional proteins, while satisfying the needs of biology. Here we present a quantitative, structure-based analysis of a high-resolution map describing the tolerance to all substitutions in all positions of a functional enzyme, namely a TEM lactamase previously studied through deep sequencing of mutants growing in competition experiments with selection against ampicillin. Substitutions are rarely observed within 7 Å of the active site, a stringency that is relaxed slowly and extends up to 15–20 Å, with buried residues being especially sensitive. Substitution patterns in over one third of the residues can be quantitatively modeled by monotonic dependencies on amino acid descriptors and predictions of changes in folding stability. Amino acid volume and steric hindrance shape constraints on the protein core; hydrophobicity and solubility shape constraints on hydrophobic clusters underneath the surface, and on salt bridges and polar networks at the protein surface together with charge and hydrogen bonding capacity. Amino acid solubility, flexibility and conformational descriptors also provide additional constraints at many locations. These findings provide fundamental insights into the chemistry underlying protein evolution and design, by quantitating links between sequence and different protein traits, illuminating subtle and unexpected sequence-trait relationships and pinpointing what traits are sacrificed upon gain-of-function mutation.
Collapse
|