1
|
Zhang X, Fang B, Huang YF. Transcription factor binding sites are frequently under accelerated evolution in primates. Nat Commun 2023; 14:783. [PMID: 36774380 PMCID: PMC9922303 DOI: 10.1038/s41467-023-36421-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 01/31/2023] [Indexed: 02/13/2023] Open
Abstract
Recent comparative genomic studies have identified many human accelerated elements (HARs) with elevated substitution rates in the human lineage. However, it remains unknown to what extent transcription factor binding sites (TFBSs) are under accelerated evolution in humans and other primates. Here, we introduce two pooling-based phylogenetic methods with dramatically enhanced sensitivity to examine accelerated evolution in TFBSs. Using these new methods, we show that more than 6000 TFBSs annotated in the human genome have experienced accelerated evolution in Hominini, apes, and Old World monkeys. Although these TFBSs individually show relatively weak signals of accelerated evolution, they collectively are more abundant than HARs. Also, we show that accelerated evolution in Pol III binding sites may be driven by lineage-specific positive selection, whereas accelerated evolution in other TFBSs might be driven by nonadaptive evolutionary forces. Finally, the accelerated TFBSs are enriched around developmental genes, suggesting that accelerated evolution in TFBSs may drive the divergence of developmental processes between primates.
Collapse
Affiliation(s)
- Xinru Zhang
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA. .,Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA. .,Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, PA, 16802, USA.
| | - Bohao Fang
- Department of Organismic and Evolutionary Biology and the Museum of Comparative Zoology, Harvard University, Boston, MA, 02135, USA
| | - Yi-Fei Huang
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA. .,Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
2
|
Del Amparo R, Branco C, Arenas J, Vicens A, Arenas M. Analysis of selection in protein-coding sequences accounting for common biases. Brief Bioinform 2021; 22:6105943. [PMID: 33479739 DOI: 10.1093/bib/bbaa431] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/17/2020] [Accepted: 12/22/2020] [Indexed: 12/16/2022] Open
Abstract
The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Jesús Arenas
- Unit of Microbiology and Immunology, University of Zaragoza, 50013 Zaragoza, Spain
| | - Alberto Vicens
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
3
|
Johnson MM, Wilke CO. Site-Specific Amino Acid Distributions Follow a Universal Shape. J Mol Evol 2020; 88:731-741. [PMID: 33230664 PMCID: PMC7717668 DOI: 10.1007/s00239-020-09976-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 11/17/2020] [Indexed: 11/25/2022]
Abstract
In many applications of evolutionary inference, a model of protein evolution needs to be fitted to the amino acid variation at individual sites in a multiple sequence alignment. Most existing models fall into one of two extremes: Either they provide a coarse-grained description that lacks biophysical realism (e.g., dN/dS models), or they require a large number of parameters to be fitted (e.g., mutation-selection models). Here, we ask whether a middle ground is possible: Can we obtain a realistic description of site-specific amino acid frequencies while severely restricting the number of free parameters in the model? We show that a distribution with a single free parameter can accurately capture the variation in amino acid frequency at most sites in an alignment, as long as we are willing to restrict our analysis to predicting amino acid frequencies by rank rather than by amino acid identity. This result holds equally well both in alignments of empirical protein sequences and of sequences evolved under a biophysically realistic all-atom force field. Our analysis reveals a near universal shape of the frequency distributions of amino acids. This insight has the potential to lead to new models of evolution that have both increased realism and a limited number of free parameters.
Collapse
Affiliation(s)
- Mackenzie M Johnson
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA.
| |
Collapse
|
4
|
Dukler N, Huang YF, Siepel A. Phylogenetic Modeling of Regulatory Element Turnover Based on Epigenomic Data. Mol Biol Evol 2020; 37:2137-2152. [PMID: 32176292 PMCID: PMC7306682 DOI: 10.1093/molbev/msaa073] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Evolutionary changes in gene expression are often driven by gains and losses of cis-regulatory elements (CREs). The dynamics of CRE evolution can be examined using multispecies epigenomic data, but so far such analyses have generally been descriptive and model-free. Here, we introduce a probabilistic modeling framework for the evolution of CREs that operates directly on raw chromatin immunoprecipitation and sequencing (ChIP-seq) data and fully considers the phylogenetic relationships among species. Our framework includes a phylogenetic hidden Markov model, called epiPhyloHMM, for identifying the locations of multiply aligned CREs, and a combined phylogenetic and generalized linear model, called phyloGLM, for accounting for the influence of a rich set of genomic features in describing their evolutionary dynamics. We apply these methods to previously published ChIP-seq data for the H3K4me3 and H3K27ac histone modifications in liver tissue from nine mammals. We find that enhancers are gained and lost during mammalian evolution at about twice the rate of promoters, and that turnover rates are negatively correlated with DNA sequence conservation, expression level, and tissue breadth, and positively correlated with distance from the transcription start site, consistent with previous findings. In addition, we find that the predicted dosage sensitivity of target genes positively correlates with DNA sequence constraint in CREs but not with turnover rates, perhaps owing to differences in the effect sizes of the relevant mutations. Altogether, our probabilistic modeling framework enables a variety of powerful new analyses.
Collapse
Affiliation(s)
- Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
- Physiology, Biophysics, and Systems Biology, Weill Cornell Medical College, New York, NY
| | - Yi-Fei Huang
- Department of Biology and Huck Institute of Life Sciences, Pennsylvania State University, University Park, PA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| |
Collapse
|
5
|
Klingen TR, Loers J, Stanelle-Bertram S, Gabriel G, McHardy AC. Structures and functions linked to genome-wide adaptation of human influenza A viruses. Sci Rep 2019; 9:6267. [PMID: 31000776 PMCID: PMC6472403 DOI: 10.1038/s41598-019-42614-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 03/27/2019] [Indexed: 11/12/2022] Open
Abstract
Human influenza A viruses elicit short-term respiratory infections with considerable mortality and morbidity. While H3N2 viruses circulate for more than 50 years, the recent introduction of pH1N1 viruses presents an excellent opportunity for a comparative analysis of the genome-wide evolutionary forces acting on both subtypes. Here, we inferred patches of sites relevant for adaptation, i.e. being under positive selection, on eleven viral protein structures, from all available data since 1968 and correlated these with known functional properties. Overall, pH1N1 have more patches than H3N2 viruses, especially in the viral polymerase complex, while antigenic evolution is more apparent for H3N2 viruses. In both subtypes, NS1 has the highest patch and patch site frequency, indicating that NS1-mediated viral attenuation of host inflammatory responses is a continuously intensifying process, elevated even in the longtime-circulating subtype H3N2. We confirmed the resistance-causing effects of two pH1N1 changes against oseltamivir in NA activity assays, demonstrating the value of the resource for discovering functionally relevant changes. Our results represent an atlas of protein regions and sites with links to host adaptation, antiviral drug resistance and immune evasion for both subtypes for further study.
Collapse
MESH Headings
- Drug Resistance, Viral/genetics
- Evolution, Molecular
- Genome, Viral/genetics
- Humans
- Influenza A Virus, H1N1 Subtype/genetics
- Influenza A Virus, H1N1 Subtype/pathogenicity
- Influenza A Virus, H3N2 Subtype/genetics
- Influenza A Virus, H3N2 Subtype/pathogenicity
- Influenza, Human/genetics
- Influenza, Human/pathology
- Influenza, Human/virology
- Oseltamivir/therapeutic use
- Respiratory Tract Infections/genetics
- Respiratory Tract Infections/virology
- Viral Nonstructural Proteins/genetics
- Virus Replication/genetics
Collapse
Affiliation(s)
- Thorsten R Klingen
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research (HZI), Braunschweig, Germany
| | - Jens Loers
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research (HZI), Braunschweig, Germany
| | | | - Gülsah Gabriel
- Heinrich Pette Institute, Leibniz Institute for Experimental Virology, Hamburg, Germany
- University of Veterinary Medicine, Hannover, Germany
| | - Alice C McHardy
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research (HZI), Braunschweig, Germany.
- German Center for Infection Research (DZIF), Braunschweig, Germany.
| |
Collapse
|
6
|
Kuzminkova AA, Sokol AD, Ushakova KE, Popadin KY, Gunbin KV. mtProtEvol: the resource presenting molecular evolution analysis of proteins involved in the function of Vertebrate mitochondria. BMC Evol Biol 2019; 19:47. [PMID: 30813887 PMCID: PMC6391778 DOI: 10.1186/s12862-019-1371-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Heterotachy is the variation in the evolutionary rate of aligned sites in different parts of the phylogenetic tree. It occurs mainly due to epistatic interactions among the substitutions, which are highly complex and make it difficult to study protein evolution. The vast majority of computational evolutionary approaches for studying these epistatic interactions or their evolutionary consequences in proteins require high computational time. However, recently, it has been shown that the evolution of residue solvent accessibility (RSA) is tightly linked with changes in protein fitness and intra-protein epistatic interactions. This provides a computationally fast alternative, based on comparison of evolutionary rates of amino acid replacements with the rates of RSA evolutionary changes in order to recognize any shifts in epistatic interaction. RESULTS Based on RSA information, data randomization and phylogenetic approaches, we constructed a software pipeline, which can be used to analyze the evolutionary consequences of intra-protein epistatic interactions with relatively low computational time. We analyzed the evolution of 512 protein families tightly linked to mitochondrial function in Vertebrates and created "mtProtEvol", the web resource with data on protein evolution. In strict agreement with lifespan and metabolic rate data, we demonstrated that different functional categories of mitochondria-related proteins subjected to selection on accelerated and decelerated RSA rates in rodents and primates. For example, accelerated RSA evolution in rodents has been shown for Krebs cycle enzymes, respiratory chain and reactive oxygen species metabolism, while in primates these functions are stress-response, translation and mtDNA integrity. Decelerated RSA evolution in rodents has been demonstrated for translational machinery and oxidative stress response components. CONCLUSIONS mtProtEvol is an interactive resource focused on evolutionary analysis of epistatic interactions in protein families involved in Vertebrata mitochondria function and available at http://bioinfodbs.kantiana.ru/mtProtEvol /. This resource and the devised software pipeline may be useful tool for researchers in area of protein evolution.
Collapse
Affiliation(s)
- Anastasia A. Kuzminkova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Anastasia D. Sokol
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Kristina E. Ushakova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Konstantin Yu. Popadin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Konstantin V. Gunbin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center of Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
7
|
Cole CT, Ingvarsson PK. Pathway position constrains the evolution of an ecologically important pathway in aspens (Populus tremula L.). Mol Ecol 2018; 27:3317-3330. [PMID: 29972878 DOI: 10.1111/mec.14785] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Revised: 01/30/2018] [Accepted: 02/20/2018] [Indexed: 12/22/2022]
Abstract
Many ecological interactions of aspens and their relatives (Populus spp.) are affected by products of the phenylpropanoid pathway synthesizing condensed tannins (CTs), whose production involves trade-offs with other ecologically important compounds and with growth. Genes of this pathway are candidates for investigating the role of selection on ecologically important, polygenic traits. We analysed sequences from 25 genes representing 10 steps of the CT synthesis pathway, which produces CTs used in defence and lignins used for growth, in 12 individuals of European aspen (Populus tremula). We compared these to homologs from P. trichocarpa, to a control set of 77 P. tremula genes, to genome-wide resequencing data and to RNA-seq expression levels, in order to identify signatures of selection distinct from those of demography. In Populus, pathway position exerts a strong influence on the evolution of these genes. Nonsynonymous diversity, divergence and allele frequency shifts (Tajima's D) were much lower than for synonymous measures. Expression levels were higher, and the direction of selection more negative, for upstream genes than for those downstream. Selective constraints act with increasing intensity on upstream genes, despite the presence of multiple paralogs in most gene families. Pleiotropy, expression level, flux control and codon bias appear to interact in determining levels and patterns of variation in genes of this pathway, whose products mediate a wide array of ecological interactions for this widely distributed species.
Collapse
Affiliation(s)
- Christopher T Cole
- Division of Science and Mathematics, University of Minnesota, Morris, Morris, Minnesota
| | - Pär K Ingvarsson
- Department of Ecology and Environmental Science, Umeå University, Umeå, Sweden
| |
Collapse
|
8
|
Chi PB, Kim D, Lai JK, Bykova N, Weber CC, Kubelka J, Liberles DA. A new parameter-rich structure-aware mechanistic model for amino acid substitution during evolution. Proteins 2017; 86:218-228. [PMID: 29178386 DOI: 10.1002/prot.25429] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 11/14/2017] [Accepted: 11/22/2017] [Indexed: 02/06/2023]
Abstract
Improvements in the description of amino acid substitution are required to develop better pseudo-energy-based protein structure-aware models for use in phylogenetic studies. These models are used to characterize the probabilities of amino acid substitution and enable better simulation of protein sequences over a phylogeny. A better characterization of amino acid substitution probabilities in turn enables numerous downstream applications, like detecting positive selection, ancestral sequence reconstruction, and evolutionarily-motivated protein engineering. Many existing Markov models for amino acid substitution in molecular evolution disregard molecular structure and describe the amino acid substitution process over longer evolutionary periods poorly. Here, we present a new model upgraded with a site-specific parameterization of pseudo-energy terms in a coarse-grained force field, which describes local heterogeneity in physical constraints on amino acid substitution better than a previous pseudo-energy-based model with minimum cost in runtime. The importance of each weight term parameterization in characterizing underlying features of the site, including contact number, solvent accessibility, and secondary structural elements was evaluated, returning both expected and biologically reasonable relationships between model parameters. This results in the acceptance of proposed amino acid substitutions that more closely resemble those observed site-specific frequencies in gene family alignments. The modular site-specific pseudo-energy function is made available for download through the following website: https://liberles.cst.temple.edu/Software/CASS/index.html.
Collapse
Affiliation(s)
- Peter B Chi
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122.,Department of Mathematics and Computer Science, Ursinus College, Collegeville, Pennsylvania, 19426
| | - Dohyup Kim
- Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, 82071
| | - Jason K Lai
- Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, 82071
| | - Nadia Bykova
- Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, 82071.,Faculty of Bioengineering and Bioinformatics, Moscow State University, Moscow, 119234, Russia
| | - Claudia C Weber
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
| | - Jan Kubelka
- Department of Chemistry, University of Wyoming, Laramie, Wyoming, 82071
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122.,Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, 82071
| |
Collapse
|
9
|
Echave J, Wilke CO. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 2017; 46:85-103. [PMID: 28301766 DOI: 10.1146/annurev-biophys-070816-033819] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
For decades, rates of protein evolution have been interpreted in terms of the vague concept of functional importance. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating these sites has a large impact on protein structure and stability. In this article, we review the studies in the emerging field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina; .,Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Texas 78712;
| |
Collapse
|
10
|
Bloom JD. Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models. Biol Direct 2017; 12:1. [PMID: 28095902 PMCID: PMC5240389 DOI: 10.1186/s13062-016-0172-z] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 12/14/2016] [Indexed: 12/23/2022] Open
Abstract
Background Sites of positive selection are identified by comparing observed evolutionary patterns to those expected under a null model for evolution in the absence of such selection. For protein-coding genes, the most common null model is that nonsynonymous and synonymous mutations fix at equal rates; this unrealistic model has limited power to detect many interesting forms of selection. Results I describe a new approach that uses a null model based on experimental measurements of a gene’s site-specific amino-acid preferences generated by deep mutational scanning in the lab. This null model makes it possible to identify both diversifying selection for repeated amino-acid change and differential selection for mutations to amino acids that are unexpected given the measurements made in the lab. I show that this approach identifies sites of adaptive substitutions in four genes (lactamase, Gal4, influenza nucleoprotein, and influenza hemagglutinin) far better than a comparable method that simply compares the rates of nonsynonymous and synonymous substitutions. Conclusions As rapid increases in biological data enable increasingly nuanced descriptions of the constraints on individual protein sites, approaches like the one here can improve our ability to identify many interesting forms of selection in natural sequences. Reviewers This article was reviewed by Sebastian Maurer-Stroh, Olivier Tenaillon, and Tal Pupko. All three reviewers are members of the Biology Direct editorial board. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0172-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, 98109, WA, USA.
| |
Collapse
|
11
|
Meyer AG, Wilke CO. The utility of protein structure as a predictor of site-wise dN/dS varies widely among HIV-1 proteins. J R Soc Interface 2016; 12:20150579. [PMID: 26468068 DOI: 10.1098/rsif.2015.0579] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein structure acts as a general constraint on the evolution of viral proteins. One widely recognized structural constraint explaining evolutionary variation among sites is the relative solvent accessibility (RSA) of residues in the folded protein. In influenza virus, the distance from functional sites has been found to explain an additional portion of the evolutionary variation in the external antigenic proteins. However, to what extent RSA and distance from a reference site in the protein can be used more generally to explain protein adaptation in other viruses and in the different proteins of any given virus remains an open question. To address this question, we have carried out an analysis of the distribution and structural predictors of site-wise dN/dS in HIV-1. Our results indicate that the distribution of dN/dS in HIV follows a smooth gamma distribution, with no special enrichment or depletion of sites with dN/dS at or above one. The variation in dN/dS can be partially explained by RSA and distance from a reference site in the protein, but these structural constraints do not act uniformly among the different HIV-1 proteins. Structural constraints are highly predictive in just one of the three enzymes and one of three structural proteins in HIV-1. For these two proteins, the protease enzyme and the gp120 structural protein, structure explains between 30 and 40% of the variation in dN/dS. Finally, for the gp120 protein of the receptor-binding complex, we also find that glycosylation sites explain just 2% of the variation in dN/dS and do not explain gp120 evolution independently of either RSA or distance from the apical surface.
Collapse
Affiliation(s)
- Austin G Meyer
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA School of Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA
| | - Claus O Wilke
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
12
|
McWhite CD, Meyer AG, Wilke CO. Sequence amplification via cell passaging creates spurious signals of positive adaptation in influenza virus H3N2 hemagglutinin. Virus Evol 2016; 2:vew026. [PMID: 27713835 PMCID: PMC5049878 DOI: 10.1093/ve/vew026] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Clinical influenza A virus isolates are frequently not sequenced directly. Instead, a majority of these isolates (~70% in 2015) are first subjected to passaging for amplification, most commonly in non-human cell culture. Here, we find that this passaging leaves distinct signals of adaptation, which can confound evolutionary analyses of the viral sequences. We find distinct patterns of adaptation to Madin-Darby (MDCK) and monkey cell culture absent from unpassaged hemagglutinin sequences. These patterns also dominate pooled datasets not separated by passaging type, and they increase in proportion to the number of passages performed. By contrast, MDCK-SIAT1 passaged sequences seem mostly (but not entirely) free of passaging adaptations. Contrary to previous studies, we find that using only internal branches of influenza virus phylogenetic trees is insufficient to correct for passaging artifacts. These artifacts can only be safely avoided by excluding passaged sequences entirely from subsequent analysis. We conclude that future influenza virus evolutionary analyses should appropriately control for potentially confounding effects of passaging adaptations.
Collapse
Affiliation(s)
- Claire D. McWhite
- Center for Systems and Synthetic Biology and Institute for Cellular and
Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Molecular Biosciences, The University of Texas at Austin,
Austin, TX 78712, USA
| | - Austin G. Meyer
- Center for Systems and Synthetic Biology and Institute for Cellular and
Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
- Center for Computational Biology and Bioinformatics, The University of Texas
at Austin, Austin, TX 78712, USA
- Department of Integrative Biology, The University of Texas at Austin,
Austin, TX 78712, USA
| | - Claus O. Wilke
- Center for Systems and Synthetic Biology and Institute for Cellular and
Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
- Center for Computational Biology and Bioinformatics, The University of Texas
at Austin, Austin, TX 78712, USA
- Department of Integrative Biology, The University of Texas at Austin,
Austin, TX 78712, USA
| |
Collapse
|
13
|
Abriata LA, Bovigny C, Dal Peraro M. Detection and sequence/structure mapping of biophysical constraints to protein variation in saturated mutational libraries and protein sequence alignments with a dedicated server. BMC Bioinformatics 2016; 17:242. [PMID: 27315797 PMCID: PMC4912743 DOI: 10.1186/s12859-016-1124-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 06/07/2016] [Indexed: 11/21/2022] Open
Abstract
Background Protein variability can now be studied by measuring high-resolution tolerance-to-substitution maps and fitness landscapes in saturated mutational libraries. But these rich and expensive datasets are typically interpreted coarsely, restricting detailed analyses to positions of extremely high or low variability or dubbed important beforehand based on existing knowledge about active sites, interaction surfaces, (de)stabilizing mutations, etc. Results Our new webserver PsychoProt (freely available without registration at http://psychoprot.epfl.ch or at http://lucianoabriata.altervista.org/psychoprot/index.html) helps to detect, quantify, and sequence/structure map the biophysical and biochemical traits that shape amino acid preferences throughout a protein as determined by deep-sequencing of saturated mutational libraries or from large alignments of naturally occurring variants. Discussion We exemplify how PsychoProt helps to (i) unveil protein structure-function relationships from experiments and from alignments that are consistent with structures according to coevolution analysis, (ii) recall global information about structural and functional features and identify hitherto unknown constraints to variation in alignments, and (iii) point at different sources of variation among related experimental datasets or between experimental and alignment-based data. Remarkably, metabolic costs of the amino acids pose strong constraints to variability at protein surfaces in nature but not in the laboratory. This and other differences call for caution when extrapolating results from in vitro experiments to natural scenarios in, for example, studies of protein evolution. Conclusion We show through examples how PsychoProt can be a useful tool for the broad communities of structural biology and molecular evolution, particularly for studies about protein modeling, evolution and design. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1124-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Luciano A Abriata
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland.
| | - Christophe Bovigny
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland.,Present address: Molecular Modeling Group, Swiss Institute of Bioinformatics, UNIL, Bâtiment Génopode, Lausanne, 1015, Switzerland
| | - Matteo Dal Peraro
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland
| |
Collapse
|
14
|
Shahmoradi A, Wilke CO. Dissecting the roles of local packing density and longer-range effects in protein sequence evolution. Proteins 2016; 84:841-54. [PMID: 26990194 PMCID: PMC5292938 DOI: 10.1002/prot.25034] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2015] [Revised: 02/01/2016] [Accepted: 02/24/2016] [Indexed: 11/07/2022]
Abstract
What are the structural determinants of protein sequence evolution? A number of site-specific structural characteristics have been proposed, most of which are broadly related to either the density of contacts or the solvent accessibility of individual residues. Most importantly, there has been disagreement in the literature over the relative importance of solvent accessibility and local packing density for explaining site-specific sequence variability in proteins. We show that this discussion has been confounded by the definition of local packing density. The most commonly used measures of local packing, such as contact number and the weighted contact number, represent the combined effects of local packing density and longer-range effects. As an alternative, we propose a truly local measure of packing density around a single residue, based on the Voronoi cell volume. We show that the Voronoi cell volume, when calculated relative to the geometric center of amino-acid side chains, behaves nearly identically to the relative solvent accessibility, and each individually can explain, on average, approximately 34% of the site-specific variation in evolutionary rate in a data set of 209 enzymes. An additional 10% of variation can be explained by nonlocal effects that are captured in the weighted contact number. Consequently, evolutionary variation at a site is determined by the combined effects of the immediate amino-acid neighbors of that site and effects mediated by more distant amino acids. We conclude that instead of contrasting solvent accessibility and local packing density, future research should emphasize on the relative importance of immediate contacts and longer-range effects on evolutionary variation. Proteins 2016; 84:841-854. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Amir Shahmoradi
- Department of Physics, The University of Texas at Austin
- Center for Computational Biology and Bioinformatics, The University
of Texas at Austin
- Institute for Cellular and Molecular Biology, The University of
Texas at Austin
| | - Claus O. Wilke
- Center for Computational Biology and Bioinformatics, The University
of Texas at Austin
- Institute for Cellular and Molecular Biology, The University of
Texas at Austin
- Department of Integrative Biology, The University of Texas at
Austin
| |
Collapse
|
15
|
Genge CE, Stevens CM, Davidson WS, Singh G, Peter Tieleman D, Tibbits GF. Functional Divergence in Teleost Cardiac Troponin Paralogs Guides Variation in the Interaction of TnI Switch Region with TnC. Genome Biol Evol 2016; 8:994-1011. [PMID: 26979795 PMCID: PMC4860682 DOI: 10.1093/gbe/evw044] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Gene duplication results in extra copies of genes that must coevolve with their interacting partners in multimeric protein complexes. The cardiac troponin (Tn) complex, containing TnC, TnI, and TnT, forms a distinct functional unit critical for the regulation of cardiac muscle contraction. In teleost fish, the function of the Tn complex is modified by the consequences of differential expression of paralogs in response to environmental thermal challenges. In this article, we focus on the interaction between TnI and TnC, coded for by genes that have independent evolutionary origins, but the co-operation of their protein products has necessitated coevolution. In this study, we characterize functional divergence of TnC and TnI paralogs, specifically the interrelated roles of regulatory subfunctionalization and structural subfunctionalization. We determined that differential paralog transcript expression in response to temperature acclimation results in three combinations of TnC and TnI in the zebrafish heart: TnC1a/TnI1.1, TnC1b/TnI1.1, and TnC1a/TnI1.5. Phylogenetic analysis of these highly conserved proteins identified functionally divergent residues in TnI and TnC. The structural and functional effect of these Tn combinations was modeled with molecular dynamics simulation to link divergent sites to changes in interaction strength. Functional divergence in TnI and TnC were not limited to the residues involved with TnC/TnI switch interaction, which emphasizes the complex nature of Tn function. Patterns in domain-specific divergent selection and interaction energies suggest that substitutions in the TnI switch region are crucial to modifying TnI/TnC function to maintain cardiac contraction with temperature changes. This integrative approach introduces Tn as a model of functional divergence that guides the coevolution of interacting proteins.
Collapse
Affiliation(s)
- Christine E Genge
- Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Charles M Stevens
- Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, British Columbia, Canada Cardiovascular Sciences, Child and Family Research Institute, Vancouver, British Columbia, Canada
| | - William S Davidson
- Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Gurpreet Singh
- Department of Biological Sciences and Centre for Molecular Simulation, University of Calgary, Alberta, Canada
| | - D Peter Tieleman
- Department of Biological Sciences and Centre for Molecular Simulation, University of Calgary, Alberta, Canada
| | - Glen F Tibbits
- Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, British Columbia, Canada Cardiovascular Sciences, Child and Family Research Institute, Vancouver, British Columbia, Canada Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
16
|
Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 2016; 17:109-21. [PMID: 26781812 DOI: 10.1038/nrg.2015.18] [Citation(s) in RCA: 180] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina
| | - Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
17
|
Kratsch C, Klingen TR, Mümken L, Steinbrück L, McHardy AC. Determination of antigenicity-altering patches on the major surface protein of human influenza A/H3N2 viruses. Virus Evol 2016; 2:vev025. [PMID: 27774294 PMCID: PMC4989879 DOI: 10.1093/ve/vev025] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Human influenza viruses are rapidly evolving RNA viruses that cause short-term respiratory infections with substantial morbidity and mortality in annual epidemics. Uncovering the general principles of viral coevolution with human hosts is important for pathogen surveillance and vaccine design. Protein regions are an appropriate model for the interactions between two macromolecules, but the currently used epitope definition for the major antigen of influenza viruses, namely hemagglutinin, is very broad. Here, we combined genetic, evolutionary, antigenic, and structural information to determine the most relevant regions of the hemagglutinin of human influenza A/H3N2 viruses for interaction with human immunoglobulins. We estimated the antigenic weights of amino acid changes at individual sites from hemagglutination inhibition data using antigenic tree inference followed by spatial clustering of antigenicity-altering protein sites on the protein structure. This approach determined six relevant areas (patches) for antigenic variation that had a key role in the past antigenic evolution of the viruses. Previous transitions between successive predominating antigenic types of H3N2 viruses always included amino acid changes in either the first or second antigenic patch. Interestingly, there was only partial overlap between the antigenic patches and the patches under strong positive selection. Therefore, besides alterations of antigenicity, other interactions with the host may shape the evolution of human influenza A/H3N2 viruses.
Collapse
Affiliation(s)
- Christina Kratsch
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Düsseldorf, Germany and
| | - Thorsten R. Klingen
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Düsseldorf, Germany and
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
| | - Linda Mümken
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Düsseldorf, Germany and
| | - Lars Steinbrück
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Düsseldorf, Germany and
| | - Alice C. McHardy
- Department for Algorithmic Bioinformatics, Heinrich Heine University, Düsseldorf, Germany and
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
| |
Collapse
|
18
|
Wu NC, Olson CA, Du Y, Le S, Tran K, Remenyi R, Gong D, Al-Mawsawi LQ, Qi H, Wu TT, Sun R. Functional Constraint Profiling of a Viral Protein Reveals Discordance of Evolutionary Conservation and Functionality. PLoS Genet 2015; 11:e1005310. [PMID: 26132554 PMCID: PMC4489113 DOI: 10.1371/journal.pgen.1005310] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 05/28/2015] [Indexed: 12/31/2022] Open
Abstract
Viruses often encode proteins with multiple functions due to their compact genomes. Existing approaches to identify functional residues largely rely on sequence conservation analysis. Inferring functional residues from sequence conservation can produce false positives, in which the conserved residues are functionally silent, or false negatives, where functional residues are not identified since they are species-specific and therefore non-conserved. Furthermore, the tedious process of constructing and analyzing individual mutations limits the number of residues that can be examined in a single study. Here, we developed a systematic approach to identify the functional residues of a viral protein by coupling experimental fitness profiling with protein stability prediction using the influenza virus polymerase PA subunit as the target protein. We identified a significant number of functional residues that were influenza type-specific and were evolutionarily non-conserved among different influenza types. Our results indicate that type-specific functional residues are prevalent and may not otherwise be identified by sequence conservation analysis alone. More importantly, this technique can be adapted to any viral (and potentially non-viral) protein where structural information is available. The analysis of sequence conservation is a common approach to identify functional residues within a protein. However, not all functional residues are conserved as natural evolution and species diversification permit continuous innovation of protein functionality through the retention of advantageous mutations. Non-conserved functional residues, which are often species-specific, may not be identified by conventional analysis of sequence conservation despite being biologically important. Here we described a novel approach to identify functional residues within a protein by coupling a high-throughput experimental fitness profiling approach with computational protein modeling. Our methodology is independent of sequence conservation and is applicable to any protein where structural information is available. In this study, we systematically mapped the functional residues on the influenza A PA protein and revealed that non-conserved functional residues are prevalent. Our results not only have significant implication on how functionality evolves during natural evolution, but also highlight the caveats when applying conservation-based approaches to identify functional residues within a protein.
Collapse
Affiliation(s)
- Nicholas C. Wu
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - C. Anders Olson
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Yushen Du
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Shuai Le
- Department of Microbiology, Third Military Medical University, Chongqing, 400038, China
| | - Kevin Tran
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Roland Remenyi
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Danyang Gong
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Laith Q. Al-Mawsawi
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Hangfei Qi
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Ting-Ting Wu
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, California, United States of America,
- AIDS Institute, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
19
|
Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 2015; 11:20140419. [PMID: 25165599 DOI: 10.1098/rsif.2014.0419] [Citation(s) in RCA: 161] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Collapse
Affiliation(s)
- Tobias Sikosek
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
20
|
Meyer AG, Wilke CO. Geometric Constraints Dominate the Antigenic Evolution of Influenza H3N2 Hemagglutinin. PLoS Pathog 2015; 11:e1004940. [PMID: 26020774 PMCID: PMC4447415 DOI: 10.1371/journal.ppat.1004940] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 05/07/2015] [Indexed: 11/18/2022] Open
Abstract
We have carried out a comprehensive analysis of the determinants of human influenza A H3 hemagglutinin evolution. We consider three distinct predictors of evolutionary variation at individual sites: solvent accessibility (as a proxy for protein fold stability and/or conservation), Immune Epitope Database (IEDB) epitope sites (as a proxy for host immune bias), and proximity to the receptor-binding region (as a proxy for one of the functions of hemagglutinin-to bind sialic acid). Individually, these quantities explain approximately 15% of the variation in site-wise dN/dS. In combination, solvent accessibility and proximity explain 32% of the variation in dN/dS; incorporating IEDB epitope sites into the model adds only an additional 2 percentage points. Thus, while solvent accessibility and proximity perform largely as independent predictors of evolutionary variation, they each overlap with the epitope-sites predictor. Furthermore, we find that the historical H3 epitope sites, which date back to the 1980s and 1990s, only partially overlap with the experimental sites from the IEDB, and display similar overlap in predictive power when combined with solvent accessibility and proximity. We also find that sites with dN/dS > 1, i.e., the sites most likely driving seasonal immune escape, are not correctly predicted by either historical or IEDB epitope sites, but only by proximity to the receptor-binding region. In summary, a simple geometric model of HA evolution outperforms a model based on epitope sites. These results suggest that either the available epitope sites do not accurately represent the true influenza antigenic sites or that host immune bias may be less important for influenza evolution than commonly thought.
Collapse
MESH Headings
- Antibodies, Viral/immunology
- Antigens, Viral/immunology
- Binding Sites
- Databases, Factual
- Epitope Mapping
- Epitopes/immunology
- Evolution, Molecular
- Genetic Variation/genetics
- Hemagglutinin Glycoproteins, Influenza Virus/chemistry
- Hemagglutinin Glycoproteins, Influenza Virus/genetics
- Hemagglutinin Glycoproteins, Influenza Virus/immunology
- Humans
- Influenza A Virus, H3N2 Subtype/immunology
- Influenza, Human/genetics
- Influenza, Human/immunology
- Influenza, Human/virology
- Protein Folding
- Protein Stability
- Sialic Acids/metabolism
- Solvents/chemistry
Collapse
Affiliation(s)
- Austin G. Meyer
- Department of Integrative Biology, Institute for Cellular and Molecular Biology and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- School of Medicine, Texas Tech University Health Sciences Center, Lubbock, Texas, United States of America
| | - Claus O. Wilke
- Department of Integrative Biology, Institute for Cellular and Molecular Biology and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
21
|
Abstract
Numerous computational methods exist to assess the mode and strength of natural selection in protein-coding sequences, yet how distinct methods relate to one another remains largely unknown. Here, we elucidate the relationship between two widely used phylogenetic modeling frameworks: dN/dS models and mutation-selection (MutSel) models. We derive a mathematical relationship between dN/dS and scaled selection coefficients, the focal parameters of MutSel models, and use this relationship to gain deeper insight into the behaviors, limitations, and applicabilities of these two modeling frameworks. We prove that, if all synonymous changes are neutral, standard MutSel models correspond to dN/dS ≤ 1. However, if synonymous codons differ in fitness, dN/dS can take on arbitrarily high values even if all selection is purifying. Thus, the MutSel modeling framework cannot necessarily accommodate positive, diversifying selection, while dN/dS cannot distinguish between purifying selection on synonymous codons and positive selection on amino acids. We further propose a new benchmarking strategy of dN/dS inferences against MutSel simulations and demonstrate that the widely used Goldman-Yang-style dN/dS models yield substantially biased dN/dS estimates on realistic sequence data. In contrast, the less frequently used Muse-Gaut-style models display much less bias. Strikingly, the least-biased and most precise dN/dS estimates are never found in the models with the best fit to the data, measured through both AIC and BIC scores. Thus, selecting models based on goodness-of-fit criteria can yield poor parameter estimates if the models considered do not precisely correspond to the underlying mechanism that generated the data. In conclusion, establishing mathematical links among modeling frameworks represents a novel, powerful strategy to pinpoint previously unrecognized model limitations and strengths.
Collapse
Affiliation(s)
- Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin
| |
Collapse
|
22
|
How structural and physicochemical determinants shape sequence constraints in a functional enzyme. PLoS One 2015; 10:e0118684. [PMID: 25706742 PMCID: PMC4338278 DOI: 10.1371/journal.pone.0118684] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 01/21/2015] [Indexed: 01/14/2023] Open
Abstract
The need for interfacing structural biology and biophysics to molecular evolution is being increasingly recognized. One part of the big problem is to understand how physics and chemistry shape the sequence space available to functional proteins, while satisfying the needs of biology. Here we present a quantitative, structure-based analysis of a high-resolution map describing the tolerance to all substitutions in all positions of a functional enzyme, namely a TEM lactamase previously studied through deep sequencing of mutants growing in competition experiments with selection against ampicillin. Substitutions are rarely observed within 7 Å of the active site, a stringency that is relaxed slowly and extends up to 15–20 Å, with buried residues being especially sensitive. Substitution patterns in over one third of the residues can be quantitatively modeled by monotonic dependencies on amino acid descriptors and predictions of changes in folding stability. Amino acid volume and steric hindrance shape constraints on the protein core; hydrophobicity and solubility shape constraints on hydrophobic clusters underneath the surface, and on salt bridges and polar networks at the protein surface together with charge and hydrogen bonding capacity. Amino acid solubility, flexibility and conformational descriptors also provide additional constraints at many locations. These findings provide fundamental insights into the chemistry underlying protein evolution and design, by quantitating links between sequence and different protein traits, illuminating subtle and unexpected sequence-trait relationships and pinpointing what traits are sacrificed upon gain-of-function mutation.
Collapse
|
23
|
Meyer AG, Spielman SJ, Bedford T, Wilke CO. Time dependence of evolutionary metrics during the 2009 pandemic influenza virus outbreak. Virus Evol 2015; 1:vev006. [PMID: 26770819 PMCID: PMC4710376 DOI: 10.1093/ve/vev006] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
With the expansion of DNA sequencing technology, quantifying evolution in emerging viral outbreaks has become an important tool for scientists and public health officials. Although it is known that the degree of sequence divergence significantly affects the calculation of evolutionary metrics in viral outbreaks, the extent and duration of this effect during an actual outbreak remains unclear. We have analyzed how limited divergence time during an early viral outbreak affects the accuracy of molecular evolutionary metrics. Using sequence data from the first 25 months of the 2009 pandemic H1N1 (pH1N1) outbreak, we calculated each of three different standard evolutionary metrics-molecular clock rate (i.e., evolutionary rate), whole gene dN/dS, and site-wise dN/dS-for hemagglutinin and neuraminidase, using increasingly longer time windows, from 1 month to 25 months. For the molecular clock rate, we found that at least three to four months of temporal divergence from the start of sampling was required to make precise estimates that also agreed with long-term values. For whole gene dN/dS, we found that at least two months of data were required to generate precise estimates, but six to nine months were required for estimates to approach their long term values. For site-wise dN/dS estimates, we found that at least six months of sampling divergence was required before the majority of sites had at least one mutation and were thus evolutionarily informative. Furthermore, eight months of sampling divergence was required before the site-wise estimates appropriately reflected the distribution of values expected from known protein-structure-based evolutionary pressure in influenza. In summary, we found that evolutionary metrics calculated from gene sequence data in early outbreaks should be expected to deviate from their long-term estimates for at least several months after the initial emergence and sequencing of the virus.
Collapse
Affiliation(s)
- Austin G. Meyer
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA, 78712
- School of Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA, 79430
| | - Stephanie J. Spielman
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA, 78712
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA, 98109
| | - Claus O. Wilke
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA, 78712
| |
Collapse
|
24
|
Shahmoradi A, Sydykova DK, Spielman SJ, Jackson EL, Dawson ET, Meyer AG, Wilke CO. Predicting evolutionary site variability from structure in viral proteins: buriedness, packing, flexibility, and design. J Mol Evol 2014; 79:130-42. [PMID: 25217382 PMCID: PMC4216736 DOI: 10.1007/s00239-014-9644-x] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 08/31/2014] [Indexed: 12/27/2022]
Abstract
Several recent works have shown that protein structure can predict site-specific evolutionary sequence variation. In particular, sites that are buried and/or have many contacts with other sites in a structure have been shown to evolve more slowly, on average, than surface sites with few contacts. Here, we present a comprehensive study of the extent to which numerous structural properties can predict sequence variation. The quantities we considered include buriedness (as measured by relative solvent accessibility), packing density (as measured by contact number), structural flexibility (as measured by B factors, root-mean-square fluctuations, and variation in dihedral angles), and variability in designed structures. We obtained structural flexibility measures both from molecular dynamics simulations performed on nine non-homologous viral protein structures and from variation in homologous variants of those proteins, where they were available. We obtained measures of variability in designed structures from flexible-backbone design in the Rosetta software. We found that most of the structural properties correlate with site variation in the majority of structures, though the correlations are generally weak (correlation coefficients of 0.1-0.4). Moreover, we found that buriedness and packing density were better predictors of evolutionary variation than structural flexibility. Finally, variability in designed structures was a weaker predictor of evolutionary variability than buriedness or packing density, but it was comparable in its predictive power to the best structural flexibility measures. We conclude that simple measures of buriedness and packing density are better predictors of evolutionary variation than the more complicated predictors obtained from dynamic simulations, ensembles of homologous structures, or computational protein design.
Collapse
Affiliation(s)
- Amir Shahmoradi
- Department of Physics, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Dariya K. Sydykova
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Stephanie J. Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Eleisha L. Jackson
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Eric T. Dawson
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Austin G. Meyer
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Claus O. Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
25
|
Pechmann S, Frydman J. Interplay between chaperones and protein disorder promotes the evolution of protein networks. PLoS Comput Biol 2014; 10:e1003674. [PMID: 24968255 PMCID: PMC4072544 DOI: 10.1371/journal.pcbi.1003674] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 05/03/2014] [Indexed: 11/19/2022] Open
Abstract
Evolution is driven by mutations, which lead to new protein functions but come at a cost to protein stability. Non-conservative substitutions are of interest in this regard because they may most profoundly affect both function and stability. Accordingly, organisms must balance the benefit of accepting advantageous substitutions with the possible cost of deleterious effects on protein folding and stability. We here examine factors that systematically promote non-conservative mutations at the proteome level. Intrinsically disordered regions in proteins play pivotal roles in protein interactions, but many questions regarding their evolution remain unanswered. Similarly, whether and how molecular chaperones, which have been shown to buffer destabilizing mutations in individual proteins, generally provide robustness during proteome evolution remains unclear. To this end, we introduce an evolutionary parameter λ that directly estimates the rate of non-conservative substitutions. Our analysis of λ in Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens sequences reveals how co- and post-translationally acting chaperones differentially promote non-conservative substitutions in their substrates, likely through buffering of their destabilizing effects. We further find that λ serves well to quantify the evolution of intrinsically disordered proteins even though the unstructured, thus generally variable regions in proteins are often flanked by very conserved sequences. Crucially, we show that both intrinsically disordered proteins and highly re-wired proteins in protein interaction networks, which have evolved new interactions and functions, exhibit a higher λ at the expense of enhanced chaperone assistance. Our findings thus highlight an intricate interplay of molecular chaperones and protein disorder in the evolvability of protein networks. Our results illuminate the role of chaperones in enabling protein evolution, and underline the importance of the cellular context and integrated approaches for understanding proteome evolution. We feel that the development of λ may be a valuable addition to the toolbox applied to understand the molecular basis of evolution.
Collapse
Affiliation(s)
- Sebastian Pechmann
- Department of Biology, Stanford University, Stanford, California, United States of America
- * E-mail: (SP); (JF)
| | - Judith Frydman
- Department of Biology, Stanford University, Stanford, California, United States of America
- * E-mail: (SP); (JF)
| |
Collapse
|
26
|
Spielman SJ, Dawson ET, Wilke CO. Limited utility of residue masking for positive-selection inference. Mol Biol Evol 2014; 31:2496-500. [PMID: 24899665 DOI: 10.1093/molbev/msu183] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Errors in multiple sequence alignments (MSAs) can reduce accuracy in positive-selection inference. Therefore, it has been suggested to filter MSAs before conducting further analyses. One widely used filter, Guidance, allows users to remove MSA positions aligned with low confidence. However, Guidance's utility in positive-selection inference has been disputed in the literature. We have conducted an extensive simulation-based study to characterize fully how Guidance impacts positive-selection inference, specifically for protein-coding sequences of realistic divergence levels. We also investigated whether novel scoring algorithms, which phylogenetically corrected confidence scores, and a new gap-penalization score-normalization scheme improved Guidance's performance. We found that no filter, including original Guidance, consistently benefitted positive-selection inferences. Moreover, all improvements detected were exceedingly minimal, and in certain circumstances, Guidance-based filters worsened inferences.
Collapse
Affiliation(s)
- Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin
| | - Eric T Dawson
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin
| |
Collapse
|
27
|
Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures. PLoS Comput Biol 2014; 10:e1003429. [PMID: 24453956 PMCID: PMC3894161 DOI: 10.1371/journal.pcbi.1003429] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 11/22/2013] [Indexed: 11/30/2022] Open
Abstract
A critical question in biology is the identification of functionally important amino acid sites in proteins. Because functionally important sites are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. A large number of phylogenetic models have been developed to estimate site-specific substitution rates in proteins and the extraordinarily low substitution rates have been used as evidence of function. Most of the existing tools, e.g. Rate4Site, assume that site-specific substitution rates are independent across sites. However, site-specific substitution rates may be strongly correlated in the protein tertiary structure, since functionally important sites tend to be clustered together to form functional patches. We have developed a new model, GP4Rate, which incorporates the Gaussian process model with the standard phylogenetic model to identify slowly evolved regions in protein tertiary structures. GP4Rate uses the Gaussian process to define a nonparametric prior distribution of site-specific substitution rates, which naturally captures the spatial correlation of substitution rates. Simulations suggest that GP4Rate can potentially estimate site-specific substitution rates with a much higher accuracy than Rate4Site and tends to report slowly evolved regions rather than individual sites. In addition, GP4Rate can estimate the strength of the spatial correlation of substitution rates from the data. By applying GP4Rate to a set of mammalian B7-1 genes, we found a highly conserved region which coincides with experimental evidence. GP4Rate may be a useful tool for the in silico prediction of functionally important regions in the proteins with known structures. To understand how a protein functions, a critical step is to know which regions in its protein tertiary structure may be functionally important. Functionally important protein regions are typically more conserved than other regions because mutations in these regions are more likely to be deleterious. A number of phylogenetic models have been developed to identify conserved sites or regions in proteins by comparing protein sequences from multiple species. However, most of these methods treat amino acid sites independently and do not consider the spatial clustering of conserved sites in the protein tertiary structure. Therefore, their power of identifying functional protein regions is limited. We develop a new statistical model, GP4Rate, which combines the information from the protein sequences and the protein tertiary structure to infer conserved regions. We demonstrate that GP4Rate outperforms Rate4Site, the most widely used phylogenetic software for inferring functional amino acid sites, via simulations with a case study of B7-1 genes. GP4Rate is a potentially useful tool for guiding mutagenesis experiments or providing insights on the relationship between protein structures and functions.
Collapse
|
28
|
Rodrigue N, Lartillot N. Site-heterogeneous mutation-selection models within the PhyloBayes-MPI package. Bioinformatics 2013; 30:1020-1. [PMID: 24351710 PMCID: PMC3967107 DOI: 10.1093/bioinformatics/btt729] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Motivation: In recent years, there has been an increasing interest in the potential of codon substitution models for a variety of applications. However, the computational demands of these models have sometimes lead to the adoption of oversimplified assumptions, questionable statistical methods or a limited focus on small data sets. Results: Here, we offer a scalable, message-passing-interface-based Bayesian implementation of site-heterogeneous codon models in the mutation-selection framework. Our software jointly infers the global mutational parameters at the nucleotide level, the branch lengths of the tree and a Dirichlet process governing across-site variation at the amino acid level. We focus on an example estimation of the distribution of selection coefficients from an alignment of several hundred sequences of the influenza PB2 gene, and highlight the site-specific characterization enabled by such a modeling approach. Finally, we discuss future potential applications of the software for conducting evolutionary inferences. Availability and implementation: The models are implemented within the PhyloBayes-MPI package, (available at phylobayes.org) along with usage details in the accompanying manual. Contact:nicolas.rodrigue@ucalgary.ca
Collapse
Affiliation(s)
- Nicolas Rodrigue
- Department of Mathematics and Statistics, University of Calgary, 2500 University Drive NW, Calgary AB T2N 1N4, Canada and UMR CNRS 5558 - LBBE, Université Lyon 1, Villeurbanne Cedex, France
| | | |
Collapse
|
29
|
Maximum allowed solvent accessibilites of residues in proteins. PLoS One 2013; 8:e80635. [PMID: 24278298 PMCID: PMC3836772 DOI: 10.1371/journal.pone.0080635] [Citation(s) in RCA: 276] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Accepted: 10/04/2013] [Indexed: 11/24/2022] Open
Abstract
The relative solvent accessibility (RSA) of a residue in a protein measures the extent of burial or exposure of that residue in the 3D structure. RSA is frequently used to describe a protein's biophysical or evolutionary properties. To calculate RSA, a residue's solvent accessibility (ASA) needs to be normalized by a suitable reference value for the given amino acid; several normalization scales have previously been proposed. However, these scales do not provide tight upper bounds on ASA values frequently observed in empirical crystal structures. Instead, they underestimate the largest allowed ASA values, by up to 20%. As a result, many empirical crystal structures contain residues that seem to have RSA values in excess of one. Here, we derive a new normalization scale that does provide a tight upper bound on observed ASA values. We pursue two complementary strategies, one based on extensive analysis of empirical structures and one based on systematic enumeration of biophysically allowed tripeptides. Both approaches yield congruent results that consistently exceed published values. We conclude that previously published ASA normalization values were too small, primarily because the conformations that maximize ASA had not been correctly identified. As an application of our results, we show that empirically derived hydrophobicity scales are sensitive to accurate RSA calculation, and we derive new hydrophobicity scales that show increased correlation with experimentally measured scales.
Collapse
|
30
|
Jackson EL, Ollikainen N, Covert AW, Kortemme T, Wilke CO. Amino-acid site variability among natural and designed proteins. PeerJ 2013; 1:e211. [PMID: 24255821 PMCID: PMC3828621 DOI: 10.7717/peerj.211] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Accepted: 10/24/2013] [Indexed: 11/20/2022] Open
Abstract
Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.
Collapse
Affiliation(s)
- Eleisha L. Jackson
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Noah Ollikainen
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, USA
| | - Arthur W. Covert
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Tanja Kortemme
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, USA
- California Institute for Quantitative Biosciences (QB3) and Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Claus O. Wilke
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
31
|
The Site-Wise Log-Likelihood Score is a Good Predictor of Genes under Positive Selection. J Mol Evol 2013; 76:280-94. [DOI: 10.1007/s00239-013-9557-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 03/20/2013] [Indexed: 12/21/2022]
|
32
|
Meyer AG, Dawson ET, Wilke CO. Cross-species comparison of site-specific evolutionary-rate variation in influenza haemagglutinin. Philos Trans R Soc Lond B Biol Sci 2013; 368:20120334. [PMID: 23382434 DOI: 10.1098/rstb.2012.0334] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
We investigate the causes of site-specific evolutionary-rate variation in influenza haemagglutinin (HA) between human and avian influenza, for subtypes H1, H3, and H5. By calculating the evolutionary-rate ratio, ω = dN/dS as a function of a residue's solvent accessibility in the three-dimensional protein structure, we show that solvent accessibility has a significant but relatively modest effect on site-specific rate variation. By comparing rates within HA subtypes among host species, we derive an upper limit to the amount of variation that can be explained by structural constraints of any kind. Protein structure explains only 20-40% of the variation in ω. Finally, by comparing ω at sites near the sialic-acid-binding region to ω at other sites, we show that ω near the sialic-acid-binding region is significantly elevated in both human and avian influenza, with the exception of avian H5. We conclude that protein structure, HA subtype, and host biology all impose distinct selection pressures on sites in influenza HA.
Collapse
Affiliation(s)
- Austin G Meyer
- Section of Integrative Biology, Institute for Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, The University of Texas, Austin, Austin, TX 78731, USA
| | | | | |
Collapse
|
33
|
Spielman SJ, Wilke CO. Membrane environment imposes unique selection pressures on transmembrane domains of G protein-coupled receptors. J Mol Evol 2013; 76:172-82. [PMID: 23355009 DOI: 10.1007/s00239-012-9538-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2012] [Accepted: 12/18/2012] [Indexed: 12/25/2022]
Abstract
We have investigated the influence of the plasma membrane environment on the molecular evolution of G protein-coupled receptors (GPCRs), the largest receptor family in Metazoa. In particular, we have analyzed the site-specific rate variation across the two primary structural partitions, transmembrane (TM) and extramembrane (EM), of these membrane proteins. We find that TM domains evolve more slowly than do EM domains, though TM domains display increased rate heterogeneity relative to their EM counterparts. Although the majority of residues across GPCRs experience strong to weak purifying selection, many GPCRs experience positive selection at both TM and EM residues, albeit with a slight bias towards the EM. Further, a subset of GPCRs, chemosensory receptors (including olfactory and taste receptors), exhibit increased rates of evolution relative to other GPCRs, an effect which is more pronounced in their TM spans. Although it has been previously suggested that the TM's low evolutionary rate is caused by their high percentage of buried residues, we show that their attenuated rate seems to stem from the strong biophysical constraints of the membrane itself, or by functional requirements. In spite of the strong evolutionary constraints acting on the TM spans of GPCRs, positive selection and high levels of evolutionary rate variability are common. Thus, biophysical constraints should not be presumed to preclude a protein's ability to evolve.
Collapse
|
34
|
Scherrer MP, Meyer AG, Wilke CO. Modeling coding-sequence evolution within the context of residue solvent accessibility. BMC Evol Biol 2012; 12:179. [PMID: 22967129 PMCID: PMC3527230 DOI: 10.1186/1471-2148-12-179] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Accepted: 09/03/2012] [Indexed: 11/30/2022] Open
Abstract
Background Protein structure mediates site-specific patterns of sequence divergence. In particular, residues in the core of a protein (solvent-inaccessible residues) tend to be more evolutionarily conserved than residues on the surface (solvent-accessible residues). Results Here, we present a model of sequence evolution that explicitly accounts for the relative solvent accessibility of each residue in a protein. Our model is a variant of the Goldman-Yang 1994 (GY94) model in which all model parameters can be functions of the relative solvent accessibility (RSA) of a residue. We apply this model to a data set comprised of nearly 600 yeast genes, and find that an evolutionary-rate ratio ω that varies linearly with RSA provides a better model fit than an RSA-independent ω or an ω that is estimated separately in individual RSA bins. We further show that the branch length t and the transition-transverion ratio κ also vary with RSA. The RSA-dependent GY94 model performs better than an RSA-dependent Muse-Gaut 1994 (MG94) model in which the synonymous and non-synonymous rates individually are linear functions of RSA. Finally, protein core size affects the slope of the linear relationship between ω and RSA, and gene expression level affects both the intercept and the slope. Conclusions Structure-aware models of sequence evolution provide a significantly better fit than traditional models that neglect structure. The linear relationship between ω and RSA implies that genes are better characterized by their ω slope and intercept than by just their mean ω.
Collapse
Affiliation(s)
- Michael P Scherrer
- Center for Computational Biology and Bioinformatics, Institute for Cellular and Molecular Biology, and Section of Integrative Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | | | | |
Collapse
|