1
|
Yang Y, Braga MV, Dean MD. Insertion-Deletion Events Are Depleted in Protein Regions with Predicted Secondary Structure. Genome Biol Evol 2024; 16:evae093. [PMID: 38735759 PMCID: PMC11102076 DOI: 10.1093/gbe/evae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/16/2024] [Accepted: 04/21/2024] [Indexed: 05/14/2024] Open
Abstract
A fundamental goal in evolutionary biology and population genetics is to understand how selection shapes the fate of new mutations. Here, we test the null hypothesis that insertion-deletion (indel) events in protein-coding regions occur randomly with respect to secondary structures. We identified indels across 11,444 sequence alignments in mouse, rat, human, chimp, and dog genomes and then quantified their overlap with four different types of secondary structure-alpha helices, beta strands, protein bends, and protein turns-predicted by deep-learning methods of AlphaFold2. Indels overlapped secondary structures 54% as much as expected and were especially underrepresented over beta strands, which tend to form internal, stable regions of proteins. In contrast, indels were enriched by 155% over regions without any predicted secondary structures. These skews were stronger in the rodent lineages compared to the primate lineages, consistent with population genetic theory predicting that natural selection will be more efficient in species with larger effective population sizes. Nonsynonymous substitutions were also less common in regions of protein secondary structure, although not as strongly reduced as in indels. In a complementary analysis of thousands of human genomes, we showed that indels overlapping secondary structure segregated at significantly lower frequency than indels outside of secondary structure. Taken together, our study shows that indels are selected against if they overlap secondary structure, presumably because they disrupt the tertiary structure and function of a protein.
Collapse
Affiliation(s)
- Yi Yang
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew V Braga
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew D Dean
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
2
|
Yildirim A, Tekpinar M. Building Quantitative Bridges between Dynamics and Sequences of SARS-CoV-2 Main Protease and a Diverse Set of Thirty-Two Proteins. J Chem Inf Model 2023; 63:9-19. [PMID: 36513349 DOI: 10.1021/acs.jcim.2c01206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Proteases are major drug targets for many viral diseases. However, mutations can render several antiprotease drugs inefficient rapidly even though these mutations may not alter protein structures significantly. Understanding relations between quickly mutating residues, protease structures, and the dynamics of the proteases is crucial for designing potent drugs. Due to this reason, we studied relations between the evolutionary information on residues in the amino acid sequences and protein dynamics for SARS-CoV-2 main protease. More precisely, we analyzed three dynamical quantities (Schlitter entropy, root-mean-square fluctuations, and dynamical flexibility index) and their relation to the amino acid conservation extracted from multiple sequence alignments of the main protease. We showed that a quantifiable similarity can be built between a sequence-based quantity called Jensen-Shannon conservation and those three dynamical quantities. We validated this similarity for a diverse set of 32 different proteins, other than the SARS-CoV-2 main protease. We believe that establishing these kinds of quantitative bridges will have larger implications for all viral proteases as well as all proteins.
Collapse
Affiliation(s)
- Ahmet Yildirim
- Department of Biology, Siirt University, 56100Siirt, Turkey
| | - Mustafa Tekpinar
- CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, Sorbonne University, 75005Paris, France
| |
Collapse
|
3
|
Pollet L, Lambourne L, Xia Y. Structural Determinants of Yeast Protein-Protein Interaction Interface Evolution at the Residue Level. J Mol Biol 2022; 434:167750. [PMID: 35850298 DOI: 10.1016/j.jmb.2022.167750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 06/09/2022] [Accepted: 07/12/2022] [Indexed: 12/01/2022]
Abstract
Interfaces of contact between proteins play important roles in determining the proper structure and function of protein-protein interactions (PPIs). Therefore, to fully understand PPIs, we need to better understand the evolutionary design principles of PPI interfaces. Previous studies have uncovered that interfacial sites are more evolutionarily conserved than other surface protein sites. Yet, little is known about the nature and relative importance of evolutionary constraints in PPI interfaces. Here, we explore constraints imposed by the structure of the microenvironment surrounding interfacial residues on residue evolutionary rate using a large dataset of over 700 structural models of baker's yeast PPIs. We find that interfacial residues are, on average, systematically more conserved than all other residues with a similar degree of total burial as measured by relative solvent accessibility (RSA). Besides, we find that RSA of the residue when the PPI is formed is a better predictor of interfacial residue evolutionary rate than RSA in the monomer state. Furthermore, we investigate four structure-based measures of residue interfacial involvement, including change in RSA upon binding (ΔRSA), number of residue-residue contacts across the interface, and distance from the center or the periphery of the interface. Integrated modeling for evolutionary rate prediction in interfaces shows that ΔRSA plays a dominant role among the four measures of interfacial involvement, with minor, but independent contributions from other measures. These results yield insight into the evolutionary design of interfaces, improving our understanding of the role that structure plays in the molecular evolution of PPIs at the residue level.
Collapse
Affiliation(s)
- Léah Pollet
- Department of Bioengineering, Faculty of Engineering, McGill University, Montreal, QC, Canada
| | - Luke Lambourne
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Yu Xia
- Department of Bioengineering, Faculty of Engineering, McGill University, Montreal, QC, Canada.
| |
Collapse
|
4
|
Bzówka M, Mitusińska K, Raczyńska A, Skalski T, Samol A, Bagrowska W, Magdziarz T, Góra A. Evolution of tunnels in α/β-hydrolase fold proteins—What can we learn from studying epoxide hydrolases? PLoS Comput Biol 2022; 18:e1010119. [PMID: 35580137 PMCID: PMC9140254 DOI: 10.1371/journal.pcbi.1010119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Revised: 05/27/2022] [Accepted: 04/19/2022] [Indexed: 12/27/2022] Open
Abstract
The evolutionary variability of a protein’s residues is highly dependent on protein region and function. Solvent-exposed residues, excluding those at interaction interfaces, are more variable than buried residues whereas active site residues are considered to be conserved. The abovementioned rules apply also to α/β-hydrolase fold proteins—one of the oldest and the biggest superfamily of enzymes with buried active sites equipped with tunnels linking the reaction site with the exterior. We selected soluble epoxide hydrolases as representative of this family to conduct the first systematic study on the evolution of tunnels. We hypothesised that tunnels are lined by mostly conserved residues, and are equipped with a number of specific variable residues that are able to respond to evolutionary pressure. The hypothesis was confirmed, and we suggested a general and detailed way of the tunnels’ evolution analysis based on entropy values calculated for tunnels’ residues. We also found three different cases of entropy distribution among tunnel-lining residues. These observations can be applied for protein reengineering mimicking the natural evolution process. We propose a ‘perforation’ mechanism for new tunnels design via the merging of internal cavities or protein surface perforation. Based on the literature data, such a strategy of new tunnel design could significantly improve the enzyme’s performance and can be applied widely for enzymes with buried active sites. So far very little is known about proteins tunnels evolution. The goal of this study is to evaluate the evolution of tunnels in the family of soluble epoxide hydrolases—representatives of numerous α/β-hydrolase fold enzymes. As a result two types of tunnels evolution analysis were proposed (a general and a detailed approach), as well as a ‘perforation’ mechanism which can mimic native evolution in proteins and can be used as an additional strategy for enzymes redesign.
Collapse
Affiliation(s)
- Maria Bzówka
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Karolina Mitusińska
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Agata Raczyńska
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Tomasz Skalski
- Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Aleksandra Samol
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Weronika Bagrowska
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Tomasz Magdziarz
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Artur Góra
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
- * E-mail:
| |
Collapse
|
5
|
Youssef N, Susko E, Roger AJ, Bielawski JP. Evolution of amino acid propensities under stability-mediated epistasis. Mol Biol Evol 2022; 39:6522130. [PMID: 35134997 PMCID: PMC8896634 DOI: 10.1093/molbev/msac030] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Site-specific amino acid preferences are influenced by the genetic background of the protein. The preferences for resident amino acids are expected to, on average, increase over time because of replacements at other sites - a nonadaptive phenomenon referred to as the 'evolutionary Stokes shift'. Alternatively, decreases in resident amino acid propensity have recently been viewed as evidence of adaptations to external environmental changes. Using population genetics theory and thermodynamic stability-constraints, we show that nonadaptive evolution can lead to both positive and negative shifts in propensities following the fixation of an amino acid, emphasizing that the detection of negative shifts is not conclusive evidence of adaptation. Considering shifts in propensities over windows between substitutions at a focal site, we find that following ≈ 50% of substitutions the propensity for the new resident amino acid decreases over time, and both positive and negative shifts were comparable in magnitude. Preferences were often conserved via a significant negative autocorrelation in propensity changes-increases in propensities often followed by decreases, and vice versa. Lastly, we explore the underlying mechanisms that lead propensities to fluctuate. We observe that stabilizing replacements increase the mutational tolerance at a site and in doing so decrease the propensity for the resident amino acid. In contrast, destabilizing substitutions result in more rugged fitness landscapes that tend to favor the resident amino acid. In summary, our results characterize propensity trajectories under nonadaptive stability-constrained evolution against which evidence of adaptations should be calibrated.
Collapse
Affiliation(s)
- Noor Youssef
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - Andrew J Roger
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada
| | - Joseph P Bielawski
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
6
|
Rahbar MR, Jahangiri A, Khalili S, Zarei M, Mehrabani-Zeinabad K, Khalesi B, Pourzardosht N, Hessami A, Nezafat N, Sadraei S, Negahdaripour M. Hotspots for mutations in the SARS-CoV-2 spike glycoprotein: a correspondence analysis. Sci Rep 2021; 11:23622. [PMID: 34880279 PMCID: PMC8654821 DOI: 10.1038/s41598-021-01655-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Accepted: 11/01/2021] [Indexed: 12/19/2022] Open
Abstract
Spike glycoprotein (Sgp) is liable for binding of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) to the host receptors. Since Sgp is the main target for vaccine and drug designing, elucidating its mutation pattern could help in this regard. This study is aimed at investigating the correspondence of specific residues to the SgpSARS-CoV-2 functionality by explorative interpretation of sequence alignments. Centrality analysis of the Sgp dissects the importance of these residues in the interaction network of the RBD-ACE2 (receptor-binding domain) complex and furin cleavage site. Correspondence of RBD to threonine500 and asparagine501 and furin cleavage site to glutamine675, glutamine677, threonine678, and alanine684 was observed; all residues are exactly located at the interaction interfaces. The harmonious location of residues dictates the RBD binding property and the flexibility, hydrophobicity, and accessibility of the furin cleavage site. These species-specific residues can be assumed as real targets of evolution, while other substitutions tend to support them. Moreover, all these residues are parts of experimentally identified epitopes. Therefore, their substitution may affect vaccine efficacy. Higher rate of RBD maintenance than furin cleavage site was predicted. The accumulation of substitutions reinforces the probability of the multi-host circulation of the virus and emphasizes the enduring evolutionary events.
Collapse
Affiliation(s)
- Mohammad Reza Rahbar
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Abolfazl Jahangiri
- Applied Microbiology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Saeed Khalili
- Department of Biology Sciences, Shahid Rajaee Teacher Training University, Tehran, Iran
| | - Mahboubeh Zarei
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Kamran Mehrabani-Zeinabad
- Department of Biostatistics, Faculty of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Bahman Khalesi
- Department of Research and Production of Poultry Viral Vaccine, Razi Vaccine, and Serum Research Institute, Agricultural Research Education and Extension Organization (AREEO), Karaj, Iran
| | - Navid Pourzardosht
- Cellular and Molecular Research Center, Faculty of Medicine, Guilan University of Medical Sciences, Rasht, Iran
- Biochemistry Department, Guilan University of Medical Sciences, Rasht, Iran
| | - Anahita Hessami
- School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Navid Nezafat
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Saman Sadraei
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Manica Negahdaripour
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, P.O. Box 71345-1583, Shiraz, Iran.
| |
Collapse
|
7
|
Youssef N, Susko E, Bielawski JP. Consequences of Stability-Induced Epistasis for Substitution Rates. Mol Biol Evol 2020; 37:3131-3148. [DOI: 10.1093/molbev/msaa151] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
AbstractDo interactions between residues in a protein (i.e., epistasis) significantly alter evolutionary dynamics? If so, what consequences might they have on inference from traditional codon substitution models which assume site-independence for the sake of computational tractability? To investigate the effects of epistasis on substitution rates, we employed a mechanistic mutation-selection model in conjunction with a fitness framework derived from protein stability. We refer to this as the stability-informed site-dependent (S-SD) model and developed a new stability-informed site-independent (S-SI) model that captures the average effect of stability constraints on individual sites of a protein. Comparison of S-SI and S-SD offers a novel and direct method for investigating the consequences of stability-induced epistasis on protein evolution. We developed S-SI and S-SD models for three natural proteins and showed that they generate sequences consistent with real alignments. Our analyses revealed that epistasis tends to increase substitution rates compared with the rates under site-independent evolution. We then assessed the epistatic sensitivity of individual site and discovered a counterintuitive effect: Highly connected sites were less influenced by epistasis relative to exposed sites. Lastly, we show that, despite the unrealistic assumptions, traditional models perform comparably well in the presence and absence of epistasis and provide reasonable summaries of average selection intensities. We conclude that epistatic models are critical to understanding protein evolutionary dynamics, but epistasis might not be required for reasonable inference of selection pressure when averaging over time and sites.
Collapse
Affiliation(s)
- Noor Youssef
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Edward Susko
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Joseph P Bielawski
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
8
|
Serçinoğlu O, Ozbek P. Sequence-structure-function relationships in class I MHC: A local frustration perspective. PLoS One 2020; 15:e0232849. [PMID: 32421728 PMCID: PMC7233585 DOI: 10.1371/journal.pone.0232849] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 04/22/2020] [Indexed: 12/22/2022] Open
Abstract
Class I Major Histocompatibility Complex (MHC) binds short antigenic peptides with the help of Peptide Loading Complex (PLC), and presents them to T-cell Receptors (TCRs) of cytotoxic T-cells and Killer-cell Immunglobulin-like Receptors (KIRs) of Natural Killer (NK) cells. With more than 10000 alleles, human MHC (Human Leukocyte Antigen, HLA) is the most polymorphic protein in humans. This allelic diversity provides a wide coverage of peptide sequence space, yet does not affect the three-dimensional structure of the complex. Moreover, TCRs mostly interact with HLA in a common diagonal binding mode, and KIR-HLA interaction is allele-dependent. With the aim of establishing a framework for understanding the relationships between polymorphism (sequence), structure (conserved fold) and function (protein interactions) of the human MHC, we performed here a local frustration analysis on pMHC homology models covering 1436 HLA I alleles. An analysis of local frustration profiles indicated that (1) variations in MHC fold are unlikely due to minimally-frustrated and relatively conserved residues within the HLA peptide-binding groove, (2) high frustration patches on HLA helices are either involved in or near interaction sites of MHC with the TCR, KIR, or tapasin of the PLC, and (3) peptide ligands mainly stabilize the F-pocket of HLA binding groove.
Collapse
Affiliation(s)
- Onur Serçinoğlu
- Department of Bioengineering, Recep Tayyip Erdogan University, Faculty of Engineering, Fener, Rize, Turkey
| | - Pemra Ozbek
- Department of Bioengineering, Marmara University, Faculty of Engineering, Goztepe, Istanbul, Turkey
- * E-mail:
| |
Collapse
|
9
|
Liberles DA, Chang B, Geiler-Samerotte K, Goldman A, Hey J, Kaçar B, Meyer M, Murphy W, Posada D, Storfer A. Emerging Frontiers in the Study of Molecular Evolution. J Mol Evol 2020; 88:211-226. [PMID: 32060574 PMCID: PMC7386396 DOI: 10.1007/s00239-020-09932-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
A collection of the editors of Journal of Molecular Evolution have gotten together to pose a set of key challenges and future directions for the field of molecular evolution. Topics include challenges and new directions in prebiotic chemistry and the RNA world, reconstruction of early cellular genomes and proteins, macromolecular and functional evolution, evolutionary cell biology, genome evolution, molecular evolutionary ecology, viral phylodynamics, theoretical population genomics, somatic cell molecular evolution, and directed evolution. While our effort is not meant to be exhaustive, it reflects research questions and problems in the field of molecular evolution that are exciting to our editors.
Collapse
Affiliation(s)
- David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.
| | - Belinda Chang
- Department of Ecology and Evolutionary Biology and Department of Cell and Systems Biology, University of Toronto, 25 Harbord Street, Toronto, ON, M5S 3G5, Canada
| | - Kerry Geiler-Samerotte
- Center for Mechanisms of Evolution, School of Life Sciences, Arizona State University, Tempe, AZ, 85287, USA
| | - Aaron Goldman
- Department of Biology, Oberlin College and Conservatory, K123 Science Center, 119 Woodland Street, Oberlin, OH, 44074, USA
| | - Jody Hey
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
| | - Betül Kaçar
- Department of Molecular and Cell Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Michelle Meyer
- Department of Biology, Boston College, Chestnut Hill, MA, 02467, USA
| | - William Murphy
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, 77843, USA
| | - David Posada
- Biomedical Research Center (CINBIO), University of Vigo, Vigo, Spain
| | - Andrew Storfer
- School of Biological Sciences, Washington State University, Pullman, WA, 99164, USA
| |
Collapse
|
10
|
Abboud A, Bédoucha P, Byška J, Arnesen T, Reuter N. Dynamics-function relationship in the catalytic domains of N-terminal acetyltransferases. Comput Struct Biotechnol J 2020; 18:532-547. [PMID: 32206212 PMCID: PMC7078549 DOI: 10.1016/j.csbj.2020.02.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 02/14/2020] [Accepted: 02/25/2020] [Indexed: 12/15/2022] Open
Abstract
N-terminal acetyltransferases (NATs) belong to the superfamily of acetyltransferases. They are enzymes catalysing the transfer of an acetyl group from acetyl coenzyme A to the N-terminus of polypeptide chains. N-terminal acetylation is one of the most common protein modifications. To date, not much is known on the molecular basis for the exclusive substrate specificity of NATs. All NATs share a common fold called GNAT. A characteristic of NATs is the β6β7 hairpin loop covering the active site and forming with the α1α2 loop a narrow tunnel surrounding the catalytic site in which cofactor and polypeptide meet and exchange an acetyl group. We investigated the dynamics-function relationships of all available structures of NATs covering the three domains of Life. Using an elastic network model and normal mode analysis, we found a common dynamics pattern conserved through the GNAT fold; a rigid V-shaped groove formed by the β4 and β5 strands and splitting the fold in two dynamical subdomains. Loops α1α2, β3β4 and β6β7 all show clear displacements in the low frequency normal modes. We characterized the mobility of the loops and show that even limited conformational changes of the loops along the low-frequency modes are able to significantly change the size and shape of the ligand binding sites. Based on the fact that these movements are present in most low-frequency modes, and common to all NATs, we suggest that the α1α2 and β6β7 loops may regulate ligand uptake and the release of the acetylated polypeptide.
Collapse
Affiliation(s)
- Angèle Abboud
- Department of Informatics, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Pierre Bédoucha
- Department of Informatics, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Jan Byška
- Department of Informatics, University of Bergen, Bergen, Norway
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
| | - Thomas Arnesen
- Department of Biological Sciences, University of Bergen, Bergen, Norway
- Department of Biomedicine, University of Bergen, Bergen, Norway
- Department of Surgery, Haukeland University Hospital, Bergen, Norway
| | - Nathalie Reuter
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Department of Chemistry, University of Bergen, Bergen, Norway
| |
Collapse
|
11
|
Sun Z, Liu Q, Qu G, Feng Y, Reetz MT. Utility of B-Factors in Protein Science: Interpreting Rigidity, Flexibility, and Internal Motion and Engineering Thermostability. Chem Rev 2019; 119:1626-1665. [PMID: 30698416 DOI: 10.1021/acs.chemrev.8b00290] [Citation(s) in RCA: 278] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Zhoutong Sun
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West Seventh Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
| | - Qian Liu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ge Qu
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West Seventh Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
| | - Yan Feng
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Manfred T. Reetz
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West Seventh Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470 Mülheim an der Ruhr, Germany
- Chemistry Department, Philipps-University, Hans-Meerwein-Strasse 4, 35032 Marburg, Germany
| |
Collapse
|
12
|
Echave J. Beyond Stability Constraints: A Biophysical Model of Enzyme Evolution with Selection on Stability and Activity. Mol Biol Evol 2018; 36:613-620. [DOI: 10.1093/molbev/msy244] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín (UNSAM), Buenos Aires, Argentina
| |
Collapse
|
13
|
Tiwari SP, Reuter N. Conservation of intrinsic dynamics in proteins — what have computational models taught us? Curr Opin Struct Biol 2018; 50:75-81. [DOI: 10.1016/j.sbi.2017.12.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 11/24/2017] [Accepted: 12/08/2017] [Indexed: 12/12/2022]
|
14
|
Lundin E, Tang PC, Guy L, Näsvall J, Andersson DI. Experimental Determination and Prediction of the Fitness Effects of Random Point Mutations in the Biosynthetic Enzyme HisA. Mol Biol Evol 2018; 35:704-718. [PMID: 29294020 PMCID: PMC5850734 DOI: 10.1093/molbev/msx325] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
The distribution of fitness effects of mutations is a factor of fundamental importance in evolutionary biology. We determined the distribution of fitness effects of 510 mutants that each carried between 1 and 10 mutations (synonymous and nonsynonymous) in the hisA gene, encoding an essential enzyme in the l-histidine biosynthesis pathway of Salmonella enterica. For the full set of mutants, the distribution was bimodal with many apparently neutral mutations and many lethal mutations. For a subset of 81 single, nonsynonymous mutants most mutations appeared neutral at high expression levels, whereas at low expression levels only a few mutations were neutral. Furthermore, we examined how the magnitude of the observed fitness effects was correlated to several measures of biophysical properties and phylogenetic conservation.We conclude that for HisA: (i) The effect of mutations can be masked by high expression levels, such that mutations that are deleterious to the function of the protein can still be neutral with regard to organism fitness if the protein is expressed at a sufficiently high level; (ii) the shape of the fitness distribution is dependent on the extent to which the protein is rate-limiting for growth; (iii) negative epistatic interactions, on an average, amplified the combined effect of nonsynonymous mutations; and (iv) no single sequence-based predictor could confidently predict the fitness effects of mutations in HisA, but a combination of multiple predictors could predict the effect with a SD of 0.04 resulting in 80% of the mutations predicted within 12% of their observed selection coefficients.
Collapse
Affiliation(s)
- Erik Lundin
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Po-Cheng Tang
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Lionel Guy
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Joakim Näsvall
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Dan I Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
15
|
Beyond Thermodynamic Constraints: Evolutionary Sampling Generates Realistic Protein Sequence Variation. Genetics 2018; 208:1387-1395. [PMID: 29382650 DOI: 10.1534/genetics.118.300699] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Accepted: 01/25/2018] [Indexed: 01/01/2023] Open
Abstract
Biological evolution generates a surprising amount of site-specific variability in protein sequences. Yet, attempts at modeling this process have been only moderately successful, and current models based on protein structural metrics explain, at best, 60% of the observed variation. Surprisingly, simple measures of protein structure, such as solvent accessibility, are often better predictors of site-specific variability than more complex models employing all-atom energy functions and detailed structural modeling. We suggest here that these more complex models perform poorly because they lack consideration of the evolutionary process, which is, in part, captured by the simpler metrics. We compare protein sequences that are computationally designed to sequences that are computationally evolved using the same protein-design energy function and to homologous natural sequences. We find that, by a wide variety of metrics, evolved sequences are much more similar to natural sequences than are designed sequences. In particular, designed sequences are too conserved on the protein surface relative to natural sequences, whereas evolved sequences are not. Our results suggest that evolutionary simulation produces a realistic sampling of sequence space. By contrast, protein design-at least as currently implemented-does not. Existing energy functions seem to be sufficiently accurate to correctly describe the key thermodynamic constraints acting on protein sequences, but they need to be paired with realistic sampling schemes to generate realistic sequence alignments.
Collapse
|
16
|
Sydykova DK, Jack BR, Spielman SJ, Wilke CO. Measuring evolutionary rates of proteins in a structural context. F1000Res 2017; 6:1845. [PMID: 29167739 DOI: 10.12688/f1000research.12874.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/18/2017] [Indexed: 11/20/2022] Open
Abstract
We describe how to measure site-specific rates of evolution in protein-coding genes and how to correlate these rates with structural features of the expressed protein, such as relative solvent accessibility, secondary structure, or weighted contact number. We present two alternative approaches to rate calculations: One based on relative amino-acid rates, and the other based on site-specific codon rates measured as dN/ dS. We additionally provide a code repository containing scripts to facilitate the specific analysis protocols we recommend.
Collapse
Affiliation(s)
- Dariya K Sydykova
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Benjamin R Jack
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Stephanie J Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
17
|
Sydykova DK, Jack BR, Spielman SJ, Wilke CO. Measuring evolutionary rates of proteins in a structural context. F1000Res 2017; 6:1845. [PMID: 29167739 PMCID: PMC5676193 DOI: 10.12688/f1000research.12874.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/31/2018] [Indexed: 12/14/2022] Open
Abstract
We describe how to measure site-specific rates of evolution in protein-coding genes and how to correlate these rates with structural features of the expressed protein, such as relative solvent accessibility, secondary structure, or weighted contact number. We present two alternative approaches to rate calculations: One based on relative amino-acid rates, and the other based on site-specific codon rates measured as
dN/
dS. We additionally provide a code repository containing scripts to facilitate the specific analysis protocols we recommend.
Collapse
Affiliation(s)
- Dariya K Sydykova
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Benjamin R Jack
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Stephanie J Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
18
|
Liu JW, Cheng CW, Lin YF, Chen SY, Hwang JK, Yen SC. Relationships between residue Voronoi volume and sequence conservation in proteins. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2017; 1866:379-386. [PMID: 28911812 DOI: 10.1016/j.bbapap.2017.09.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 08/18/2017] [Accepted: 09/05/2017] [Indexed: 12/31/2022]
Abstract
BACKGROUND Functional and biophysical constraints can cause different levels of sequence conservation in proteins. Previously, structural properties, e.g., relative solvent accessibility (RSA) and packing density of the weighted contact number (WCN), have been found to be related to protein sequence conservation (CS). The Voronoi volume has recently been recognized as a new structural property of the local protein structural environment reflecting CS. However, for surface residues, it is sensitive to water molecules surrounding the protein structure. Herein, we present a simple structural determinant termed the relative space of Voronoi volume (RSV); it uses the Voronoi volume and the van der Waals volume of particular residues to quantify the local structural environment. METHODS RSV (range, 0-1) is defined as (Voronoi volume-van der Waals volume)/Voronoi volume of the target residue. The concept of RSV describes the extent of available space for every protein residue. RESULTS RSV and Voronoi profiles with and without water molecules (RSVw, RSV, VOw, and VO) were compared for 554 non-homologous proteins. RSV (without water) showed better Pearson's correlations with CS than did RSVw, VO, or VOw values. The mean correlation coefficient between RSV and CS was 0.51, which is comparable to the correlation between RSA and CS (0.49) and that between WCN and CS (0.56). CONCLUSIONS RSV is a robust structural descriptor with and without water molecules and can quantitatively reflect evolutionary information in a single protein structure. Therefore, it may represent a practical structural determinant to study protein sequence, structure, and function relationships.
Collapse
Affiliation(s)
- Jen-Wei Liu
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, HsinChu 30050, Taiwan, R.O.C.
| | - Chih-Wen Cheng
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, HsinChu 30050, Taiwan, R.O.C.
| | - Yu-Feng Lin
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, HsinChu 30050, Taiwan, R.O.C.
| | - Shao-Yu Chen
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, HsinChu 30050, Taiwan, R.O.C.
| | - Jenn-Kang Hwang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, HsinChu 30050, Taiwan, R.O.C; Center for Bioinformatics Research, National Chiao Tung University, HsinChu 30050, Taiwan, R.O.C.
| | - Shih-Chung Yen
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, HsinChu 30050, Taiwan, R.O.C.
| |
Collapse
|
19
|
Sydykova DK, Wilke CO. Calculating site-specific evolutionary rates at the amino-acid or codon level yields similar rate estimates. PeerJ 2017; 5:e3391. [PMID: 28584717 PMCID: PMC5452972 DOI: 10.7717/peerj.3391] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 05/08/2017] [Indexed: 11/20/2022] Open
Abstract
Site-specific evolutionary rates can be estimated from codon sequences or from amino-acid sequences. For codon sequences, the most popular methods use some variation of the dN∕dS ratio. For amino-acid sequences, one widely-used method is called Rate4Site, and it assigns a relative conservation score to each site in an alignment. How site-wise dN∕dS values relate to Rate4Site scores is not known. Here we elucidate the relationship between these two rate measurements. We simulate sequences with known dN∕dS, using either dN∕dS models or mutation–selection models for simulation. We then infer Rate4Site scores on the simulated alignments, and we compare those scores to either true or inferred dN∕dS values on the same alignments. We find that Rate4Site scores generally correlate well with true dN∕dS, and the correlation strengths increase in alignments with greater sequence divergence and more taxa. Moreover, Rate4Site scores correlate very well with inferred (as opposed to true) dN∕dS values, even for small alignments with little divergence. Finally, we verify this relationship between Rate4Site and dN∕dS in a variety of empirical datasets. We conclude that codon-level and amino-acid-level analysis frameworks are directly comparable and yield very similar inferences.
Collapse
Affiliation(s)
- Dariya K Sydykova
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
20
|
Jackson EL, Spielman SJ, Wilke CO. Computational prediction of the tolerance to amino-acid deletion in green-fluorescent protein. PLoS One 2017; 12:e0164905. [PMID: 28369116 PMCID: PMC5378326 DOI: 10.1371/journal.pone.0164905] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 03/21/2017] [Indexed: 01/29/2023] Open
Abstract
Proteins evolve through two primary mechanisms: substitution, where mutations alter a protein's amino-acid sequence, and insertions and deletions (indels), where amino acids are either added to or removed from the sequence. Protein structure has been shown to influence the rate at which substitutions accumulate across sites in proteins, but whether structure similarly constrains the occurrence of indels has not been rigorously studied. Here, we investigate the extent to which structural properties known to covary with protein evolutionary rates might also predict protein tolerance to indels. Specifically, we analyze a publicly available dataset of single-amino-acid deletion mutations in enhanced green fluorescent protein (eGFP) to assess how well the functional effect of deletions can be predicted from protein structure. We find that weighted contact number (WCN), which measures how densely packed a residue is within the protein's three-dimensional structure, provides the best single predictor for whether eGFP will tolerate a given deletion. We additionally find that using protein design to explicitly model deletions results in improved predictions of functional status when combined with other structural predictors. Our work suggests that structure plays fundamental role in constraining deletions at sites in proteins, and further that similar biophysical constraints influence both substitutions and deletions. This study therefore provides a solid foundation for future work to examine how protein structure influences tolerance of more complex indel events, such as insertions or large deletions.
Collapse
Affiliation(s)
- Eleisha L. Jackson
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Stephanie J. Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Claus O. Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- * E-mail:
| |
Collapse
|
21
|
Bloom JD. Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models. Biol Direct 2017; 12:1. [PMID: 28095902 PMCID: PMC5240389 DOI: 10.1186/s13062-016-0172-z] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 12/14/2016] [Indexed: 12/23/2022] Open
Abstract
Background Sites of positive selection are identified by comparing observed evolutionary patterns to those expected under a null model for evolution in the absence of such selection. For protein-coding genes, the most common null model is that nonsynonymous and synonymous mutations fix at equal rates; this unrealistic model has limited power to detect many interesting forms of selection. Results I describe a new approach that uses a null model based on experimental measurements of a gene’s site-specific amino-acid preferences generated by deep mutational scanning in the lab. This null model makes it possible to identify both diversifying selection for repeated amino-acid change and differential selection for mutations to amino acids that are unexpected given the measurements made in the lab. I show that this approach identifies sites of adaptive substitutions in four genes (lactamase, Gal4, influenza nucleoprotein, and influenza hemagglutinin) far better than a comparable method that simply compares the rates of nonsynonymous and synonymous substitutions. Conclusions As rapid increases in biological data enable increasingly nuanced descriptions of the constraints on individual protein sites, approaches like the one here can improve our ability to identify many interesting forms of selection in natural sequences. Reviewers This article was reviewed by Sebastian Maurer-Stroh, Olivier Tenaillon, and Tal Pupko. All three reviewers are members of the Biology Direct editorial board. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0172-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, 98109, WA, USA.
| |
Collapse
|
22
|
Meyer AG, Wilke CO. The utility of protein structure as a predictor of site-wise dN/dS varies widely among HIV-1 proteins. J R Soc Interface 2016; 12:20150579. [PMID: 26468068 DOI: 10.1098/rsif.2015.0579] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein structure acts as a general constraint on the evolution of viral proteins. One widely recognized structural constraint explaining evolutionary variation among sites is the relative solvent accessibility (RSA) of residues in the folded protein. In influenza virus, the distance from functional sites has been found to explain an additional portion of the evolutionary variation in the external antigenic proteins. However, to what extent RSA and distance from a reference site in the protein can be used more generally to explain protein adaptation in other viruses and in the different proteins of any given virus remains an open question. To address this question, we have carried out an analysis of the distribution and structural predictors of site-wise dN/dS in HIV-1. Our results indicate that the distribution of dN/dS in HIV follows a smooth gamma distribution, with no special enrichment or depletion of sites with dN/dS at or above one. The variation in dN/dS can be partially explained by RSA and distance from a reference site in the protein, but these structural constraints do not act uniformly among the different HIV-1 proteins. Structural constraints are highly predictive in just one of the three enzymes and one of three structural proteins in HIV-1. For these two proteins, the protease enzyme and the gp120 structural protein, structure explains between 30 and 40% of the variation in dN/dS. Finally, for the gp120 protein of the receptor-binding complex, we also find that glycosylation sites explain just 2% of the variation in dN/dS and do not explain gp120 evolution independently of either RSA or distance from the apical surface.
Collapse
Affiliation(s)
- Austin G Meyer
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA School of Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA
| | - Claus O Wilke
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
23
|
Lowenthal MS, Davis KS, Formolo T, Kilpatrick LE, Phinney KW. Identification of Novel N-Glycosylation Sites at Noncanonical Protein Consensus Motifs. J Proteome Res 2016; 15:2087-101. [PMID: 27246700 DOI: 10.1021/acs.jproteome.5b00733] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
N-glycosylation of proteins is well known to occur at asparagine residues that fall within the canonical consensus sequence N-X-S/T but has also been identified at a small number of asparagine residues within N-X-C motifs, including the N491 residue of human serotransferrin. Here we report novel glycosylation sites within noncanonical consensus motifs, in the conformation N-X-C, based on mass spectrometry analysis of partially deglycosylated glycopeptide targets. Alpha-1-acid glycoprotein (A1AG) and serotransferrin (Tf) were observed for the first time to be N-glycosylated on asparagine residues within a total of six unique noncanonical motifs. N-glycosylation was initially predicted in silico based on the evolutionary conservation of the N-X-C motif among related mammalian species and demonstrated experimentally in A1AG from porcine, canine, and feline sources and in human serotransferrin. High-resolution liquid chromatography-tandem mass spectrometry was employed to collect fragmentation data of predicted GlcNAcylated peptides and to assign modification sites within N-X-C motifs. A combination of targeted analytical techniques that includes complementary mass spectrometry platforms, enzymatic digestions, and partial-deglycosylation procedures was developed to confirm the novel observations. Additionally, we found that A1AG in porcine and canine sources is highly N-glycosylated at a noncanonical motif (N-Q-C) based on semiquantitative multiple reaction monitoring analysis-the first report of an N-X-C motif exhibiting substantial N-glycosylation. Although reports of N-X-C motif N-glycosylation are relatively uncommon in the literature, this work adds to a growing list of glycoproteins reported with glycosylation at various forms of noncanonical motifs.
Collapse
Affiliation(s)
- Mark S Lowenthal
- Material Measurement Laboratory, Biomolecular Measurement Division, National Institute of Standards and Technology , 100 Bureau Drive, Stop 8314, Gaithersburg, Maryland 20899, United States
| | - Kiersta S Davis
- Material Measurement Laboratory, Biomolecular Measurement Division, National Institute of Standards and Technology , 100 Bureau Drive, Stop 8314, Gaithersburg, Maryland 20899, United States
| | - Trina Formolo
- Material Measurement Laboratory, Biomolecular Measurement Division, National Institute of Standards and Technology , 100 Bureau Drive, Stop 8314, Gaithersburg, Maryland 20899, United States
| | - Lisa E Kilpatrick
- Material Measurement Laboratory, Biomolecular Measurement Division, National Institute of Standards and Technology , 100 Bureau Drive, Stop 8314, Gaithersburg, Maryland 20899, United States
| | - Karen W Phinney
- Material Measurement Laboratory, Biomolecular Measurement Division, National Institute of Standards and Technology , 100 Bureau Drive, Stop 8314, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
24
|
Shahmoradi A, Wilke CO. Dissecting the roles of local packing density and longer-range effects in protein sequence evolution. Proteins 2016; 84:841-54. [PMID: 26990194 PMCID: PMC5292938 DOI: 10.1002/prot.25034] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2015] [Revised: 02/01/2016] [Accepted: 02/24/2016] [Indexed: 11/07/2022]
Abstract
What are the structural determinants of protein sequence evolution? A number of site-specific structural characteristics have been proposed, most of which are broadly related to either the density of contacts or the solvent accessibility of individual residues. Most importantly, there has been disagreement in the literature over the relative importance of solvent accessibility and local packing density for explaining site-specific sequence variability in proteins. We show that this discussion has been confounded by the definition of local packing density. The most commonly used measures of local packing, such as contact number and the weighted contact number, represent the combined effects of local packing density and longer-range effects. As an alternative, we propose a truly local measure of packing density around a single residue, based on the Voronoi cell volume. We show that the Voronoi cell volume, when calculated relative to the geometric center of amino-acid side chains, behaves nearly identically to the relative solvent accessibility, and each individually can explain, on average, approximately 34% of the site-specific variation in evolutionary rate in a data set of 209 enzymes. An additional 10% of variation can be explained by nonlocal effects that are captured in the weighted contact number. Consequently, evolutionary variation at a site is determined by the combined effects of the immediate amino-acid neighbors of that site and effects mediated by more distant amino acids. We conclude that instead of contrasting solvent accessibility and local packing density, future research should emphasize on the relative importance of immediate contacts and longer-range effects on evolutionary variation. Proteins 2016; 84:841-854. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Amir Shahmoradi
- Department of Physics, The University of Texas at Austin
- Center for Computational Biology and Bioinformatics, The University
of Texas at Austin
- Institute for Cellular and Molecular Biology, The University of
Texas at Austin
| | - Claus O. Wilke
- Center for Computational Biology and Bioinformatics, The University
of Texas at Austin
- Institute for Cellular and Molecular Biology, The University of
Texas at Austin
- Department of Integrative Biology, The University of Texas at
Austin
| |
Collapse
|
25
|
Jack BR, Meyer AG, Echave J, Wilke CO. Functional Sites Induce Long-Range Evolutionary Constraints in Enzymes. PLoS Biol 2016; 14:e1002452. [PMID: 27138088 PMCID: PMC4854464 DOI: 10.1371/journal.pbio.1002452] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 04/04/2016] [Indexed: 12/26/2022] Open
Abstract
Functional residues in proteins tend to be highly conserved over evolutionary time. However, to what extent functional sites impose evolutionary constraints on nearby or even more distant residues is not known. Here, we report pervasive conservation gradients toward catalytic residues in a dataset of 524 distinct enzymes: evolutionary conservation decreases approximately linearly with increasing distance to the nearest catalytic residue in the protein structure. This trend encompasses, on average, 80% of the residues in any enzyme, and it is independent of known structural constraints on protein evolution such as residue packing or solvent accessibility. Further, the trend exists in both monomeric and multimeric enzymes and irrespective of enzyme size and/or location of the active site in the enzyme structure. By contrast, sites in protein-protein interfaces, unlike catalytic residues, are only weakly conserved and induce only minor rate gradients. In aggregate, these observations show that functional sites, and in particular catalytic residues, induce long-range evolutionary constraints in enzymes.
Collapse
Affiliation(s)
- Benjamin R. Jack
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Austin G. Meyer
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, San Martín, Buenos Aires, Argentina
| | - Claus O. Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
26
|
Jackson EL, Shahmoradi A, Spielman SJ, Jack BR, Wilke CO. Intermediate divergence levels maximize the strength of structure-sequence correlations in enzymes and viral proteins. Protein Sci 2016; 25:1341-53. [PMID: 26971720 DOI: 10.1002/pro.2920] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Accepted: 03/04/2016] [Indexed: 12/16/2022]
Abstract
Structural properties such as solvent accessibility and contact number predict site-specific sequence variability in many proteins. However, the strength and significance of these structure-sequence relationships vary widely among different proteins, with absolute correlation strengths ranging from 0 to 0.8. In particular, two recent works have made contradictory observations. Yeh et al. (Mol. Biol. Evol. 31:135-139, 2014) found that both relative solvent accessibility (RSA) and weighted contact number (WCN) are good predictors of sitewise evolutionary rate in enzymes, with WCN clearly out-performing RSA. Shahmoradi et al. (J. Mol. Evol. 79:130-142, 2014) considered these same predictors (as well as others) in viral proteins and found much weaker correlations and no clear advantage of WCN over RSA. Because these two studies had substantial methodological differences, however, a direct comparison of their results is not possible. Here, we reanalyze the datasets of the two studies with one uniform analysis pipeline, and we find that many apparent discrepancies between the two analyses can be attributed to the extent of sequence divergence in individual alignments. Specifically, the alignments of the enzyme dataset are much more diverged than those of the virus dataset, and proteins with higher divergence exhibit, on average, stronger structure-sequence correlations. However, the highest structure-sequence correlations are observed at intermediate divergence levels, where both highly conserved and highly variable sites are present in the same alignment.
Collapse
Affiliation(s)
- Eleisha L Jackson
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712.,Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712.,Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| | - Amir Shahmoradi
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712.,Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712.,Department of Physics, The University of Texas at Austin, Austin, Texas, 78712
| | - Stephanie J Spielman
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712.,Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712.,Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| | - Benjamin R Jack
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712.,Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712.,Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712.,Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712.,Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| |
Collapse
|
27
|
González MM, Abriata LA, Tomatis PE, Vila AJ. Optimization of Conformational Dynamics in an Epistatic Evolutionary Trajectory. Mol Biol Evol 2016; 33:1768-76. [PMID: 26983555 DOI: 10.1093/molbev/msw052] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The understanding of protein evolution depends on the ability to relate the impact of mutations on molecular traits to organismal fitness. Biological activity and robustness have been regarded as important features in shaping protein evolutionary landscapes. Conformational dynamics, which is essential for protein function, has received little attention in the context of evolutionary analyses. Here we employ NMR spectroscopy, the chief experimental tool to describe protein dynamics at atomic level in solution at room temperature, to study the intrinsic dynamic features of a metallo- Β: -lactamase enzyme and three variants identified during a directed evolution experiment that led to an expanded substrate profile. We show that conformational dynamics in the catalytically relevant microsecond to millisecond timescale is optimized along the favored evolutionary trajectory. In addition, we observe that the effects of mutations on dynamics are epistatic. Mutation Gly262Ser introduces slow dynamics on several residues that surround the active site when introduced in the wild-type enzyme. Mutation Asn70Ser removes the slow dynamics observed for few residues of the wild-type enzyme, but increases the number of residues that undergo slow dynamics when introduced in the Gly262Ser mutant. These effects on dynamics correlate with the epistatic interaction between these two mutations on the bacterial phenotype. These findings indicate that conformational dynamics is an evolvable trait, and that proteins endowed with more dynamic active sites also display a larger potential for promoting evolution.
Collapse
Affiliation(s)
- Mariano M González
- IBR (Instituto de Biología Molecular y Celular de Rosario), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Ocampo y Esmeralda, Rosario, Argentina
| | - Luciano A Abriata
- IBR (Instituto de Biología Molecular y Celular de Rosario), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Ocampo y Esmeralda, Rosario, Argentina
| | - Pablo E Tomatis
- IBR (Instituto de Biología Molecular y Celular de Rosario), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Ocampo y Esmeralda, Rosario, Argentina
| | - Alejandro J Vila
- IBR (Instituto de Biología Molecular y Celular de Rosario), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Ocampo y Esmeralda, Rosario, Argentina Plataforma Argentina de Biología Estructural y Metabolómica (PLABEM), Ocampo y Esmeralda, Rosario, Argentina
| |
Collapse
|
28
|
Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 2016; 17:109-21. [PMID: 26781812 DOI: 10.1038/nrg.2015.18] [Citation(s) in RCA: 206] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina
| | - Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
29
|
Wu NC, Olson CA, Du Y, Le S, Tran K, Remenyi R, Gong D, Al-Mawsawi LQ, Qi H, Wu TT, Sun R. Functional Constraint Profiling of a Viral Protein Reveals Discordance of Evolutionary Conservation and Functionality. PLoS Genet 2015; 11:e1005310. [PMID: 26132554 PMCID: PMC4489113 DOI: 10.1371/journal.pgen.1005310] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 05/28/2015] [Indexed: 12/31/2022] Open
Abstract
Viruses often encode proteins with multiple functions due to their compact genomes. Existing approaches to identify functional residues largely rely on sequence conservation analysis. Inferring functional residues from sequence conservation can produce false positives, in which the conserved residues are functionally silent, or false negatives, where functional residues are not identified since they are species-specific and therefore non-conserved. Furthermore, the tedious process of constructing and analyzing individual mutations limits the number of residues that can be examined in a single study. Here, we developed a systematic approach to identify the functional residues of a viral protein by coupling experimental fitness profiling with protein stability prediction using the influenza virus polymerase PA subunit as the target protein. We identified a significant number of functional residues that were influenza type-specific and were evolutionarily non-conserved among different influenza types. Our results indicate that type-specific functional residues are prevalent and may not otherwise be identified by sequence conservation analysis alone. More importantly, this technique can be adapted to any viral (and potentially non-viral) protein where structural information is available. The analysis of sequence conservation is a common approach to identify functional residues within a protein. However, not all functional residues are conserved as natural evolution and species diversification permit continuous innovation of protein functionality through the retention of advantageous mutations. Non-conserved functional residues, which are often species-specific, may not be identified by conventional analysis of sequence conservation despite being biologically important. Here we described a novel approach to identify functional residues within a protein by coupling a high-throughput experimental fitness profiling approach with computational protein modeling. Our methodology is independent of sequence conservation and is applicable to any protein where structural information is available. In this study, we systematically mapped the functional residues on the influenza A PA protein and revealed that non-conserved functional residues are prevalent. Our results not only have significant implication on how functionality evolves during natural evolution, but also highlight the caveats when applying conservation-based approaches to identify functional residues within a protein.
Collapse
Affiliation(s)
- Nicholas C. Wu
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - C. Anders Olson
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Yushen Du
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Shuai Le
- Department of Microbiology, Third Military Medical University, Chongqing, 400038, China
| | - Kevin Tran
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Roland Remenyi
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Danyang Gong
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Laith Q. Al-Mawsawi
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Hangfei Qi
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Ting-Ting Wu
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America,
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, California, United States of America,
- AIDS Institute, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
30
|
Meyer AG, Wilke CO. Geometric Constraints Dominate the Antigenic Evolution of Influenza H3N2 Hemagglutinin. PLoS Pathog 2015; 11:e1004940. [PMID: 26020774 PMCID: PMC4447415 DOI: 10.1371/journal.ppat.1004940] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 05/07/2015] [Indexed: 11/18/2022] Open
Abstract
We have carried out a comprehensive analysis of the determinants of human influenza A H3 hemagglutinin evolution. We consider three distinct predictors of evolutionary variation at individual sites: solvent accessibility (as a proxy for protein fold stability and/or conservation), Immune Epitope Database (IEDB) epitope sites (as a proxy for host immune bias), and proximity to the receptor-binding region (as a proxy for one of the functions of hemagglutinin-to bind sialic acid). Individually, these quantities explain approximately 15% of the variation in site-wise dN/dS. In combination, solvent accessibility and proximity explain 32% of the variation in dN/dS; incorporating IEDB epitope sites into the model adds only an additional 2 percentage points. Thus, while solvent accessibility and proximity perform largely as independent predictors of evolutionary variation, they each overlap with the epitope-sites predictor. Furthermore, we find that the historical H3 epitope sites, which date back to the 1980s and 1990s, only partially overlap with the experimental sites from the IEDB, and display similar overlap in predictive power when combined with solvent accessibility and proximity. We also find that sites with dN/dS > 1, i.e., the sites most likely driving seasonal immune escape, are not correctly predicted by either historical or IEDB epitope sites, but only by proximity to the receptor-binding region. In summary, a simple geometric model of HA evolution outperforms a model based on epitope sites. These results suggest that either the available epitope sites do not accurately represent the true influenza antigenic sites or that host immune bias may be less important for influenza evolution than commonly thought.
Collapse
MESH Headings
- Antibodies, Viral/immunology
- Antigens, Viral/immunology
- Binding Sites
- Databases, Factual
- Epitope Mapping
- Epitopes/immunology
- Evolution, Molecular
- Genetic Variation/genetics
- Hemagglutinin Glycoproteins, Influenza Virus/chemistry
- Hemagglutinin Glycoproteins, Influenza Virus/genetics
- Hemagglutinin Glycoproteins, Influenza Virus/immunology
- Humans
- Influenza A Virus, H3N2 Subtype/immunology
- Influenza, Human/genetics
- Influenza, Human/immunology
- Influenza, Human/virology
- Protein Folding
- Protein Stability
- Sialic Acids/metabolism
- Solvents/chemistry
Collapse
Affiliation(s)
- Austin G. Meyer
- Department of Integrative Biology, Institute for Cellular and Molecular Biology and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- School of Medicine, Texas Tech University Health Sciences Center, Lubbock, Texas, United States of America
| | - Claus O. Wilke
- Department of Integrative Biology, Institute for Cellular and Molecular Biology and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
31
|
Marcos ML, Echave J. Too packed to change: side-chain packing and site-specific substitution rates in protein evolution. PeerJ 2015; 3:e911. [PMID: 25922797 PMCID: PMC4411540 DOI: 10.7717/peerj.911] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 04/04/2015] [Indexed: 12/21/2022] Open
Abstract
In protein evolution, due to functional and biophysical constraints, the rates of amino acid substitution differ from site to site. Among the best predictors of site-specific rates are solvent accessibility and packing density. The packing density measure that best correlates with rates is the weighted contact number (WCN), the sum of inverse square distances between a site’s Cα and the Cα of the other sites. According to a mechanistic stress model proposed recently, rates are determined by packing because mutating packed sites stresses and destabilizes the protein’s active conformation. While WCN is a measure of Cα packing, mutations replace side chains. Here, we consider whether a site’s evolutionary divergence is constrained by main-chain packing or side-chain packing. To address this issue, we extended the stress theory to model side chains explicitly. The theory predicts that rates should depend solely on side-chain contact density. We tested this prediction on a data set of structurally and functionally diverse monomeric enzymes. We compared side-chain contact density with main-chain contact density measures and with relative solvent accessibility (RSA). We found that side-chain contact density is the best predictor of rate variation among sites (it explains 39.2% of the variation). Moreover, the independent contribution of main-chain contact density measures and RSA are negligible. Thus, as predicted by the stress theory, site-specific evolutionary rates are determined by side-chain packing.
Collapse
Affiliation(s)
- María Laura Marcos
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín , San Martín, Buenos Aires , Argentina
| | - Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín , San Martín, Buenos Aires , Argentina
| |
Collapse
|
32
|
Echave J, Jackson EL, Wilke CO. Relationship between protein thermodynamic constraints and variation of evolutionary rates among sites. Phys Biol 2015; 12:025002. [PMID: 25787027 PMCID: PMC4391963 DOI: 10.1088/1478-3975/12/2/025002] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Evolutionary-rate variation among sites within proteins depends on functional and biophysical properties that constrain protein evolution. It is generally accepted that proteins must be able to fold stably in order to function. However, the relationship between stability constraints and among-sites rate variation is not well understood. Here, we present a biophysical model that links the thermodynamic stability changes due to mutations at sites in proteins ([Formula: see text]) to the rate at which mutations accumulate at those sites over evolutionary time. We find that such a 'stability model' generally performs well, displaying correlations between predicted and empirically observed rates of up to 0.75 for some proteins. We further find that our model has comparable predictive power as does an alternative, recently proposed 'stress model' that explains evolutionary-rate variation among sites in terms of the excess energy needed for mutants to adopt the correct active structure ([Formula: see text]). The two models make distinct predictions, though, and for some proteins the stability model outperforms the stress model and vice versa. We conclude that both stability and stress constrain site-specific sequence evolution in proteins.
Collapse
|
33
|
Meyer AG, Spielman SJ, Bedford T, Wilke CO. Time dependence of evolutionary metrics during the 2009 pandemic influenza virus outbreak. Virus Evol 2015; 1:vev006. [PMID: 26770819 PMCID: PMC4710376 DOI: 10.1093/ve/vev006] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
With the expansion of DNA sequencing technology, quantifying evolution in emerging viral outbreaks has become an important tool for scientists and public health officials. Although it is known that the degree of sequence divergence significantly affects the calculation of evolutionary metrics in viral outbreaks, the extent and duration of this effect during an actual outbreak remains unclear. We have analyzed how limited divergence time during an early viral outbreak affects the accuracy of molecular evolutionary metrics. Using sequence data from the first 25 months of the 2009 pandemic H1N1 (pH1N1) outbreak, we calculated each of three different standard evolutionary metrics-molecular clock rate (i.e., evolutionary rate), whole gene dN/dS, and site-wise dN/dS-for hemagglutinin and neuraminidase, using increasingly longer time windows, from 1 month to 25 months. For the molecular clock rate, we found that at least three to four months of temporal divergence from the start of sampling was required to make precise estimates that also agreed with long-term values. For whole gene dN/dS, we found that at least two months of data were required to generate precise estimates, but six to nine months were required for estimates to approach their long term values. For site-wise dN/dS estimates, we found that at least six months of sampling divergence was required before the majority of sites had at least one mutation and were thus evolutionarily informative. Furthermore, eight months of sampling divergence was required before the site-wise estimates appropriately reflected the distribution of values expected from known protein-structure-based evolutionary pressure in influenza. In summary, we found that evolutionary metrics calculated from gene sequence data in early outbreaks should be expected to deviate from their long-term estimates for at least several months after the initial emergence and sequencing of the virus.
Collapse
Affiliation(s)
- Austin G. Meyer
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA, 78712
- School of Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA, 79430
| | - Stephanie J. Spielman
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA, 78712
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA, 98109
| | - Claus O. Wilke
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA, 78712
| |
Collapse
|