1
|
Lipsh-Sokolik R, Fleishman SJ. Addressing epistasis in the design of protein function. Proc Natl Acad Sci U S A 2024; 121:e2314999121. [PMID: 39133844 PMCID: PMC11348311 DOI: 10.1073/pnas.2314999121] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024] Open
Abstract
Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional active-site variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.
Collapse
Affiliation(s)
- Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
2
|
Luppino F, Adzhubei IA, Cassa CA, Toth-Petroczy A. DeMAG predicts the effects of variants in clinically actionable genes by integrating structural and evolutionary epistatic features. Nat Commun 2023; 14:2230. [PMID: 37076482 PMCID: PMC10115847 DOI: 10.1038/s41467-023-37661-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 03/27/2023] [Indexed: 04/21/2023] Open
Abstract
Despite the increasing use of genomic sequencing in clinical practice, the interpretation of rare genetic variants remains challenging even in well-studied disease genes, resulting in many patients with Variants of Uncertain Significance (VUSs). Computational Variant Effect Predictors (VEPs) provide valuable evidence in variant assessment, but they are prone to misclassifying benign variants, contributing to false positives. Here, we develop Deciphering Mutations in Actionable Genes (DeMAG), a supervised classifier for missense variants trained using extensive diagnostic data available in 59 actionable disease genes (American College of Medical Genetics and Genomics Secondary Findings v2.0, ACMG SF v2.0). DeMAG improves performance over existing VEPs by reaching balanced specificity (82%) and sensitivity (94%) on clinical data, and includes a novel epistatic feature, the 'partners score', which leverages evolutionary and structural partnerships of residues. The 'partners score' provides a general framework for modeling epistatic interactions, integrating both clinical and functional information. We provide our tool and predictions for all missense variants in 316 clinically actionable disease genes (demag.org) to facilitate the interpretation of variants and improve clinical decision-making.
Collapse
Affiliation(s)
- Federica Luppino
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307, Dresden, Germany
- Center for Systems Biology Dresden, 01307, Dresden, Germany
| | - Ivan A Adzhubei
- Brigham and Women's Hospital Division of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Christopher A Cassa
- Brigham and Women's Hospital Division of Genetics, Harvard Medical School, Boston, MA, 02115, USA.
| | - Agnes Toth-Petroczy
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307, Dresden, Germany.
- Center for Systems Biology Dresden, 01307, Dresden, Germany.
- Cluster of Excellence Physics of Life, TU Dresden, 01062, Dresden, Germany.
| |
Collapse
|
3
|
Hoehe MR, Herwig R. Analysis of 1276 Haplotype-Resolved Genomes Allows Characterization of Cis- and Trans-Abundant Genes. Methods Mol Biol 2023; 2590:237-272. [PMID: 36335503 DOI: 10.1007/978-1-0716-2819-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Many methods for haplotyping have materialized, but their application on a significant scale has been rare to date. Here we summarize analyses that were carried out in 1092 genomes from the 1000 Genomes Consortium and validated in an unprecedented number of 184 PGP genomes that have been experimentally haplotype-resolved by application of the Long-Fragment Read (LFR) technology. These analyses provided first insights into the diplotypic nature of human genomes and its potential functional implications. Thus, protein-changing variants were not randomly distributed between the two homologues of 18,121 autosomal protein-coding genes but occurred significantly more frequently in cis than in trans configurations in virtually each of the 1276 phased genomes. This resulted in global cis/trans ratios of ~60:40, establishing "cis abundance" as a universal characteristic of diploid human genomes. This phenomenon was based on two different classes of genes, a larger one exhibiting cis configurations of protein-changing variants in excess, so-called "cis-abundant" genes, and a smaller one of "trans-abundant" genes. These two gene classes, which together constitute a common diplotypic exome, were further functionally distinguished by means of gene ontology (GO) and pathway enrichment analysis. Moreover, they were distinguishable in terms of their effects on the human interactome, where they constitute distinct cis and trans modules, as shown with network propagation on a large integrated protein-protein interaction network. These analyses, recently performed with updated database and analysis tools, further consolidated the characterization of cis- and trans-abundant genes while expanding previous results. In this chapter, we present the key results along with the materials and methods to motivate readers to investigate these findings independently and gain further insights into the diplotypic nature of genes and genomes.
Collapse
Affiliation(s)
- Margret R Hoehe
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | - Ralf Herwig
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
4
|
Azbukina N, Zharikova A, Ramensky V. Intragenic compensation through the lens of deep mutational scanning. Biophys Rev 2022; 14:1161-1182. [PMID: 36345285 PMCID: PMC9636336 DOI: 10.1007/s12551-022-01005-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/26/2022] [Indexed: 12/20/2022] Open
Abstract
A significant fraction of mutations in proteins are deleterious and result in adverse consequences for protein function, stability, or interaction with other molecules. Intragenic compensation is a specific case of positive epistasis when a neutral missense mutation cancels effect of a deleterious mutation in the same protein. Permissive compensatory mutations facilitate protein evolution, since without them all sequences would be extremely conserved. Understanding compensatory mechanisms is an important scientific challenge at the intersection of protein biophysics and evolution. In human genetics, intragenic compensatory interactions are important since they may result in variable penetrance of pathogenic mutations or fixation of pathogenic human alleles in orthologous proteins from related species. The latter phenomenon complicates computational and clinical inference of an allele's pathogenicity. Deep mutational scanning is a relatively new technique that enables experimental studies of functional effects of thousands of mutations in proteins. We review the important aspects of the field and discuss existing limitations of current datasets. We reviewed ten published DMS datasets with quantified functional effects of single and double mutations and described rates and patterns of intragenic compensation in eight of them. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-022-01005-w.
Collapse
Affiliation(s)
- Nadezhda Azbukina
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
| | - Anastasia Zharikova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| | - Vasily Ramensky
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| |
Collapse
|
5
|
Hoehe MR, Herwig R, Mao Q, Peters BA, Drmanac R, Church GM, Huebsch T. Significant abundance of cis configurations of coding variants in diploid human genomes. Nucleic Acids Res 2019; 47:2981-2995. [PMID: 30698752 PMCID: PMC6451136 DOI: 10.1093/nar/gkz031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 12/05/2018] [Accepted: 01/15/2019] [Indexed: 12/12/2022] Open
Abstract
To fully understand human genetic variation and its functional consequences, the specific distribution of variants between the two chromosomal homologues of genes must be known. The 'phase' of variants can significantly impact gene function and phenotype. To assess patterns of phase at large scale, we have analyzed 18 121 autosomal genes in 1092 statistically phased genomes from the 1000 Genomes Project and 184 experimentally phased genomes from the Personal Genome Project. Here we show that genes with cis-configurations of coding variants are more frequent than genes with trans-configurations in a genome, with global cis/trans ratios of ∼60:40. Significant cis-abundance was observed in virtually all genomes in all populations. Moreover, we identified a large group of genes exhibiting cis-configurations of protein-changing variants in excess, so-called 'cis-abundant genes', and a smaller group of 'trans-abundant genes'. These two gene categories were functionally distinguishable, and exhibited strikingly different distributional patterns of protein-changing variants. Underlying these phenomena was a shared set of phase-sensitive genes of importance for adaptation and evolution. This work establishes common patterns of phase as key characteristics of diploid human exomes and provides evidence for their functional significance, highlighting the importance of phase for the interpretation of protein-coding genetic variation and gene function.
Collapse
Affiliation(s)
- Margret R Hoehe
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Ralf Herwig
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Qing Mao
- Complete Genomics, Inc., San Jose, CA 95112, USA
| | - Brock A Peters
- Complete Genomics, Inc., San Jose, CA 95112, USA.,BGI-Shenzhen, Shenzhen 518083, China
| | - Radoje Drmanac
- Complete Genomics, Inc., San Jose, CA 95112, USA.,BGI-Shenzhen, Shenzhen 518083, China
| | - George M Church
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Thomas Huebsch
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| |
Collapse
|
6
|
Marín Ò, Aguirre J, de la Cruz X. Compensated pathogenic variants in coagulation factors VIII and IX present complex mapping between molecular impact and hemophilia severity. Sci Rep 2019; 9:9538. [PMID: 31267011 PMCID: PMC6606640 DOI: 10.1038/s41598-019-45916-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 06/18/2019] [Indexed: 01/07/2023] Open
Abstract
Compensated pathogenic deviations (CPDs) are sequence variants that are pathogenic in humans but neutral in other species. In recent years, our molecular understanding of CPDs has advanced substantially. For example, it is known that their impact on human proteins is generally milder than that of average pathogenic mutations and that their impact is suppressed in non-human carriers by compensatory mutations. However, prior studies have ignored the evolutionarily relevant relationship between molecular impact and organismal phenotype. Here, we explore this topic using CPDs from FVIII and FIX and data concerning carriers' hemophilia severity. We find that, regardless of their molecular impact, these mutations can be associated with either mild or severe disease phenotypes. Only a weak relationship is found between protein stability changes and severity. We also characterize the population variability of hemostasis proteins, which constitute the genetic background of FVIII and FIX, using data from the 1000 Genome project. We observe that genetic background can vary substantially between individuals in terms of both the amount and nature of genetic variants. Finally, we discuss how these results highlight the need to include new terms in present models of protein evolution to explain the origin of CPDs.
Collapse
Affiliation(s)
- Òscar Marín
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, P/Vall d'Hebron, 119-129, 08035, Barcelona, Spain
| | - Josu Aguirre
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, P/Vall d'Hebron, 119-129, 08035, Barcelona, Spain
| | - Xavier de la Cruz
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, P/Vall d'Hebron, 119-129, 08035, Barcelona, Spain. .,ICREA, Barcelona, Spain.
| |
Collapse
|
7
|
Storz JF. Compensatory mutations and epistasis for protein function. Curr Opin Struct Biol 2018; 50:18-25. [PMID: 29100081 PMCID: PMC5936477 DOI: 10.1016/j.sbi.2017.10.009] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 10/05/2017] [Accepted: 10/12/2017] [Indexed: 01/09/2023]
Abstract
Adaptive protein evolution may be facilitated by neutral amino acid mutations that confer no benefit when they first arise but which potentiate subsequent function-altering mutations via direct or indirect structural mechanisms. Theoretical and empirical results indicate that such compensatory interactions (intramolecular epistasis) can exert a strong influence on trajectories of protein evolution. For this reason, assessing the form and prevalence of intramolecular epistasis and characterizing biophysical mechanisms of compensatory interaction are important research goals at the nexus of structural biology and molecular evolution. Here I review recent insights derived from protein-engineering studies, and I describe an approach for identifying and characterizing mechanisms of epistasis that integrates experimental data on structure-function relationships with analyses of comparative sequence data.
Collapse
Affiliation(s)
- Jay F Storz
- University of Nebraska, School of Biological Sciences, Lincoln, NE 68588-0114, United States.
| |
Collapse
|
8
|
Tiberti M, Pandini A, Fraternali F, Fornili A. In silico identification of rescue sites by double force scanning. Bioinformatics 2018; 34:207-214. [PMID: 28961796 PMCID: PMC5860198 DOI: 10.1093/bioinformatics/btx515] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 06/23/2017] [Accepted: 08/10/2017] [Indexed: 01/03/2023] Open
Abstract
Motivation A deleterious amino acid change in a protein can be compensated by a second-site rescue mutation. These compensatory mechanisms can be mimicked by drugs. In particular, the location of rescue mutations can be used to identify protein regions that can be targeted by small molecules to reactivate a damaged mutant. Results We present the first general computational method to detect rescue sites. By mimicking the effect of mutations through the application of forces, the double force scanning (DFS) method identifies the second-site residues that make the protein structure most resilient to the effect of pathogenic mutations. We tested DFS predictions against two datasets containing experimentally validated and putative evolutionary-related rescue sites. A remarkably good agreement was found between predictions and experimental data. Indeed, almost half of the rescue sites in p53 was correctly predicted by DFS, with 65% of remaining sites in contact with DFS predictions. Similar results were found for other proteins in the evolutionary dataset. Availability and implementation The DFS code is available under GPL at https://fornililab.github.io/dfs/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matteo Tiberti
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| | - Alessandro Pandini
- Department of Computer Science, College of Engineering, Design and Physical Sciences and Synthetic Biology Theme, Institute of Environment, Health and Societies, Brunel University London, Uxbridge, London, UK
| | - Franca Fraternali
- Randall Division of Cell and Molecular Biophysics, King‘s College London, London, UK
- The Francis Crick Institute, London, UK
- The Thomas Young Centre for Theory and Simulation of Materials, London, UK
| | - Arianna Fornili
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
- The Thomas Young Centre for Theory and Simulation of Materials, London, UK
| |
Collapse
|
9
|
Snouwaert JN, Nguyen M, Repenning PW, Dye R, Livingston EW, Kovarova M, Moy SS, Brigman BE, Bateman TA, Ting JPY, Koller BH. An NLRP3 Mutation Causes Arthropathy and Osteoporosis in Humanized Mice. Cell Rep 2017; 17:3077-3088. [PMID: 27974218 DOI: 10.1016/j.celrep.2016.11.052] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Revised: 08/29/2016] [Accepted: 11/16/2016] [Indexed: 01/14/2023] Open
Abstract
The NLRP3 inflammasome plays a critical role in host defense by facilitating caspase I activation and maturation of IL-1β and IL-18, whereas dysregulation of inflammasome activity results in autoinflammatory disease. Factors regulating human NLRP3 activity that contribute to the phenotypic heterogeneity of NLRP3-related diseases have largely been inferred from the study of Nlrp3 mutant mice. By generating a mouse line in which the NLRP3 locus is humanized by syntenic replacement, we show the functioning of the human NLRP3 proteins in vivo, demonstrating the ability of the human inflammasome to orchestrate immune reactions in response to innate stimuli. Humanized mice expressing disease-associated mutations develop normally but display acute sensitivity to endotoxin and develop progressive and debilitating arthritis characterized by granulocytic infiltrates, elevated cytokines, erosion of bones, and osteoporosis. This NLRP3-dependent arthritis model provides a platform for testing therapeutic reagents targeting the human inflammasome.
Collapse
Affiliation(s)
- John N Snouwaert
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - MyTrang Nguyen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Peter W Repenning
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rebecca Dye
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Eric W Livingston
- Department of Biomedical Engineering, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Martina Kovarova
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Sheryl S Moy
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Brian E Brigman
- Department of Orthopedic Surgery and Pediatrics, Duke University, Durham, NC 27705, USA
| | - Ted A Bateman
- Department of Biomedical Engineering, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jenny P-Y Ting
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Beverly H Koller
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| |
Collapse
|
10
|
Barber MF, Lee EM, Griffin H, Elde NC. Rapid Evolution of Primate Type 2 Immune Response Factors Linked to Asthma Susceptibility. Genome Biol Evol 2017; 9:1757-1765. [PMID: 28854632 PMCID: PMC5569703 DOI: 10.1093/gbe/evx120] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/03/2017] [Indexed: 02/06/2023] Open
Abstract
Host immunity pathways evolve rapidly in response to antagonism by pathogens. Microbial infections can also trigger excessive inflammation that contributes to diverse autoimmune disorders including asthma, lupus, diabetes, and arthritis. Definitive links between immune system evolution and human autoimmune disease remain unclear. Here we provide evidence that several components of the type 2 immune response pathway have been subject to recurrent positive selection in the primate lineage. Notably, substitutions in the central immune regulator IL13 correspond to a polymorphism linked to asthma susceptibility in humans. We also find evidence of accelerated amino acid substitutions as well as gene gain and loss events among eosinophil granule proteins, which act as toxic antimicrobial effectors that promote asthma pathology by damaging airway tissues. These results support the hypothesis that evolutionary conflicts with pathogens promote tradeoffs for increasingly robust immune responses during animal evolution. Our findings are also consistent with the view that natural selection has contributed to the spread of autoimmune disease alleles in humans.
Collapse
Affiliation(s)
| | - Elliott M. Lee
- Department of Human Genetics, University of Utah School of Medicine
| | - Hayden Griffin
- Department of Human Genetics, University of Utah School of Medicine
| | - Nels C. Elde
- Department of Human Genetics, University of Utah School of Medicine
| |
Collapse
|
11
|
The genomic landscape of evolutionary convergence in mammals, birds and reptiles. Nat Ecol Evol 2017; 1:41. [PMID: 28812724 DOI: 10.1038/s41559-016-0041] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Accepted: 11/23/2016] [Indexed: 01/11/2023]
Abstract
Many lineage-defining (nodal) mutations possess high functionality. However, differentiating adaptive nodal mutations from those that are functionally compensated remains challenging. To address this challenge, we identified functional nodal mutations (fNMs) in ~3,400 nuclear DNA (nDNA) and 4 mitochondrial DNA (mtDNA) protein structures from 91 and 1,003 species, respectively, representing the entire mammalian, bird and reptile phylogeny. A screen for candidate compensatory mutations among co-occurring amino acid changes in close structural proximity revealed that such compensated fNMs encompass 37% and 27% of the mtDNA and nDNA datasets, respectively. Analysis of the remaining (non-compensated) mutations, which are enriched for adaptive mutations, showed that birds and mammals share most such recurrent fNMs (N = 51). Among the latter, we discovered mutations in thermoregulation-related genes. These represent the best candidates to explain the molecular basis of convergent body thermoregulation in birds and mammals. Our analysis reveals the landscape of possible mutational compensation and convergence in amniote phylogeny.
Collapse
|
12
|
Starr TN, Thornton JW. Epistasis in protein evolution. Protein Sci 2016; 25:1204-18. [PMID: 26833806 PMCID: PMC4918427 DOI: 10.1002/pro.2897] [Citation(s) in RCA: 301] [Impact Index Per Article: 37.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 01/25/2016] [Accepted: 01/27/2016] [Indexed: 01/18/2023]
Abstract
The structure, function, and evolution of proteins depend on physical and genetic interactions among amino acids. Recent studies have used new strategies to explore the prevalence, biochemical mechanisms, and evolutionary implications of these interactions-called epistasis-within proteins. Here we describe an emerging picture of pervasive epistasis in which the physical and biological effects of mutations change over the course of evolution in a lineage-specific fashion. Epistasis can restrict the trajectories available to an evolving protein or open new paths to sequences and functions that would otherwise have been inaccessible. We describe two broad classes of epistatic interactions, which arise from different physical mechanisms and have different effects on evolutionary processes. Specific epistasis-in which one mutation influences the phenotypic effect of few other mutations-is caused by direct and indirect physical interactions between mutations, which nonadditively change the protein's physical properties, such as conformation, stability, or affinity for ligands. In contrast, nonspecific epistasis describes mutations that modify the effect of many others; these typically behave additively with respect to the physical properties of a protein but exhibit epistasis because of a nonlinear relationship between the physical properties and their biological effects, such as function or fitness. Both types of interaction are rampant, but specific epistasis has stronger effects on the rate and outcomes of evolution, because it imposes stricter constraints and modulates evolutionary potential more dramatically; it therefore makes evolution more contingent on low-probability historical events and leaves stronger marks on the sequences, structures, and functions of protein families.
Collapse
Affiliation(s)
- Tyler N Starr
- Graduate Program in Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, 60637
| | - Joseph W Thornton
- Departments of Ecology and Evolution and Human Genetics, University of Chicago, Chicago, Illinois, 60637
| |
Collapse
|
13
|
Abstract
To what extent is the convergent evolution of protein function attributable to convergent or parallel changes at the amino acid level? The mutations that contribute to adaptive protein evolution may represent a biased subset of all possible beneficial mutations owing to mutation bias and/or variation in the magnitude of deleterious pleiotropy. A key finding is that the fitness effects of amino acid mutations are often conditional on genetic background. This context dependence (epistasis) can reduce the probability of convergence and parallelism because it reduces the number of possible mutations that are unconditionally acceptable in divergent genetic backgrounds. Here, I review factors that influence the probability of replicated evolution at the molecular level.
Collapse
Affiliation(s)
- Jay F Storz
- School of Biological Sciences, University of Nebraska, Lincoln, Nebraska 68588, USA
| |
Collapse
|
14
|
Abstract
Deleterious or 'disease-associated' mutations are mutations that lead to disease with high phenotype penetrance: they are inherited in a simple Mendelian manner, or, in the case of cancer, accumulate in somatic cells leading directly to disease. However, in some cases, the amino acid that is substituted resulting in disease is the wild-type native residue in the functionally equivalent protein in another species. Such examples are known as 'compensated pathogenic deviations' (CPDs) because, somewhere in the second species, there must be compensatory mutations that allow the protein to function normally despite having a residue which would cause disease in the first species. Depending on the nature of the mutations, compensation can occur in the same protein, or in a different protein with which it interacts. In principle, compensation can be achieved by a single mutation (most probably structurally close to the CPD), or by the cumulative effect of several mutations. Although it is clear that these effects occur in proteins, compensatory mutations are also important in RNA potentially having an impact on disease. As a much simpler molecule, RNA provides an interesting model for understanding mechanisms of compensatory effects, both by looking at naturally occurring RNA molecules and as a means of computational simulation. This review surveys the rather limited literature that has explored these effects. Understanding the nature of CPDs is important in understanding traversal along fitness landscape valleys in evolution. It could also have applications in treating diseases that result from such mutations.
Collapse
|
15
|
Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 2015; 11:20140419. [PMID: 25165599 DOI: 10.1098/rsif.2014.0419] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Collapse
Affiliation(s)
- Tobias Sikosek
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
16
|
Ivankov DN, Finkelstein AV, Kondrashov FA. A structural perspective of compensatory evolution. Curr Opin Struct Biol 2014; 26:104-12. [PMID: 24981969 PMCID: PMC4141909 DOI: 10.1016/j.sbi.2014.05.004] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Revised: 04/11/2014] [Accepted: 05/16/2014] [Indexed: 11/25/2022]
Abstract
The study of molecular evolution is important because it reveals how protein functions emerge and evolve. Recently, several types of studies indicated that substitutions in molecular evolution occur in a compensatory manner, whereby the occurrence of a substitution depends on the amino acid residues at other sites. However, a molecular or structural basis behind the compensation often remains obscure. Here, we review studies on the interface of structural biology and molecular evolution that revealed novel aspects of compensatory evolution. In many cases structural studies benefit from evolutionary data while structural data often add a functional dimension to the study of molecular evolution.
Collapse
Affiliation(s)
- Dmitry N Ivankov
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, 4 Institutskaya str., Pushchino, Moscow Region, 142290, Russia
| | - Alexei V Finkelstein
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, 4 Institutskaya str., Pushchino, Moscow Region, 142290, Russia
| | - Fyodor A Kondrashov
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), 23 Pg. Lluís Companys, 08010 Barcelona, Spain.
| |
Collapse
|
17
|
Xu J, Zhang J. Why human disease-associated residues appear as the wild-type in other species: genome-scale structural evidence for the compensation hypothesis. Mol Biol Evol 2014; 31:1787-92. [PMID: 24723421 DOI: 10.1093/molbev/msu130] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Many human-disease associated amino acid residues (DARs) appear as the wild-type in other species. This phenomenon is commonly explained by the presence of compensatory residues in these other species that alleviate the deleterious effects of the DARs. The general validity of this hypothesis, however, is unclear, because few compensatory residues have been identified. Here we test the compensation hypothesis by assembling and analyzing 1,077 DARs located in 177 proteins of known crystal structures. Because destabilizing protein structures is a primary reason why DARs are deleterious, we focus on protein stability in this analysis. We discover that, in species where a DAR represents the wild-type, the destabilizing effect of the DAR is generally lessened by the observed amino acid substitutions in the spatial proximity of the DAR. This and other findings provide genome-scale evidence for the compensation hypothesis and have important implications for understanding epistasis in protein evolution and for using animal models of human diseases.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Computational Medicine and Bioinformatics, University of Michigan
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan
| |
Collapse
|
18
|
Riera C, Lois S, de la Cruz X. Prediction of pathological mutations in proteins: the challenge of integrating sequence conservation and structure stability principles. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2013. [DOI: 10.1002/wcms.1170] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Casandra Riera
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
| | - Sergio Lois
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
| | - Xavier de la Cruz
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
- Institució Catalana per la Recerca i Estudis Avançats (ICREA); Barcelona Spain
| |
Collapse
|
19
|
Soylemez O, Kondrashov FA. Estimating the rate of irreversibility in protein evolution. Genome Biol Evol 2013; 4:1213-22. [PMID: 23132897 PMCID: PMC3542581 DOI: 10.1093/gbe/evs096] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Whether or not evolutionary change is inherently irreversible remains a controversial
topic. Some examples of evolutionary irreversibility are known; however, this question has
not been comprehensively addressed at the molecular level. Here, we use data from 221
human genes with known pathogenic mutations to estimate the rate of irreversibility in
protein evolution. For these genes, we reconstruct ancestral amino acid sequences along
the mammalian phylogeny and identify ancestral amino acid states that match known
pathogenic mutations. Such cases represent inherent evolutionary irreversibility because,
at the present moment, reversals to these ancestral amino acid states are impossible for
the human lineage. We estimate that approximately 10% of all amino acid
substitutions along the mammalian phylogeny are irreversible, such that a return to the
ancestral amino acid state would lead to a pathogenic phenotype. For a subset of 51 genes
with high rates of irreversibility, as much as 40% of all amino acid evolution was
estimated to be irreversible. Because pathogenic phenotypes do not resemble ancestral
phenotypes, the molecular nature of the high rate of irreversibility in proteins is best
explained by evolution with a high prevalence of compensatory, epistatic interactions
between amino acid sites. Under such mode of protein evolution, once an amino acid
substitution is fixed, the probability of its reversal declines as the protein sequence
accumulates changes that affect the phenotypic manifestation of the ancestral state. The
prevalence of epistasis in evolution indicates that the observed high rate of
irreversibility in protein evolution is an inherent property of protein structure and
function.
Collapse
Affiliation(s)
- Onuralp Soylemez
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | | |
Collapse
|
20
|
Gong LI, Suchard MA, Bloom JD. Stability-mediated epistasis constrains the evolution of an influenza protein. eLife 2013; 2:e00631. [PMID: 23682315 PMCID: PMC3654441 DOI: 10.7554/elife.00631] [Citation(s) in RCA: 256] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 04/09/2013] [Indexed: 11/28/2022] Open
Abstract
John Maynard Smith compared protein evolution to the game where one word is converted into another a single letter at a time, with the constraint that all intermediates are words: WORD→WORE→GORE→GONE→GENE. In this analogy, epistasis constrains evolution, with some mutations tolerated only after the occurrence of others. To test whether epistasis similarly constrains actual protein evolution, we created all intermediates along a 39-mutation evolutionary trajectory of influenza nucleoprotein, and also introduced each mutation individually into the parent. Several mutations were deleterious to the parent despite becoming fixed during evolution without negative impact. These mutations were destabilizing, and were preceded or accompanied by stabilizing mutations that alleviated their adverse effects. The constrained mutations occurred at sites enriched in T-cell epitopes, suggesting they promote viral immune escape. Our results paint a coherent portrait of epistasis during nucleoprotein evolution, with stabilizing mutations permitting otherwise inaccessible destabilizing mutations which are sometimes of adaptive value. DOI:http://dx.doi.org/10.7554/eLife.00631.001 During evolution, the effect of one mutation on a protein can depend on whether another mutation is also present. This phenomenon is similar to the game in which one word is converted to another word, one letter at a time, subject to the rule that all the intermediate steps are also valid words: for example, the word WORD can be converted to the word GENE as follows: WORD→WORE→GORE→GONE→GENE. In this example, the D must be changed to an E before the W is changed to a G, because GORD is not a valid word. Similarly, during the evolution of a virus, a mutation that helps the virus evade the human immune system might only be tolerated if the virus has acquired another mutation beforehand. This type of mutational interaction would constrain the evolution of the virus, since its capacity to take advantage of the second mutation depends on the first mutation having already occurred. Gong et al. examined whether such interactions have indeed constrained evolution of the influenza virus. Between 1968 and 2007, the nucleoprotein—which acts as a scaffold for the replication of genetic material—in the human H3N2 influenza virus underwent a series of 39 mutations. To test whether all of these mutations could have been tolerated by the 1968 virus, Gong et al. introduced each one individually into the 1968 nucleoprotein. They found that several mutations greatly reduced the fitness of the 1968 virus when introduced on their own, which strongly suggests that these ‘constrained mutations’ became part of the virus’s genetic makeup as a result of interactions with ‘enabling’ mutations. The constrained mutations decreased the stability of the nucleoprotein at high temperatures, while the enabling mutations counteracted this effect. It may, therefore, be possible to identify enabling mutations based on their effects on thermal stability. Intriguingly, the constrained mutations helped the virus overcome one form of human immunity to influenza, suggesting that interactions between mutations might limit the rate at which viruses evolve to evade the immune system. Overall, these results show that interactions among mutations constrain the evolution of the influenza nucleoprotein in a fashion that can be largely understood in terms of protein stability. If the same is true for other proteins and viruses, this work could lead to a deeper understanding of the constraints that govern evolution at the molecular level. DOI:http://dx.doi.org/10.7554/eLife.00631.002
Collapse
Affiliation(s)
- Lizhi Ian Gong
- Division of Basic Sciences , Fred Hutchinson Cancer Research Center , Seattle , United States
| | | | | |
Collapse
|
21
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|
22
|
Zhang G, Pei Z, Ball EV, Mort M, Kehrer-Sawatzki H, Cooper DN. Cross-comparison of the genome sequences from human, chimpanzee, Neanderthal and a Denisovan hominin identifies novel potentially compensated mutations. Hum Genomics 2012; 5:453-84. [PMID: 21807602 PMCID: PMC3525967 DOI: 10.1186/1479-7364-5-5-453] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The recent publication of the draft genome sequences of the Neanderthal and a ~50,000-year-old archaic hominin from Denisova Cave in southern Siberia has ushered in a new age in molecular archaeology. We previously cross-compared the human, chimpanzee and Neanderthal genome sequences with respect to a set of disease-causing/disease-associated missense and regulatory mutations (Human Gene Mutation Database) and succeeded in identifying genetic variants which, although apparently pathogenic in humans, may represent a 'compensated' wild-type state in at least one of the other two species. Here, in an attempt to identify further 'potentially compensated mutations' (PCMs) of interest, we have compared our dataset of disease-causing/disease-associated mutations with their corresponding nucleotide positions in the Denisovan hominin, Neanderthal and chimpanzee genomes. Of the 15 human putatively disease-causing mutations that were found to be compensated in chimpanzee, Denisovan or Neanderthal, only a solitary F5 variant (Val1736Met) was specific to the Denisovan. In humans, this missense mutation is associated with activated protein C resistance and an increased risk of thromboembolism and recurrent miscarriage. It is unclear at this juncture whether this variant was indeed a PCM in the Denisovan or whether it could instead have been associated with disease in this ancient hominin.
Collapse
Affiliation(s)
- Guojie Zhang
- Bioinformatics Department, Beijing Genomics Institute at Shenzhen, China.
| | | | | | | | | | | |
Collapse
|
23
|
Zhang G, Pei Z, Krawczak M, Ball EV, Mort M, Kehrer-Sawatzki H, Cooper DN. Triangulation of the human, chimpanzee, and Neanderthal genome sequences identifies potentially compensated mutations. Hum Mutat 2010; 31:1286-93. [DOI: 10.1002/humu.21389] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
24
|
Cooper DN, Chen JM, Ball EV, Howells K, Mort M, Phillips AD, Chuzhanova N, Krawczak M, Kehrer-Sawatzki H, Stenson PD. Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics. Hum Mutat 2010; 31:631-55. [PMID: 20506564 DOI: 10.1002/humu.21260] [Citation(s) in RCA: 117] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The number of reported germline mutations in human nuclear genes, either underlying or associated with inherited disease, has now exceeded 100,000 in more than 3,700 different genes. The availability of these data has both revolutionized the study of the morbid anatomy of the human genome and facilitated "personalized genomics." With approximately 300 new "inherited disease genes" (and approximately 10,000 new mutations) being identified annually, it is pertinent to ask how many "inherited disease genes" there are in the human genome, how many mutations reside within them, and where such lesions are likely to be located? To address these questions, it is necessary not only to reconsider how we define human genes but also to explore notions of gene "essentiality" and "dispensability."Answers to these questions are now emerging from recent novel insights into genome structure and function and through complete genome sequence information derived from multiple individual human genomes. However, a change in focus toward screening functional genomic elements as opposed to genes sensu stricto will be required if we are to capitalize fully on recent technical and conceptual advances and identify new types of disease-associated mutation within noncoding regions remote from the genes whose function they disrupt.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, United Kingdom.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Palpant NJ, Houang EM, Delport W, Hastings KEM, Onufriev AV, Sham YY, Metzger JM. Pathogenic peptide deviations support a model of adaptive evolution of chordate cardiac performance by troponin mutations. Physiol Genomics 2010; 42:287-99. [PMID: 20423961 DOI: 10.1152/physiolgenomics.00033.2010] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
In cardiac muscle, the troponin (cTn) complex is a key regulator of myofilament calcium sensitivity because it serves as a molecular switch required for translating myocyte calcium fluxes into sarcomeric contraction and relaxation. Studies of several species suggest that ectotherm chordates have myofilaments with heightened calcium responsiveness. However, genetic polymorphisms in cTn that cause increased myofilament sensitivity to activating calcium in mammals result in cardiac disease including arrhythmias, diastolic dysfunction, and increased susceptibility to sudden cardiac death. We hypothesized that specific residue modifications in the regulatory arm of troponin I (TnI) were critical in mediating the observed decrease in myofilament calcium sensitivity within the mammalian taxa. We performed large-scale phylogenetic analysis, atomic resolution molecular dynamics simulations and modeling, and computational alanine scanning. This study provides evidence that a His to Ala substitution within mammalian cardiac TnI (cTnI) reduced the thermodynamic potential at the interface between cTnI and cardiac TnC (cTnC) in the calcium-saturated state by disrupting a strong intermolecular electrostatic interaction. This key residue modification reduced myofilament calcium sensitivity by making cTnI molecularly untethered from cTnC. To meet the requirements for refined mammalian adult cardiac performance, we propose that compensatory evolutionary pressures favored mutations that enhanced the relaxation properties of cTn by decreasing its sensitivity to activating calcium.
Collapse
Affiliation(s)
- Nathan J Palpant
- Department of Integrative Biology and Physiology, University of Minnesota Academic Health Center, 321 Church Street SE, Minneapolis, MN 55455, USA
| | | | | | | | | | | | | |
Collapse
|