1
|
Todd BP, Downard KM. Structural Phylogenetics with Protein Mass Spectrometry: A Proof-of-Concept. Protein J 2024; 43:997-1008. [PMID: 39078529 DOI: 10.1007/s10930-024-10227-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2024] [Indexed: 07/31/2024]
Abstract
It is demonstrated, for the first time, that a mass spectrometry approach (known as phylonumerics) can be successfully implemented for structural phylogenetics investigations to chart the evolution of a protein's structure and function. Illustrated for the compact globular protein myoglobin, peptide masses produced from the proteolytic digestion of the protein across animal species generate trees congruent to the sequence tree counterparts. Single point mutations calculated during the same mass tree building step can be followed along interconnected branches of the tree and represent a viable structural metric. A mass tree built for 15 diverse animal species, easily resolve the birds from mammal species, and the ruminant mammals from the remainder of the animals. Mutations within helix-spanning peptide segments alter both the mass and structure of the protein in these segments. Greater evolution is found in the B-helix over the A, E, F, G and H helices. A further mass tree study, of six more closely related primate species, resolves gorilla from the other primates based on a P22S mutation within the B-helix. The remaining five primates are resolved into two groups based on whether they contain a glycine or serine at position 23 in the same helix. The orangutan is resolved from the gibbon and siamang by its G-helix C110S mutation, while homo sapiens are resolved from chimpanzee based on the Q116H mutation. All are associated with structural perturbations in such helices. These structure altering mutations can be tracked along interconnecting branches of a mass tree, to follow the protein's structure and evolution, and ultimately the evolution of the species in which the proteins are expressed. Those that have the greatest impact on a protein's structure, its function, and ultimately the evolution of the species, can be selectively tracked or monitored.
Collapse
Affiliation(s)
- Benjamin P Todd
- Infectious Disease Responses Laboratory, Prince of Wales Clinical Research Sciences, Sydney, NSW, Australia
| | - Kevin M Downard
- Infectious Disease Responses Laboratory, Prince of Wales Clinical Research Sciences, Sydney, NSW, Australia.
| |
Collapse
|
2
|
Li Z, Ióca LP, He R, Donia MS. Natural diversifying evolution of nonribosomal peptide synthetases in a defensive symbiont reveals nonmodular functional constraints. PNAS NEXUS 2024; 3:pgae384. [PMID: 39346623 PMCID: PMC11428043 DOI: 10.1093/pnasnexus/pgae384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 07/19/2024] [Indexed: 10/01/2024]
Abstract
The modular architecture of nonribosomal peptide synthetases (NRPSs) has inspired efforts to study their evolution and engineering. In this study, we analyze in detail a unique family of NRPSs from the defensive intracellular bacterial symbiont, Candidatus Endobryopsis kahalalidifaciens (Ca. E. kahalalidifaciens). We show that intensive and indiscriminate recombination events erase trivial sequence covariations induced by phylogenetic relatedness, revealing nonmodular functional constraints and clear recombination units. Moreover, we reveal unique substrate specificity determinants for multiple enzymatic domains, allowing us to accurately predict and experimentally discover the products of an orphan NRPS in Ca. E. kahalalidifaciens directly from environmental samples of its algal host. Finally, we expanded our analysis to 1,531 diverse NRPS pathways and revealed similar functional constraints to those observed in Ca. E. kahalalidifaciens' NRPSs. Our findings reveal the sequence bases of genetic exchange, functional constraints, and substrate specificity in Ca. E. kahalalidifaciens' NRPSs, and highlight them as a uniquely primed system for diversifying evolution.
Collapse
Affiliation(s)
- Zhiyuan Li
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking 8 University, Beijing 100871, China
- Center for the Physics of Biological Function, Princeton University, Princeton, NJ 08544, USA
| | - Laura P Ióca
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | - Ruolin He
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Mohamed S Donia
- Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
3
|
Dietler N, Abbara A, Choudhury S, Bitbol AF. Impact of phylogeny on the inference of functional sectors from protein sequence data. PLoS Comput Biol 2024; 20:e1012091. [PMID: 39312591 PMCID: PMC11449291 DOI: 10.1371/journal.pcbi.1012091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 10/03/2024] [Accepted: 09/10/2024] [Indexed: 09/25/2024] Open
Abstract
Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.
Collapse
Affiliation(s)
- Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alia Abbara
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Subham Choudhury
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
4
|
Erez K, Jangid A, Feldheim ON, Friedlander T. The role of promiscuous molecular recognition in the evolution of RNase-based self-incompatibility in plants. Nat Commun 2024; 15:4864. [PMID: 38849350 PMCID: PMC11161657 DOI: 10.1038/s41467-024-49163-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 05/22/2024] [Indexed: 06/09/2024] Open
Abstract
How do biological networks evolve and expand? We study these questions in the context of the plant collaborative-non-self recognition self-incompatibility system. Self-incompatibility evolved to avoid self-fertilization among hermaphroditic plants. It relies on specific molecular recognition between highly diverse proteins of two families: female and male determinants, such that the combination of genes an individual possesses determines its mating partners. Though highly polymorphic, previous models struggled to pinpoint the evolutionary trajectories by which new specificities evolved. Here, we construct a novel theoretical framework, that crucially affords interaction promiscuity and multiple distinct partners per protein, as is seen in empirical findings disregarded by previous models. We demonstrate spontaneous self-organization of the population into distinct "classes" with full between-class compatibility and a dynamic long-term balance between class emergence and decay. Our work highlights the importance of molecular recognition promiscuity to network evolvability. Promiscuity was found in additional systems suggesting that our framework could be more broadly applicable.
Collapse
Affiliation(s)
- Keren Erez
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, Faculty of Agriculture, The Hebrew University of Jerusalem, P.O. Box 12, Rehovot, 7610001, Israel
| | - Amit Jangid
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, Faculty of Agriculture, The Hebrew University of Jerusalem, P.O. Box 12, Rehovot, 7610001, Israel
| | - Ohad Noy Feldheim
- The Einstein Institute of Mathematics, Faculty of Natural Sciences, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel
| | - Tamar Friedlander
- The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, Faculty of Agriculture, The Hebrew University of Jerusalem, P.O. Box 12, Rehovot, 7610001, Israel.
| |
Collapse
|
5
|
Gerardos A, Dietler N, Bitbol AF. Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences. PLoS Comput Biol 2022; 18:e1010147. [PMID: 35576238 PMCID: PMC9135348 DOI: 10.1371/journal.pcbi.1010147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 05/26/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Inferring protein-protein interactions from sequences is an important task in computational biology. Recent methods based on Direct Coupling Analysis (DCA) or Mutual Information (MI) allow to find interaction partners among paralogs of two protein families. Does successful inference mainly rely on correlations from structural contacts or from phylogeny, or both? Do these two types of signal combine constructively or hinder each other? To address these questions, we generate and analyze synthetic data produced using a minimal model that allows us to control the amounts of structural constraints and phylogeny. We show that correlations from these two sources combine constructively to increase the performance of partner inference by DCA or MI. Furthermore, signal from phylogeny can rescue partner inference when signal from contacts becomes less informative, including in the realistic case where inter-protein contacts are restricted to a small subset of sites. We also demonstrate that DCA-inferred couplings between non-contact pairs of sites improve partner inference in the presence of strong phylogeny, while deteriorating it otherwise. Moreover, restricting to non-contact pairs of sites preserves inference performance in the presence of strong phylogeny. In a natural data set, as well as in realistic synthetic data based on it, we find that non-contact pairs of sites contribute positively to partner inference performance, and that restricting to them preserves performance, evidencing an important role of phylogeny.
Collapse
Affiliation(s)
- Andonis Gerardos
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
6
|
Extracting phylogenetic dimensions of coevolution reveals hidden functional signals. Sci Rep 2022; 12:820. [PMID: 35039514 PMCID: PMC8764114 DOI: 10.1038/s41598-021-04260-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 12/17/2021] [Indexed: 11/08/2022] Open
Abstract
Despite the structural and functional information contained in the statistical coupling between pairs of residues in a protein, coevolution associated with function is often obscured by artifactual signals such as genetic drift, which shapes a protein's phylogenetic history and gives rise to concurrent variation between protein sequences that is not driven by selection for function. Here, we introduce a background model for phylogenetic contributions of statistical coupling that separates the coevolution signal due to inter-clade and intra-clade sequence comparisons and demonstrate that coevolution can be measured on multiple phylogenetic timescales within a single protein. Our method, nested coevolution (NC), can be applied as an extension to any coevolution metric. We use NC to demonstrate that poorly conserved residues can nonetheless have important roles in protein function. Moreover, NC improved the structural-contact predictions of several coevolution-based methods, particularly in subsampled alignments with fewer sequences. NC also lowered the noise in detecting functional sectors of collectively coevolving residues. Sectors of coevolving residues identified after application of NC were more spatially compact and phylogenetically distinct from the rest of the protein, and strongly enriched for mutations that disrupt protein activity. Thus, our conceptualization of the phylogenetic separation of coevolution provides the potential to further elucidate relationships among protein evolution, function, and genetic diseases.
Collapse
|
7
|
Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick CL, Ma J, Fergus R. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci U S A 2021. [PMID: 33876751 DOI: 10.1101/622803] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2023] Open
Abstract
In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In the life sciences, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology. To this end, we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity. The resulting model contains information about biological properties in its representations. The representations are learned from sequence data alone. The learned representation space has a multiscale organization reflecting structure from the level of biochemical properties of amino acids to remote homology of proteins. Information about secondary and tertiary structure is encoded in the representations and can be identified by linear projections. Representation learning produces features that generalize across a range of applications, enabling state-of-the-art supervised prediction of mutational effect and secondary structure and improving state-of-the-art features for long-range contact prediction.
Collapse
Affiliation(s)
- Alexander Rives
- Facebook AI Research, New York, NY 10003;
- Department of Computer Science, New York University, New York, NY 10012
| | | | - Tom Sercu
- Facebook AI Research, New York, NY 10003
| | | | - Zeming Lin
- Department of Computer Science, New York University, New York, NY 10012
| | - Jason Liu
- Facebook AI Research, New York, NY 10003
| | - Demi Guo
- Harvard University, Cambridge, MA 02138
| | - Myle Ott
- Facebook AI Research, New York, NY 10003
| | | | - Jerry Ma
- Booth School of Business, University of Chicago, Chicago, IL 60637
- Yale Law School, New Haven, CT 06511
| | - Rob Fergus
- Department of Computer Science, New York University, New York, NY 10012
| |
Collapse
|
8
|
Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick CL, Ma J, Fergus R. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci U S A 2021; 118:e2016239118. [PMID: 33876751 PMCID: PMC8053943 DOI: 10.1073/pnas.2016239118] [Citation(s) in RCA: 899] [Impact Index Per Article: 224.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In the life sciences, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology. To this end, we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity. The resulting model contains information about biological properties in its representations. The representations are learned from sequence data alone. The learned representation space has a multiscale organization reflecting structure from the level of biochemical properties of amino acids to remote homology of proteins. Information about secondary and tertiary structure is encoded in the representations and can be identified by linear projections. Representation learning produces features that generalize across a range of applications, enabling state-of-the-art supervised prediction of mutational effect and secondary structure and improving state-of-the-art features for long-range contact prediction.
Collapse
Affiliation(s)
- Alexander Rives
- Facebook AI Research, New York, NY 10003;
- Department of Computer Science, New York University, New York, NY 10012
| | | | - Tom Sercu
- Facebook AI Research, New York, NY 10003
| | | | - Zeming Lin
- Department of Computer Science, New York University, New York, NY 10012
| | - Jason Liu
- Facebook AI Research, New York, NY 10003
| | - Demi Guo
- Harvard University, Cambridge, MA 02138
| | - Myle Ott
- Facebook AI Research, New York, NY 10003
| | | | - Jerry Ma
- Booth School of Business, University of Chicago, Chicago, IL 60637
- Yale Law School, New Haven, CT 06511
| | - Rob Fergus
- Department of Computer Science, New York University, New York, NY 10012
| |
Collapse
|
9
|
Neverov AD, Popova AV, Fedonin GG, Cheremukhin EA, Klink GV, Bazykin GA. Episodic evolution of coadapted sets of amino acid sites in mitochondrial proteins. PLoS Genet 2021; 17:e1008711. [PMID: 33493156 PMCID: PMC7861529 DOI: 10.1371/journal.pgen.1008711] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 02/04/2021] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
The rate of evolution differs between protein sites and changes with time. However, the link between these two phenomena remains poorly understood. Here, we design a phylogenetic approach for distinguishing pairs of amino acid sites that evolve concordantly, i.e., such that substitutions at one site trigger subsequent substitutions at the other; and also pairs of sites that evolve discordantly, so that substitutions at one site impede subsequent substitutions at the other. We distinguish groups of amino acid sites that undergo coordinated evolution and evolve discordantly from other such groups. In mitochondrion-encoded proteins of metazoans and fungi, we show that concordantly evolving sites are clustered in protein structures. By analysing the phylogenetic patterns of substitutions at concordantly and discordantly evolving site pairs, we find that concordant evolution has two distinct causes: epistatic interactions between amino acid substitutions and episodes of selection independently affecting substitutions at different sites. The rate of substitutions at concordantly evolving groups of protein sites changes in the course of evolution, indicating episodes of selection limited to some of the lineages. The phylogenetic positions of these changes are consistent between proteins, suggesting common selective forces underlying them. The mode and rate of evolution of a protein site depends on the effect of its mutations on protein fitness. The fitness effect of a mutation itself can change in the course of evolution for at least two reasons. First, it can be modulated by substitutions occurring at other sites, a phenomenon called epistasis. Second, changes in selection can be non-epistatic, affecting sites independently of one another. Here, we analyse substitutions accumulated by the evolving lineages of the five proteins encoded by the mitochondrial genomes of thousands of species of metazoans and fungi. We show that substitutions at different amino acid sites occur in a coordinated fashion, and this coordination is caused both by epistasis and by episodes of selection affecting groups of sites. We partition each protein into several groups of concordantly evolving sites such that evolution of sites from different groups is discordant, and show that the proteins encoded by the mitochondrial genome consist of coevolving structural blocks. Some of these blocks have a clear functional specialization, e.g. are associated with interfaces between proteins composing respiratory complexes. Together, our results reveal a previously unrecognized complexity in the causes of variation in evolutionary rates between protein sites.
Collapse
Affiliation(s)
- Alexey D. Neverov
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
- * E-mail:
| | - Anfisa V. Popova
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
| | - Gennady G. Fedonin
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow region, Russia
| | | | - Galya V. Klink
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
| | - Georgii A. Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
| |
Collapse
|
10
|
Abstract
Living systems evolve one mutation at a time, but a single mutation can alter the effect of subsequent mutations. The underlying mechanistic determinants of such epistasis are unclear. Here, we demonstrate that the physical dynamics of a biological system can generically constrain epistasis. We analyze models and experimental data on proteins and regulatory networks. In each, we find that if the long-time physical dynamics is dominated by a slow, collective mode, then the dimensionality of mutational effects is reduced. Consequently, epistatic coefficients for different combinations of mutations are no longer independent, even if individually strong. Such epistasis can be summarized as resulting from a global nonlinearity applied to an underlying linear trait, that is, as global epistasis. This constraint, in turn, reduces the ruggedness of the sequence-to-function map. By providing a generic mechanistic origin for experimentally observed global epistasis, our work suggests that slow collective physical modes can make biological systems evolvable.
Collapse
Affiliation(s)
- Kabir Husain
- Department of Physics, University of Chicago, Chicago, IL
| | - Arvind Murugan
- Department of Physics, University of Chicago, Chicago, IL
| |
Collapse
|
11
|
Engineering allosteric communication. Curr Opin Struct Biol 2020; 63:115-122. [DOI: 10.1016/j.sbi.2020.05.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 05/06/2020] [Accepted: 05/08/2020] [Indexed: 11/18/2022]
|
12
|
Bravi B, Ravasio R, Brito C, Wyart M. Direct coupling analysis of epistasis in allosteric materials. PLoS Comput Biol 2020; 16:e1007630. [PMID: 32119660 PMCID: PMC7067494 DOI: 10.1371/journal.pcbi.1007630] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 03/12/2020] [Accepted: 01/03/2020] [Indexed: 11/22/2022] Open
Abstract
In allosteric proteins, the binding of a ligand modifies function at a distant active site. Such allosteric pathways can be used as target for drug design, generating considerable interest in inferring them from sequence alignment data. Currently, different methods lead to conflicting results, in particular on the existence of long-range evolutionary couplings between distant amino-acids mediating allostery. Here we propose a resolution of this conundrum, by studying epistasis and its inference in models where an allosteric material is evolved in silico to perform a mechanical task. We find in our model the four types of epistasis (Synergistic, Sign, Antagonistic, Saturation), which can be both short or long-range and have a simple mechanical interpretation. We perform a Direct Coupling Analysis (DCA) and find that DCA predicts well the cost of point mutations but is a rather poor generative model. Strikingly, it can predict short-range epistasis but fails to capture long-range epistasis, in consistence with empirical findings. We propose that such failure is generic when function requires subparts to work in concert. We illustrate this idea with a simple model, which suggests that other methods may be better suited to capture long-range effects. Allostery in proteins is the property of highly specific responses to ligand binding at a distant site. To inform protocols of de novo drug design, it is fundamental to understand the impact of mutations on allosteric regulation and whether it can be predicted from evolutionary correlations. In this work we consider allosteric architectures artificially evolved to optimize the cooperativity of binding at allosteric and active site. We first characterize the emergent pattern of epistasis as well as the underlying mechanical phenomena, finding the four types of epistasis (Synergistic, Sign, Antagonistic, Saturation), which can be both short or long-range. The numerical evolution of these allosteric architectures allows us to benchmark Direct Coupling Analysis, a method which relies on co-evolution in sequence data to infer direct evolutionary couplings, in connection to allostery. We show that Direct Coupling Analysis predicts quantitatively point mutation costs but underestimates strong long-range epistasis. We provide an argument, based on a simplified model, illustrating the reasons for this discrepancy. Our analysis suggests neural networks as more promising tool to measure epistasis.
Collapse
Affiliation(s)
- Barbara Bravi
- Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- * E-mail: (BB); (MW)
| | - Riccardo Ravasio
- Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Carolina Brito
- Instituto de Fìsica, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Matthieu Wyart
- Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- * E-mail: (BB); (MW)
| |
Collapse
|
13
|
Campitelli P, Modi T, Kumar S, Ozkan SB. The Role of Conformational Dynamics and Allostery in Modulating Protein Evolution. Annu Rev Biophys 2020; 49:267-288. [PMID: 32075411 DOI: 10.1146/annurev-biophys-052118-115517] [Citation(s) in RCA: 97] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Advances in sequencing techniques and statistical methods have made it possible not only to predict sequences of ancestral proteins but also to identify thousands of mutations in the human exome, some of which are disease associated. These developments have motivated numerous theories and raised many questions regarding the fundamental principles behind protein evolution, which have been traditionally investigated horizontally using the tip of the phylogenetic tree through comparative studies of extant proteins within a family. In this article, we review a vertical comparison of the modern and resurrected ancestral proteins. We focus mainly on the dynamical properties responsible for a protein's ability to adapt new functions in response to environmental changes. Using the Dynamic Flexibility Index and the Dynamic Coupling Index to quantify the relative flexibility and dynamic coupling at a site-specific, single-amino-acid level, we provide evidence that the migration of hinges, which are often functionally critical rigid sites, is a mechanism through which proteins can rapidly evolve. Additionally, we show that disease-associated mutations in proteins often result in flexibility changes even at positions distal from mutational sites, particularly in the modulation of active site dynamics.
Collapse
Affiliation(s)
- Paul Campitelli
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona 85281, USA; , ,
| | - Tushar Modi
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona 85281, USA; , ,
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania 19122, USA; .,Department of Biology, Temple University, Philadelphia, Pennsylvania 19122, USA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - S Banu Ozkan
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona 85281, USA; , ,
| |
Collapse
|
14
|
Deep Analysis of Residue Constraints (DARC): identifying determinants of protein functional specificity. Sci Rep 2020; 10:1691. [PMID: 32015389 PMCID: PMC6997377 DOI: 10.1038/s41598-019-55118-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 11/23/2019] [Indexed: 01/03/2023] Open
Abstract
Protein functional constraints are manifest as superfamily and functional-subgroup conserved residues, and as pairwise correlations. Deep Analysis of Residue Constraints (DARC) aids the visualization of these constraints, characterizes how they correlate with each other and with structure, and estimates statistical significance. This can identify determinants of protein functional specificity, as we illustrate for bacterial DNA clamp loader ATPases. These load ring-shaped sliding clamps onto DNA to keep polymerase attached during replication and contain one δ, three γ, and one δ’ AAA+ subunits semi-circularly arranged in the order δ-γ1-γ2-γ3-δ’. Only γ is active, though both γ and δ’ functionally influence an adjacent γ subunit. DARC identifies, as functionally-congruent features linking allosterically the ATP, DNA, and clamp binding sites: residues distinctive of γ and of γ/δ’ that mutually interact in trans, centered on the catalytic base; several γ/δ’-residues and six γ/δ’-covariant residue pairs within the DNA binding N-termini of helices α2 and α3; and γ/δ’-residues associated with the α2 C-terminus and the clamp-binding loop. Most notable is a trans-acting γ/δ’ hydroxyl group that 99% of other AAA+ proteins lack. Mutation of this hydroxyl to a methyl group impedes clamp binding and opening, DNA binding, and ATP hydrolysis—implying a remarkably clamp-loader-specific function.
Collapse
|
15
|
Ravasio R, Flatt SM, Yan L, Zamuner S, Brito C, Wyart M. Mechanics of Allostery: Contrasting the Induced Fit and Population Shift Scenarios. Biophys J 2019; 117:1954-1962. [PMID: 31653447 PMCID: PMC7031744 DOI: 10.1016/j.bpj.2019.10.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 09/27/2019] [Accepted: 10/03/2019] [Indexed: 12/11/2022] Open
Abstract
In allosteric proteins, binding a ligand can affect function at a distant location, for example, by changing the binding affinity of a substrate at the active site. The induced fit and population shift models, which differ by the assumed number of stable configurations, explain such cooperative binding from a thermodynamic viewpoint. Yet, understanding what mechanical principles constrain these models remains a challenge. Here, we provide an empirical study on 34 proteins supporting the idea that allosteric conformational change generally occurs along a soft elastic mode presenting extended regions of high shear. We argue, based on a detailed analysis of how the energy profile along such a mode depends on binding, that in the induced fit scenario, there is an optimal stiffness ka∗ ∼ 1/N for cooperative binding, where N is the number of residues. We find that the population shift scenario is more robust to mutations affecting stiffness because binding becomes more and more cooperative with stiffness up to the same characteristic value ka∗, beyond which cooperativity saturates instead of decaying. We numerically confirm these findings in a nonlinear mechanical model. Dynamical considerations suggest that a stiffness of order ka∗ is favorable in that scenario as well, supporting that for proper function, proteins must evolve a functional elastic mode that is softer as their size increases. In consistency with this view, we find a fair anticorrelation between the stiffness of the allosteric response and protein size in our data set.
Collapse
Affiliation(s)
- Riccardo Ravasio
- Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| | - Solange Marie Flatt
- Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Le Yan
- Kavli Institute for Theoretical Physics, University of California, Santa Barbara, California
| | - Stefano Zamuner
- Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Carolina Brito
- Instituto de Física, Universidade Federal do Rio Grande do Sul, Porto Alegre, State of Rio Grande do Sul, Brazil
| | - Matthieu Wyart
- Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| |
Collapse
|