1
|
Schmelkin L, Carnevale V, Haldane A, Townsend JP, Chung S, Levy RM, Kumar S. Entrenchment and contingency in neutral protein evolution with epistasis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.09.632266. [PMID: 39868204 PMCID: PMC11761135 DOI: 10.1101/2025.01.09.632266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Protein sequence evolution in the presence of epistasis makes many previously acceptable amino acid residues at a site unfavorable over time. This phenomenon of entrenchment has also been observed with neutral substitutions using Potts Hamiltonian models. Here, we show that simulations using these models often evolve non-neutral proteins. We introduce a Neutral-with-Epistasis (N×E) model that incorporates purifying selection to conserve fitness, a requirement of neutral evolution. N×E protein evolution revealed a surprising lack of entrenchment, with site-specific amino-acid preferences remaining remarkably conserved, in biologically realistic time frames despite extensive residue coupling. Moreover, we found that the overdispersion of the molecular clock is caused by rate variation across sites introduced by epistasis in individual lineages, rather than by historical contingency. Therefore, substitutional entrenchment and rate contingency may indicate that adaptive and other non-neutral evolutionary processes were at play during protein evolution.
Collapse
Affiliation(s)
- Lisa Schmelkin
- Institute for Genomics and Evolutionary Medicine, Temple University; Philadelphia, PA 19122, USA
- Department of Biology, Temple University; Philadelphia, PA 19122, USA
| | - Vincenzo Carnevale
- Institute for Genomics and Evolutionary Medicine, Temple University; Philadelphia, PA 19122, USA
- Department of Biology, Temple University; Philadelphia, PA 19122, USA
- Institute of Computational Molecular Science, Temple University; Philadelphia, PA 19122, USA
| | - Allan Haldane
- Institute of Computational Molecular Science, Temple University; Philadelphia, PA 19122, USA
- Department of Chemistry, Temple University; Philadelphia, Pennsylvania 19122, USA
- Center for Biophysics and Computational Biology, Temple University; Philadelphia, Pennsylvania 19122, USA
| | | | - Sarah Chung
- Institute for Genomics and Evolutionary Medicine, Temple University; Philadelphia, PA 19122, USA
- Department of Biology, Temple University; Philadelphia, PA 19122, USA
| | - Ronald M. Levy
- Institute of Computational Molecular Science, Temple University; Philadelphia, PA 19122, USA
- Department of Chemistry, Temple University; Philadelphia, Pennsylvania 19122, USA
- Center for Biophysics and Computational Biology, Temple University; Philadelphia, Pennsylvania 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University; Philadelphia, PA 19122, USA
- Department of Biology, Temple University; Philadelphia, PA 19122, USA
| |
Collapse
|
2
|
Di Bari L, Bisardi M, Cotogno S, Weigt M, Zamponi F. Emergent time scales of epistasis in protein evolution. Proc Natl Acad Sci U S A 2024; 121:e2406807121. [PMID: 39325427 PMCID: PMC11459137 DOI: 10.1073/pnas.2406807121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 08/17/2024] [Indexed: 09/27/2024] Open
Abstract
We introduce a data-driven epistatic model of protein evolution, capable of generating evolutionary trajectories spanning very different time scales reaching from individual mutations to diverged homologs. Our in silico evolution encompasses random nucleotide mutations, insertions and deletions, and models selection using a fitness landscape, which is inferred via a generative probabilistic model for protein families. We show that the proposed framework accurately reproduces the sequence statistics of both short-time (experimental) and long-time (natural) protein evolution, suggesting applicability also to relatively data-poor intermediate evolutionary time scales, which are currently inaccessible to evolution experiments. Our model uncovers a highly collective nature of epistasis, gradually changing the fitness effect of mutations in a diverging sequence context, rather than acting via strong interactions between individual mutations. This collective nature triggers the emergence of a long evolutionary time scale, separating fast mutational processes inside a given sequence context, from the slow evolution of the context itself. The model quantitatively reproduces epistatic phenomena such as contingency and entrenchment, as well as the loss of predictability in protein evolution observed in deep mutational scanning experiments of distant homologs. It thereby deepens our understanding of the interplay between mutation and selection in shaping protein diversity and functions, allows one to statistically forecast evolution, and challenges the prevailing independent-site models of protein evolution, which are unable to capture the fundamental importance of epistasis.
Collapse
Affiliation(s)
- Leonardo Di Bari
- Dipartimento Scienza Applicata e Tecnologia, Politecnico di Torino, I-10129Torino, Italy
| | - Matteo Bisardi
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative, ParisF-75005, France
| | - Sabrina Cotogno
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative, ParisF-75005, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative, ParisF-75005, France
| | - Francesco Zamponi
- Dipartimento di Fisica, Sapienza Università di Roma, 00185Rome, Italy
| |
Collapse
|
3
|
Dietler N, Abbara A, Choudhury S, Bitbol AF. Impact of phylogeny on the inference of functional sectors from protein sequence data. PLoS Comput Biol 2024; 20:e1012091. [PMID: 39312591 PMCID: PMC11449291 DOI: 10.1371/journal.pcbi.1012091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 10/03/2024] [Accepted: 09/10/2024] [Indexed: 09/25/2024] Open
Abstract
Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.
Collapse
Affiliation(s)
- Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alia Abbara
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Subham Choudhury
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
4
|
Martin J, Lequerica Mateos M, Onuchic JN, Coluzza I, Morcos F. Machine learning in biological physics: From biomolecular prediction to design. Proc Natl Acad Sci U S A 2024; 121:e2311807121. [PMID: 38913893 PMCID: PMC11228481 DOI: 10.1073/pnas.2311807121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024] Open
Abstract
Machine learning has been proposed as an alternative to theoretical modeling when dealing with complex problems in biological physics. However, in this perspective, we argue that a more successful approach is a proper combination of these two methodologies. We discuss how ideas coming from physical modeling neuronal processing led to early formulations of computational neural networks, e.g., Hopfield networks. We then show how modern learning approaches like Potts models, Boltzmann machines, and the transformer architecture are related to each other, specifically, through a shared energy representation. We summarize recent efforts to establish these connections and provide examples on how each of these formulations integrating physical modeling and machine learning have been successful in tackling recent problems in biomolecular structure, dynamics, function, evolution, and design. Instances include protein structure prediction; improvement in computational complexity and accuracy of molecular dynamics simulations; better inference of the effects of mutations in proteins leading to improved evolutionary modeling and finally how machine learning is revolutionizing protein engineering and design. Going beyond naturally existing protein sequences, a connection to protein design is discussed where synthetic sequences are able to fold to naturally occurring motifs driven by a model rooted in physical principles. We show that this model is "learnable" and propose its future use in the generation of unique sequences that can fold into a target structure.
Collapse
Affiliation(s)
- Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Marcos Lequerica Mateos
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Universidad del País Vasco/Euskal Herriko Unibertsitatea Science Park, Leioa48940, Spain
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
- Department of Physics and Astronomy, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
- Department of BioSciences, Rice University, Houston, TX77005
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Universidad del País Vasco/Euskal Herriko Unibertsitatea Science Park, Leioa48940, Spain
- Basque Foundation for Science, Ikerbasque, Bilbao48940, Spain
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| |
Collapse
|
5
|
Norn C, Oliveira F, André I. Improved prediction of site-rates from structure with averaging across homologs. Protein Sci 2024; 33:e5086. [PMID: 38923241 PMCID: PMC11196898 DOI: 10.1002/pro.5086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 05/12/2024] [Accepted: 06/04/2024] [Indexed: 06/28/2024]
Abstract
Variation in mutation rates at sites in proteins can largely be understood by the constraint that proteins must fold into stable structures. Models that calculate site-specific rates based on protein structure and a thermodynamic stability model have shown a significant but modest ability to predict empirical site-specific rates calculated from sequence. Models that use detailed atomistic models of protein energetics do not outperform simpler approaches using packing density. We demonstrate that a fundamental reason for this is that empirical site-specific rates are the result of the average effect of many different microenvironments in a phylogeny. By analyzing the results of evolutionary dynamics simulations, we show how averaging site-specific rates across many extant protein structures can lead to correct recovery of site-rate prediction. This result is also demonstrated in natural protein sequences and experimental structures. Using predicted structures, we demonstrate that atomistic models can improve upon contact density metrics in predicting site-specific rates from a structure. The results give fundamental insights into the factors governing the distribution of site-specific rates in protein families.
Collapse
Affiliation(s)
- Christoffer Norn
- Department of Biochemistry and Structural BiologyLund UniversityLundSweden
- Bioinnovation Institute FoundationKøbenhavnDenmark
| | - Fábio Oliveira
- Department of Biochemistry and Structural BiologyLund UniversityLundSweden
| | - Ingemar André
- Department of Biochemistry and Structural BiologyLund UniversityLundSweden
| |
Collapse
|
6
|
Calvanese F, Lambert CN, Nghe P, Zamponi F, Weigt M. Towards parsimonious generative modeling of RNA families. Nucleic Acids Res 2024; 52:5465-5477. [PMID: 38661206 PMCID: PMC11162787 DOI: 10.1093/nar/gkae289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 03/05/2024] [Accepted: 04/05/2024] [Indexed: 04/26/2024] Open
Abstract
Generative probabilistic models emerge as a new paradigm in data-driven, evolution-informed design of biomolecular sequences. This paper introduces a novel approach, called Edge Activation Direct Coupling Analysis (eaDCA), tailored to the characteristics of RNA sequences, with a strong emphasis on simplicity, efficiency, and interpretability. eaDCA explicitly constructs sparse coevolutionary models for RNA families, achieving performance levels comparable to more complex methods while utilizing a significantly lower number of parameters. Our approach demonstrates efficiency in generating artificial RNA sequences that closely resemble their natural counterparts in both statistical analyses and SHAPE-MaP experiments, and in predicting the effect of mutations. Notably, eaDCA provides a unique feature: estimating the number of potential functional sequences within a given RNA family. For example, in the case of cyclic di-AMP riboswitches (RF00379), our analysis suggests the existence of approximately 1039 functional nucleotide sequences. While huge compared to the known <4000 natural sequences, this number represents only a tiny fraction of the vast pool of nearly 1082 possible nucleotide sequences of the same length (136 nucleotides). These results underscore the promise of sparse and interpretable generative models, such as eaDCA, in enhancing our understanding of the expansive RNA sequence space.
Collapse
Affiliation(s)
- Francesco Calvanese
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative – LCQB, Paris, France
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Camille N Lambert
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Philippe Nghe
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Francesco Zamponi
- Dipartimento di Fisica, Sapienza Università di Roma, Rome, Italy
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative – LCQB, Paris, France
| |
Collapse
|
7
|
Jaafari H, Bueno C, Schafer NP, Martin J, Morcos F, Wolynes PG. The physical and evolutionary energy landscapes of devolved protein sequences corresponding to pseudogenes. Proc Natl Acad Sci U S A 2024; 121:e2322428121. [PMID: 38739795 PMCID: PMC11127006 DOI: 10.1073/pnas.2322428121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/26/2024] [Indexed: 05/16/2024] Open
Abstract
Protein evolution is guided by structural, functional, and dynamical constraints ensuring organismal viability. Pseudogenes are genomic sequences identified in many eukaryotes that lack translational activity due to sequence degradation and thus over time have undergone "devolution." Previously pseudogenized genes sometimes regain their protein-coding function, suggesting they may still encode robust folding energy landscapes despite multiple mutations. We study both the physical folding landscapes of protein sequences corresponding to human pseudogenes using the Associative Memory, Water Mediated, Structure and Energy Model, and the evolutionary energy landscapes obtained using direct coupling analysis (DCA) on their parent protein families. We found that generally mutations that have occurred in pseudogene sequences have disrupted their native global network of stabilizing residue interactions, making it harder for them to fold if they were translated. In some cases, however, energetic frustration has apparently decreased when the functional constraints were removed. We analyzed this unexpected situation for Cyclophilin A, Profilin-1, and Small Ubiquitin-like Modifier 2 Protein. Our analysis reveals that when such mutations in the pseudogene ultimately stabilize folding, at the same time, they likely alter the pseudogenes' former biological activity, as estimated by DCA. We localize most of these stabilizing mutations generally to normally frustrated regions required for binding to other partners.
Collapse
Affiliation(s)
- Hana Jaafari
- Center for Theoretical Biophysics, Rice University, Houston, TX77005
- Applied Physics Graduate Program, Smalley-Curl Institute, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
| | - Carlos Bueno
- Center for Theoretical Biophysics, Rice University, Houston, TX77005
| | | | - Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX75080
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| | - Peter G. Wolynes
- Center for Theoretical Biophysics, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
- Department of Physics and Astronomy, Rice University, Houston, TX77005
- Department of Biochemistry and Cell Biology, Rice University, Houston, TX77005
| |
Collapse
|
8
|
Ose NJ, Campitelli P, Modi T, Kazan IC, Kumar S, Ozkan SB. Some mechanistic underpinnings of molecular adaptations of SARS-COV-2 spike protein by integrating candidate adaptive polymorphisms with protein dynamics. eLife 2024; 12:RP92063. [PMID: 38713502 PMCID: PMC11076047 DOI: 10.7554/elife.92063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024] Open
Abstract
We integrate evolutionary predictions based on the neutral theory of molecular evolution with protein dynamics to generate mechanistic insight into the molecular adaptations of the SARS-COV-2 spike (S) protein. With this approach, we first identified candidate adaptive polymorphisms (CAPs) of the SARS-CoV-2 S protein and assessed the impact of these CAPs through dynamics analysis. Not only have we found that CAPs frequently overlap with well-known functional sites, but also, using several different dynamics-based metrics, we reveal the critical allosteric interplay between SARS-CoV-2 CAPs and the S protein binding sites with the human ACE2 (hACE2) protein. CAPs interact far differently with the hACE2 binding site residues in the open conformation of the S protein compared to the closed form. In particular, the CAP sites control the dynamics of binding residues in the open state, suggesting an allosteric control of hACE2 binding. We also explored the characteristic mutations of different SARS-CoV-2 strains to find dynamic hallmarks and potential effects of future mutations. Our analyses reveal that Delta strain-specific variants have non-additive (i.e., epistatic) interactions with CAP sites, whereas the less pathogenic Omicron strains have mostly additive mutations. Finally, our dynamics-based analysis suggests that the novel mutations observed in the Omicron strain epistatically interact with the CAP sites to help escape antibody binding.
Collapse
Affiliation(s)
- Nicholas James Ose
- Department of Physics and Center for Biological Physics, Arizona State UniversityTempeUnited States
| | - Paul Campitelli
- Department of Physics and Center for Biological Physics, Arizona State UniversityTempeUnited States
| | - Tushar Modi
- Department of Physics and Center for Biological Physics, Arizona State UniversityTempeUnited States
| | - I Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State UniversityTempeUnited States
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple UniversityPhiladelphiaUnited States
- Department of Biology, Temple UniversityPhiladelphiaUnited States
- Center for Genomic Medicine Research, King Abdulaziz UniversityJeddahSaudi Arabia
| | - Sefika Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State UniversityTempeUnited States
| |
Collapse
|
9
|
Biswas A, Choudhuri I, Arnold E, Lyumkis D, Haldane A, Levy RM. Kinetic coevolutionary models predict the temporal emergence of HIV-1 resistance mutations under drug selection pressure. Proc Natl Acad Sci U S A 2024; 121:e2316662121. [PMID: 38557187 PMCID: PMC11009627 DOI: 10.1073/pnas.2316662121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 02/23/2024] [Indexed: 04/04/2024] Open
Abstract
Drug resistance in HIV type 1 (HIV-1) is a pervasive problem that affects the lives of millions of people worldwide. Although records of drug-resistant mutations (DRMs) have been extensively tabulated within public repositories, our understanding of the evolutionary kinetics of DRMs and how they evolve together remains limited. Epistasis, the interaction between a DRM and other residues in HIV-1 protein sequences, is key to the temporal evolution of drug resistance. We use a Potts sequence-covariation statistical-energy model of HIV-1 protein fitness under drug selection pressure, which captures epistatic interactions between all positions, combined with kinetic Monte-Carlo simulations of sequence evolutionary trajectories, to explore the acquisition of DRMs as they arise in an ensemble of drug-naive patient protein sequences. We follow the time course of 52 DRMs in the enzymes protease, RT, and integrase, the primary targets of antiretroviral therapy. The rates at which DRMs emerge are highly correlated with their observed acquisition rates reported in the literature when drug pressure is applied. This result highlights the central role of epistasis in determining the kinetics governing DRM emergence. Whereas rapidly acquired DRMs begin to accumulate as soon as drug pressure is applied, slowly acquired DRMs are contingent on accessory mutations that appear only after prolonged drug pressure. We provide a foundation for using computational methods to determine the temporal evolution of drug resistance using Potts statistical potentials, which can be used to gain mechanistic insights into drug resistance pathways in HIV-1 and other infectious agents.
Collapse
Affiliation(s)
- Avik Biswas
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Laboratory of Genetics, The Salk Institute for Biological Studies, La Jolla, CA92037
- Department of Physics, University of California San Diego, La Jolla, CA92093
| | - Indrani Choudhuri
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Chemistry, Temple University, Philadelphia, PA19122
| | - Eddy Arnold
- Department of Chemistry and Chemical Biology, Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, NJ08854
| | - Dmitry Lyumkis
- Laboratory of Genetics, The Salk Institute for Biological Studies, La Jolla, CA92037
- Graduate School of Biological Sciences, Department of Molecular Biology, University of California San Diego, La Jolla, CA92093
| | - Allan Haldane
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Physics, Temple University, Philadelphia, PA19122
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Chemistry, Temple University, Philadelphia, PA19122
| |
Collapse
|
10
|
Nartey C, Koo HJ, Laurendon C, Shaik HZ, O’maille P, Noel JP, Morcos F. Coevolutionary Information Captures Catalytic Functions and Reveals Divergent Roles of Terpene Synthase Interdomain Connections. Biochemistry 2024; 63:355-366. [PMID: 38206111 PMCID: PMC10851433 DOI: 10.1021/acs.biochem.3c00578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/22/2023] [Accepted: 12/27/2023] [Indexed: 01/12/2024]
Abstract
Inferring the historical and biophysical causes of diversity within protein families is a complex puzzle. A key to unraveling this problem is characterizing the rugged topography of sequence-function adaptive landscapes. Using biochemical data from a 29 = 512 combinatorial library of tobacco 5-epi-aristolochene synthase (TEAS) mutants engineered to make the native major product of Egyptian henbane premnaspirodiene synthase (HPS) and a complementary 512 mutant HPS library, we address the question of how product specificity is controlled. These data sets reveal that HPS is far more robust and resistant to mutations than TEAS, where most mutants are promiscuous. We also combine experimental data with a sequence Potts Hamiltonian model and direct coupling analysis to quantify mutant fitness. Our results demonstrate that the Hamiltonian captures variation in product outputs across both libraries, clusters native family members based on their substrate specificities, and exposes the divergent catalytic roles of couplings between the catalytic and noncatalytic domains of TEAS versus HPS. Specifically, we found that the role of the interdomain connectivities in specifying product output is more important in TEAS than connectivities within the catalytic domain. Despite being 75% identical, this property is not shared by HPS, where connectivities within the catalytic domain are more important for specificity. By solving the X-ray crystal structure of HPS, we assessed structural bases for their interdomain network differences. Last, we calculate the product profile Shannon entropies of the two libraries, which showcases that site-site connectivities also play divergent roles in catalytic accuracy.
Collapse
Affiliation(s)
- Charisse
M. Nartey
- Department
of Biological Sciences, The University of
Texas at Dallas, Richardson, Texas 75080, United States
| | - Hyun Jo Koo
- Howard
Hughes Medical Institute, The Salk Institute for Biological Studies, Jack H. Skirball Center for Chemical Biology and Proteomics, 10010 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Caroline Laurendon
- John
Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich NR4 7UH, U.K.
| | - Hana Z. Shaik
- Department
of Bioengineering, The University of Texas
at Dallas, Richardson, Texas 75080, United States
| | - Paul O’maille
- John
Innes Centre, Institute of Food Research, Food & Health Programme, Norwich Research Park, Norwich NR4 7UA, U.K.
| | - Joseph P. Noel
- Howard
Hughes Medical Institute, The Salk Institute for Biological Studies, Jack H. Skirball Center for Chemical Biology and Proteomics, 10010 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Faruck Morcos
- Department
of Biological Sciences, The University of
Texas at Dallas, Richardson, Texas 75080, United States
- Department
of Bioengineering, The University of Texas
at Dallas, Richardson, Texas 75080, United States
- Center for
Systems Biology, The University of Texas
at Dallas, Richardson, Texas 75080, United States
| |
Collapse
|
11
|
Alvarez S, Nartey CM, Mercado N, de la Paz JA, Huseinbegovic T, Morcos F. In vivo functional phenotypes from a computational epistatic model of evolution. Proc Natl Acad Sci U S A 2024; 121:e2308895121. [PMID: 38285950 PMCID: PMC10861889 DOI: 10.1073/pnas.2308895121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 12/19/2023] [Indexed: 01/31/2024] Open
Abstract
Computational models of evolution are valuable for understanding the dynamics of sequence variation, to infer phylogenetic relationships or potential evolutionary pathways and for biomedical and industrial applications. Despite these benefits, few have validated their propensities to generate outputs with in vivo functionality, which would enhance their value as accurate and interpretable evolutionary algorithms. We demonstrate the power of epistasis inferred from natural protein families to evolve sequence variants in an algorithm we developed called sequence evolution with epistatic contributions (SEEC). Utilizing the Hamiltonian of the joint probability of sequences in the family as fitness metric, we sampled and experimentally tested for in vivo [Formula: see text]-lactamase activity in Escherichia coli TEM-1 variants. These evolved proteins can have dozens of mutations dispersed across the structure while preserving sites essential for both catalysis and interactions. Remarkably, these variants retain family-like functionality while being more active than their wild-type predecessor. We found that depending on the inference method used to generate the epistatic constraints, different parameters simulate diverse selection strengths. Under weaker selection, local Hamiltonian fluctuations reliably predict relative changes to variant fitness, recapitulating neutral evolution. SEEC has the potential to explore the dynamics of neofunctionalization, characterize viral fitness landscapes, and facilitate vaccine development.
Collapse
Affiliation(s)
- Sophia Alvarez
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Charisse M. Nartey
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Nicholas Mercado
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | | | - Tea Huseinbegovic
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX75080
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| |
Collapse
|
12
|
Ose NJ, Campitelli P, Modi T, Can Kazan I, Kumar S, Banu Ozkan S. Some mechanistic underpinnings of molecular adaptations of SARS-COV-2 spike protein by integrating candidate adaptive polymorphisms with protein dynamics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.14.557827. [PMID: 37745560 PMCID: PMC10515954 DOI: 10.1101/2023.09.14.557827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
We integrate evolutionary predictions based on the neutral theory of molecular evolution with protein dynamics to generate mechanistic insight into the molecular adaptations of the SARS-COV-2 Spike (S) protein. With this approach, we first identified Candidate Adaptive Polymorphisms (CAPs) of the SARS-CoV-2 Spike protein and assessed the impact of these CAPs through dynamics analysis. Not only have we found that CAPs frequently overlap with well-known functional sites, but also, using several different dynamics-based metrics, we reveal the critical allosteric interplay between SARS-CoV-2 CAPs and the S protein binding sites with the human ACE2 (hACE2) protein. CAPs interact far differently with the hACE2 binding site residues in the open conformation of the S protein compared to the closed form. In particular, the CAP sites control the dynamics of binding residues in the open state, suggesting an allosteric control of hACE2 binding. We also explored the characteristic mutations of different SARS-CoV-2 strains to find dynamic hallmarks and potential effects of future mutations. Our analyses reveal that Delta strain-specific variants have non-additive (i.e., epistatic) interactions with CAP sites, whereas the less pathogenic Omicron strains have mostly additive mutations. Finally, our dynamics-based analysis suggests that the novel mutations observed in the Omicron strain epistatically interact with the CAP sites to help escape antibody binding.
Collapse
Affiliation(s)
- Nicholas J. Ose
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Paul Campitelli
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Tushar Modi
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Biology, Temple University, Philadelphia, Pennsylvania, United States of America
- Center for Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|
13
|
Ose NJ, Campitelli P, Patel R, Kumar S, Ozkan SB. Protein dynamics provide mechanistic insights about epistasis among common missense polymorphisms. Biophys J 2023; 122:2938-2947. [PMID: 36726312 PMCID: PMC10398253 DOI: 10.1016/j.bpj.2023.01.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 12/20/2022] [Accepted: 01/26/2023] [Indexed: 02/03/2023] Open
Abstract
Sequencing of the protein coding genome has revealed many different missense mutations of human proteins and different population frequencies of corresponding haplotypes, which consist of different sets of those mutations. Here, we present evidence for pairwise intramolecular epistasis (i.e., nonadditive interactions) between many such mutations through an analysis of protein dynamics. We suggest that functional compensation for conserving protein dynamics is a likely evolutionary mechanism that maintains high-frequency mutations that are individually nonneutral but epistatically compensating within proteins. This analysis is the first of its type to look at human proteins with specific high population frequency mutations and examine the relationship between mutations that make up that observed high-frequency protein haplotype. Importantly, protein dynamics revealed a separation between high and low frequency haplotypes within a target protein cytochrome P450 2A7, with the high-frequency haplotypes showing behavior closer to the wild-type protein. Common protein haplotypes containing two mutations display dynamic compensation in which one mutation can correct for the dynamic effects of the other. We also utilize a dynamics-based metric, EpiScore, that evaluates the epistatic interactions and allows us to see dynamic compensation within many other proteins.
Collapse
Affiliation(s)
- Nicholas J Ose
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona
| | - Paul Campitelli
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania; Department of Biology, Temple University, Philadelphia, Pennsylvania
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania; Department of Biology, Temple University, Philadelphia, Pennsylvania; Center for Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia.
| | - S Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona.
| |
Collapse
|
14
|
Alvarez S, Nartey CM, Mercado N, de la Paz A, Huseinbegovic T, Morcos F. In vivo functional phenotypes from a computational epistatic model of evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.542176. [PMID: 37292895 PMCID: PMC10245989 DOI: 10.1101/2023.05.24.542176] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Computational models of evolution are valuable for understanding the dynamics of sequence variation, to infer phylogenetic relationships or potential evolutionary pathways and for biomedical and industrial applications. Despite these benefits, few have validated their propensities to generate outputs with in vivo functionality, which would enhance their value as accurate and interpretable evolutionary algorithms. We demonstrate the power of epistasis inferred from natural protein families to evolve sequence variants in an algorithm we developed called Sequence Evolution with Epistatic Contributions. Utilizing the Hamiltonian of the joint probability of sequences in the family as fitness metric, we sampled and experimentally tested for in vivo β -lactamase activity in E. coli TEM-1 variants. These evolved proteins can have dozens of mutations dispersed across the structure while preserving sites essential for both catalysis and interactions. Remarkably, these variants retain family-like functionality while being more active than their WT predecessor. We found that depending on the inference method used to generate the epistatic constraints, different parameters simulate diverse selection strengths. Under weaker selection, local Hamiltonian fluctuations reliably predict relative changes to variant fitness, recapitulating neutral evolution. SEEC has the potential to explore the dynamics of neofunctionalization, characterize viral fitness landscapes and facilitate vaccine development.
Collapse
Affiliation(s)
- Sophia Alvarez
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Charisse M. Nartey
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Nicholas Mercado
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Alberto de la Paz
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Tea Huseinbegovic
- School of Natural Sciences and Mathematics, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX 75080, USA
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| |
Collapse
|
15
|
Ziegler C, Martin J, Sinner C, Morcos F. Latent generative landscapes as maps of functional diversity in protein sequence space. Nat Commun 2023; 14:2222. [PMID: 37076519 PMCID: PMC10113739 DOI: 10.1038/s41467-023-37958-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 04/05/2023] [Indexed: 04/21/2023] Open
Abstract
Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins, β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design.
Collapse
Affiliation(s)
- Cheyenne Ziegler
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Claude Sinner
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA.
| |
Collapse
|
16
|
Sgarbossa D, Lupo U, Bitbol AF. Generative power of a protein language model trained on multiple sequence alignments. eLife 2023; 12:e79854. [PMID: 36734516 PMCID: PMC10038667 DOI: 10.7554/elife.79854] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 02/02/2023] [Indexed: 02/04/2023] Open
Abstract
Computational models starting from large ensembles of evolutionarily related protein sequences capture a representation of protein families and learn constraints associated to protein structure and function. They thus open the possibility for generating novel sequences belonging to protein families. Protein language models trained on multiple sequence alignments, such as MSA Transformer, are highly attractive candidates to this end. We propose and test an iterative method that directly employs the masked language modeling objective to generate sequences using MSA Transformer. We demonstrate that the resulting sequences score as well as natural sequences, for homology, coevolution, and structure-based measures. For large protein families, our synthetic sequences have similar or better properties compared to sequences generated by Potts models, including experimentally validated ones. Moreover, for small protein families, our generation method based on MSA Transformer outperforms Potts models. Our method also more accurately reproduces the higher-order statistics and the distribution of sequences in sequence space of natural data than Potts models. MSA Transformer is thus a strong candidate for protein sequence generation and protein design.
Collapse
Affiliation(s)
- Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| |
Collapse
|
17
|
Dietler N, Lupo U, Bitbol AF. Impact of phylogeny on structural contact inference from protein sequence data. J R Soc Interface 2023; 20:20220707. [PMID: 36751926 PMCID: PMC9905998 DOI: 10.1098/rsif.2022.0707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 01/09/2023] [Indexed: 02/09/2023] Open
Abstract
Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalize to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference.
Collapse
Affiliation(s)
- Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
18
|
Choudhuri I, Biswas A, Haldane A, Levy RM. Contingency and Entrenchment of Drug-Resistance Mutations in HIV Viral Proteins. J Phys Chem B 2022; 126:10622-10636. [PMID: 36493468 PMCID: PMC9841799 DOI: 10.1021/acs.jpcb.2c06123] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The ability of HIV-1 to rapidly mutate leads to antiretroviral therapy (ART) failure among infected patients. Drug-resistance mutations (DRMs), which cause a fitness penalty to intrinsic viral fitness, are compensated by accessory mutations with favorable epistatic interactions which cause an evolutionary trapping effect, but the kinetics of this overall process has not been well characterized. Here, using a Potts Hamiltonian model describing epistasis combined with kinetic Monte Carlo simulations of evolutionary trajectories, we explore how epistasis modulates the evolutionary dynamics of HIV DRMs. We show how the occurrence of a drug-resistance mutation is contingent on favorable epistatic interactions with many other residues of the sequence background and that subsequent mutations entrench DRMs. We measure the time-autocorrelation of fluctuations in the likelihood of DRMs due to epistatic coupling with the sequence background, which reveals the presence of two evolutionary processes controlling DRM kinetics with two distinct time scales. Further analysis of waiting times for the evolutionary trapping effect to reverse reveals that the sequences which entrench (trap) a DRM are responsible for the slower time scale. We also quantify the overall strength of epistatic effects on the evolutionary kinetics for different mutations and show these are much larger for DRM positions than polymorphic positions, and we also show that trapping of a DRM is often caused by the collective effect of many accessory mutations, rather than a few strongly coupled ones, suggesting the importance of multiresidue sequence variations in HIV evolution. The analysis presented here provides a framework to explore the kinetic pathways through which viral proteins like HIV evolve under drug-selection pressure.
Collapse
Affiliation(s)
| | | | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States; Department of Physics, Temple University, Philadelphia, Pennsylvania 19122-6008, United States
| | - Ronald M. Levy
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States; Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
19
|
Evolutionary entrenchment in immune proteins selects against stabilizing mutations. Proc Natl Acad Sci U S A 2022; 119:e2216087119. [PMID: 36449544 PMCID: PMC9894131 DOI: 10.1073/pnas.2216087119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
|
20
|
Ravishankar K, Jiang X, Leddin EM, Morcos F, Cisneros GA. Computational compensatory mutation discovery approach: Predicting a PARP1 variant rescue mutation. Biophys J 2022; 121:3663-3673. [PMID: 35642254 PMCID: PMC9617126 DOI: 10.1016/j.bpj.2022.05.036] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 05/20/2022] [Accepted: 05/23/2022] [Indexed: 11/02/2022] Open
Abstract
The prediction of protein mutations that affect function may be exploited for multiple uses. In the context of disease variants, the prediction of compensatory mutations that reestablish functional phenotypes could aid in the development of genetic therapies. In this work, we present an integrated approach that combines coevolutionary analysis and molecular dynamics (MD) simulations to discover functional compensatory mutations. This approach is employed to investigate possible rescue mutations of a poly(ADP-ribose) polymerase 1 (PARP1) variant, PARP1 V762A, associated with lung cancer and follicular lymphoma. MD simulations show PARP1 V762A exhibits noticeable changes in structural and dynamical behavior compared with wild-type (WT) PARP1. Our integrated approach predicts A755E as a possible compensatory mutation based on coevolutionary information, and molecular simulations indicate that the PARP1 A755E/V762A double mutant exhibits similar structural and dynamical behavior to WT PARP1. Our methodology can be broadly applied to a large number of systems where single-nucleotide polymorphisms have been identified as connected to disease and can shed light on the biophysical effects of such changes as well as provide a way to discover potential mutants that could restore WT-like functionality. This can, in turn, be further utilized in the design of molecular therapeutics that aim to mimic such compensatory effect.
Collapse
Affiliation(s)
| | - Xianli Jiang
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Emmett M Leddin
- Department of Chemistry, University of North Texas, Denton, Texas
| | - Faruck Morcos
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioengineering, The University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, The University of Texas at Dallas, Richardson, Texas.
| | - G Andrés Cisneros
- Department of Chemistry, University of North Texas, Denton, Texas; Department of Physics, The University of Texas at Dallas, Richardson, Texas; Department of Chemistry, The University of Texas at Dallas, Richardson, Texas.
| |
Collapse
|
21
|
Abstract
Repeat proteins are made with tandem copies of similar amino acid stretches that fold into elongated architectures. These proteins constitute excellent model systems to investigate how evolution relates to structure, folding, and function. Here, we propose a scheme to map evolutionary information at the sequence level to a coarse-grained model for repeat-protein folding and use it to investigate the folding of thousands of repeat proteins. We model the energetics by a combination of an inverse Potts-model scheme with an explicit mechanistic model of duplications and deletions of repeats to calculate the evolutionary parameters of the system at the single-residue level. These parameters are used to inform an Ising-like model that allows for the generation of folding curves, apparent domain emergence, and occupation of intermediate states that are highly compatible with experimental data in specific case studies. We analyzed the folding of thousands of natural Ankyrin repeat proteins and found that a multiplicity of folding mechanisms are possible. Fully cooperative all-or-none transitions are obtained for arrays with enough sequence-similar elements and strong interactions between them, while noncooperative element-by-element intermittent folding arose if the elements are dissimilar and the interactions between them are energetically weak. Additionally, we characterized nucleation-propagation and multidomain folding mechanisms. We show that the global stability and cooperativity of the repeating arrays can be predicted from simple sequence scores.
Collapse
|
22
|
Vigué L, Croce G, Petitjean M, Ruppé E, Tenaillon O, Weigt M. Deciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes. Nat Commun 2022; 13:4030. [PMID: 35821377 PMCID: PMC9276797 DOI: 10.1038/s41467-022-31643-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 06/27/2022] [Indexed: 12/05/2022] Open
Abstract
Characterizing the effect of mutations is key to understand the evolution of protein sequences and to separate neutral amino-acid changes from deleterious ones. Epistatic interactions between residues can lead to a context dependence of mutation effects. Context dependence constrains the amino-acid changes that can contribute to polymorphism in the short term, and the ones that can accumulate between species in the long term. We use computational approaches to accurately predict the polymorphisms segregating in a panel of 61,157 Escherichia coli genomes from the analysis of distant homologues. By comparing a context-aware Direct-Coupling Analysis modelling to a non-epistatic approach, we show that the genetic context strongly constrains the tolerable amino acids in 30% to 50% of amino-acid sites. The study of more distant species suggests the gradual build-up of genetic context over long evolutionary timescales by the accumulation of small epistatic contributions.
Collapse
Affiliation(s)
- Lucile Vigué
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France
| | - Giancarlo Croce
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics-SIB, Lausanne, Switzerland
| | - Marie Petitjean
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France
| | - Etienne Ruppé
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France
- Laboratoire de Bactériologie, Hôpital Bichat, APHP, Paris, France
| | - Olivier Tenaillon
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France.
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology-LCQB, Paris, France.
| |
Collapse
|
23
|
Patel R, Carnevale V, Kumar S. Epistasis Creates Invariant Sites and Modulates the Rate of Molecular Evolution. Mol Biol Evol 2022; 39:msac106. [PMID: 35575390 PMCID: PMC9156017 DOI: 10.1093/molbev/msac106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Invariant sites are a common feature of amino acid sequence evolution. The presence of invariant sites is frequently attributed to the need to preserve function through site-specific conservation of amino acid residues. Amino acid substitution models without a provision for invariant sites often fit the data significantly worse than those that allow for an excess of invariant sites beyond those predicted by models that only incorporate rate variation among sites (e.g., a Gamma distribution). An alternative is epistasis between sites to preserve residue interactions that can create invariant sites. Through computer-simulated sequence evolution, we evaluated the relative effects of site-specific preferences and site-site couplings in the generation of invariant sites and the modulation of the rate of molecular evolution. In an analysis of ten major families of protein domains with diverse sequence and functional properties, we find that the negative selection imposed by epistasis creates many more invariant sites than site-specific residue preferences alone. Further, epistasis plays an increasingly larger role in creating invariant sites over longer evolutionary periods. Epistasis also dictates rates of domain evolution over time by exerting significant additional purifying selection to preserve site couplings. These patterns illuminate the mechanistic role of epistasis in the processes underlying observed site invariance and evolutionary rates.
Collapse
Affiliation(s)
- Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Vincenzo Carnevale
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
24
|
Morcos F. Characterizing the landscape of evolvability. Nat Ecol Evol 2022; 6:500-501. [PMID: 35361891 PMCID: PMC9150722 DOI: 10.1038/s41559-022-01731-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A framework to experimentally traverse the large space of functionally neutral variants in a toxin–antitoxin protein complex reveals insights on evolvability and entrenchment of molecular interactions.
Collapse
Affiliation(s)
- Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA.
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA.
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, USA.
| |
Collapse
|
25
|
Gerardos A, Dietler N, Bitbol AF. Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences. PLoS Comput Biol 2022; 18:e1010147. [PMID: 35576238 PMCID: PMC9135348 DOI: 10.1371/journal.pcbi.1010147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 05/26/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Inferring protein-protein interactions from sequences is an important task in computational biology. Recent methods based on Direct Coupling Analysis (DCA) or Mutual Information (MI) allow to find interaction partners among paralogs of two protein families. Does successful inference mainly rely on correlations from structural contacts or from phylogeny, or both? Do these two types of signal combine constructively or hinder each other? To address these questions, we generate and analyze synthetic data produced using a minimal model that allows us to control the amounts of structural constraints and phylogeny. We show that correlations from these two sources combine constructively to increase the performance of partner inference by DCA or MI. Furthermore, signal from phylogeny can rescue partner inference when signal from contacts becomes less informative, including in the realistic case where inter-protein contacts are restricted to a small subset of sites. We also demonstrate that DCA-inferred couplings between non-contact pairs of sites improve partner inference in the presence of strong phylogeny, while deteriorating it otherwise. Moreover, restricting to non-contact pairs of sites preserves inference performance in the presence of strong phylogeny. In a natural data set, as well as in realistic synthetic data based on it, we find that non-contact pairs of sites contribute positively to partner inference performance, and that restricting to them preserves performance, evidencing an important role of phylogeny.
Collapse
Affiliation(s)
- Andonis Gerardos
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
26
|
Youssef N, Susko E, Roger AJ, Bielawski JP. Evolution of amino acid propensities under stability-mediated epistasis. Mol Biol Evol 2022; 39:6522130. [PMID: 35134997 PMCID: PMC8896634 DOI: 10.1093/molbev/msac030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Site-specific amino acid preferences are influenced by the genetic background of the protein. The preferences for resident amino acids are expected to, on average, increase over time because of replacements at other sites - a nonadaptive phenomenon referred to as the 'evolutionary Stokes shift'. Alternatively, decreases in resident amino acid propensity have recently been viewed as evidence of adaptations to external environmental changes. Using population genetics theory and thermodynamic stability-constraints, we show that nonadaptive evolution can lead to both positive and negative shifts in propensities following the fixation of an amino acid, emphasizing that the detection of negative shifts is not conclusive evidence of adaptation. Considering shifts in propensities over windows between substitutions at a focal site, we find that following ≈ 50% of substitutions the propensity for the new resident amino acid decreases over time, and both positive and negative shifts were comparable in magnitude. Preferences were often conserved via a significant negative autocorrelation in propensity changes-increases in propensities often followed by decreases, and vice versa. Lastly, we explore the underlying mechanisms that lead propensities to fluctuate. We observe that stabilizing replacements increase the mutational tolerance at a site and in doing so decrease the propensity for the resident amino acid. In contrast, destabilizing substitutions result in more rugged fitness landscapes that tend to favor the resident amino acid. In summary, our results characterize propensity trajectories under nonadaptive stability-constrained evolution against which evidence of adaptations should be calibrated.
Collapse
Affiliation(s)
- Noor Youssef
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - Andrew J Roger
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada
| | - Joseph P Bielawski
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
27
|
Bisardi M, Rodriguez-Rivas J, Zamponi F, Weigt M. Modeling sequence-space exploration and emergence of epistatic signals in protein evolution. Mol Biol Evol 2021; 39:6424001. [PMID: 34751386 PMCID: PMC8789065 DOI: 10.1093/molbev/msab321] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
During their evolution, proteins explore sequence space via an interplay between random mutations and phenotypic selection. Here, we build upon recent progress in reconstructing data-driven fitness landscapes for families of homologous proteins, to propose stochastic models of experimental protein evolution. These models predict quantitatively important features of experimentally evolved sequence libraries, like fitness distributions and position-specific mutational spectra. They also allow us to efficiently simulate sequence libraries for a vast array of combinations of experimental parameters like sequence divergence, selection strength, and library size. We showcase the potential of the approach in reanalyzing two recent experiments to determine protein structure from signals of epistasis emerging in experimental sequence libraries. To be detectable, these signals require sufficiently large and sufficiently diverged libraries. Our modeling framework offers a quantitative explanation for different outcomes of recently published experiments. Furthermore, we can forecast the outcome of time- and resource-intensive evolution experiments, opening thereby a way to computationally optimize experimental protocols.
Collapse
Affiliation(s)
- M Bisardi
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, F-75005, France.,Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| | - J Rodriguez-Rivas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| | - F Zamponi
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, F-75005, France
| | - M Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| |
Collapse
|
28
|
Trinquier J, Uguzzoni G, Pagnani A, Zamponi F, Weigt M. Efficient generative modeling of protein sequences using simple autoregressive models. Nat Commun 2021; 12:5800. [PMID: 34608136 PMCID: PMC8490405 DOI: 10.1038/s41467-021-25756-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 08/23/2021] [Indexed: 02/08/2023] Open
Abstract
Generative models emerge as promising candidates for novel sequence-data driven approaches to protein design, and for the extraction of structural and functional information about proteins deeply hidden in rapidly growing sequence databases. Here we propose simple autoregressive models as highly accurate but computationally efficient generative sequence models. We show that they perform similarly to existing approaches based on Boltzmann machines or deep generative models, but at a substantially lower computational cost (by a factor between 102 and 103). Furthermore, the simple structure of our models has distinctive mathematical advantages, which translate into an improved applicability in sequence generation and evaluation. Within these models, we can easily estimate both the probability of a given sequence, and, using the model's entropy, the size of the functional sequence space related to a specific protein family. In the example of response regulators, we find a huge number of ca. 1068 possible sequences, which nevertheless constitute only the astronomically small fraction 10-80 of all amino-acid sequences of the same length. These findings illustrate the potential and the difficulty in exploring sequence space via generative sequence models.
Collapse
Affiliation(s)
- Jeanne Trinquier
- grid.503253.20000 0004 0520 7190Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France ,grid.462608.e0000 0004 0384 7821Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Guido Uguzzoni
- grid.4800.c0000 0004 1937 0343Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy ,grid.428948.b0000 0004 1784 6598Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo (TO), Italy
| | - Andrea Pagnani
- grid.4800.c0000 0004 1937 0343Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy ,grid.428948.b0000 0004 1784 6598Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo (TO), Italy ,grid.470222.10000 0004 7471 9712INFN Sezione di Torino, Via P. Giuria 1, I-10125 Torino, Italy
| | - Francesco Zamponi
- grid.462608.e0000 0004 0384 7821Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Martin Weigt
- grid.503253.20000 0004 0520 7190Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France
| |
Collapse
|
29
|
Prabh N, Tautz D. Frequent lineage-specific substitution rate changes support an episodic model for protein evolution. G3-GENES GENOMES GENETICS 2021; 11:6372692. [PMID: 34542594 PMCID: PMC8664490 DOI: 10.1093/g3journal/jkab333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/13/2021] [Indexed: 12/04/2022]
Abstract
Since the inception of the molecular clock model for sequence evolution, the investigation of protein divergence has revolved around the question of a more or less constant change of amino acid sequences, with specific overall rates for each family. Although anomalies in clock-like divergence are well known, the assumption of a constant decay rate for a given protein family is usually taken as the null model for protein evolution. However, systematic tests of this null model at a genome-wide scale have lagged behind, despite the databases’ enormous growth. We focus here on divergence rate comparisons between very closely related lineages since this allows clear orthology assignments by synteny and reliable alignments, which are crucial for determining substitution rate changes. We generated a high-confidence dataset of syntenic orthologs from four ape species, including humans. We find that despite the appearance of an overall clock-like substitution pattern, several hundred protein families show lineage-specific acceleration and deceleration in divergence rates, or combinations of both in different lineages. Hence, our analysis uncovers a rather dynamic history of substitution rate changes, even between these closely related lineages, implying that one should expect that a large fraction of proteins will have had a history of episodic rate changes in deeper phylogenies. Furthermore, each of the lineages has a separate set of particularly fast diverging proteins. The genes with the highest percentage of branch-specific substitutions are ADCYAP1 in the human lineage (9.7%), CALU in chimpanzees (7.1%), SLC39A14 in the internal branch leading to humans and chimpanzees (4.1%), RNF128 in gorillas (9%), and S100Z in gibbons (15.2%). The mutational pattern in ADCYAP1 suggests a biased mutation process, possibly through asymmetric gene conversion effects. We conclude that a null model of constant change can be problematic for predicting the evolutionary trajectories of individual proteins.
Collapse
Affiliation(s)
- Neel Prabh
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, 24306 Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, 24306 Plön, Germany
| |
Collapse
|
30
|
Epistasis produces an excess of invariant sites in neutral molecular evolution. Proc Natl Acad Sci U S A 2021; 118:2018767118. [PMID: 33875549 DOI: 10.1073/pnas.2018767118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
|
31
|
Reply to Patel and Kumar: Epistasis not critical to rate distributions but key in heterotachy, overdispersion, and evolutionary Stokes shift. Proc Natl Acad Sci U S A 2021; 118:2100107118. [PMID: 33875550 DOI: 10.1073/pnas.2100107118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
32
|
Crippa M, Andreghetti D, Capelli R, Tiana G. Evolution of frustrated and stabilising contacts in reconstructed ancient proteins. EUROPEAN BIOPHYSICS JOURNAL 2021; 50:699-712. [PMID: 33569610 PMCID: PMC8260555 DOI: 10.1007/s00249-021-01500-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 12/14/2020] [Accepted: 01/13/2021] [Indexed: 11/30/2022]
Abstract
Energetic properties of a protein are a major determinant of its evolutionary fitness. Using a reconstruction algorithm, dating the reconstructed proteins and calculating the interaction network between their amino acids through a coevolutionary approach, we studied how the interactions that stabilise 890 proteins, belonging to five families, evolved for billions of years. In particular, we focused our attention on the network of most strongly attractive contacts and on that of poorly optimised, frustrated contacts. Our results support the idea that the cluster of most attractive interactions extends its size along evolutionary time, but from the data, we cannot conclude that protein stability or that the degree of frustration tends always to decrease.
Collapse
Affiliation(s)
- Martina Crippa
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Damiano Andreghetti
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
| | - Riccardo Capelli
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Guido Tiana
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy.
| |
Collapse
|
33
|
Zeng HL, Dichio V, Rodríguez Horta E, Thorell K, Aurell E. Global analysis of more than 50,000 SARS-CoV-2 genomes reveals epistasis between eight viral genes. Proc Natl Acad Sci U S A 2020; 117:31519-31526. [PMID: 33203681 PMCID: PMC7733830 DOI: 10.1073/pnas.2012331117] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Genome-wide epistasis analysis is a powerful tool to infer gene interactions, which can guide drug and vaccine development and lead to deeper understanding of microbial pathogenesis. We have considered all complete severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes deposited in the Global Initiative on Sharing All Influenza Data (GISAID) repository until four different cutoff dates, and used direct coupling analysis together with an assumption of quasi-linkage equilibrium to infer epistatic contributions to fitness from polymorphic loci. We find eight interactions, of which three are between pairs where one locus lies in gene ORF3a, both loci holding nonsynonymous mutations. We also find interactions between two loci in gene nsp13, both holding nonsynonymous mutations, and four interactions involving one locus holding a synonymous mutation. Altogether, we infer interactions between loci in viral genes ORF3a and nsp2, nsp12, and nsp6, between ORF8 and nsp4, and between loci in genes nsp2, nsp13, and nsp14. The paper opens the prospect to use prominent epistatically linked pairs as a starting point to search for combinatorial weaknesses of recombinant viral pathogens.
Collapse
Affiliation(s)
- Hong-Li Zeng
- New Energy Technology Engineering Laboratory of Jiangsu Province, School of Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
- Nordic Institute for Theoretical Physics, Royal Institute of Technology and Stockholm University, 10691 Stockholm, Sweden
| | - Vito Dichio
- Nordic Institute for Theoretical Physics, Royal Institute of Technology and Stockholm University, 10691 Stockholm, Sweden
- Department of Physics, University of Trieste, 34151 Trieste, Italy
- Department of Computational Science and Technology, AlbaNova University Center, 10691 Stockholm, Sweden
| | - Edwin Rodríguez Horta
- Group of Complex Systems and Statistical Physics, Department of Theoretical Physics, Physics Faculty, University of Havana, 10400 Havana, Cuba
| | - Kaisa Thorell
- Department of Infectious Diseases, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 40530 Gothenburg, Sweden
- Center for Translational Microbiome Research, Department of Microbiology, Cell and Tumor Biology, Karolinska Institutet, 17177 Stockholm, Sweden
| | - Erik Aurell
- Department of Computational Science and Technology, AlbaNova University Center, 10691 Stockholm, Sweden;
| |
Collapse
|
34
|
Youssef N, Susko E, Bielawski JP. Consequences of Stability-Induced Epistasis for Substitution Rates. Mol Biol Evol 2020; 37:3131-3148. [DOI: 10.1093/molbev/msaa151] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
AbstractDo interactions between residues in a protein (i.e., epistasis) significantly alter evolutionary dynamics? If so, what consequences might they have on inference from traditional codon substitution models which assume site-independence for the sake of computational tractability? To investigate the effects of epistasis on substitution rates, we employed a mechanistic mutation-selection model in conjunction with a fitness framework derived from protein stability. We refer to this as the stability-informed site-dependent (S-SD) model and developed a new stability-informed site-independent (S-SI) model that captures the average effect of stability constraints on individual sites of a protein. Comparison of S-SI and S-SD offers a novel and direct method for investigating the consequences of stability-induced epistasis on protein evolution. We developed S-SI and S-SD models for three natural proteins and showed that they generate sequences consistent with real alignments. Our analyses revealed that epistasis tends to increase substitution rates compared with the rates under site-independent evolution. We then assessed the epistatic sensitivity of individual site and discovered a counterintuitive effect: Highly connected sites were less influenced by epistasis relative to exposed sites. Lastly, we show that, despite the unrealistic assumptions, traditional models perform comparably well in the presence and absence of epistasis and provide reasonable summaries of average selection intensities. We conclude that epistatic models are critical to understanding protein evolutionary dynamics, but epistasis might not be required for reasonable inference of selection pressure when averaging over time and sites.
Collapse
Affiliation(s)
- Noor Youssef
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Edward Susko
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Joseph P Bielawski
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|