1
|
Arcos S, Han AX, te Velthuis AJW, Russell CA, Lauring AS. Mutual information networks reveal evolutionary relationships within the influenza A virus polymerase. Virus Evol 2023; 9:vead037. [PMID: 37325086 PMCID: PMC10263469 DOI: 10.1093/ve/vead037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/27/2023] [Accepted: 05/24/2023] [Indexed: 06/17/2023] Open
Abstract
The influenza A virus (IAV) RNA polymerase is an essential driver of IAV evolution. Mutations that the polymerase introduces into viral genome segments during replication are the ultimate source of genetic variation, including within the three subunits of the IAV polymerase (polymerase basic protein 2, polymerase basic protein 1, and polymerase acidic protein). Evolutionary analysis of the IAV polymerase is complicated, because changes in mutation rate, replication speed, and drug resistance involve epistatic interactions among its subunits. In order to study the evolution of the human seasonal H3N2 polymerase since the 1968 pandemic, we identified pairwise evolutionary relationships among ∼7000 H3N2 polymerase sequences using mutual information (MI), which measures the information gained about the identity of one residue when a second residue is known. To account for uneven sampling of viral sequences over time, we developed a weighted MI (wMI) metric and demonstrate that wMI outperforms raw MI through simulations using a well-sampled severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) dataset. We then constructed wMI networks of the H3N2 polymerase to extend the inherently pairwise wMI statistic to encompass relationships among larger groups of residues. We included hemagglutinin (HA) in the wMI network to distinguish between functional wMI relationships within the polymerase and those potentially due to hitch-hiking on antigenic changes in HA. The wMI networks reveal coevolutionary relationships among residues with roles in replication and encapsidation. Inclusion of HA highlighted polymerase-only subgraphs containing residues with roles in the enzymatic functions of the polymerase and host adaptability. This work provides insight into the factors that drive and constrain the rapid evolution of influenza viruses.
Collapse
|
2
|
Choudhuri I, Biswas A, Haldane A, Levy RM. Contingency and Entrenchment of Drug-Resistance Mutations in HIV Viral Proteins. J Phys Chem B 2022; 126:10622-10636. [PMID: 36493468 PMCID: PMC9841799 DOI: 10.1021/acs.jpcb.2c06123] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The ability of HIV-1 to rapidly mutate leads to antiretroviral therapy (ART) failure among infected patients. Drug-resistance mutations (DRMs), which cause a fitness penalty to intrinsic viral fitness, are compensated by accessory mutations with favorable epistatic interactions which cause an evolutionary trapping effect, but the kinetics of this overall process has not been well characterized. Here, using a Potts Hamiltonian model describing epistasis combined with kinetic Monte Carlo simulations of evolutionary trajectories, we explore how epistasis modulates the evolutionary dynamics of HIV DRMs. We show how the occurrence of a drug-resistance mutation is contingent on favorable epistatic interactions with many other residues of the sequence background and that subsequent mutations entrench DRMs. We measure the time-autocorrelation of fluctuations in the likelihood of DRMs due to epistatic coupling with the sequence background, which reveals the presence of two evolutionary processes controlling DRM kinetics with two distinct time scales. Further analysis of waiting times for the evolutionary trapping effect to reverse reveals that the sequences which entrench (trap) a DRM are responsible for the slower time scale. We also quantify the overall strength of epistatic effects on the evolutionary kinetics for different mutations and show these are much larger for DRM positions than polymorphic positions, and we also show that trapping of a DRM is often caused by the collective effect of many accessory mutations, rather than a few strongly coupled ones, suggesting the importance of multiresidue sequence variations in HIV evolution. The analysis presented here provides a framework to explore the kinetic pathways through which viral proteins like HIV evolve under drug-selection pressure.
Collapse
Affiliation(s)
| | | | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States; Department of Physics, Temple University, Philadelphia, Pennsylvania 19122-6008, United States
| | - Ronald M. Levy
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States; Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
3
|
Adami C, C G N. Emergence of functional information from multivariate correlations. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2022; 380:20210250. [PMID: 35599555 DOI: 10.1098/rsta.2021.0250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
The information content of symbolic sequences (such as nucleic or amino acid sequences, but also neuronal firings or strings of letters) can be calculated from an ensemble of such sequences, but because information cannot be assigned to single sequences, we cannot correlate information to other observables attached to the sequence. Here we show that an information score obtained from multivariate (multiple-variable) correlations within sequences of a 'training' ensemble can be used to predict observables of out-of-sample sequences with an accuracy that scales with the complexity of correlations, showing that functional information emerges from a hierarchy of multi-variable correlations. This article is part of the theme issue 'Emergent phenomena in complex physical and socio-technical systems: from cells to societies'.
Collapse
Affiliation(s)
- Christoph Adami
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI 48824, USA
- BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, MI 48824, USA
- Program in Ecology, Evolution, and Behavior, Michigan State University, East Lansing, MI 48824, USA
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA
| | - Nitash C G
- BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, MI 48824, USA
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
4
|
Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins. Cell Syst 2022; 13:274-285.e6. [PMID: 35120643 DOI: 10.1016/j.cels.2022.01.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 11/15/2021] [Accepted: 01/12/2022] [Indexed: 11/23/2022]
Abstract
The degree to which evolution is predictable is a fundamental question in biology. Previous attempts to predict the evolution of protein sequences have been limited to specific proteins and to small changes, such as single-residue mutations. Here, we demonstrate that by using a protein language model to predict the local evolution within protein families, we recover a dynamic "vector field" of protein evolution that we call evolutionary velocity (evo-velocity). Evo-velocity generalizes to evolution over vastly different timescales, from viral proteins evolving over years to eukaryotic proteins evolving over geologic eons, and can predict the evolutionary dynamics of proteins that were not used to develop the original model. Evo-velocity also yields new evolutionary insights by predicting strategies of viral-host immune escape, resolving conflicting theories on the evolution of serpins, and revealing a key role of horizontal gene transfer in the evolution of eukaryotic glycolysis.
Collapse
|
5
|
Biswas A, Haldane A, Levy RM. Limits to detecting epistasis in the fitness landscape of HIV. PLoS One 2022; 17:e0262314. [PMID: 35041711 PMCID: PMC8765623 DOI: 10.1371/journal.pone.0262314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/20/2021] [Indexed: 02/05/2023] Open
Abstract
The rapid evolution of HIV is constrained by interactions between mutations which affect viral fitness. In this work, we explore the role of epistasis in determining the mutational fitness landscape of HIV for multiple drug target proteins, including Protease, Reverse Transcriptase, and Integrase. Epistatic interactions between residues modulate the mutation patterns involved in drug resistance, with unambiguous signatures of epistasis best seen in the comparison of the Potts model predicted and experimental HIV sequence "prevalences" expressed as higher-order marginals (beyond triplets) of the sequence probability distribution. In contrast, experimental measures of fitness such as viral replicative capacities generally probe fitness effects of point mutations in a single background, providing weak evidence for epistasis in viral systems. The detectable effects of epistasis are obscured by higher evolutionary conservation at sites. While double mutant cycles in principle, provide one of the best ways to probe epistatic interactions experimentally without reference to a particular background, we show that the analysis is complicated by the small dynamic range of measurements. Overall, we show that global pairwise interaction Potts models are necessary for predicting the mutational landscape of viral proteins.
Collapse
Affiliation(s)
- Avik Biswas
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
| | - Allan Haldane
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
| | - Ronald M. Levy
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
- Department of Chemistry, Temple University, Philadelphia, PA, United States of America
| |
Collapse
|
6
|
C G N, Adami C. Information-theoretic characterization of the complete genotype-phenotype map of a complex pre-biotic world: Comment on "From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics" by Susanna Manrubia et al. Phys Life Rev 2021; 38:111-114. [PMID: 34272193 DOI: 10.1016/j.plrev.2021.06.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 06/22/2021] [Indexed: 11/16/2022]
Abstract
How information is encoded in bio-molecular sequences is difficult to quantify since such an analysis usually requires sampling an exponentially large genetic space. Here we show how information theory reveals both robust and compressed encodings in the largest complete genotype-phenotype map (over 5 trillion sequences) obtained to date.
Collapse
Affiliation(s)
- Nitash C G
- Department of Computer Science and Engineering, Michigan State University, USA; BEACON Center for the Study of Evolution Action, Michigan State University, USA
| | - Christoph Adami
- BEACON Center for the Study of Evolution Action, Michigan State University, USA; Department of Microbiology and Molecular Genetics, Michigan State University, USA; Department of Physics and Astronomy, Michigan State University, USA; Program in Ecology, Evolution, and Behavior, Michigan State University, USA.
| |
Collapse
|
7
|
Ben-David M, Soskine M, Dubovetskyi A, Cherukuri KP, Dym O, Sussman JL, Liao Q, Szeler K, Kamerlin SCL, Tawfik DS. Enzyme Evolution: An Epistatic Ratchet versus a Smooth Reversible Transition. Mol Biol Evol 2019; 37:1133-1147. [DOI: 10.1093/molbev/msz298] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Evolutionary trajectories are deemed largely irreversible. In a newly diverged protein, reversion of mutations that led to the functional switch typically results in loss of both the new and the ancestral functions. Nonetheless, evolutionary transitions where reversions are viable have also been described. The structural and mechanistic causes of reversion compatibility versus incompatibility therefore remain unclear. We examined two laboratory evolution trajectories of mammalian paraoxonase-1, a lactonase with promiscuous organophosphate hydrolase (OPH) activity. Both trajectories began with the same active-site mutant, His115Trp, which lost the native lactonase activity and acquired higher OPH activity. A neo-functionalization trajectory amplified the promiscuous OPH activity, whereas the re-functionalization trajectory restored the native activity, thus generating a new lactonase that lacks His115. The His115 revertants of these trajectories indicated opposite trends. Revertants of the neo-functionalization trajectory lost both the evolved OPH and the original lactonase activity. Revertants of the trajectory that restored the original lactonase function were, however, fully active. Crystal structures and molecular simulations show that in the newly diverged OPH, the reverted His115 and other catalytic residues are displaced, thus causing loss of both the original and the new activity. In contrast, in the re-functionalization trajectory, reversion compatibility of the original lactonase activity derives from mechanistic versatility whereby multiple residues can fulfill the same task. This versatility enables unique sequence-reversible compositions that are inaccessible when the active site was repurposed toward a new function.
Collapse
Affiliation(s)
- Moshe Ben-David
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Misha Soskine
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Artem Dubovetskyi
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | | | - Orly Dym
- Department of Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot, Israel
| | - Joel L Sussman
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Qinghua Liao
- Department of Chemistry – BMC, Uppsala University, Uppsala, Sweden
| | - Klaudia Szeler
- Department of Chemistry – BMC, Uppsala University, Uppsala, Sweden
| | | | - Dan S Tawfik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
8
|
Biswas A, Haldane A, Arnold E, Levy RM. Epistasis and entrenchment of drug resistance in HIV-1 subtype B. eLife 2019; 8:e50524. [PMID: 31591964 PMCID: PMC6783267 DOI: 10.7554/elife.50524] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 09/09/2019] [Indexed: 12/17/2022] Open
Abstract
The development of drug resistance in HIV is the result of primary mutations whose effects on viral fitness depend on the entire genetic background, a phenomenon called 'epistasis'. Based on protein sequences derived from drug-experienced patients in the Stanford HIV database, we use a co-evolutionary (Potts) Hamiltonian model to provide direct confirmation of epistasis involving many simultaneous mutations. Building on earlier work, we show that primary mutations leading to drug resistance can become highly favored (or entrenched) by the complex mutation patterns arising in response to drug therapy despite being disfavored in the wild-type background, and provide the first confirmation of entrenchment for all three drug-target proteins: protease, reverse transcriptase, and integrase; a comparative analysis reveals that NNRTI-induced mutations behave differently from the others. We further show that the likelihood of resistance mutations can vary widely in patient populations, and from the population average compared to specific molecular clones.
Collapse
Affiliation(s)
- Avik Biswas
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
| | - Allan Haldane
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
| | - Eddy Arnold
- Center for Advanced Biotechnology and MedicineRutgers UniversityPiscatawayUnited States
- Department of Chemistry and Chemical BiologyRutgers UniversityPiscatawayUnited States
| | - Ronald M Levy
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
- Department of ChemistryTemple UniversityPhiladelphiaUnited States
| |
Collapse
|
9
|
Lima ENDC, Piqueira JRC, Camargo M, Galinskas J, Sucupira MC, Diaz RS. Impact of antiretroviral resistance and virological failure on HIV-1 informational entropy. J Antimicrob Chemother 2019; 73:1054-1059. [PMID: 29373694 DOI: 10.1093/jac/dkx508] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 12/07/2017] [Indexed: 11/13/2022] Open
Abstract
Objectives The present study investigated the relationship between genomic variability and resistance of HIV-1 sequences in protease (PR) and reverse transcriptase (RT) regions of the pol gene. In addition, we analysed the resistance among 651 individuals presenting antiretroviral virological failure, from 2009 to 2011, in the state of São Paulo, Brazil. Methods The genomic variability was quantified by using informational entropy methods and the relationship between resistance and replicative fitness, as inferred by the residual viral load and CD4+ T cell count. Results The number of antiretroviral schemes is related to the number of resistance mutations in the HIV-1 PR (α = 0.2511, P = 0.0003, R2 = 0.8672) and the RT (α = 0.7892, P = 0.0001, R2 = 0.9141). Increased informational entropy rate is related to lower levels of HIV-1 viral loads (α = -0.0121, P = 0.0471, R2 = 0.7923), lower levels of CD4+ T cell counts (α = -0.0120, P = 0.0335, R2 = 0.8221) and a higher number of antiretroviral resistance-related mutations. Conclusions Less organized HIV genomes as inferred by higher levels of informational entropy relate to less competent host immune systems, lower levels of HIV replication and HIV genetic evolution as a consequence of antiretroviral resistance.
Collapse
Affiliation(s)
- Elidamar Nunes de Carvalho Lima
- Division of Infectious Diseases, Paulista School of Medicine, Federal University of São Paulo-UNIFESP, São Paulo, SP, Brazil.,Telecommunication and Control Engineering Department, Engineering School, University of São Paulo, São Paulo, SP, Brazil
| | - José Roberto Castilho Piqueira
- Telecommunication and Control Engineering Department, Engineering School, University of São Paulo, São Paulo, SP, Brazil
| | - Michelle Camargo
- Division of Infectious Diseases, Paulista School of Medicine, Federal University of São Paulo-UNIFESP, São Paulo, SP, Brazil
| | - Juliana Galinskas
- Division of Infectious Diseases, Paulista School of Medicine, Federal University of São Paulo-UNIFESP, São Paulo, SP, Brazil
| | - Maria Cecilia Sucupira
- Division of Infectious Diseases, Paulista School of Medicine, Federal University of São Paulo-UNIFESP, São Paulo, SP, Brazil
| | - Ricardo Sobhie Diaz
- Division of Infectious Diseases, Paulista School of Medicine, Federal University of São Paulo-UNIFESP, São Paulo, SP, Brazil
| |
Collapse
|
10
|
Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons. PLoS Comput Biol 2018; 14:e1006498. [PMID: 30543621 PMCID: PMC6314628 DOI: 10.1371/journal.pcbi.1006498] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 01/02/2019] [Accepted: 09/10/2018] [Indexed: 01/07/2023] Open
Abstract
Next generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data. FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser. We demonstrate how FLEA may be used to process Pacific Biosciences HIV env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV env populations. A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018. Viral populations constantly evolve and diversify. In this article we introduce a method, FLEA, for reconstructing and visualizing the details of evolutionary changes. FLEA specifically processes data from sequencing platforms that generate reads that are long, but error-prone. To study the evolutionary dynamics of entire genes during viral infection, data is collected via long-read sequencing at discrete time points, allowing us to understand how the virus changes over time. However, the experimental and sequencing process is imperfect, so the resulting data contain not only real evolutionary changes, but also mutations and other genetic artifacts caused by sequencing errors. Our method corrects most of these errors by combining thousands of erroneous sequences into a much smaller number of unique consensus sequences that represent biologically meaningful variation. The resulting high-quality sequences are used for further analysis, such as building an evolutionary tree that tracks and interprets the genetic changes in the viral population over time. FLEA is open source, and is freely available online.
Collapse
|
11
|
Hartman EC, Lobba MJ, Favor AH, Robinson SA, Francis MB, Tullman-Ercek D. Experimental Evaluation of Coevolution in a Self-Assembling Particle. Biochemistry 2018; 58:1527-1538. [DOI: 10.1021/acs.biochem.8b00948] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Emily C. Hartman
- Department of Chemistry, University of California, Berkeley, California 94720-1460, United States
| | - Marco J. Lobba
- Department of Chemistry, University of California, Berkeley, California 94720-1460, United States
| | - Andrew H. Favor
- Department of Chemistry, University of California, Berkeley, California 94720-1460, United States
| | - Stephanie A. Robinson
- Department of Chemistry, University of California, Berkeley, California 94720-1460, United States
| | - Matthew B. Francis
- Department of Chemistry, University of California, Berkeley, California 94720-1460, United States
- Materials Sciences Division, Lawrence Berkeley National Laboratories, Berkeley, California 94720-1460, United States
| | - Danielle Tullman-Ercek
- Department of Chemical and Biological Engineering, Northwestern University, 2145 Sheridan Road, Technological Institute E136, Evanston, Illinois 60208-3120, United States
| |
Collapse
|
12
|
Nelson ED, Grishin NV. Inference of epistatic effects in a key mitochondrial protein. Phys Rev E 2018; 97:062404. [PMID: 30011480 DOI: 10.1103/physreve.97.062404] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2017] [Indexed: 12/17/2022]
Abstract
We use Potts model inference to predict pair epistatic effects in a key mitochondrial protein-cytochrome c oxidase subunit 2-for ray-finned fishes. We examine the effect of phylogenetic correlations on our predictions using a simple exact fitness model, and we find that, although epistatic effects are underpredicted, they maintain a roughly linear relationship to their true (model) values. After accounting for this correction, epistatic effects in the protein are still relatively weak, leading to fitness valleys of depth 2Ns≃-5 in compensatory double mutants. Interestingly, positive epistasis is more pronounced than negative epistasis, and the strongest positive effects capture nearly all sites subject to positive selection in fishes, similar to virus proteins evolving under selection pressure in the context of drug therapy.
Collapse
Affiliation(s)
- Erik D Nelson
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| |
Collapse
|
13
|
Obolski U, Ram Y, Hadany L. Key issues review: evolution on rugged adaptive landscapes. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2018; 81:012602. [PMID: 29051394 DOI: 10.1088/1361-6633/aa94d4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Adaptive landscapes represent a mapping between genotype and fitness. Rugged adaptive landscapes contain two or more adaptive peaks: allele combinations with higher fitness than any of their neighbors in the genetic space. How do populations evolve on such rugged landscapes? Evolutionary biologists have struggled with this question since it was first introduced in the 1930s by Sewall Wright. Discoveries in the fields of genetics and biochemistry inspired various mathematical models of adaptive landscapes. The development of landscape models led to numerous theoretical studies analyzing evolution on rugged landscapes under different biological conditions. The large body of theoretical work suggests that adaptive landscapes are major determinants of the progress and outcome of evolutionary processes. Recent technological advances in molecular biology and microbiology allow experimenters to measure adaptive values of large sets of allele combinations and construct empirical adaptive landscapes for the first time. Such empirical landscapes have already been generated in bacteria, yeast, viruses, and fungi, and are contributing to new insights about evolution on adaptive landscapes. In this Key Issues Review we will: (i) introduce the concept of adaptive landscapes; (ii) review the major theoretical studies of evolution on rugged landscapes; (iii) review some of the recently obtained empirical adaptive landscapes; (iv) discuss recent mathematical and statistical analyses motivated by empirical adaptive landscapes, as well as provide the reader with instructions and source code to implement simulations of evolution on adaptive landscapes; and (v) discuss possible future directions for this exciting field.
Collapse
|
14
|
Crona K, Gavryushkin A, Greene D, Beerenwinkel N. Inferring genetic interactions from comparative fitness data. eLife 2017; 6. [PMID: 29260711 PMCID: PMC5737811 DOI: 10.7554/elife.28629] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Accepted: 11/21/2017] [Indexed: 01/13/2023] Open
Abstract
Darwinian fitness is a central concept in evolutionary biology. In practice, however, it is hardly possible to measure fitness for all genotypes in a natural population. Here, we present quantitative tools to make inferences about epistatic gene interactions when the fitness landscape is only incompletely determined due to imprecise measurements or missing observations. We demonstrate that genetic interactions can often be inferred from fitness rank orders, where all genotypes are ordered according to fitness, and even from partial fitness orders. We provide a complete characterization of rank orders that imply higher order epistasis. Our theory applies to all common types of gene interactions and facilitates comprehensive investigations of diverse genetic interactions. We analyzed various genetic systems comprising HIV-1, the malaria-causing parasite Plasmodium vivax, the fungus Aspergillus niger, and the TEM-family of β-lactamase associated with antibiotic resistance. For all systems, our approach revealed higher order interactions among mutations.
Collapse
Affiliation(s)
- Kristina Crona
- Department of Mathematics and Statistics, American University, Washington, DC, United States
| | - Alex Gavryushkin
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Devin Greene
- Department of Mathematics and Statistics, American University, Washington, DC, United States
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
15
|
Dasmeh P, Girard É, Serohijos AWR. Highly expressed genes evolve under strong epistasis from a proteome-wide scan in E. coli. Sci Rep 2017; 7:15844. [PMID: 29158562 PMCID: PMC5696520 DOI: 10.1038/s41598-017-16030-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Accepted: 11/06/2017] [Indexed: 11/11/2022] Open
Abstract
Epistasis or the non-additivity of mutational effects is a major force in protein evolution, but it has not been systematically quantified at the level of a proteome. Here, we estimated the extent of epistasis for 2,382 genes in E. coli using several hundreds of orthologs for each gene within the class Gammaproteobacteria. We found that the average epistasis is ~41% across genes in the proteome and that epistasis is stronger among highly expressed genes. This trend is quantitatively explained by the prevailing model of sequence evolution based on minimizing the fitness cost of protein unfolding and aggregation. The genes with the highest epistasis are also functionally involved in the maintenance of proteostasis, translation and central metabolism. In contrast, genes evolving with low epistasis mainly encode for membrane proteins and are involved in transport activity. Our results highlight the coupling between selection and epistasis in the long-term evolution of a proteome.
Collapse
Affiliation(s)
- Pouria Dasmeh
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
- Centre Robert Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
| | - Éric Girard
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
- Centre Robert Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
| | - Adrian W R Serohijos
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada.
- Centre Robert Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada.
| |
Collapse
|
16
|
Flynn WF, Haldane A, Torbett BE, Levy RM. Inference of Epistatic Effects Leading to Entrenchment and Drug Resistance in HIV-1 Protease. Mol Biol Evol 2017; 34:1291-1306. [PMID: 28369521 PMCID: PMC5435099 DOI: 10.1093/molbev/msx095] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Understanding the complex mutation patterns that give rise to drug resistant viral strains provides a foundation for developing more effective treatment strategies for HIV/AIDS. Multiple sequence alignments of drug-experienced HIV-1 protease sequences contain networks of many pair correlations which can be used to build a (Potts) Hamiltonian model of these mutation patterns. Using this Hamiltonian model, we translate HIV-1 protease sequence covariation data into quantitative predictions for the probability of observing specific mutation patterns which are in agreement with the observed sequence statistics. We find that the statistical energies of the Potts model are correlated with the fitness of individual proteins containing therapy-associated mutations as estimated by in vitro measurements of protein stability and viral infectivity. We show that the penalty for acquiring primary resistance mutations depends on the epistatic interactions with the sequence background. Primary mutations which lead to drug resistance can become highly advantageous (or entrenched) by the complex mutation patterns which arise in response to drug therapy despite being destabilizing in the wildtype background. Anticipating epistatic effects is important for the design of future protease inhibitor therapies.
Collapse
Affiliation(s)
- William F. Flynn
- Department of Physics and Astronomy, Rutgers University, New Brunswick, NJ
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
- Department of Chemistry, Temple University, Philadelphia, PA
| | - Bruce E. Torbett
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
- Department of Chemistry, Temple University, Philadelphia, PA
| |
Collapse
|
17
|
Gupta A, Adami C. Shared Information between Residues Is Sufficient to Detect Pairwise Epistasis in a Protein. PLoS Genet 2016; 12:e1006471. [PMID: 28005913 PMCID: PMC5179016 DOI: 10.1371/journal.pgen.1006471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 11/08/2016] [Indexed: 11/19/2022] Open
Affiliation(s)
- Aditi Gupta
- Center for Infectious Diseases, New Jersey Medical School, Rutgers University, Newark, New Jersey, United States of America
| | - Christoph Adami
- Department of Microbiology & Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
- BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, Michigan, United States of America
- * E-mail:
| |
Collapse
|
18
|
Abstract
Epistasis is a key concept in the theory of adaptation. Indicators of epistasis are of interest for large systems where systematic fitness measurements may not be possible. Some recent approaches depend on information theory. We show that considering shared entropy for pairs of loci can be misleading. The reason is that shared entropy does not imply epistasis for the pair. This observation holds true also in the absence of higher order epistasis. We discuss a method for reducing the number of false positives. However, our main conclusion is that entropy-based approaches have serious limitations in this context. Some recent approaches for identifying epistasis from sequence data depend on information theory. We show that considering shared entropy for pairs of loci can be misleading. The reason is that shared entropy does not imply epistasis for the pair. This observation holds true also in the absence of higher order epistasis. We discuss a method for reducing the number of false positives in the proposed method. However, our main conclusion is that shared entropy for pairs of loci is difficult to interpret. Gene frequencies reflect interactions in the entire system, and there is no natural way to decompose frequency data.
Collapse
Affiliation(s)
- Kristina Crona
- American University, Washington, D.C., United States of America
- * E-mail:
| |
Collapse
|
19
|
Levy RM, Haldane A, Flynn WF. Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr Opin Struct Biol 2016; 43:55-62. [PMID: 27870991 DOI: 10.1016/j.sbi.2016.11.004] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/03/2016] [Indexed: 11/17/2022]
Abstract
Potts Hamiltonian models of protein sequence co-variation are statistical models constructed from the pair correlations observed in a multiple sequence alignment (MSA) of a protein family. These models are powerful because they capture higher order correlations induced by mutations evolving under constraints and help quantify the connections between protein sequence, structure, and function maintained through evolution. We review recent work with Potts models to predict protein structure and sequence-dependent conformational free energy landscapes, to survey protein fitness landscapes and to explore the effects of epistasis on fitness. We also comment on the numerical methods used to infer these models for each application.
Collapse
Affiliation(s)
- Ronald M Levy
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States.
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States
| | - William F Flynn
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States; Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
| |
Collapse
|
20
|
Adami C. What is information?†. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016; 374:rsta.2015.0230. [PMID: 26857663 PMCID: PMC4760127 DOI: 10.1098/rsta.2015.0230] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Information is a precise concept that can be defined mathematically, but its relationship to what we call 'knowledge' is not always made clear. Furthermore, the concepts 'entropy' and 'information', while deeply related, are distinct and must be used with care, something that is not always achieved in the literature. In this elementary introduction, the concepts of entropy and information are laid out one by one, explained intuitively, but defined rigorously. I argue that a proper understanding of information in terms of prediction is key to a number of disciplines beyond engineering, such as physics and biology.
Collapse
Affiliation(s)
- Christoph Adami
- Department of Microbiology and Molecular Genetics, Department of Physics and Astronomy, BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
21
|
Adami C. Information-Theoretic Considerations Concerning the Origin of Life. ORIGINS LIFE EVOL B 2015; 45:309-17. [PMID: 26062909 DOI: 10.1007/s11084-015-9439-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 12/17/2014] [Indexed: 10/23/2022]
Abstract
Research investigating the origins of life usually either focuses on exploring possible life-bearing chemistries in the pre-biotic Earth, or else on synthetic approaches. Comparatively little work has explored fundamental issues concerning the spontaneous emergence of life using only concepts (such as information and evolution) that are divorced from any particular chemistry. Here, I advocate studying the probability of spontaneous molecular self-replication as a function of the information contained in the replicator, and the environmental conditions that might enable this emergence. I show (under certain simplifying assumptions) that the probability to discover a self-replicator by chance depends exponentially on the relative rate of formation of the monomers. If the rate at which monomers are formed is somewhat similar to the rate at which they would occur in a self-replicating polymer, the likelihood to discover such a replicator by chance is increased by many orders of magnitude. I document such an increase in searches for a self-replicator within the digital life system avida.
Collapse
Affiliation(s)
- Christoph Adami
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA,
| |
Collapse
|